|
Project Description |
---|
Acquisition of Communication
and Recognition Skills -- ACORNS
ACORNS is a
three year project
funded as part of Future
and Emerging Technologies,
in the Information
Society Technologies thematic priority in the 6th Framework
Programme of the European Union.
Starting
date: 1 December 2006
Ending date: 30 November 2009
This page gives a summary of the original project proposal. Please, go to the Progress and Document pages for up-to-date information about the progress and status of the project.
The ACORNS
project intends to develop, implement and test mathematical models and
computational mechanisms needed to create an artificial agent that is capable of
acquiring human verbal communication behaviour based on recent findings related
to emergence as well as the memory-prediction theory of intelligence,
popularised in the 2004 book ‘On
Intelligence’ by Jeff Hawkins and Sandra Blakeslee. Unlike conventional
pattern recognizers, the ACORNS agent will learn dynamic emergent patterns of
speech and non-speech sounds from rich and redundant representations of the
acoustic input. Learning will be guided by the agent’s intention to fulfill basic needs required to acquire the verbal and non-verbal communication skills,
similar to the acquisition of language and communication skills of a human
toddler.
To reach the goal, advances in understanding and technology development are needed in five closely interrelated areas,
representations of acoustic signals in multiple parallel temporal and spectral resolutions,
methods for patterning signals into coherent structures that correspond to potentially meaningful acoustic events,
methods for building and maintaining dynamic emergent patterns in memory,
methods for searching in an associative memory, and
methods for handling natural interaction between an artificial agent and a human, including verbal and non-verbal (affective) information.
These five
issues are being addressed in five closely intertwined work packages, which form
the core of the ACORNS project. Figure 1 shows a graphical representation of the
structure of the project. There are two additional work packages, one
devoted to project management and another to dissemination, both of which will
be coordinated by the project manager Els den Os.
Fig. 1 Structure of the ACORNS project.
ACORNS
sets out to prove that the memory-prediction theory can account for the
acquisition of communication skills and language if a suitably designed system
is fed with a combination of speech and visual input.
Reactions of the learning agent to the inputs will be guided by the innate
intention to maximise the appreciation it receives from its ‘caretaker’. To
that end, the ‘caretaker’ will give feedback to the system according to the
appropriateness of its latest response. Thus, the learning system will always
receive two types of input: a multimodal message and feedback on its response to
the original multimodal message. Depending on the type of feedback the learning
regime can be varied from unsupervised (no meaningful feedback at all) to
tightly supervised learning. In ACORNS we will use lightly supervised learning,
meaning that there will be some kind of feedback following most of the
input-response pairs, but that the feedback will be quite general in nature.
Initially, both the system’s responses and the caretaker’s feedback will be
symbolic. Symbolic feedback implies that the system does not need to apply
error-prone processing to interpret the meaning of the feedback. Rather, it will
be possible to interpret the feedback unambiguously on a scale form neutral to
positive. From the second year onward part of the caretakers’ feedback will be
in the form of speech, requiring the learning system to estimate the affective
meaning of the feedback from the prosody. This will complicate the learning
situation.
At the end of the first year the system must be able to build internal representations of some 10 words which are not too difficult to distinguish acoustically and which will be produced by four speakers. However, it must be able to handle somewhat similar words like ‘papa’ and ‘mamma’. The system must be able to form and access these representations from continuous speech input, be it that the input utterances will have the characteristics of ‘parentese’, i.e. the somewhat exaggerated type of speech that is often used to address babies.
At the end of the second year the learning system must be able to learn a vocabulary of 50 words, starting from its language proficiency at the end of the first year. The words will still refer to concrete objects and events. At the same time the system must be able to recognise ten speakers (both male and female) and recognise the words spoken by these speakers. The focus will be on learning words for concrete objects, such as animals, fruits, furniture, etc. The focus on concrete objects will facilitate the coupling of speech and visual inputs.
At the end of the
third year the system must have extended its vocabulary to some 250 words,
including adjectives and present tense forms of simple action verbs. The
most important additional communicative skills and functionalities include
the capability to understand arbitrary speakers and to acquire additional
words on the basis of a small number of ‘training’ tokens. If new words
occur without a non-verbal context that allows establishing links between
the words and real-world entities, only acoustic representations will be
formed. If new words can be linked to visual information, these links must
be established automatically. Examples of words-concept combinations that
the system should be able to learn from a small number of inputs include
colour names, spatial relations and adjectives referring to size.
In addition to fundamental
scientific knowledge ACORNS will also produce novel techniques that will be
integrated and tested into more conventional systems that perform pattern
recognition and human-machine interaction. By doing so, ACORNS remedies
important weaknesses in today's state-of-the-art speech recognition and dialogue
systems.
It is intended that ACORNS will
provide a radical new approach to the creation of artificial systems capable of
human-like communicative behaviour.
In the
first year of the project we intended to show that an artificial agent is able
to discover structure in child directed speech signals, build internal
representations of the acoustic signals in memory, and link the acoustic
representation to a small number of physical objects that are (virtually)
present in the scene while a corresponding utterance is spoken. These targets
have been reached, despite the fact that some partners were only able to put
together a complete team in the second half of the year. Thanks to intensive and
effective collaboration between the partners we have been able to build a
platform for conducting learning experiments, to integrate the module for
conventional feature extraction from WP1 and a module for information discovery
from WP4 in the platform, and conduct experiments. The design and results of the
experiments show how the pattern discovery techniques under development in WP2
can be integrated in the platform now that initial software modules are
available. It has also become clear how the results of the experiments performed
so far can be mapped onto the memory architecture under development in WP3.
The focus of the research in the next year will be on the development of novel features in the acoustic pre-processing, further development of techniques for structure discovery and information discovery, and on a tighter integration of these modules in the platform for conduction learning and interaction experiments. The learning task will be more complicated, in that the artificial agent must be able to handle a larger number (±50) of concepts (not only nouns, but also verbs and adjectives), to discover multiple semantic units in an utterance and to build internal representations that can be linked both to the acoustic signals and the semantic/pragmatic value of an utterance and can be used as a stepping stone for learning additional words and concepts. We will also investigate whether the emergent internal representations can explain the loss of sensitivity to non-native phonetic contrasts that is observed in babies after their first birthday. Additional success criteria will be defined in the first project meeting, which is will take place in March 2008.
Last updated: 6 January 2009. Please contact Els den Os with any comments, complaints, or reports of broken links.