ACORNS: Acquisition of Communication and Recognition Skills

About

Acquisition of Communication and Recognition Skills

Project Description

Acquisition of Communication and Recognition Skills -- ACORNS

ACORNS is a three year project
funded as part of Future and Emerging Technologies,
in the Information Society Technologies thematic priority in the 6^th Framework Programme of the European Union.

Starting date: 1 December 2006
Ending date: 30 November 2009

This page gives a summary of the original project proposal. Please, go to the Progress and Document pages for up-to-date information about the progress and status of the project.

Objectives

Project Structure

Implementation

Expected achievements/impact

Objectives

The ACORNS project intends to develop, implement and test mathematical models and computational mechanisms needed to create an artificial agent that is capable of acquiring human verbal communication behaviour based on recent findings related to emergence as well as the memory-prediction theory of intelligence, popularised in the 2004 book ‘On Intelligence’ by Jeff Hawkins and Sandra Blakeslee. Unlike conventional pattern recognizers, the ACORNS agent will learn dynamic emergent patterns of speech and non-speech sounds from rich and redundant representations of the acoustic input. Learning will be guided by the agent’s intention to fulfill basic needs required to acquire the verbal and non-verbal communication skills, similar to the acquisition of language and communication skills of a human toddler.

top

Project structure

To reach the goal, advances in understanding and technology development are needed in five closely interrelated areas,

representations of acoustic signals in multiple parallel temporal and spectral resolutions,
methods for patterning signals into coherent structures that correspond to potentially meaningful acoustic events,
methods for building and maintaining dynamic emergent patterns in memory,
methods for searching in an associative memory, and
methods for handling natural interaction between an artificial agent and a human, including verbal and non-verbal (affective) information.

These five issues are being addressed in five closely intertwined work packages, which form the core of the ACORNS project. Figure 1 shows a graphical representation of the structure of the project. There are two additional work packages, one devoted to project management and another to dissemination, both of which will be coordinated by the project manager Els den Os.

Fig. 1 Structure of the ACORNS project.

top

Implementation

ACORNS sets out to prove that the memory-prediction theory can account for the acquisition of communication skills and language if a suitably designed system is fed with a combination of speech and visual input. Reactions of the learning agent to the inputs will be guided by the innate intention to maximise the appreciation it receives from its ‘caretaker’. To that end, the ‘caretaker’ will give feedback to the system according to the appropriateness of its latest response. Thus, the learning system will always receive two types of input: a multimodal message and feedback on its response to the original multimodal message. Depending on the type of feedback the learning regime can be varied from unsupervised (no meaningful feedback at all) to tightly supervised learning. In ACORNS we will use lightly supervised learning, meaning that there will be some kind of feedback following most of the input-response pairs, but that the feedback will be quite general in nature. Initially, both the system’s responses and the caretaker’s feedback will be symbolic. Symbolic feedback implies that the system does not need to apply error-prone processing to interpret the meaning of the feedback. Rather, it will be possible to interpret the feedback unambiguously on a scale form neutral to positive. From the second year onward part of the caretakers’ feedback will be in the form of speech, requiring the learning system to estimate the affective meaning of the feedback from the prosody. This will complicate the learning situation.

At the end of the first year the system must be able to build internal representations of some 10 words which are not too difficult to distinguish acoustically and which will be produced by four speakers. However, it must be able to handle somewhat similar words like ‘papa’ and ‘mamma’. The system must be able to form and access these representations from continuous speech input, be it that the input utterances will have the characteristics of ‘parentese’, i.e. the somewhat exaggerated type of speech that is often used to address babies.
At the end of the second year the learning system must be able to learn a vocabulary of 50 words, starting from its language proficiency at the end of the first year. The words will still refer to concrete objects and events. At the same time the system must be able to recognise ten speakers (both male and female) and recognise the words spoken by these speakers. The focus will be on learning words for concrete objects, such as animals, fruits, furniture, etc. The focus on concrete objects will facilitate the coupling of speech and visual inputs.
At the end of the third year the system must have extended its vocabulary to some 250 words, including adjectives and present tense forms of simple action verbs. The most important additional communicative skills and functionalities include the capability to understand arbitrary speakers and to acquire additional words on the basis of a small number of ‘training’ tokens. If new words occur without a non-verbal context that allows establishing links between the words and real-world entities, only acoustic representations will be formed. If new words can be linked to visual information, these links must be established automatically. Examples of words-concept combinations that the system should be able to learn from a small number of inputs include colour names, spatial relations and adjectives referring to size.

In addition to fundamental scientific knowledge ACORNS will also produce novel techniques that will be integrated and tested into more conventional systems that perform pattern recognition and human-machine interaction. By doing so, ACORNS remedies important weaknesses in today's state-of-the-art speech recognition and dialogue systems.

It is intended that ACORNS will provide a radical new approach to the creation of artificial systems capable of human-like communicative behaviour.

top

Expected achievements/impact

In the first year of the project we intended to show that an artificial agent is able to discover structure in child directed speech signals, build internal representations of the acoustic signals in memory, and link the acoustic representation to a small number of physical objects that are (virtually) present in the scene while a corresponding utterance is spoken. These targets have been reached, despite the fact that some partners were only able to put together a complete team in the second half of the year. Thanks to intensive and effective collaboration between the partners we have been able to build a platform for conducting learning experiments, to integrate the module for conventional feature extraction from WP1 and a module for information discovery from WP4 in the platform, and conduct experiments. The design and results of the experiments show how the pattern discovery techniques under development in WP2 can be integrated in the platform now that initial software modules are available. It has also become clear how the results of the experiments performed so far can be mapped onto the memory architecture under development in WP3.

The focus of the research in the next year will be on the development of novel features in the acoustic pre-processing, further development of techniques for structure discovery and information discovery, and on a tighter integration of these modules in the platform for conduction learning and interaction experiments. The learning task will be more complicated, in that the artificial agent must be able to handle a larger number (±50) of concepts (not only nouns, but also verbs and adjectives), to discover multiple semantic units in an utterance and to build internal representations that can be linked both to the acoustic signals and the semantic/pragmatic value of an utterance and can be used as a stepping stone for learning additional words and concepts. We will also investigate whether the emergent internal representations can explain the loss of sensitivity to non-native phonetic contrasts that is observed in babies after their first birthday. Additional success criteria will be defined in the first project meeting, which is will take place in March 2008.

top

Last updated: 6 January 2009. Please contact Els den Os with any comments, complaints, or reports of broken links.