IS2007 special session
home > Research topics > IS2007 Special Session >
Contact
Special session at Interspeech 2007: STeB-ASR

Structure-Based and Template-Based Automatic Speech Recognition - Comparing parametric and non-parametric approaches

While hidden Markov modeling (HMM) has been the dominant technology for acoustic modeling in automatic speech recognition today, many of its weaknesses have also been well known and they have become the focus of much intensive research. One prominent weakness in current HMMs is the handicap in representing long-span temporal dependency in the acoustic feature sequence of speech, which, nevertheless, is an essential property of speech dynamics. The main cause of this handicap is the conditional IID (Independent and Identical Distribution) assumption inherit in the HMM formalism. Furthermore, in the standard HMM approach the focus is on verbal information. However, experiments have shown that non-verbal information also plays an important role in human speech recognition which the HMM framework has not attempted to address directly. Numerous approaches have been taken over the past dozen years to address the above weaknesses of HMMs. These approaches can be broadly classified into the following two categories.

The first, parametric, structure-based approach establishes mathematical models for stochastic trajectories/segments of speech utterances using various forms of parametric characterization, including polynomials, linear dynamic systems, and nonlinear dynamic systems embedding hidden structure of speech dynamics. In this parametric modeling framework, systematic speaker variation can also be satisfactorily handled. The essence of such a hidden-dynamic approach is that it exploits knowledge and mechanisms of human speech production so as to provide the structure of the multi-tiered stochastic process models. A specific layer in this type of models represents long-range temporal dependency in a parametric form.

The second, non-parametric and template-based approach to overcoming the HMM weaknesses involves direct exploitation of speech feature trajectories (i.e., “template”) in the training data without any modeling assumptions. Due to the dramatic increase of speech databases and computer storage capacity available for training, as well as the exponentially expanded computational power, non-parametric methods using the traditional pattern recognition techniques of kNN (k-nearest-neighbor decision rule) and DTW (dynamic time warping) have recently received substantial attention. Such template-based methods have also been called exemplar-based or data-driven techniques in the literature.

The purpose of this special session is to bring together researchers who have special interest in novel techniques that are aimed at overcoming weaknesses of HMMs for acoustic modeling in speech recognition. In particular, we plan to address issues related to the representation and exploitation of long-range temporal dependency in speech feature sequences, the incorporation of fine phonetic detail in speech recognition algorithms and systems, comparisons of pros and cons between the parametric and non-parametric approaches, and the computation resource requirements for the two approaches.

This Special Session addresses key issues of Sound to Sense (S2S), a Marie Curie Research Training Network that started in 2007. S2S's unifying theme is the role of fine phonetic detail (FPD) in speech processing. This special session focuses on alternative theoretical and computational modeling paradigms for encoding FPD.

The special session is on Wednesday, August 29, 10:00 – 12:00, in the Astrid Park Plaza (APP) hotel. This hotel is nearby the Flanders Congress & Concert Centre (FCCC), it is on the same square. We start with a small poster session of 45 minutes, then 45 minutes for 3 orals, and we end with the panel discussion. Information about this special session can also be found at http://www.interspeech2007.org/Technical/structure_template_based_asr.php

Session organizers:
Li Deng <deng [at] microsoft.com>
Helmer Strik <strik [at] let.ru.nl>
Last updated on 16-08-2007