Using Articulatory Knowledge in Automatic Speech Recognition

HMMs are models in which phonetic/phonological knowledge usually plays a limited role, at least in the conventional type of HMMs. The fact that during the last decade research on speech recognition has been confined almost completely to conventional HMMs, has had the effect of widening the gap between speech technology on the one hand, and phonetics and phonology on the other. Clearly, this is not an ideal situation, because both fields could and should benefit from each other.

Based on the considerations mentioned above, we decided to test a new approach to speech recognition. In this approach explicit use is made of phonetic/phonological knowledge, especially of knowledge about articulation. This knowledge is integrated in a probabilistic framework. To that end the basic units used for speech recognition are coded in terms of multi-valued articulatory features. The basic units are used to build transition networks for phonemes and words. These transition networks depict the way the the feature values change during utterances.

Our most important goal is (1) to bridge the gap between speech technology and phonetics/phonology mentioned above. We try to do this by using a model (i.e. the new HMM) which probably represents speech production in a more realistic way than the conventional HMM. In this way we hope to achieve two other goals, viz. (2) to obtain (statistical) knowledge about articulation from large amounts of 'natural speech' (as opposed to 'lab speech', on which most knowledge is based now); and (3) to improve speech recognition.

A more elaborate description can be found here.

References:
  • Bakis, R. (1991) Coarticulation modeling with continuous-state HMMs, Proc. 1991 IEEE Workshop on Automatic Speech Recognition, Arden House, Harriman, New York, 1991, pp. 20-21.
  • Deng, L. & Erler, K. (1992) Structural design of a hidden Markov model based speech recognizer using multivalued phonetic features: Comparison with segmental speech units. J. Ac. Soc. Am. 92 (6), pp. 3058-3067.
  • Deng, L. & Sun, D.X. (1994) A statistical approach to automatic speech recognition using the atomic speech units constructed from overlapping articulatory features. J. Ac. Soc. Am. 95 (5), pp. 2702-2719.
  • Kenny, P., Zhao, R., Gupta, V., Lennig, M. and O'Shaughnessy, D. (1991) Articulatory Markov models, Proc. 1991 IEEE Workshop on Automatic Speech Recognition, Arden House, Harriman, New York, 1991, pp. 22-23.
Time-scale: Three years, started May 1, 1995
Supervisors: Lou Boves
Type of project: KNAW post-doc
For more information: Helmer Strik

 

[Projects]