HMMs are models in which phonetic/phonological knowledge usually plays a limited role, at least in the conventional type of HMMs. The fact that during the last decade research on speech recognition has been confined almost completely to conventional HMMs, has had the effect of widening the gap between speech technology on the one hand, and phonetics and phonology on the other. Clearly, this is not an ideal situation, because both fields could and should benefit from each other.
Based on the considerations mentioned above, we decided to test a new approach to speech recognition. In this approach explicit use is made of phonetic/phonological knowledge, especially of knowledge about articulation. This knowledge is integrated in a probabilistic framework. To that end the basic units used for speech recognition are coded in terms of multi-valued articulatory features. The basic units are used to build transition networks for phonemes and words. These transition networks depict the way the the feature values change during utterances.
Our most important goal is (1) to bridge the gap between speech technology and phonetics/phonology mentioned above. We try to do this by using a model (i.e. the new HMM) which probably represents speech production in a more realistic way than the conventional HMM. In this way we hope to achieve two other goals, viz. (2) to obtain (statistical) knowledge about articulation from large amounts of 'natural speech' (as opposed to 'lab speech', on which most knowledge is based now); and (3) to improve speech recognition.
A more elaborate description can be found here.
| References: |
|
| Time-scale: | Three years, started May 1, 1995 |
| Supervisors: | Lou Boves |
| Type of project: | KNAW post-doc |
| For more information: | Helmer Strik |