One of the major difficulties in automatic speech recognition (ASR) is the high degree of variation in the pronunciation of individual words. In conventional Hidden Markov Models (HMMs), the variations are captured implicitly by using mixture Gaussian models for the state observations and context dependent phone models. It is known, however, that the poor performance of conventional HMMs in recognizing spontaneous speech is due to the inability to model the high degree of pronunciation variation effectively. Another limitation of conventional HMMs is that a fixed length model structure is used for individual acoustic units without considering the temporal variation of the units. This fixed length topology imposes an unrealistic constraint on the duration of the realizations. The aim of this project is to study data-driven methods for automatically designing HMMs topologies describing longer-length acoustic units, so that long term patterns due to pronunciation variation can be better modeled. As a first step, a method, called trajectory clustering will be employed to analyse longer-length speech segments. In this method, utterance segments are regarded as continuous trajectories along time in feature vector space. The similarities and differences of these trajectories are reflecting the dynamics inherent in speech. By clustering speech segments in terms of trajectories, long-term patterns due to pronunciation variation can be identified. The trajectory models describing each cluster will be used for the design of longer HMM topologies.
| References: |
|
| Time-scale: | 01-01-2004 t/m 30-06-2007 |
| Supervisors: | |
| Type of project: | PhD-project |
| For more information: |