Speech recognition in noisy conditions using missing feature approach

The research in this thesis addresses the problem of automatic speech recognition in noisy environments. Automatic speech recognition systems obtain acceptable performances in noise free conditions but these performances degrade dramatically in presence of additive noise. This is mainly due to the mismatch between the training and the noisy operating conditions. In the time-frequency representation of the noisy speech signal, some of the clean speech features are masked by noise. In this case the clean speech features cannot be correctly estimated from the noisy speech and therefore they are considered as missing or unreliable. In order to improve the performance of speech recognition systems in additive noise conditions, special attention should be paid to the problems of detection and compensation of these unreliable features. This thesis is concerned with the problem of missing features applied to automatic speaker-independent speech recognition. The two problems of the detection and the compensation of the unreliable features are treated in this work. The detection of unreliable features consists in selecting the regions in the timefrequency representation of the speech signal where noise is dominant. In these regions the information about the speech signal is masked by noise. Two approaches are possibles to compensate for unreliable features. The first approach estimates an interval where the unreliable values should be, and during the recognition the emission probabilities of the hidden Markov models (HMMs) are integrated over these intervals. And the second approach replaces the unreliable features by their most probable values calculated from the reliable features and a statistical model representing clean speech. In this work, we first study the problem of pattern recognition when some data are missing. From this study we derive two possible approaches for the compensation of the missing (unreliable) data. The problems of detection and compensation of the missing features both need an estimation of the disturbing noise. Therefore, we study and evaluate several methods for the estimation of the noise distribution. A new statistical detector, based on the model of the noise, is developed to divide the features in two sets, reliable and unreliable. Then we develop and evaluate several methods to compensate the unreliable features in the framework of continuous density HMM recognizer. We propose a new method of integration based on approximation of the clean features and a new method of imputation based on Gaussian mixture model. These methods for compensation of the unreliable data are then compared with classical robust speech recognition methods. The results obtained by the proposed methods of detection and compensation of the unreliable features outperform or obtain similar results to that obtained by classical methods for noisy speech recognition tasks. This work opens several directions of research to the problem of speech recognition in noisy conditions. The concept of detection and compensation of unreliable features can be used to extend other methods generally used in robust speech recognition such as parallel model combination.

File Type: pdf
File Size: 5 KB
Publication Year: 2001
Author: Renevey, Philippe
Supervisors: Andrzej Drygajlo
Institution: Swiss Federal Institute of Technology
Keywords: