Confidence Measures for Speech/Speaker Recognition and Applications on Turkish LVCSR

Con dence measures for the results of speech/speaker recognition make the systems more useful in the real time applications. Con dence measures provide a test statistic for accepting or rejecting the recognition hypothesis of the speech/speaker recognition system. Speech/speaker recognition systems are usually based on statistical modeling techniques. In this thesis we de ned con dence measures for statistical modeling techniques used in speech/speaker recognition systems. For speech recognition we tested available con dence measures and the newly de ned acoustic prior information based con dence measure in two di erent conditions which cause errors: the out-of-vocabulary words and presence of additive noise. We showed that the newly de ned con dence measure performs better in both tests. Review of speech recognition and speaker recognition techniques and some related statistical methods is given through the thesis. We de ned also a new interpretation technique for con dence measures which is based on Fisher transformation of likelihood ratios obtained in speaker veri cation. Transformation provided us with a linearly interpretable con dence level which can be used directly in real time applications like for dialog management. We have also tested the con dence measures for speaker veri cation systems and evaluated the e?ciency of the con dence measures for adaptation of speaker models. We showed that use of con dence measures to select adaptation data improves the accuracy of the speaker model adaptation process. Another contribution of this thesis is the preparation of a phonetically rich continuous speech database for Turkish Language. The database is used for developing an HMM/MLP hybrid speech recognition for Turkish Language. Experiments on the test sets of the database showed that the speech recognition system has a good accuracy for long speech sequences while performance is lower for short words, as it is the case for current speech recognition systems for other languages. A new language modeling technique for the Turkish language is introduced in this thesis, which can be used for other agglutinative languages. Performance evaluations on newly de ned language modeling techniques showed that it outperforms the classical n-gram language modeling technique.

File Type: pdf
File Size: 893 KB
Publication Year: 2004
Author: Mengusoglu, Erhan
Supervisors: H. Leich
Institution: Universite de Mons
Keywords: