Confidence Measures for Speech/Speaker Recognition and Applications on Turkish LVCSR
Condence measures for the results of speech/speaker recognition make the systems more useful in the real time applications. Condence measures provide a test statistic for accepting or rejecting the recognition hypothesis of the speech/speaker recognition system. Speech/speaker recognition systems are usually based on statistical modeling techniques. In this thesis we dened condence measures for statistical modeling techniques used in speech/speaker recognition systems. For speech recognition we tested available condence measures and the newly dened acoustic prior information based condence measure in two dierent conditions which cause errors: the out-of-vocabulary words and presence of additive noise. We showed that the newly dened condence measure performs better in both tests. Review of speech recognition and speaker recognition techniques and some related statistical methods is given through the thesis. We dened also a new interpretation technique for condence measures which is based on Fisher transformation of likelihood ratios obtained in speaker verication. Transformation provided us with a linearly interpretable condence level which can be used directly in real time applications like for dialog management. We have also tested the condence measures for speaker verication systems and evaluated the e?ciency of the condence measures for adaptation of speaker models. We showed that use of condence measures to select adaptation data improves the accuracy of the speaker model adaptation process. Another contribution of this thesis is the preparation of a phonetically rich continuous speech database for Turkish Language. The database is used for developing an HMM/MLP hybrid speech recognition for Turkish Language. Experiments on the test sets of the database showed that the speech recognition system has a good accuracy for long speech sequences while performance is lower for short words, as it is the case for current speech recognition systems for other languages. A new language modeling technique for the Turkish language is introduced in this thesis, which can be used for other agglutinative languages. Performance evaluations on newly de ned language modeling techniques showed that it outperforms the classical n-gram language modeling technique.
