Statistical and Discriminative Language Modeling for Turkish Large Vocabulary Continuous Speech Recognition (2009)
Abstract / truncated to 115 words
Turkish, being an agglutinative language with rich morphology, presents challenges for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. First, the agglutinative nature of Turkish leads to a high number of Out-of Vocabulary (OOV) words which in turn lower Automatic Speech Recognition (ASR) accuracy. Second, Turkish has a relatively free word order that leads to non-robust language model estimates. These challenges have been mostly handled by using meaningful segmentations of words, called sub-lexical units, in language modeling. However, a shortcoming of sub-lexical units is over-generation which needs to be dealt with for higher accuracies. This dissertation aims to address the challenges of Turkish in LVCSR. Grammatical and statistical sub-lexical units for language modeling are investigated and ...
language modeling – automatic speech recognition – discriminative training – sub-lexical language modeling units
Information
- Author
- Arisoy, Ebru
- Institution
- Bogazici University
- Supervisor
- Publication Year
- 2009
- Upload Date
- Jan. 18, 2010
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.