Statistical and Discriminative Language Modeling for Turkish Large Vocabulary Continuous Speech Recognition

Abstract / truncated to 115 words (read the full abstract)

Turkish, being an agglutinative language with rich morphology, presents challenges for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. First, the agglutinative nature of Turkish leads to a high number of Out-of Vocabulary (OOV) words which in turn lower Automatic Speech Recognition (ASR) accuracy. Second, Turkish has a relatively free word order that leads to non-robust language model estimates. These challenges have been mostly handled by using meaningful segmentations of words, called sub-lexical units, in language modeling. However, a shortcoming of sub-lexical units is over-generation which needs to be dealt with for higher accuracies. This dissertation aims to address the challenges of Turkish in LVCSR. Grammatical and statistical sub-lexical units for language modeling are investigated and ... toggle 4 keywords
language modeling – automatic speech recognition – discriminative training – sub-lexical language modeling units

Information

Author

Arisoy, Ebru

Institution

Bogazici University

Supervisor

Murat Saraclar

Publication Year

2009

Upload Date

Jan. 18, 2010

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.

Statistical and Discriminative Language Modeling for Turkish Large Vocabulary Continuous Speech Recognition (2009)

Abstract / truncated to 115 words (read the full abstract)

Information

First few pages / click to enlarge