Efficient Integration of Hierarchical Knowledge Sources and the Estimation of Semantic Confidences for Automatic Speech Interpretation

This thesis presents a system for the interpretation of natural speech which serves as input module for a spoken dialog system. It carries out the task of extracting application-specific pieces of information from the user utterance in order to pass them to the control module of the dialog system. By following the approach of integrating speech recognition and speech interpretation, the system is able to determine the spoken word sequence together with the hierarchical utterance structure that is necessary for the extraction of information directly from the recorded speech signal. The efficient implementation of the underlying decoder is based on the powerful tool of weighted finite state transducers (WFSTs). This tool allows to compile all involved knowledge sources into an optimized network representation of the search space which is constructed dynamically during the ongoing decoding process. In addition to the best-matching result, the integrated decoder architecture allows to determine grammatical alternatives which are exploited to estimate semantic confidence values for the extracted pieces of information. This new method improves the robustness against interpretation errors without requiring any additional knowledge source.

File Type: pdf
File Size: 1 MB
Publication Year: 2006
Author: Lieb, Robert
Supervisors: Gunther Ruske, Gernot A. Fink
Institution: Technische Universit?t M?nchen
Keywords: speech recognition, natural speech, speech interpretation, speech understanding, spoken dialog, hierarchical language model, statistical language model, semantic interpretation grammar, one-stage decoding, weighted finite-state transducer, WFST, semantic confidences, grammatical alternatives, out-of-vocabulary words