(When) Does it Harm to Be Incomplete? Comparing Human and Automatic Speech Recognition of Syntactically Disfluent Utterances (2025)
Abstract / truncated to 115 words
This thesis presents a corpus-based, comparative analysis of error patterns in human and automatic speech recognition (ASR), based on utterances taken from spontaneous, unscripted face-to-face conversations. The utterances reflect specific patterns that are characteristic of this speaking style: They are disfluent through either a pause, a filler particle (FP), a break in the syntax, or a combination of them. Utterances that originally contained FPs were generally easier to recognise for both humans and ASR – regardless of the FP being cut out or left in the presented stimuli – than disfluent utterances without FPs. In the easier utterances, the best ASR system still had an average word error rate (WER) that was about 4.45% higher ...
syntactic disfluencies – automatic speech recognition – human transcription – word error rate analysis – conversational speech
Information
- Author
- Wepner, Saskia
- Institution
- Signal Processing and Speech Communicatoin Laboratory, Graz University of Technology
- Supervisors
- Publication Year
- 2025
- Upload Date
- Sept. 1, 2025
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.