(When) Does it Harm to Be Incomplete? Comparing Human and Automatic Speech Recognition of Syntactically Disfluent Utterances

Abstract / truncated to 115 words (read the full abstract)

This thesis presents a corpus-based, comparative analysis of error patterns in human and automatic speech recognition (ASR), based on utterances taken from spontaneous, unscripted face-to-face conversations. The utterances reflect specific patterns that are characteristic of this speaking style: They are disfluent through either a pause, a filler particle (FP), a break in the syntax, or a combination of them. Utterances that originally contained FPs were generally easier to recognise for both humans and ASR – regardless of the FP being cut out or left in the presented stimuli – than disfluent utterances without FPs. In the easier utterances, the best ASR system still had an average word error rate (WER) that was about 4.45% higher ... toggle 5 keywords
syntactic disfluencies – automatic speech recognition – human transcription – word error rate analysis – conversational speech

Information

Author

Wepner, Saskia

Institution

Signal Processing and Speech Communicatoin Laboratory, Graz University of Technology

Supervisors

Publication Year

2025

Upload Date

Sept. 1, 2025

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.

(When) Does it Harm to Be Incomplete? Comparing Human and Automatic Speech Recognition of Syntactically Disfluent Utterances (2025)

Abstract / truncated to 115 words (read the full abstract)

Information

First few pages / click to enlarge