Abstract / truncated to 115 words (read the full abstract)

This thesis presents a corpus-based, comparative analysis of error patterns in human and automatic speech recognition (ASR), based on utterances taken from spontaneous, unscripted face-to-face conversations. The utterances reflect specific patterns that are characteristic of this speaking style: They are disfluent through either a pause, a filler particle (FP), a break in the syntax, or a combination of them. Utterances that originally contained FPs were generally easier to recognise for both humans and ASR – regardless of the FP being cut out or left in the presented stimuli – than disfluent utterances without FPs. In the easier utterances, the best ASR system still had an average word error rate (WER) that was about 4.45% higher ... toggle 5 keywords

syntactic disfluencies automatic speech recognition human transcription word error rate analysis conversational speech

Information

Author
Wepner, Saskia
Institution
Signal Processing and Speech Communicatoin Laboratory, Graz University of Technology
Supervisors
Publication Year
2025
Upload Date
Sept. 1, 2025

First few pages / click to enlarge

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.