Abstract / truncated to 115 words (read the full abstract)

A major goal in distant-speech recognition is to transform speech signals of a target speaker into symbols in order to trigger a dialog manager. Spatio-temporal filters, so called beamformers, usually enhance the target speaker's speech signals in a noisy and reverberant environment. However, a beamformer requires information on the target speaker's position. A source localizer provides this information, which facilitates steering a beam into the direction of the target speaker. Unfortunately, the beamformer also captures noise and reverberation, especially from the target speaker's direction. To additionally reduce these artifacts, one can employ bandpass filters in order to emphasize the target speaker's harmonic components. But these bandpass filters require information on the target speaker's fundamental frequency. ... toggle 26 keywords

chirp z-transform data association direction of arrival fundamental frequency glottogram GM-PHD GM-CPHD gm-cbmember joint estimation microphone array multiple-target tracking optimal subpattern assignment pitch analysis pitch estimation pitch-period doubling position-pitch algorithm POPI probability hypothesis density filter relative phase-delay masking RPDM source localization sparse joint parameter space speaker separation speaker tracking variable-scale sampling VSS

Information

Author
Pessentheiner, Hannes
Institution
Graz University of Technology, Signal Processing and Speech Communication Laboratory
Supervisor
Publication Year
2017
Upload Date
Sept. 8, 2025

First few pages / click to enlarge

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.