Abstract / truncated to 115 words (read the full abstract)

This thesis presents a novel approach to speech enhancement by exploiting the bimodality of speech production and the correlation that exists between audio and visual speech information. An analysis into the correlation of a range of audio and visual features reveals significant correlation to exist between visual speech features and audio filterbank features. The amount of correlation was also found to be greater when the correlation is analysed with individual phonemes rather than across all phonemes. This led to building a Gaussian Mixture Model (GMM) that is capable of estimating filterbank features from visual features. Phoneme-specific GMMs gave lower filterbank estimation errors and a phoneme transcription is decoded using audio-visual Hidden Markov Model (HMM). Clean ... toggle 3 keywords

audio-visual speech processing speech enhancement

Information

Author
Almajai, Ibrahim
Institution
University of East Anglia
Supervisors
Publication Year
2009
Upload Date
Sept. 27, 2011

First few pages / click to enlarge

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.