Advances in Glottal Analysis and its Applications

From artificial voices in GPS to automatic systems of dictation, from voice-based identity verification to voice pathology detection, speech processing applications are nowadays omnipresent in our daily life. By offering solutions to companies seeking for efficiency enhancement with simultaneous cost saving, the market of speech technology is forecast to be especially promising in the next years. The present thesis deals with advances in glottal analysis in order to incorporate new techniques within speech processing applications. While current systems are usually based on information related to the vocal tract configuration, the airflow passing through the vocal folds, and called glottal flow, is expected to exhibit a relevant complementarity. Unfortunately, glottal analysis from speech recordings requires specific complex processing operations, which explains why it has been generally avoided. The main goal of this thesis is to provide new advances in glottal analysis so as to popularize it in speech processing. First, new techniques for glottal excitation estimation and modeling are proposed and shown to outperform other state-of-the-art approaches on large corpora of real speech. In a second time, proposed techniques are integrated within various speech processing applications: speech synthesis, voice pathology detection, speaker recognition and expressive speech analysis. They are shown to lead to a substantial improvement when compared to other existing methods. More specifically, the present thesis is made of three separate but connected divisions. In the first part, new algorithms for robust pitch tracking and for the automatic determination of glottal closure instants are developed. This step is necessary as accurate glottal analysis requires to process pitch-synchronous speech frames. In the second part, a new non-parametric method based on Complex Cepstrum is proposed for glottal flow estimation. In addition, a way to achieve this decomposition asynchronously is investigated. A comprehensive comparative study of glottal flow estimation approaches is also given. Relying on this expertise, the usefulness of glottal information for voice pathology detection and expressive speech analysis is explored. In the third part, a new excitation modeling called Deterministic plus Stochastic Model of the residual signal is proposed. This model is applied to speech synthesis where it is shown to enhance the naturalness and quality of the delivered voice. Finally, glottal signatures derived from this model are observed to lead to an increase of identification rates for speaker recognition purpose.

File Type: pdf
File Size: 6 MB
Publication Year: 2011
Author: Drugman, Thomas
Supervisors: Thierry Dutoit
Institution: Universite de Mons
Keywords: Speech Processing, Speech Analysis, Speech Synthesis, Speaker Recognition, Voice Pathology Detection, Expressive Speech