Glottal Source Estimation and Automatic Detection of Dysphonic Speakers

Among all the biomedical signals, speech is among the most complex ones since it is produced and received by humans. The extraction and the analysis of the information conveyed by this signal are the basis of many applications, including the topics discussed in this thesis: the estimation of the glottal source and the automatic detection of voice pathologies. In the first part of the thesis, after a presentation of existing methods for the estimation of the glottal source, a focus is made on the occurence of irregular glottal source estimations when the representation based on the Zeros of the Z-Transform (ZZT) is concerned. As this method is sensitive to the location of the analysis window, it is proposed to regularize the estimation by shifting the analysis window around its initial location. The best shift is found by using a dynamic programing algorithm including constraints about the glottal source and the vocal tract response, both being estimated by the ZZT-based method for each shift. Based on the regularized glottal source, characteristic parameters are estimated by finding the best fitting glottal source model. The application of this method on real speech is presented. The second part of the thesis is devoted to the development of automatic methods for the detection of voice pathologies. These pathologies are usually assessed in clinics by means of perceptive and objective analysis. In support to this assessment, there is a need to develop new objective methods in order to detect a pathology or evaluate the voice quality before and after surgery. After a large overview of existing methods in terms of features and classification approaches and a comparison between different methodologies for the features selection, it is investigated to which extent a limited number of features can be combined in a simple classification approach to detect the presence of a pathology. A first application shows that the correlation between acoustic descriptors, which do not require the estimation of fundamental period, is able to discriminate well between normal and pathological sustained vowels. A second application shows the interest of combining the information extracted from the speech signal and the estimation of the glottal source for the detection of voice pathologies. In this application, two features (one computed on the speech signal and the other on the glottal contribution) are selected by means of mutual information-based measure and their distribution for normal and pathological voices is estimated to derive a simple classifier based on Gaussian Mixture Models. The ability of this classification approach to discriminate between normal and pathological sustained vowels is demonstrated and it is proposed to nuance the decision provided by the classifier by including indetermination zones in the normal/pathological decision. These precautions allow to increase the reliability of the decision provided to the clinician.

File Type: pdf
File Size: 8 MB
Publication Year: 2011
Author: Dubuisson, Thomas
Supervisors: Thierry Dutoit
Institution: University of Mons
Keywords: Speech signal, glottal source estimation, zeros of the z-transform, voice pathology, detection, features selection, mutual information, Gaussian Mixture Models