Automatic Speaker Characterization; Identification of Gender, Age, Language and Accent from Speech Signals

Speech signals carry important information about a speaker such as age, gender, language, accent and emotional/psychological state. Automatic recognition of speaker characteristics has a wide range of commercial, medical and forensic applications such as interactive voice response systems, service customization, natural human-machine interaction, recognizing the type of pathology of speakers, and directing the forensic investigation process. This research aims to develop accurate methods and tools to identify different physical characteristics of the speakers. Due to the lack of required databases, among all characteristics of speakers, our experiments cover gender recognition, age estimation, language recognition and accent/dialect identification. However, similar approaches and techniques can be applied to identify other characteristics such as emotional/psychological state. For speaker characterization, we first convert variable-duration speech signals into fixed-dimensional vectors suitable for classification/regression algorithms. This is performed by fitting a probability density function to acoustic ...

Bahari, Mohamad Hasan — KU Leuven


Automatic Recognition of Ageing Speakers

The process of ageing causes changes to the voice over time. There have been significant research efforts in the automatic speaker recognition community towards improving performance in the presence of everyday variability. The influence of long-term variability, due to vocal ageing, has received only marginal attention however. In this Thesis, the impact of vocal ageing on speaker verification and forensic speaker recognition is assessed, and novel methods are proposed to counteract its effect. The Trinity College Dublin Speaker Ageing (TCDSA) database, compiled for this study, is first introduced. Containing 26 speakers, with recordings spanning an age difference of between 28 and 58 years per speaker, it is the largest longitudinal speech database in the public domain. A Gaussian Mixture Model-Universal Background Model (GMM-UBM) speaker verification experiment demonstrates a progressive decline in the scores of genuine-speakers as the age difference between ...

Kelly, Finnian — Trinity College Dublin


Discrete-time speech processing with application to emotion recognition

The subject of this PhD thesis is the efficient and robust processing and analysis of the audio recordings that are derived from a call center. The thesis is comprised of two parts. The first part is dedicated to dialogue/non-dialogue detection and to speaker segmentation. The systems that are developed are prerequisite for detecting (i) the audio segments that actually contain a dialogue between the system and the call center customer and (ii) the change points between the system and the customer. This way the volume of the audio recordings that need to be processed is significantly reduced, while the system is automated. To detect the presence of a dialogue several systems are developed. This is the first effort found in the international literature that the audio channel is exclusively exploited. Also, it is the first time that the speaker utterance ...

Kotti, Margarita — Aristotle University of Thessaloniki


Forensic Evaluation of the Evidence Using Automatic Speaker Recognition Systems

This Thesis is focused on the use of automatic speaker recognition systems for forensic identification, in what is called forensic automatic speaker recognition. More generally, forensic identification aims at individualization, defined as the certainty of distinguishing an object or person from any other in a given population. This objective is followed by the analysis of the forensic evidence, understood as the comparison between two samples of material, such as glass, blood, speech, etc. An automatic speaker recognition system can be used in order to perform such comparison between some recovered speech material of questioned origin (e.g., an incriminating wire-tapping) and some control speech material coming from a suspect (e.g., recordings acquired in police facilities). However, the evaluation of such evidence is not a trivial issue at all. In fact, the debate about the presentation of forensic evidence in a court ...

Ramos, Daniel — Universidad Autonoma de Madrid


Perception and Production of Greek Vowels by Egyptian Arabic Learners of Greek as a Second Language

The purpose of the thesis is the investigation of the perception and production of the Cypriot Greek vowels by Egyptian Arab learners of Greek as a second language (L2). The particular group of adult learners has been taught Greek through formal education settings (schools, universities) living as well permanently in a country where Greek is dominant. Moreover, the study aims to show the effect of the pedagogical intervention (vowel instruction/training) on the perception and production of the Greek vowels by the adult L2 learners. The thesis employs the theoretical hypotheses of two models: the Speech Learning Model (SLM) and the Perceptual Assimilation Model (PAM). The present study constitutes the first cross-linguistic study which examines the perception and production of Greek segments by learners with Arabic first language (L1) background while the studies provided by the bibliography regarding the acquisition of ...

Georgios P. Georgiou — University of Cyprus


Dealing with Variability Factors and Its Application to Biometrics at a Distance

This Thesis is focused on dealing with the variability factors in biometric recognition and applications of biometrics at a distance. In particular, this PhD Thesis explores the problem of variability factors assessment and how to deal with them by the incorporation of soft biometrics information in order to improve person recognition systems working at a distance. The proposed methods supported by experimental results show the benefits of adapting the system considering the variability of the sample at hand. Although being relatively young compared to other mature and long-used security technologies, biometrics have emerged in the last decade as a pushing alternative for applications where automatic recognition of people is needed. Certainly, biometrics are very attractive and useful for video surveillance systems at a distance, widely distributed in our lifes, and for the final user: forget about PINs and passwords, you ...

Tome, Pedro — Universidad Autónoma de Madrid


Decision threshold estimation and model quality evaluation techniques for speaker verification

The number of biometric applications has increased a lot in the last few years. In this context, the automatic person recognition by some physical traits like fingerprints, face, voice or iris, plays an important role. Users demand this type of applications every time more and the technology seems already mature. People look for security, low cost and accuracy but, at the same time, there are many other factors in connection with biometric applications that are growing in importance. Intrusiveness is undoubtedly a burning factor to decide about the biometrics we will used for our application. At this point, one can realize about the suitability of speaker recognition because voice is the natural way of communicating, can be remotely used and provides a low cost. Automatic speaker recognition is commonly used in telephonic applications although it can also be used in ...

Rodriguez Saeta, Javier — Universitat Politecnica de Catalunya


Vision Based Sign Language Recognition: Modeling and Recognizing Isolated Signs With Manual and Non-manual Components

This thesis addresses the problem of vision based sign language recognition and focuses on three main tasks to design improved techniques that increase the performance of sign language recognition systems. We first attack the markerless tracking problem during natural and unrestricted signing in less restricted environments. We propose a joint particle filter approach for tracking multiple identical objects, in our case the two hands and the face, which is robust to situations including fast movement, interactions and occlusions. Our experiments show that the proposed approach has a robust tracking performance during the challenging situations and is suitable for tracking long durations of signing with its ability of fast recovery. Second, we attack the problem of the recognition of signs that include both manual (hand gestures) and non-manual (head/body gestures) components. We investigated multi-modal fusion techniques to model the different temporal ...

Aran, Oya — Bogazici University


Biometric Sample Quality and Its Application to Multimodal Authentication Systems

This Thesis is focused on the quality assessment of biometric signals and its application to multimodal biometric systems. Since the establishment of biometrics as an specific research area in late 90s, the biometric community has focused its efforts in the development of accurate recognition algorithms and nowadays, biometric recognition is a mature technology that is used in many applications. However, we can notice recent studies that demonstrate how performance of biometric systems is heavily affected by the quality of biometric signals. Quality measurement has emerged in the biometric community as an important concern after the poor performance observed in biometric systems on certain pathological samples. We first summarize the state-of-the-art in the biometric quality problem. We present the factors influencing biometric quality, which mainly have to do with four issues: the individual itself, the sensor used in the acquisition, the ...

Alonso-Fernandez, Fernando — Universidad Politecnica de Madrid


Confidence Measures for Speech/Speaker Recognition and Applications on Turkish LVCSR

Con dence measures for the results of speech/speaker recognition make the systems more useful in the real time applications. Con dence measures provide a test statistic for accepting or rejecting the recognition hypothesis of the speech/speaker recognition system. Speech/speaker recognition systems are usually based on statistical modeling techniques. In this thesis we de ned con dence measures for statistical modeling techniques used in speech/speaker recognition systems. For speech recognition we tested available con dence measures and the newly de ned acoustic prior information based con dence measure in two di erent conditions which cause errors: the out-of-vocabulary words and presence of additive noise. We showed that the newly de ned con dence measure performs better in both tests. Review of speech recognition and speaker recognition techniques and some related statistical methods is given through the thesis. We de ned also ...

Mengusoglu, Erhan — Universite de Mons


Improving Speech Recognition for Pluricentric Languages exemplified on Varieties of German

A method is presented to improve speech recognition for pluricentric languages. Both the effect of adaptation of acoustic data and phonetic transcriptions for several subregions of the German speaking area are investigated and discussed. All experiments were carried out for German spoken in Germany and Austria using large telephone databases (Speech-Dat). In the first part triphone-based acoustic models (AMOs) were trained for several regions and their word error rates (WERs) were compared. The WERs vary between 9.89% and 21.78% and demonstrate the importance of regional variety adaptation. In the pronunciation modeling part narrow phonetic transcriptions for a subset of the Austrian database were carried out to derive pronunciation rules for Austrian German and to generate phonetic lexica for Austrian German which are the first of their kind. These lexica were used for both triphone-based and monophone-based AMOs with German and ...

Micha Baum — TU Graz


A Robust Face Recognition Algorithm for Real-World Applications

Face recognition is one of the most challenging problems of computer vision and pattern recognition. The difficulty in face recognition arises mainly from facial appearance variations caused by factors, such as expression, illumination, partial face occlusion, and time gap between training and testing data capture. Moreover, the performance of face recognition algorithms heavily depends on prior facial feature localization step. That is, face images need to be aligned very well before they are fed into a face recognition algorithm, which requires precise facial feature localization. This thesis addresses on solving these two main problems -facial appearance variations due to changes in expression, illumination, occlusion, time gap, and imprecise face alignment due to mislocalized facial features- in order to accomplish its goal of building a generic face recognition algorithm that can function reliably under real-world conditions. The proposed face recognition algorithm ...

Ekenel, Hazim Kemal — University of Karlsruhe


Adapted Fusion Schemes for Multimodal Biometric Authentication

This Thesis is focused on the combination of multiple biometric traits for automatic person authentication, in what is called a multimodal biometric system. More generally, any type of biometric information can be combined in what is called a multibiometric system. The information sources in multibiometrics include not only multiple biometric traits but also multiple sensors, multiple biometric instances (e.g., different fingers in fingerprint verification), repeated instances, and multiple algorithms. Most of the approaches found in the literature for combining these various information sources are based on the combination of the matching scores provided by individual systems built on the different biometric evidences. The combination schemes following this architecture are typically based on combination rules or trained pattern classifiers, and most of them assume that the score level fusion function is fixed at verification time. This Thesis considers the problem of ...

Fierrez, Julian — Universidad Politecnica de Madrid


Deep learning for semantic description of visual human traits

The recent progress in artificial neural networks (rebranded as “deep learning”) has significantly boosted the state-of-the-art in numerous domains of computer vision offering an opportunity to approach the problems which were hardly solvable with conventional machine learning. Thus, in the frame of this PhD study, we explore how deep learning techniques can help in the analysis of one the most basic and essential semantic traits revealed by a human face, namely, gender and age. In particular, two complementary problem settings are considered: (1) gender/age prediction from given face images, and (2) synthesis and editing of human faces with the required gender/age attributes. Convolutional Neural Network (CNN) has currently become a standard model for image-based object recognition in general, and therefore, is a natural choice for addressing the first of these two problems. However, our preliminary studies have shown that the ...

Antipov, Grigory — Télécom ParisTech (Eurecom)


Robust Speech Recognition on Intelligent Mobile Devices with Dual-Microphone

Despite the outstanding progress made on automatic speech recognition (ASR) throughout the last decades, noise-robust ASR still poses a challenge. Tackling with acoustic noise in ASR systems is more important than ever before for a twofold reason: 1) ASR technology has begun to be extensively integrated in intelligent mobile devices (IMDs) such as smartphones to easily accomplish different tasks (e.g. search-by-voice), and 2) IMDs can be used anywhere at any time, that is, under many different acoustic (noisy) conditions. On the other hand, with the aim of enhancing noisy speech, IMDs have begun to embed small microphone arrays, i.e. microphone arrays comprised of a few sensors close each other. These multi-sensor IMDs often embed one microphone (usually at their rear) intended to capture the acoustic environment more than the speaker’s voice. This is the so-called secondary microphone. While classical microphone ...

López-Espejo, Iván — University of Granada

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.