Automatic Speaker Characterization; Identification of Gender, Age, Language and Accent from Speech Signals

Speech signals carry important information about a speaker such as age, gender, language, accent and emotional/psychological state. Automatic recognition of speaker characteristics has a wide range of commercial, medical and forensic applications such as interactive voice response systems, service customization, natural human-machine interaction, recognizing the type of pathology of speakers, and directing the forensic investigation process. This research aims to develop accurate methods and tools to identify different physical characteristics of the speakers. Due to the lack of required databases, among all characteristics of speakers, our experiments cover gender recognition, age estimation, language recognition and accent/dialect identification. However, similar approaches and techniques can be applied to identify other characteristics such as emotional/psychological state. For speaker characterization, we first convert variable-duration speech signals into fixed-dimensional vectors suitable for classification/regression algorithms. This is performed by fitting a probability density function to acoustic ...

Bahari, Mohamad Hasan — KU Leuven


Fusing prosodic and acoustic information for speaker recognition

Automatic speaker recognition is the use of a machine to identify an individual from a spoken sentence. Recently, this technology has been undergone an increasing use in applications such as access control, transaction authentication, law enforcement, forensics, and system customisation, among others. One of the central questions addressed by this field is what is it in the speech signal that conveys speaker identity. Traditionally, automatic speaker recognition systems have relied mostly on short-term features related to the spectrum of the voice. However, human speaker recognition relies on other sources of information; therefore, there is reason to believe that these sources can play also an important role in the automatic speaker recognition task, adding complementary knowledge to the traditional spectrum-based recognition systems and thus improving their accuracy. The main objective of this thesis is to add prosodic information to a traditional ...

Farrus, Mireia — Universitat Politecnica de Catalunya


Forensic Evaluation of the Evidence Using Automatic Speaker Recognition Systems

This Thesis is focused on the use of automatic speaker recognition systems for forensic identification, in what is called forensic automatic speaker recognition. More generally, forensic identification aims at individualization, defined as the certainty of distinguishing an object or person from any other in a given population. This objective is followed by the analysis of the forensic evidence, understood as the comparison between two samples of material, such as glass, blood, speech, etc. An automatic speaker recognition system can be used in order to perform such comparison between some recovered speech material of questioned origin (e.g., an incriminating wire-tapping) and some control speech material coming from a suspect (e.g., recordings acquired in police facilities). However, the evaluation of such evidence is not a trivial issue at all. In fact, the debate about the presentation of forensic evidence in a court ...

Ramos, Daniel — Universidad Autonoma de Madrid


Dealing with Variability Factors and Its Application to Biometrics at a Distance

This Thesis is focused on dealing with the variability factors in biometric recognition and applications of biometrics at a distance. In particular, this PhD Thesis explores the problem of variability factors assessment and how to deal with them by the incorporation of soft biometrics information in order to improve person recognition systems working at a distance. The proposed methods supported by experimental results show the benefits of adapting the system considering the variability of the sample at hand. Although being relatively young compared to other mature and long-used security technologies, biometrics have emerged in the last decade as a pushing alternative for applications where automatic recognition of people is needed. Certainly, biometrics are very attractive and useful for video surveillance systems at a distance, widely distributed in our lifes, and for the final user: forget about PINs and passwords, you ...

Tome, Pedro — Universidad Autónoma de Madrid


Decision threshold estimation and model quality evaluation techniques for speaker verification

The number of biometric applications has increased a lot in the last few years. In this context, the automatic person recognition by some physical traits like fingerprints, face, voice or iris, plays an important role. Users demand this type of applications every time more and the technology seems already mature. People look for security, low cost and accuracy but, at the same time, there are many other factors in connection with biometric applications that are growing in importance. Intrusiveness is undoubtedly a burning factor to decide about the biometrics we will used for our application. At this point, one can realize about the suitability of speaker recognition because voice is the natural way of communicating, can be remotely used and provides a low cost. Automatic speaker recognition is commonly used in telephonic applications although it can also be used in ...

Rodriguez Saeta, Javier — Universitat Politecnica de Catalunya


Adapted Fusion Schemes for Multimodal Biometric Authentication

This Thesis is focused on the combination of multiple biometric traits for automatic person authentication, in what is called a multimodal biometric system. More generally, any type of biometric information can be combined in what is called a multibiometric system. The information sources in multibiometrics include not only multiple biometric traits but also multiple sensors, multiple biometric instances (e.g., different fingers in fingerprint verification), repeated instances, and multiple algorithms. Most of the approaches found in the literature for combining these various information sources are based on the combination of the matching scores provided by individual systems built on the different biometric evidences. The combination schemes following this architecture are typically based on combination rules or trained pattern classifiers, and most of them assume that the score level fusion function is fixed at verification time. This Thesis considers the problem of ...

Fierrez, Julian — Universidad Politecnica de Madrid


Deep learning for semantic description of visual human traits

The recent progress in artificial neural networks (rebranded as “deep learning”) has significantly boosted the state-of-the-art in numerous domains of computer vision offering an opportunity to approach the problems which were hardly solvable with conventional machine learning. Thus, in the frame of this PhD study, we explore how deep learning techniques can help in the analysis of one the most basic and essential semantic traits revealed by a human face, namely, gender and age. In particular, two complementary problem settings are considered: (1) gender/age prediction from given face images, and (2) synthesis and editing of human faces with the required gender/age attributes. Convolutional Neural Network (CNN) has currently become a standard model for image-based object recognition in general, and therefore, is a natural choice for addressing the first of these two problems. However, our preliminary studies have shown that the ...

Antipov, Grigory — Télécom ParisTech (Eurecom)


Automatic Signature and Graphical Password Verification: Discriminant Features and New Application Scenarios

The proliferation of handheld devices such as smartphones and tablets brings a new scenario for biometric authentication, and in particular to automatic signature verification. Research on signature verification has been traditionally carried out using signatures acquired on digitizing tablets or Tablet-PCs. This PhD Thesis addresses the problem of user authentication on handled devices using handwritten signatures and graphical passwords based on free-form doodles, as well as the effects of biometric aging on signatures. The Thesis pretends to analyze: (i) which are the effects of mobile conditions on signature and doodle verification, (ii) which are the most distinctive features in mobile conditions, extracted from the pen or fingertip trajectory, (iii) how do different similarity computation (i.e. matching) algorithms behave with signatures and graphical passwords captured on mobile conditions, and (iv) what is the impact of aging on signature features and verification ...

Martinez-Diaz, Marcos — Universidad Autonoma de Madrid


Deep Learning for i-Vector Speaker and Language Recognition

Over the last few years, i-vectors have been the state-of-the-art technique in speaker and language recognition. Recent advances in Deep Learning (DL) technology have improved the quality of i-vectors but the DL techniques in use are computationally expensive and need speaker or/and phonetic labels for the background data, which are not easily accessible in practice. On the other hand, the lack of speaker-labeled background data makes a big performance gap, in speaker recognition, between two well-known cosine and Probabilistic Linear Discriminant Analysis (PLDA) i-vector scoring techniques. It has recently been a challenge how to fill this gap without speaker labels, which are expensive in practice. Although some unsupervised clustering techniques are proposed to estimate the speaker labels, they cannot accurately estimate the labels. This thesis tries to solve the problems above by using the DL technology in different ways, without ...

Ghahabi, Omid — Universitat Politecnica de Catalunya


Advances in Glottal Analysis and its Applications

From artificial voices in GPS to automatic systems of dictation, from voice-based identity verification to voice pathology detection, speech processing applications are nowadays omnipresent in our daily life. By offering solutions to companies seeking for efficiency enhancement with simultaneous cost saving, the market of speech technology is forecast to be especially promising in the next years. The present thesis deals with advances in glottal analysis in order to incorporate new techniques within speech processing applications. While current systems are usually based on information related to the vocal tract configuration, the airflow passing through the vocal folds, and called glottal flow, is expected to exhibit a relevant complementarity. Unfortunately, glottal analysis from speech recordings requires specific complex processing operations, which explains why it has been generally avoided. The main goal of this thesis is to provide new advances in glottal analysis ...

Drugman, Thomas — Universite de Mons


Representation and Metric Learning Advances for Deep Neural Network Face and Speaker Biometric Systems

The increasing use of technological devices and biometric recognition systems in people daily lives has motivated a great deal of research interest in the development of effective and robust systems. However, there are still some challenges to be solved in these systems when Deep Neural Networks (DNNs) are employed. For this reason, this thesis proposes different approaches to address these issues. First of all, we have analyzed the effect of introducing the most widespread DNN architectures to develop systems for face and text-dependent speaker verification tasks. In this analysis, we observed that state-of-the-art DNNs established for many tasks, including face verification, did not perform efficiently for text-dependent speaker verification. Therefore, we have conducted a study to find the cause of this poor performance and we have noted that under certain circumstances this problem is due to the use of a ...

Mingote, Victoria — University of Zaragoza


Joint Source-Cryptographic-Channel Coding for Real-Time Secure Voice Communications on Voice Channels

The growing risk of privacy violation and espionage associated with the rapid spread of mobile communications renewed interest in the original concept of sending encrypted voice as audio signal over arbitrary voice channels. The usual methods used for encrypted data transmission over analog telephony turned out to be inadequate for modern vocal links (cellular networks, VoIP) equipped with voice compression, voice activity detection, and adaptive noise suppression algorithms. The limited available bandwidth, nonlinear channel distortion, and signal fadings motivate the investigation of a dedicated, joint approach for speech encoding and encryption adapted to modern noisy voice channels. This thesis aims to develop, analyze, and validate secure and efficient schemes for real-time speech encryption and transmission via modern voice channels. In addition to speech encryption, this study covers the security and operational aspects of the whole voice communication system, as this ...

Krasnowski, Piotr — Université Côte d'Azur


Direct Pore-based Identification For Fingerprint Matching Process

Fingerprint, is considered one of the most crucial scientific tools in solving criminal cases. This biometric feature is composed of unique and distinctive patterns found on the fingertips of each individual. With advancing technology and progress in forensic sciences, fingerprint analysis plays a vital role in forensic investigations and the analysis of evidence at crime scenes. The fingerprint patterns of each individual start to develop in early stagesof life and never change thereafter. This fact makes fingerprints an exceptional means of identification. In criminal cases, fingerprint analysis is used to decipher traces, evidence, and clues at crime scenes. These analyses not only provide insights into how a crime was committed but also assist in identifying the culprits or individuals involved. Computer-based fingerprint identification systems yield faster and more accurate results compared to traditional methods, making fingerprint comparisons in large databases ...

Vedat DELICAN, PhD — Istanbul Technical University


Deep Learning-based Speaker Verification In Real Conditions

Smart applications like speaker verification have become essential in verifying the user's identity for availing of personal assistants or online banking services based on the user's voice characteristics. However, far-field or distant speaker verification is constantly affected by surrounding noises which can severely distort the speech signal. Moreover, speech signals propagating in long-range get reflected by various objects in the surrounding area, which creates reverberation and further degrades the signal quality. This PhD thesis explores deep learning-based multichannel speech enhancement techniques to improve the performance of speaker verification systems in real conditions. Multichannel speech enhancement aims to enhance distorted speech using multiple microphones. It has become crucial to many smart devices, which are flexible and convenient for speech applications. Three novel approaches are proposed to improve the robustness of speaker verification systems in noisy and reverberated conditions. Firstly, we integrate ...

Dowerah Sandipana — Universite de Lorraine, CNRS, Inria, Loria


Glottal Source Estimation and Automatic Detection of Dysphonic Speakers

Among all the biomedical signals, speech is among the most complex ones since it is produced and received by humans. The extraction and the analysis of the information conveyed by this signal are the basis of many applications, including the topics discussed in this thesis: the estimation of the glottal source and the automatic detection of voice pathologies. In the first part of the thesis, after a presentation of existing methods for the estimation of the glottal source, a focus is made on the occurence of irregular glottal source estimations when the representation based on the Zeros of the Z-Transform (ZZT) is concerned. As this method is sensitive to the location of the analysis window, it is proposed to regularize the estimation by shifting the analysis window around its initial location. The best shift is found by using a dynamic ...

Dubuisson, Thomas — University of Mons

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.