Dealing with Variability Factors and Its Application to Biometrics at a Distance

This Thesis is focused on dealing with the variability factors in biometric recognition and applications of biometrics at a distance. In particular, this PhD Thesis explores the problem of variability factors assessment and how to deal with them by the incorporation of soft biometrics information in order to improve person recognition systems working at a distance. The proposed methods supported by experimental results show the benefits of adapting the system considering the variability of the sample at hand. Although being relatively young compared to other mature and long-used security technologies, biometrics have emerged in the last decade as a pushing alternative for applications where automatic recognition of people is needed. Certainly, biometrics are very attractive and useful for video surveillance systems at a distance, widely distributed in our lifes, and for the final user: forget about PINs and passwords, you ...

Tome, Pedro — Universidad Autónoma de Madrid


Automatic Speaker Characterization; Identification of Gender, Age, Language and Accent from Speech Signals

Speech signals carry important information about a speaker such as age, gender, language, accent and emotional/psychological state. Automatic recognition of speaker characteristics has a wide range of commercial, medical and forensic applications such as interactive voice response systems, service customization, natural human-machine interaction, recognizing the type of pathology of speakers, and directing the forensic investigation process. This research aims to develop accurate methods and tools to identify different physical characteristics of the speakers. Due to the lack of required databases, among all characteristics of speakers, our experiments cover gender recognition, age estimation, language recognition and accent/dialect identification. However, similar approaches and techniques can be applied to identify other characteristics such as emotional/psychological state. For speaker characterization, we first convert variable-duration speech signals into fixed-dimensional vectors suitable for classification/regression algorithms. This is performed by fitting a probability density function to acoustic ...

Bahari, Mohamad Hasan — KU Leuven


Automatic Recognition of Ageing Speakers

The process of ageing causes changes to the voice over time. There have been significant research efforts in the automatic speaker recognition community towards improving performance in the presence of everyday variability. The influence of long-term variability, due to vocal ageing, has received only marginal attention however. In this Thesis, the impact of vocal ageing on speaker verification and forensic speaker recognition is assessed, and novel methods are proposed to counteract its effect. The Trinity College Dublin Speaker Ageing (TCDSA) database, compiled for this study, is first introduced. Containing 26 speakers, with recordings spanning an age difference of between 28 and 58 years per speaker, it is the largest longitudinal speech database in the public domain. A Gaussian Mixture Model-Universal Background Model (GMM-UBM) speaker verification experiment demonstrates a progressive decline in the scores of genuine-speakers as the age difference between ...

Kelly, Finnian — Trinity College Dublin


Direct Pore-based Identification For Fingerprint Matching Process

Fingerprint, is considered one of the most crucial scientific tools in solving criminal cases. This biometric feature is composed of unique and distinctive patterns found on the fingertips of each individual. With advancing technology and progress in forensic sciences, fingerprint analysis plays a vital role in forensic investigations and the analysis of evidence at crime scenes. The fingerprint patterns of each individual start to develop in early stagesof life and never change thereafter. This fact makes fingerprints an exceptional means of identification. In criminal cases, fingerprint analysis is used to decipher traces, evidence, and clues at crime scenes. These analyses not only provide insights into how a crime was committed but also assist in identifying the culprits or individuals involved. Computer-based fingerprint identification systems yield faster and more accurate results compared to traditional methods, making fingerprint comparisons in large databases ...

Vedat DELICAN, PhD — Istanbul Technical University


Decision threshold estimation and model quality evaluation techniques for speaker verification

The number of biometric applications has increased a lot in the last few years. In this context, the automatic person recognition by some physical traits like fingerprints, face, voice or iris, plays an important role. Users demand this type of applications every time more and the technology seems already mature. People look for security, low cost and accuracy but, at the same time, there are many other factors in connection with biometric applications that are growing in importance. Intrusiveness is undoubtedly a burning factor to decide about the biometrics we will used for our application. At this point, one can realize about the suitability of speaker recognition because voice is the natural way of communicating, can be remotely used and provides a low cost. Automatic speaker recognition is commonly used in telephonic applications although it can also be used in ...

Rodriguez Saeta, Javier — Universitat Politecnica de Catalunya


Adapted Fusion Schemes for Multimodal Biometric Authentication

This Thesis is focused on the combination of multiple biometric traits for automatic person authentication, in what is called a multimodal biometric system. More generally, any type of biometric information can be combined in what is called a multibiometric system. The information sources in multibiometrics include not only multiple biometric traits but also multiple sensors, multiple biometric instances (e.g., different fingers in fingerprint verification), repeated instances, and multiple algorithms. Most of the approaches found in the literature for combining these various information sources are based on the combination of the matching scores provided by individual systems built on the different biometric evidences. The combination schemes following this architecture are typically based on combination rules or trained pattern classifiers, and most of them assume that the score level fusion function is fixed at verification time. This Thesis considers the problem of ...

Fierrez, Julian — Universidad Politecnica de Madrid


Fusing prosodic and acoustic information for speaker recognition

Automatic speaker recognition is the use of a machine to identify an individual from a spoken sentence. Recently, this technology has been undergone an increasing use in applications such as access control, transaction authentication, law enforcement, forensics, and system customisation, among others. One of the central questions addressed by this field is what is it in the speech signal that conveys speaker identity. Traditionally, automatic speaker recognition systems have relied mostly on short-term features related to the spectrum of the voice. However, human speaker recognition relies on other sources of information; therefore, there is reason to believe that these sources can play also an important role in the automatic speaker recognition task, adding complementary knowledge to the traditional spectrum-based recognition systems and thus improving their accuracy. The main objective of this thesis is to add prosodic information to a traditional ...

Farrus, Mireia — Universitat Politecnica de Catalunya


Discrete-time speech processing with application to emotion recognition

The subject of this PhD thesis is the efficient and robust processing and analysis of the audio recordings that are derived from a call center. The thesis is comprised of two parts. The first part is dedicated to dialogue/non-dialogue detection and to speaker segmentation. The systems that are developed are prerequisite for detecting (i) the audio segments that actually contain a dialogue between the system and the call center customer and (ii) the change points between the system and the customer. This way the volume of the audio recordings that need to be processed is significantly reduced, while the system is automated. To detect the presence of a dialogue several systems are developed. This is the first effort found in the international literature that the audio channel is exclusively exploited. Also, it is the first time that the speaker utterance ...

Kotti, Margarita — Aristotle University of Thessaloniki


Vulnerabilities and Attack Protection in Security Systems Based on Biometric Recognition

Absolute security does not exist: given funding, willpower and the proper technology, every security system can be compromised. However, the objective of the security community should be to develop such applications that the funding, the will, and the resources needed by the attacker to crack the system prevent him from attempting to do so. This Thesis is focused on the vulnerability assessment of biometric systems. Although being relatively young compared to other mature and long-used security technologies, biometrics have emerged in the last decade as a pushing alternative for applications where automatic recognition of people is needed. Certainly, biometrics are very attractive and useful for the final user: forget about PINs and passwords, you are your own key. However, we cannot forget that as any technology aimed to provide a security service, biometric systems are exposed to external attacks which ...

Javier Galbally — Universidad Autonoma de Madrid


Deep Learning for i-Vector Speaker and Language Recognition

Over the last few years, i-vectors have been the state-of-the-art technique in speaker and language recognition. Recent advances in Deep Learning (DL) technology have improved the quality of i-vectors but the DL techniques in use are computationally expensive and need speaker or/and phonetic labels for the background data, which are not easily accessible in practice. On the other hand, the lack of speaker-labeled background data makes a big performance gap, in speaker recognition, between two well-known cosine and Probabilistic Linear Discriminant Analysis (PLDA) i-vector scoring techniques. It has recently been a challenge how to fill this gap without speaker labels, which are expensive in practice. Although some unsupervised clustering techniques are proposed to estimate the speaker labels, they cannot accurately estimate the labels. This thesis tries to solve the problems above by using the DL technology in different ways, without ...

Ghahabi, Omid — Universitat Politecnica de Catalunya


Modelling context in automatic speech recognition

Speech is at the core of human communication. Speaking and listing comes so natural to us that we do not have to think about it at all. The underlying cognitive processes are very rapid and almost completely subconscious. It is hard, if not impossible not to understand speech. For computers on the other hand, recognising speech is a daunting task. It has to deal with a large number of different voices "influenced, among other things, by emotion, moods and fatigue" the acoustic properties of different environments, dialects, a huge vocabulary and an unlimited creativity of speakers to combine words and to break the rules of grammar. Almost all existing automatic speech recognisers use statistics over speech sounds "what is the probability that a piece of audio is an a-sound" and statistics over word combinations to deal with this complexity. The ...

Wiggers, Pascal — Delft University of Technology


Deep Learning Techniques for Visual Counting

The explosion of Deep Learning (DL) added a boost to the already rapidly developing field of Computer Vision to such a point that vision-based tasks are now parts of our everyday lives. Applications such as image classification, photo stylization, or face recognition are nowadays pervasive, as evidenced by the advent of modern systems trivially integrated into mobile applications. In this thesis, we investigated and enhanced the visual counting task, which automatically estimates the number of objects in still images or video frames. Recently, due to the growing interest in it, several Convolutional Neural Network (CNN)-based solutions have been suggested by the scientific community. These artificial neural networks, inspired by the organization of the animal visual cortex, provide a way to automatically learn effective representations from raw visual data and can be successfully employed to address typical challenges characterizing this task, ...

Ciampi Luca — University of Pisa


Dialogue Enhancement and Personalization - Contributions to Quality Assessment and Control

The production and delivery of audio for television involve many creative and technical challenges. One of them is concerned with the level balance between the foreground speech (also referred to as dialogue) and the background elements, e.g., music, sound effects, and ambient sounds. Background elements are fundamental for the narrative and for creating an engaging atmosphere, but they can mask the dialogue, which the audience wishes to follow in a comfortable way. Very different individual factors of the people in the audience clash with the creative freedom of the content creators. As a result, service providers receive regular complaints about difficulties in understanding the dialogue because of too loud background sounds. While this has been a known issue for at least three decades, works analyzing the problem and up-to-date statics were scarce before the contributions in this work. Enabling the ...

Torcoli, Matteo — Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)


Some Contributions to Music Signal Processing and to Mono-Microphone Blind Audio Source Separation

For humans, the sound is valuable mostly for its meaning. The voice is spoken language, music, artistic intent. Its physiological functioning is highly developed, as well as our understanding of the underlying process. It is a challenge to replicate this analysis using a computer: in many aspects, its capabilities do not match those of human beings when it comes to speech or instruments music recognition from the sound, to name a few. In this thesis, two problems are investigated: the source separation and the musical processing. The first part investigates the source separation using only one Microphone. The problem of sources separation arises when several audio sources are present at the same moment, mixed together and acquired by some sensors (one in our case). In this kind of situation it is natural for a human to separate and to recognize ...

Schutz, Antony — Eurecome/Mobile


Diplophonic Voice - Definitions, models, and detection

Voice disorders need to be better understood because they may lead to reduced job chances and social isolation. Correct treatment indication and treatment effect measurements are needed to tackle these problems. They must rely on robust outcome measures for clinical intervention studies. Diplophonia is a severe and often misunderstood sign of voice disorders. Depending on its underlying etiology, diplophonic patients typically receive treatment such as logopedic therapy or phonosurgery. In the current clinical practice diplophonia is determined auditively by the medical doctor, which is problematic from the viewpoints of evidence-based medicine and scientific methodology. The aim of this thesis is to work towards objective (i.e., automatic) detection of diplophonia. A database of 40 euphonic, 40 diplophonic and 40 dysphonic subjects has been acquired. The collected material consists of laryngeal high-speed videos and simultaneous high-quality audio recordings. All material has been ...

Aichinger, Philipp — Division of Phoniatrics-Logopedics, Department of Otorhinolaryngology, Medical University of Vienna; Signal Processing and Speech Communication Laboratory Graz University of Technology, Austria

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.