Automatic Recognition of Ageing Speakers (2014)
Speech signals carry important information about a speaker such as age, gender, language, accent and emotional/psychological state. Automatic recognition of speaker characteristics has a wide range of commercial, medical and forensic applications such as interactive voice response systems, service customization, natural human-machine interaction, recognizing the type of pathology of speakers, and directing the forensic investigation process. This research aims to develop accurate methods and tools to identify different physical characteristics of the speakers. Due to the lack of required databases, among all characteristics of speakers, our experiments cover gender recognition, age estimation, language recognition and accent/dialect identification. However, similar approaches and techniques can be applied to identify other characteristics such as emotional/psychological state. For speaker characterization, we first convert variable-duration speech signals into fixed-dimensional vectors suitable for classification/regression algorithms. This is performed by fitting a probability density function to acoustic ...
Bahari, Mohamad Hasan — KU Leuven
Forensic Evaluation of the Evidence Using Automatic Speaker Recognition Systems
This Thesis is focused on the use of automatic speaker recognition systems for forensic identification, in what is called forensic automatic speaker recognition. More generally, forensic identification aims at individualization, defined as the certainty of distinguishing an object or person from any other in a given population. This objective is followed by the analysis of the forensic evidence, understood as the comparison between two samples of material, such as glass, blood, speech, etc. An automatic speaker recognition system can be used in order to perform such comparison between some recovered speech material of questioned origin (e.g., an incriminating wire-tapping) and some control speech material coming from a suspect (e.g., recordings acquired in police facilities). However, the evaluation of such evidence is not a trivial issue at all. In fact, the debate about the presentation of forensic evidence in a court ...
Ramos, Daniel — Universidad Autonoma de Madrid
Fusing prosodic and acoustic information for speaker recognition
Automatic speaker recognition is the use of a machine to identify an individual from a spoken sentence. Recently, this technology has been undergone an increasing use in applications such as access control, transaction authentication, law enforcement, forensics, and system customisation, among others. One of the central questions addressed by this field is what is it in the speech signal that conveys speaker identity. Traditionally, automatic speaker recognition systems have relied mostly on short-term features related to the spectrum of the voice. However, human speaker recognition relies on other sources of information; therefore, there is reason to believe that these sources can play also an important role in the automatic speaker recognition task, adding complementary knowledge to the traditional spectrum-based recognition systems and thus improving their accuracy. The main objective of this thesis is to add prosodic information to a traditional ...
Farrus, Mireia — Universitat Politecnica de Catalunya
Deep learning for semantic description of visual human traits
The recent progress in artificial neural networks (rebranded as “deep learning”) has significantly boosted the state-of-the-art in numerous domains of computer vision offering an opportunity to approach the problems which were hardly solvable with conventional machine learning. Thus, in the frame of this PhD study, we explore how deep learning techniques can help in the analysis of one the most basic and essential semantic traits revealed by a human face, namely, gender and age. In particular, two complementary problem settings are considered: (1) gender/age prediction from given face images, and (2) synthesis and editing of human faces with the required gender/age attributes. Convolutional Neural Network (CNN) has currently become a standard model for image-based object recognition in general, and therefore, is a natural choice for addressing the first of these two problems. However, our preliminary studies have shown that the ...
Antipov, Grigory — Télécom ParisTech (Eurecom)
Dealing with Variability Factors and Its Application to Biometrics at a Distance
This Thesis is focused on dealing with the variability factors in biometric recognition and applications of biometrics at a distance. In particular, this PhD Thesis explores the problem of variability factors assessment and how to deal with them by the incorporation of soft biometrics information in order to improve person recognition systems working at a distance. The proposed methods supported by experimental results show the benefits of adapting the system considering the variability of the sample at hand. Although being relatively young compared to other mature and long-used security technologies, biometrics have emerged in the last decade as a pushing alternative for applications where automatic recognition of people is needed. Certainly, biometrics are very attractive and useful for video surveillance systems at a distance, widely distributed in our lifes, and for the final user: forget about PINs and passwords, you ...
Tome, Pedro — Universidad Autónoma de Madrid
Adapted Fusion Schemes for Multimodal Biometric Authentication
This Thesis is focused on the combination of multiple biometric traits for automatic person authentication, in what is called a multimodal biometric system. More generally, any type of biometric information can be combined in what is called a multibiometric system. The information sources in multibiometrics include not only multiple biometric traits but also multiple sensors, multiple biometric instances (e.g., different fingers in fingerprint verification), repeated instances, and multiple algorithms. Most of the approaches found in the literature for combining these various information sources are based on the combination of the matching scores provided by individual systems built on the different biometric evidences. The combination schemes following this architecture are typically based on combination rules or trained pattern classifiers, and most of them assume that the score level fusion function is fixed at verification time. This Thesis considers the problem of ...
Fierrez, Julian — Universidad Politecnica de Madrid
Deep Learning for i-Vector Speaker and Language Recognition
Over the last few years, i-vectors have been the state-of-the-art technique in speaker and language recognition. Recent advances in Deep Learning (DL) technology have improved the quality of i-vectors but the DL techniques in use are computationally expensive and need speaker or/and phonetic labels for the background data, which are not easily accessible in practice. On the other hand, the lack of speaker-labeled background data makes a big performance gap, in speaker recognition, between two well-known cosine and Probabilistic Linear Discriminant Analysis (PLDA) i-vector scoring techniques. It has recently been a challenge how to fill this gap without speaker labels, which are expensive in practice. Although some unsupervised clustering techniques are proposed to estimate the speaker labels, they cannot accurately estimate the labels. This thesis tries to solve the problems above by using the DL technology in different ways, without ...
Ghahabi, Omid — Universitat Politecnica de Catalunya
The proliferation of handheld devices such as smartphones and tablets brings a new scenario for biometric authentication, and in particular to automatic signature verification. Research on signature verification has been traditionally carried out using signatures acquired on digitizing tablets or Tablet-PCs. This PhD Thesis addresses the problem of user authentication on handled devices using handwritten signatures and graphical passwords based on free-form doodles, as well as the effects of biometric aging on signatures. The Thesis pretends to analyze: (i) which are the effects of mobile conditions on signature and doodle verification, (ii) which are the most distinctive features in mobile conditions, extracted from the pen or fingertip trajectory, (iii) how do different similarity computation (i.e. matching) algorithms behave with signatures and graphical passwords captured on mobile conditions, and (iv) what is the impact of aging on signature features and verification ...
Martinez-Diaz, Marcos — Universidad Autonoma de Madrid
The increasing use of technological devices and biometric recognition systems in people daily lives has motivated a great deal of research interest in the development of effective and robust systems. However, there are still some challenges to be solved in these systems when Deep Neural Networks (DNNs) are employed. For this reason, this thesis proposes different approaches to address these issues. First of all, we have analyzed the effect of introducing the most widespread DNN architectures to develop systems for face and text-dependent speaker verification tasks. In this analysis, we observed that state-of-the-art DNNs established for many tasks, including face verification, did not perform efficiently for text-dependent speaker verification. Therefore, we have conducted a study to find the cause of this poor performance and we have noted that under certain circumstances this problem is due to the use of a ...
Mingote, Victoria — University of Zaragoza
Decision threshold estimation and model quality evaluation techniques for speaker verification
The number of biometric applications has increased a lot in the last few years. In this context, the automatic person recognition by some physical traits like fingerprints, face, voice or iris, plays an important role. Users demand this type of applications every time more and the technology seems already mature. People look for security, low cost and accuracy but, at the same time, there are many other factors in connection with biometric applications that are growing in importance. Intrusiveness is undoubtedly a burning factor to decide about the biometrics we will used for our application. At this point, one can realize about the suitability of speaker recognition because voice is the natural way of communicating, can be remotely used and provides a low cost. Automatic speaker recognition is commonly used in telephonic applications although it can also be used in ...
Rodriguez Saeta, Javier — Universitat Politecnica de Catalunya
Direct Pore-based Identification For Fingerprint Matching Process
Fingerprint, is considered one of the most crucial scientific tools in solving criminal cases. This biometric feature is composed of unique and distinctive patterns found on the fingertips of each individual. With advancing technology and progress in forensic sciences, fingerprint analysis plays a vital role in forensic investigations and the analysis of evidence at crime scenes. The fingerprint patterns of each individual start to develop in early stagesof life and never change thereafter. This fact makes fingerprints an exceptional means of identification. In criminal cases, fingerprint analysis is used to decipher traces, evidence, and clues at crime scenes. These analyses not only provide insights into how a crime was committed but also assist in identifying the culprits or individuals involved. Computer-based fingerprint identification systems yield faster and more accurate results compared to traditional methods, making fingerprint comparisons in large databases ...
Vedat DELICAN, PhD — Istanbul Technical University
Advances in Glottal Analysis and its Applications
From artificial voices in GPS to automatic systems of dictation, from voice-based identity verification to voice pathology detection, speech processing applications are nowadays omnipresent in our daily life. By offering solutions to companies seeking for efficiency enhancement with simultaneous cost saving, the market of speech technology is forecast to be especially promising in the next years. The present thesis deals with advances in glottal analysis in order to incorporate new techniques within speech processing applications. While current systems are usually based on information related to the vocal tract configuration, the airflow passing through the vocal folds, and called glottal flow, is expected to exhibit a relevant complementarity. Unfortunately, glottal analysis from speech recordings requires specific complex processing operations, which explains why it has been generally avoided. The main goal of this thesis is to provide new advances in glottal analysis ...
Drugman, Thomas — Universite de Mons
Contributions to Human Motion Modeling and Recognition using Non-intrusive Wearable Sensors
This thesis contributes to motion characterization through inertial and physiological signals captured by wearable devices and analyzed using signal processing and deep learning techniques. This research leverages the possibilities of motion analysis for three main applications: to know what physical activity a person is performing (Human Activity Recognition), to identify who is performing that motion (user identification) or know how the movement is being performed (motor anomaly detection). Most previous research has addressed human motion modeling using invasive sensors in contact with the user or intrusive sensors that modify the user’s behavior while performing an action (cameras or microphones). In this sense, wearable devices such as smartphones and smartwatches can collect motion signals from users during their daily lives in a less invasive or intrusive way. Recently, there has been an exponential increase in research focused on inertial-signal processing to ...
Gil-Martín, Manuel — Universidad Politécnica de Madrid
Semantic Similarity in Automatic Speech Recognition for Meetings
This thesis investigates the application of language models based on semantic similarity to Automatic Speech Recognition for meetings. We consider data-driven Latent Semantic Analysis based and knowledge-driven WordNet-based models. Latent Semantic Analysis based models are trained for several background domains and it is shown that all background models reduce perplexity compared to the n-gram baseline models, and some background models also significantly improve speech recognition for meetings. A new method for interpolating multiple models is introduced and the relation to cache-based models is investigated. The semantics of the models is investigated through a synonymity task. WordNet-based models are defined for different word-word similarities that use information encoded in the WordNet graph and corpus information. It is shown that these models can significantly improve over baseline random models on the task of word prediction, and that the chosen part-of-speech context is ...
Pucher, Michael — Graz University of Technology
Robust Speech Recognition on Intelligent Mobile Devices with Dual-Microphone
Despite the outstanding progress made on automatic speech recognition (ASR) throughout the last decades, noise-robust ASR still poses a challenge. Tackling with acoustic noise in ASR systems is more important than ever before for a twofold reason: 1) ASR technology has begun to be extensively integrated in intelligent mobile devices (IMDs) such as smartphones to easily accomplish different tasks (e.g. search-by-voice), and 2) IMDs can be used anywhere at any time, that is, under many different acoustic (noisy) conditions. On the other hand, with the aim of enhancing noisy speech, IMDs have begun to embed small microphone arrays, i.e. microphone arrays comprised of a few sensors close each other. These multi-sensor IMDs often embed one microphone (usually at their rear) intended to capture the acoustic environment more than the speaker’s voice. This is the so-called secondary microphone. While classical microphone ...
López-Espejo, Iván — University of Granada
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.