Discrete-time speech processing with application to emotion recognition

The subject of this PhD thesis is the efficient and robust processing and analysis of the audio recordings that are derived from a call center. The thesis is comprised of two parts. The first part is dedicated to dialogue/non-dialogue detection and to speaker segmentation. The systems that are developed are prerequisite for detecting (i) the audio segments that actually contain a dialogue between the system and the call center customer and (ii) the change points between the system and the customer. This way the volume of the audio recordings that need to be processed is significantly reduced, while the system is automated. To detect the presence of a dialogue several systems are developed. This is the first effort found in the international literature that the audio channel is exclusively exploited. Also, it is the first time that the speaker utterance ...

Kotti, Margarita — Aristotle University of Thessaloniki


Emotion assessment for affective computing based on brain and peripheral signals

Current Human-Machine Interfaces (HMI) lack of “emotional intelligence”, i.e. they are not able to identify human emotional states and take this information into account to decide on the proper actions to execute. The goal of affective computing is to fill this lack by detecting emotional cues occurring during Human-Computer Interaction (HCI) and synthesizing emotional responses. In the last decades, most of the studies on emotion assessment have focused on the analysis of facial expressions and speech to determine the emotional state of a person. Physiological activity also includes emotional information that can be used for emotion assessment but has received less attention despite of its advantages (for instance it can be less easily faked than facial expressions). This thesis reports on the use of two types of physiological activities to assess emotions in the context of affective computing: the activity ...

Chanel, Guillaume — University of Geneva


Automatic Speaker Characterization; Identification of Gender, Age, Language and Accent from Speech Signals

Speech signals carry important information about a speaker such as age, gender, language, accent and emotional/psychological state. Automatic recognition of speaker characteristics has a wide range of commercial, medical and forensic applications such as interactive voice response systems, service customization, natural human-machine interaction, recognizing the type of pathology of speakers, and directing the forensic investigation process. This research aims to develop accurate methods and tools to identify different physical characteristics of the speakers. Due to the lack of required databases, among all characteristics of speakers, our experiments cover gender recognition, age estimation, language recognition and accent/dialect identification. However, similar approaches and techniques can be applied to identify other characteristics such as emotional/psychological state. For speaker characterization, we first convert variable-duration speech signals into fixed-dimensional vectors suitable for classification/regression algorithms. This is performed by fitting a probability density function to acoustic ...

Bahari, Mohamad Hasan — KU Leuven


Geometric Approach to Statistical Learning Theory through Support Vector Machines (SVM) with Application to Medical Diagnosis

This thesis deals with problems of Pattern Recognition in the framework of Machine Learning (ML) and, specifically, Statistical Learning Theory (SLT), using Support Vector Machines (SVMs). The focus of this work is on the geometric interpretation of SVMs, which is accomplished through the notion of Reduced Convex Hulls (RCHs), and its impact on the derivation of new, efficient algorithms for the solution of the general SVM optimization task. The contributions of this work is the extension of the mathematical framework of RCHs, the derivation of novel geometric algorithms for SVMs and, finally, the application of the SVM algorithms to the field of Medical Image Analysis and Diagnosis (Mammography). Geometric SVM Framework's extensions: The geometric interpretation of SVMs is based on the notion of Reduced Convex Hulls. Although the geometric approach to SVMs is very intuitive, its usefulness was restricted by ...

Mavroforakis, Michael — University of Athens


Contributions to the Information Fusion : application to Obstacle Recognition in Visible and Infrared Images

The interest for the intelligent vehicle field has been increased during the last years, must probably due to an important number of road accidents. Many accidents could be avoided if a device attached to the vehicle would assist the driver with some warnings when dangerous situations are about to appear. In recent years, leading car developers have recorded significant efforts and support research works regarding the intelligent vehicle field where they propose solutions for the existing problems, especially in the vision domain. Road detection and following, pedestrian or vehicle detection, recognition and tracking, night vision, among others are examples of applications which have been developed and improved recently. Still, a lot of challenges and unsolved problems remain in the intelligent vehicle domain. Our purpose in this thesis is to design an Obstacle Recognition system for improving the road security by ...

Apatean, Anca Ioana — Institut National des Sciences Appliquées de Rouen


Least squares support vector machines classification applied to brain tumour recognition using magnetic resonance spectroscopy

Magnetic Resonance Spectroscopy (MRS) is a technique which has evolved rapidly over the past 15 years. It has been used specifically in the context of brain tumours and has shown very encouraging correlations between brain tumour type and spectral pattern. In vivo MRS enables the quantification of metabolite concentrations non-invasively, thereby avoiding serious risks to brain damage. While Magnetic Resonance Imaging (MRI) is commonly used for identifying the location and size of brain tumours, MRS complements it with the potential to provide detailed chemical information about metabolites present in the brain tissue and enable an early detection of abnormality. However, the introduction of MRS in clinical medicine has been difficult due to problems associated with the acquisition of in vivo MRS signals from living tissues at low magnetic fields acceptable for patients. The low signal-to-noise ratio makes accurate analysis of ...

Lukas, Lukas — Katholieke Universiteit Leuven


Epigraphical splitting of convex constraints. Application to image recovery, supervised classification, and image forgery detection.

In this thesis, we present a convex optimization approach to address three problems arising in multicomponent image recovery, supervised classification, and image forgery detection. The common thread among these problems is the presence of nonlinear convex constraints difficult to handle with state-of-the-art methods. Therefore, we present a novel splitting technique to simplify the management of such constraints. Relying on this approach, we also propose some contributions that are tailored to the aforementioned applications. The first part of the thesis presents the epigraphical splitting of nonlinear convex constraints. The principle is to decompose the sublevel set of a block-separable function into a collection of epigraphs. So doing, we reduce the complexity of optimization algorithms when the above constraint involves the sum of absolute values, distance functions to a convex set, Euclidean norms, infinity norms, or max functions. We demonstrate through numerical ...

Chierchia, Giovanni — Telecom ParisTech


Automatic Analysis of Head and Facial Gestures in Video Streams

Automatic analysis of head gestures and facial expressions is a challenging research area and it has significant applications for intelligent human-computer interfaces. An important task is the automatic classification of non-verbal messages composed of facial signals where both facial expressions and head rotations are observed. This is a challenging task, because there is no definite grammar or code-book for mapping the non-verbal facial signals into a corresponding mental state. Furthermore, non-verbal facial signals and the observed emotions have dependency on personality, society, state of the mood and also the context in which they are displayed or observed. This thesis mainly addresses the three desired tasks for an effective visual information based automatic face and head gesture (FHG) analyzer. First we develop a fully automatic, robust and accurate 17-point facial landmark localizer based on local appearance information and structural information of ...

Cinar Akakin, Hatice — Bogazici University


Automatic Recognition of Ageing Speakers

The process of ageing causes changes to the voice over time. There have been significant research efforts in the automatic speaker recognition community towards improving performance in the presence of everyday variability. The influence of long-term variability, due to vocal ageing, has received only marginal attention however. In this Thesis, the impact of vocal ageing on speaker verification and forensic speaker recognition is assessed, and novel methods are proposed to counteract its effect. The Trinity College Dublin Speaker Ageing (TCDSA) database, compiled for this study, is first introduced. Containing 26 speakers, with recordings spanning an age difference of between 28 and 58 years per speaker, it is the largest longitudinal speech database in the public domain. A Gaussian Mixture Model-Universal Background Model (GMM-UBM) speaker verification experiment demonstrates a progressive decline in the scores of genuine-speakers as the age difference between ...

Kelly, Finnian — Trinity College Dublin


Offline Signature Verification with User-Based and Global Classifiers of Local Features

Signature verification deals with the problem of identifying forged signatures of a user from his/her genuine signatures. The difficulty lies in identifying allowed variations in a user’s signatures, in the presence of high intra-class and low inter-class variability (the forgeries may be more similar to a user’s genuine signature, compared to his/her other genuine signatures). The problem can be seen as a non-rigid object matching where classes are very similar. In the field of biometrics, signature is considered a behavioral biometric and the problem possesses further difficulties compared to other modalities (e.g. fingerprints) due to the added issue of skilled forgeries. A novel offline (image-based) signature verification system is proposed in this thesis. In order to capture the signature’s stable parts and alleviate the difficulty of global matching, local features (histogram of oriented gradients, local binary patterns) are used, based ...

Yılmaz, Mustafa Berkay — Sabancı University


Advances in Glottal Analysis and its Applications

From artificial voices in GPS to automatic systems of dictation, from voice-based identity verification to voice pathology detection, speech processing applications are nowadays omnipresent in our daily life. By offering solutions to companies seeking for efficiency enhancement with simultaneous cost saving, the market of speech technology is forecast to be especially promising in the next years. The present thesis deals with advances in glottal analysis in order to incorporate new techniques within speech processing applications. While current systems are usually based on information related to the vocal tract configuration, the airflow passing through the vocal folds, and called glottal flow, is expected to exhibit a relevant complementarity. Unfortunately, glottal analysis from speech recordings requires specific complex processing operations, which explains why it has been generally avoided. The main goal of this thesis is to provide new advances in glottal analysis ...

Drugman, Thomas — Universite de Mons


Deep Learning for i-Vector Speaker and Language Recognition

Over the last few years, i-vectors have been the state-of-the-art technique in speaker and language recognition. Recent advances in Deep Learning (DL) technology have improved the quality of i-vectors but the DL techniques in use are computationally expensive and need speaker or/and phonetic labels for the background data, which are not easily accessible in practice. On the other hand, the lack of speaker-labeled background data makes a big performance gap, in speaker recognition, between two well-known cosine and Probabilistic Linear Discriminant Analysis (PLDA) i-vector scoring techniques. It has recently been a challenge how to fill this gap without speaker labels, which are expensive in practice. Although some unsupervised clustering techniques are proposed to estimate the speaker labels, they cannot accurately estimate the labels. This thesis tries to solve the problems above by using the DL technology in different ways, without ...

Ghahabi, Omid — Universitat Politecnica de Catalunya


Contributions to Human Motion Modeling and Recognition using Non-intrusive Wearable Sensors

This thesis contributes to motion characterization through inertial and physiological signals captured by wearable devices and analyzed using signal processing and deep learning techniques. This research leverages the possibilities of motion analysis for three main applications: to know what physical activity a person is performing (Human Activity Recognition), to identify who is performing that motion (user identification) or know how the movement is being performed (motor anomaly detection). Most previous research has addressed human motion modeling using invasive sensors in contact with the user or intrusive sensors that modify the user’s behavior while performing an action (cameras or microphones). In this sense, wearable devices such as smartphones and smartwatches can collect motion signals from users during their daily lives in a less invasive or intrusive way. Recently, there has been an exponential increase in research focused on inertial-signal processing to ...

Gil-Martín, Manuel — Universidad Politécnica de Madrid


Cross-Lingual Voice Conversion

Cross-lingual voice conversion refers to the automatic transformation of a source speaker’s voice to a target speaker’s voice in a language that the target speaker can not speak. It involves a set of statistical analysis, pattern recognition, machine learning, and signal processing techniques. This study focuses on the problems related to cross-lingual voice conversion by discussing open research questions, presenting new methods, and performing comparisons with the state-of-the-art techniques. In the training stage, a Phonetic Hidden Markov Model based automatic segmentation and alignment method is developed for cross-lingual applications which support textindependent and text-dependent modes. Vocal tract transformation function is estimated using weighted speech frame mapping in more detail. Adjusting the weights, similarity to target voice and output quality can be balanced depending on the requirements of the cross- lingual voice conversion application. A context-matching algorithm is developed to reduce ...

Turk, Oytun — Bogazici University


Adapted Fusion Schemes for Multimodal Biometric Authentication

This Thesis is focused on the combination of multiple biometric traits for automatic person authentication, in what is called a multimodal biometric system. More generally, any type of biometric information can be combined in what is called a multibiometric system. The information sources in multibiometrics include not only multiple biometric traits but also multiple sensors, multiple biometric instances (e.g., different fingers in fingerprint verification), repeated instances, and multiple algorithms. Most of the approaches found in the literature for combining these various information sources are based on the combination of the matching scores provided by individual systems built on the different biometric evidences. The combination schemes following this architecture are typically based on combination rules or trained pattern classifiers, and most of them assume that the score level fusion function is fixed at verification time. This Thesis considers the problem of ...

Fierrez, Julian — Universidad Politecnica de Madrid

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.