Automatic Analysis of Head and Facial Gestures in Video Streams

Automatic analysis of head gestures and facial expressions is a challenging research area and it has significant applications for intelligent human-computer interfaces. An important task is the automatic classification of non-verbal messages composed of facial signals where both facial expressions and head rotations are observed. This is a challenging task, because there is no definite grammar or code-book for mapping the non-verbal facial signals into a corresponding mental state. Furthermore, non-verbal facial signals and the observed emotions have dependency on personality, society, state of the mood and also the context in which they are displayed or observed. This thesis mainly addresses the three desired tasks for an effective visual information based automatic face and head gesture (FHG) analyzer. First we develop a fully automatic, robust and accurate 17-point facial landmark localizer based on local appearance information and structural information of ...

Cinar Akakin, Hatice — Bogazici University


Discrete-time speech processing with application to emotion recognition

The subject of this PhD thesis is the efficient and robust processing and analysis of the audio recordings that are derived from a call center. The thesis is comprised of two parts. The first part is dedicated to dialogue/non-dialogue detection and to speaker segmentation. The systems that are developed are prerequisite for detecting (i) the audio segments that actually contain a dialogue between the system and the call center customer and (ii) the change points between the system and the customer. This way the volume of the audio recordings that need to be processed is significantly reduced, while the system is automated. To detect the presence of a dialogue several systems are developed. This is the first effort found in the international literature that the audio channel is exclusively exploited. Also, it is the first time that the speaker utterance ...

Kotti, Margarita — Aristotle University of Thessaloniki


Emotion assessment for affective computing based on brain and peripheral signals

Current Human-Machine Interfaces (HMI) lack of “emotional intelligence”, i.e. they are not able to identify human emotional states and take this information into account to decide on the proper actions to execute. The goal of affective computing is to fill this lack by detecting emotional cues occurring during Human-Computer Interaction (HCI) and synthesizing emotional responses. In the last decades, most of the studies on emotion assessment have focused on the analysis of facial expressions and speech to determine the emotional state of a person. Physiological activity also includes emotional information that can be used for emotion assessment but has received less attention despite of its advantages (for instance it can be less easily faked than facial expressions). This thesis reports on the use of two types of physiological activities to assess emotions in the context of affective computing: the activity ...

Chanel, Guillaume — University of Geneva


A Robust Face Recognition Algorithm for Real-World Applications

Face recognition is one of the most challenging problems of computer vision and pattern recognition. The difficulty in face recognition arises mainly from facial appearance variations caused by factors, such as expression, illumination, partial face occlusion, and time gap between training and testing data capture. Moreover, the performance of face recognition algorithms heavily depends on prior facial feature localization step. That is, face images need to be aligned very well before they are fed into a face recognition algorithm, which requires precise facial feature localization. This thesis addresses on solving these two main problems -facial appearance variations due to changes in expression, illumination, occlusion, time gap, and imprecise face alignment due to mislocalized facial features- in order to accomplish its goal of building a generic face recognition algorithm that can function reliably under real-world conditions. The proposed face recognition algorithm ...

Ekenel, Hazim Kemal — University of Karlsruhe


Video person recognition strategies using head motion and facial appearance

In this doctoral dissertation, we principally explore the use of the temporal information available in video sequences for person and gender recognition; in particular, we focus on the analysis of head and facial motion, and their potential application as biometric identifiers. We also investigate how to exploit as much video information as possible for the automatic recognition; more precisely, we examine the possibility of integrating the head and mouth motion information with facial appearance into a multimodal biometric system, and we study the extraction of novel spatio-temporal facial features for recognition. We initially present a person recognition system that exploits the unconstrained head motion information, extracted by tracking a few facial landmarks in the image plane. In particular, we detail how each video sequence is firstly pre-processed by semiautomatically detecting the face, and then automatically tracking the facial landmarks over ...

Matta, Federico — Eurécom / Multimedia communications


Non-rigid Registration-based Data-driven 3D Facial Action Unit Detection

Automated analysis of facial expressions has been an active area of study due to its potential applications not only for intelligent human-computer interfaces but also for human facial behavior research. To advance automatic expression analysis, this thesis proposes and empirically proves two hypotheses: (i) 3D face data is a better data modality than conventional 2D camera images, not only for being much less disturbed by illumination and head pose effects but also for capturing true facial surface information. (ii) It is possible to perform detailed face registration without resorting to any face modeling. This means that data-driven methods in automatic expression analysis can compensate for the confounding effects like pose and physiognomy differences, and can process facial features more effectively, without suffering the drawbacks of model-driven analysis. Our study is based upon Facial Action Coding System (FACS) as this paradigm ...

Savran, Arman — Bogazici University


Three dimensional shape modeling: segmentation, reconstruction and registration

Accounting for uncertainty in three-dimensional (3D) shapes is important in a large number of scientific and engineering areas, such as biometrics, biomedical imaging, and data mining. It is well known that 3D polar shaped objects can be represented by Fourier descriptors such as spherical harmonics and double Fourier series. However, the statistics of these spectral shape models have not been widely explored. This thesis studies several areas involved in 3D shape modeling, including random field models for statistical shape modeling, optimal shape filtering, parametric active contours for object segmentation and surface reconstruction. It also investigates multi-modal image registration with respect to tumor activity quantification. Spherical harmonic expansions over the unit sphere not only provide a low dimensional polarimetric parameterization of stochastic shape, but also correspond to the Karhunen-Lo´eve (K-L) expansion of any isotropic random field on the unit sphere. Spherical ...

Li, Jia — University of Michigan


Methods For Detection and Classification In ECG Analysis

The first part of the presented work is focused on measuring of QT intervals. QT interval can be an indicator of the cardiovascular health of the patient and detect any potential abnormalities. The QT interval is measured from the onset of the QRS complex to the end of the T wave. However, measurements for the end of the T wave are often highly subjective and the corresponding verification is difficult. Here we propose two methods of QT interval measuring – wavelet based and template matching method. Methods are compared with each other and tested on standard QT database. The second part of the presented work is focused on modelling of arrhythmias using McSharry’s model followed with classification using an artificial neural network. The proposed method uses pre-processing of signals with Linear Approximation Distance Thresholding method and Line Segment Clustering method ...

Kicmerova, Dina — Brno University of Technology / Department of Biomedical Engineering


Computational models of expressive gesture in multimedia systems

This thesis focuses on the development of paradigms and techniques for the design and implementation of multimodal interactive systems, mainly for performing arts applications. The work addresses research issues in the fields of human-computer interaction, multimedia systems, and sound and music computing. The thesis is divided into two parts. In the first one, after a short review of the state-of-the-art, the focus moves on the definition of environments in which novel forms of technology-integrated artistic performances can take place. These are distributed active mixed reality environments in which information at different layers of abstraction is conveyed mainly non-verbally through expressive gestures. Expressive gesture is therefore defined and the internal structure of a virtual observer able to process it (and inhabiting the proposed environments) is described in a multimodal perspective. The definition of the structure of the environments, of the virtual ...

Volpe, Gualtiero — University of Genova


Perceptually-Based Signal Features for Environmental Sound Classification

This thesis faces the problem of automatically classifying environmental sounds, i.e., any non-speech or non-music sounds that can be found in the environment. Broadly speaking, two main processes are needed to perform such classification: the signal feature extraction so as to compose representative sound patterns and the machine learning technique that performs the classification of such patterns. The main focus of this research is put on the former, studying relevant signal features that optimally represent the sound characteristics since, according to several references, it is a key issue to attain a robust recognition. This type of audio signals holds many differences with speech or music signals, thus specific features should be determined and adapted to their own characteristics. In this sense, new signal features, inspired by the human auditory system and the human perception of sound, are proposed to improve ...

Valero, Xavier — La Salle-Universitat Ramon Llull


Glottal Source Estimation and Automatic Detection of Dysphonic Speakers

Among all the biomedical signals, speech is among the most complex ones since it is produced and received by humans. The extraction and the analysis of the information conveyed by this signal are the basis of many applications, including the topics discussed in this thesis: the estimation of the glottal source and the automatic detection of voice pathologies. In the first part of the thesis, after a presentation of existing methods for the estimation of the glottal source, a focus is made on the occurence of irregular glottal source estimations when the representation based on the Zeros of the Z-Transform (ZZT) is concerned. As this method is sensitive to the location of the analysis window, it is proposed to regularize the estimation by shifting the analysis window around its initial location. The best shift is found by using a dynamic ...

Dubuisson, Thomas — University of Mons


Realtime and Accurate Musical Control of Expression in Voice Synthesis

In the early days of speech synthesis research, understanding voice production has attracted the attention of scientists with the goal of producing intelligible speech. Later, the need to produce more natural voices led researchers to use prerecorded voice databases, containing speech units, reassembled by a concatenation algorithm. With the outgrowth of computer capacities, the length of units increased, going from diphones to non-uniform units, in the so-called unit selection framework, using a strategy referred to as 'take the best, modify the least'. Today the new challenge in voice synthesis is the production of expressive speech or singing. The mainstream solution to this problem is based on the “there is no data like more data” paradigm: emotionspecific databases are recorded and emotion-specific units are segmented. In this thesis, we propose to restart the expressive speech synthesis problem, from its original voice ...

D' Alessandro, N. — Universite de Mons


Three-Dimensional Face Recognition

In this thesis, we attack the problem of identifying humans from their three dimensional facial characteristics. For this purpose, a complete 3D face recognition system is developed. We divide the whole system into sub-processes. These sub-processes can be categorized as follows: 1) registration, 2) representation of faces, 3) extraction of discriminative features, and 4) fusion of matchers. For each module, we evaluate the state-of-the art methods, and also propose novel ones. For the registration task, we propose to use a generic face model which speeds up the correspondence establishment process. We compare the benefits of rigid and non-rigid registration schemes using a generic face model. In terms of face representation schemes, we implement a diverse range of approaches such as point clouds, curvature-based descriptors, and range images. In relation to these, various feature extraction methods are used to determine the ...

Gokberk, Berk — Bogazici University


Mixed structural models for 3D audio in virtual environments

In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...

Geronazzo, Michele — University of Padova


Improvements in Pose Invariance and Local Description for Gabor-based 2D Face Recognition

Automatic face recognition has attracted a lot of attention not only because of the large number of practical applications where human identification is needed but also due to the technical challenges involved in this problem: large variability in facial appearance, non-linearity of face manifolds and high dimensionality are some the most critical handicaps. In order to deal with the above mentioned challenges, there are two possible strategies: the first is to construct a “good” feature space in which the manifolds become simpler (more linear and more convex). This scheme usually comprises two levels of processing: (1) normalize images geometrically and photometrically and (2) extract features that are stable with respect to these variations (such as those based on Gabor filters). The second strategy is to use classification structures that are able to deal with non-linearities and to generalize properly. To ...

Gonzalez-Jimenez, Daniel — University of Vigo

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.