Automatic Analysis of Head and Facial Gestures in Video Streams

Automatic analysis of head gestures and facial expressions is a challenging research area and it has significant applications for intelligent human-computer interfaces. An important task is the automatic classification of non-verbal messages composed of facial signals where both facial expressions and head rotations are observed. This is a challenging task, because there is no definite grammar or code-book for mapping the non-verbal facial signals into a corresponding mental state. Furthermore, non-verbal facial signals and the observed emotions have dependency on personality, society, state of the mood and also the context in which they are displayed or observed. This thesis mainly addresses the three desired tasks for an effective visual information based automatic face and head gesture (FHG) analyzer. First we develop a fully automatic, robust and accurate 17-point facial landmark localizer based on local appearance information and structural information of ...

Cinar Akakin, Hatice — Bogazici University


Discrete-time speech processing with application to emotion recognition

The subject of this PhD thesis is the efficient and robust processing and analysis of the audio recordings that are derived from a call center. The thesis is comprised of two parts. The first part is dedicated to dialogue/non-dialogue detection and to speaker segmentation. The systems that are developed are prerequisite for detecting (i) the audio segments that actually contain a dialogue between the system and the call center customer and (ii) the change points between the system and the customer. This way the volume of the audio recordings that need to be processed is significantly reduced, while the system is automated. To detect the presence of a dialogue several systems are developed. This is the first effort found in the international literature that the audio channel is exclusively exploited. Also, it is the first time that the speaker utterance ...

Kotti, Margarita — Aristotle University of Thessaloniki


Emotion assessment for affective computing based on brain and peripheral signals

Current Human-Machine Interfaces (HMI) lack of “emotional intelligence”, i.e. they are not able to identify human emotional states and take this information into account to decide on the proper actions to execute. The goal of affective computing is to fill this lack by detecting emotional cues occurring during Human-Computer Interaction (HCI) and synthesizing emotional responses. In the last decades, most of the studies on emotion assessment have focused on the analysis of facial expressions and speech to determine the emotional state of a person. Physiological activity also includes emotional information that can be used for emotion assessment but has received less attention despite of its advantages (for instance it can be less easily faked than facial expressions). This thesis reports on the use of two types of physiological activities to assess emotions in the context of affective computing: the activity ...

Chanel, Guillaume — University of Geneva


Video person recognition strategies using head motion and facial appearance

In this doctoral dissertation, we principally explore the use of the temporal information available in video sequences for person and gender recognition; in particular, we focus on the analysis of head and facial motion, and their potential application as biometric identifiers. We also investigate how to exploit as much video information as possible for the automatic recognition; more precisely, we examine the possibility of integrating the head and mouth motion information with facial appearance into a multimodal biometric system, and we study the extraction of novel spatio-temporal facial features for recognition. We initially present a person recognition system that exploits the unconstrained head motion information, extracted by tracking a few facial landmarks in the image plane. In particular, we detail how each video sequence is firstly pre-processed by semiautomatically detecting the face, and then automatically tracking the facial landmarks over ...

Matta, Federico — Eurécom / Multimedia communications


Three dimensional shape modeling: segmentation, reconstruction and registration

Accounting for uncertainty in three-dimensional (3D) shapes is important in a large number of scientific and engineering areas, such as biometrics, biomedical imaging, and data mining. It is well known that 3D polar shaped objects can be represented by Fourier descriptors such as spherical harmonics and double Fourier series. However, the statistics of these spectral shape models have not been widely explored. This thesis studies several areas involved in 3D shape modeling, including random field models for statistical shape modeling, optimal shape filtering, parametric active contours for object segmentation and surface reconstruction. It also investigates multi-modal image registration with respect to tumor activity quantification. Spherical harmonic expansions over the unit sphere not only provide a low dimensional polarimetric parameterization of stochastic shape, but also correspond to the Karhunen-Lo´eve (K-L) expansion of any isotropic random field on the unit sphere. Spherical ...

Li, Jia — University of Michigan


Glottal Source Estimation and Automatic Detection of Dysphonic Speakers

Among all the biomedical signals, speech is among the most complex ones since it is produced and received by humans. The extraction and the analysis of the information conveyed by this signal are the basis of many applications, including the topics discussed in this thesis: the estimation of the glottal source and the automatic detection of voice pathologies. In the first part of the thesis, after a presentation of existing methods for the estimation of the glottal source, a focus is made on the occurence of irregular glottal source estimations when the representation based on the Zeros of the Z-Transform (ZZT) is concerned. As this method is sensitive to the location of the analysis window, it is proposed to regularize the estimation by shifting the analysis window around its initial location. The best shift is found by using a dynamic ...

Dubuisson, Thomas — University of Mons


A Robust Face Recognition Algorithm for Real-World Applications

Face recognition is one of the most challenging problems of computer vision and pattern recognition. The difficulty in face recognition arises mainly from facial appearance variations caused by factors, such as expression, illumination, partial face occlusion, and time gap between training and testing data capture. Moreover, the performance of face recognition algorithms heavily depends on prior facial feature localization step. That is, face images need to be aligned very well before they are fed into a face recognition algorithm, which requires precise facial feature localization. This thesis addresses on solving these two main problems -facial appearance variations due to changes in expression, illumination, occlusion, time gap, and imprecise face alignment due to mislocalized facial features- in order to accomplish its goal of building a generic face recognition algorithm that can function reliably under real-world conditions. The proposed face recognition algorithm ...

Ekenel, Hazim Kemal — University of Karlsruhe


Three-Dimensional Face Recognition

In this thesis, we attack the problem of identifying humans from their three dimensional facial characteristics. For this purpose, a complete 3D face recognition system is developed. We divide the whole system into sub-processes. These sub-processes can be categorized as follows: 1) registration, 2) representation of faces, 3) extraction of discriminative features, and 4) fusion of matchers. For each module, we evaluate the state-of-the art methods, and also propose novel ones. For the registration task, we propose to use a generic face model which speeds up the correspondence establishment process. We compare the benefits of rigid and non-rigid registration schemes using a generic face model. In terms of face representation schemes, we implement a diverse range of approaches such as point clouds, curvature-based descriptors, and range images. In relation to these, various feature extraction methods are used to determine the ...

Gokberk, Berk — Bogazici University


Improvements in Pose Invariance and Local Description for Gabor-based 2D Face Recognition

Automatic face recognition has attracted a lot of attention not only because of the large number of practical applications where human identification is needed but also due to the technical challenges involved in this problem: large variability in facial appearance, non-linearity of face manifolds and high dimensionality are some the most critical handicaps. In order to deal with the above mentioned challenges, there are two possible strategies: the first is to construct a “good” feature space in which the manifolds become simpler (more linear and more convex). This scheme usually comprises two levels of processing: (1) normalize images geometrically and photometrically and (2) extract features that are stable with respect to these variations (such as those based on Gabor filters). The second strategy is to use classification structures that are able to deal with non-linearities and to generalize properly. To ...

Gonzalez-Jimenez, Daniel — University of Vigo


Methods For Detection and Classification In ECG Analysis

The first part of the presented work is focused on measuring of QT intervals. QT interval can be an indicator of the cardiovascular health of the patient and detect any potential abnormalities. The QT interval is measured from the onset of the QRS complex to the end of the T wave. However, measurements for the end of the T wave are often highly subjective and the corresponding verification is difficult. Here we propose two methods of QT interval measuring – wavelet based and template matching method. Methods are compared with each other and tested on standard QT database. The second part of the presented work is focused on modelling of arrhythmias using McSharry’s model followed with classification using an artificial neural network. The proposed method uses pre-processing of signals with Linear Approximation Distance Thresholding method and Line Segment Clustering method ...

Kicmerova, Dina — Brno University of Technology / Department of Biomedical Engineering


Non-rigid Registration-based Data-driven 3D Facial Action Unit Detection

Automated analysis of facial expressions has been an active area of study due to its potential applications not only for intelligent human-computer interfaces but also for human facial behavior research. To advance automatic expression analysis, this thesis proposes and empirically proves two hypotheses: (i) 3D face data is a better data modality than conventional 2D camera images, not only for being much less disturbed by illumination and head pose effects but also for capturing true facial surface information. (ii) It is possible to perform detailed face registration without resorting to any face modeling. This means that data-driven methods in automatic expression analysis can compensate for the confounding effects like pose and physiognomy differences, and can process facial features more effectively, without suffering the drawbacks of model-driven analysis. Our study is based upon Facial Action Coding System (FACS) as this paradigm ...

Savran, Arman — Bogazici University


Facial Feature Extraction and Estimation of Gaze Direction in Human-Computer Interaction

In the modern age of information, there is a growing interest in improving interaction between humans and computers in an unremitting attempt to render it as seamless as the interaction between humans. In the core of this endeavor are the study of the human face and the focus of attention, determined by the eye gaze. The main objective of the current thesis is to develop accurate and reliable methods for extracting facial information, localizing the positions of the eye centers and performing tracking of the eye gaze. Usually such systems are grounded upon various assumptions regarding the topology of the features and the camera parameters or require dedicated hardware. In the regard of ubiquitous computing, all the methods developed in the scope of the current thesis use images and videos acquired using standard cameras under natural illumination, without the requirement ...

Skodras, Evangelos — University of Patras


Segmentation par modèle déformable surfacique localement régularisé par spline

Image segmentation through deformable models is a method that localizes object boundaries. When difficult segmentation context are proposed because of noise or a lack of information, the use of prior knowledge in the deformation process increases segmentation accuracy. Medical imaging is often concerned by these context. Moreover, medical applications deal with large amounts of data. Then it is mandatory to use a robust and fast processing. This question lead us to a local regularisation of the deformable model. Highly based on the active contour framework, also known as \emph{snake}, we propose a new regularization scheme. This is done by filtering the displacements at each iteration. The filter is based on a smoothing spline kernel whose aim was to approximate a set of points rather than interpolating it. We point out the consistency of the regularization parameter in such a method. ...

Velut, Jerome — INSA-Lyon / CREATIS-LRMN


Perceptually-Based Signal Features for Environmental Sound Classification

This thesis faces the problem of automatically classifying environmental sounds, i.e., any non-speech or non-music sounds that can be found in the environment. Broadly speaking, two main processes are needed to perform such classification: the signal feature extraction so as to compose representative sound patterns and the machine learning technique that performs the classification of such patterns. The main focus of this research is put on the former, studying relevant signal features that optimally represent the sound characteristics since, according to several references, it is a key issue to attain a robust recognition. This type of audio signals holds many differences with speech or music signals, thus specific features should be determined and adapted to their own characteristics. In this sense, new signal features, inspired by the human auditory system and the human perception of sound, are proposed to improve ...

Valero, Xavier — La Salle-Universitat Ramon Llull


Mixed structural models for 3D audio in virtual environments

In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...

Geronazzo, Michele — University of Padova

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.