Automatic recognition of French Cued Speech gestures (2007)
This thesis addresses the problem of vision based sign language recognition and focuses on three main tasks to design improved techniques that increase the performance of sign language recognition systems. We first attack the markerless tracking problem during natural and unrestricted signing in less restricted environments. We propose a joint particle filter approach for tracking multiple identical objects, in our case the two hands and the face, which is robust to situations including fast movement, interactions and occlusions. Our experiments show that the proposed approach has a robust tracking performance during the challenging situations and is suitable for tracking long durations of signing with its ability of fast recovery. Second, we attack the problem of the recognition of signs that include both manual (hand gestures) and non-manual (head/body gestures) components. We investigated multi-modal fusion techniques to model the different temporal ...
Aran, Oya — Bogazici University
Speech derereverberation in noisy environments using time-frequency domain signal models
Reverberation is the sum of reflected sound waves and is present in any conventional room. Speech communication devices such as mobile phones in hands-free mode, tablets, smart TVs, teleconferencing systems, hearing aids, voice-controlled systems, etc. use one or more microphones to pick up the desired speech signals. When the microphones are not in the proximity of the desired source, strong reverberation and noise can degrade the signal quality at the microphones and can impair the intelligibility and the performance of automatic speech recognizers. Therefore, it is a highly demanded task to process the microphone signals such that reverberation and noise are reduced. The process of reducing or removing reverberation from recorded signals is called dereverberation. As dereverberation is usually a completely blind problem, where the only available information are the microphone signals, and as the acoustic scenario can be non-stationary, ...
Braun, Sebastian — Friedrich-Alexander Universität Erlangen-Nürnberg
A Robust Face Recognition Algorithm for Real-World Applications
Face recognition is one of the most challenging problems of computer vision and pattern recognition. The difficulty in face recognition arises mainly from facial appearance variations caused by factors, such as expression, illumination, partial face occlusion, and time gap between training and testing data capture. Moreover, the performance of face recognition algorithms heavily depends on prior facial feature localization step. That is, face images need to be aligned very well before they are fed into a face recognition algorithm, which requires precise facial feature localization. This thesis addresses on solving these two main problems -facial appearance variations due to changes in expression, illumination, occlusion, time gap, and imprecise face alignment due to mislocalized facial features- in order to accomplish its goal of building a generic face recognition algorithm that can function reliably under real-world conditions. The proposed face recognition algorithm ...
Ekenel, Hazim Kemal — University of Karlsruhe
Automatic Analysis of Head and Facial Gestures in Video Streams
Automatic analysis of head gestures and facial expressions is a challenging research area and it has significant applications for intelligent human-computer interfaces. An important task is the automatic classification of non-verbal messages composed of facial signals where both facial expressions and head rotations are observed. This is a challenging task, because there is no definite grammar or code-book for mapping the non-verbal facial signals into a corresponding mental state. Furthermore, non-verbal facial signals and the observed emotions have dependency on personality, society, state of the mood and also the context in which they are displayed or observed. This thesis mainly addresses the three desired tasks for an effective visual information based automatic face and head gesture (FHG) analyzer. First we develop a fully automatic, robust and accurate 17-point facial landmark localizer based on local appearance information and structural information of ...
Cinar Akakin, Hatice — Bogazici University
Techniques for improving the performance of distributed video coding
Distributed Video Coding (DVC) is a recently proposed paradigm in video communication, which fits well emerging applications such as wireless video surveillance, multimedia sensor networks, wireless PC cameras, and mobile cameras phones. These applications require a low complexity encoding, while possibly affording a high complexity decoding. DVC presents several advantages: First, the complexity can be distributed between the encoder and the decoder. Second, the DVC is robust to errors, since it uses a channel code. In DVC, a Side Information (SI) is estimated at the decoder, using the available decoded frames, and used for the decoding and reconstruction of other frames. In this Ph.D thesis, we propose new techniques in order to improve the quality of the SI. First, successive refinement of the SI is performed after each decoded DCT band, using a Partially Decoded WZF (PDWZF), along with the ...
Abou-Elailah, Abdalbassir — Telecom Paristech
Video person recognition strategies using head motion and facial appearance
In this doctoral dissertation, we principally explore the use of the temporal information available in video sequences for person and gender recognition; in particular, we focus on the analysis of head and facial motion, and their potential application as biometric identifiers. We also investigate how to exploit as much video information as possible for the automatic recognition; more precisely, we examine the possibility of integrating the head and mouth motion information with facial appearance into a multimodal biometric system, and we study the extraction of novel spatio-temporal facial features for recognition. We initially present a person recognition system that exploits the unconstrained head motion information, extracted by tracking a few facial landmarks in the image plane. In particular, we detail how each video sequence is firstly pre-processed by semiautomatically detecting the face, and then automatically tracking the facial landmarks over ...
Matta, Federico — Eurécom / Multimedia communications
Speech Enhancement Algorithms for Audiological Applications
The improvement of speech intelligibility is a traditional problem which still remains open and unsolved. The recent boom of applications such as hands-free communi- cations or automatic speech recognition systems and the ever-increasing demands of the hearing-impaired community have given a definitive impulse to the research in this area. This PhD thesis is focused on speech enhancement for audiological applications. Most of the research conducted in this thesis has been focused on the improvement of speech intelligibility in hearing aids, considering the variety of restrictions and limitations imposed by this type of devices. The combination of source separation techniques and spatial filtering with machine learning and evolutionary computation has originated novel and interesting algorithms which are included in this thesis. The thesis is divided in two main parts. The first one contains a preliminary study of the problem and a ...
Ayllón, David — Universidad de Alcalá
Robust Speech Recognition: Analysis and Equalization of Lombard Effect in Czech Corpora
When exposed to noise, speakers will modify the way they speak in an effort to maintain intelligible communication. This process, which is referred to as Lombard effect (LE), involves a combination of both conscious and subconscious articulatory adjustment. Speech production variations due to LE can cause considerable degradation in automatic speech recognition (ASR) since they introduce a mismatch between parameters of the speech to be recognized and the ASR system’s acoustic models, which are usually trained on neutral speech. The main objective of this thesis is to analyze the impact of LE on speech production and to propose methods that increase ASR system performance in LE. All presented experiments were conducted on the Czech spoken language, yet, the proposed concepts are assumed applicable to other languages. The first part of the thesis focuses on the design and acquisition of a ...
Boril, Hynek — Czech Technical University in Prague
A statistical approach to motion estimation
Digital video technology has been characterized by a steady growth in the last decade. New applications like video e-mail, third generation mobile phone video communications, videoconferencing, video streaming on the web continuously push for further evolution of research in digital video coding. In order to be sent over the internet or even wireless networks, video information clearly needs compression to meet bandwidth requirements. Compression is mainly realized by exploiting the redundancy present in the data. A sequence of images contains an intrinsic, intuitive and simple idea of redundancy: two successive images are very similar. This simple concept is called temporal redundancy. The research of a proper scheme to exploit the temporal redundancy completely changes the scenario between compression of still pictures and sequence of images. It also represents the key for very high performances in image sequence coding when compared ...
Moschetti, Fulvio — Swiss Federal Institute of Technology
Signal processing algorithms for wireless acoustic sensor networks
Recent academic developments have initiated a paradigm shift in the way spatial sensor data can be acquired. Traditional localized and regularly arranged sensor arrays are replaced by sensor nodes that are randomly distributed over the entire spatial field, and which communicate with each other or with a master node through wireless communication links. Together, these nodes form a so-called ‘wireless sensor network’ (WSN). Each node of a WSN has a local sensor array and a signal processing unit to perform computations on the acquired data. The advantage of WSNs compared to traditional (wired) sensor arrays, is that many more sensors can be used that physically cover the full spatial field, which typically yields more variety (and thus more information) in the signals. It is likely that future data acquisition, control and physical monitoring, will heavily rely on this type of ...
Bertrand, Alexander — Katholieke Universiteit Leuven
Perceptually-Based Signal Features for Environmental Sound Classification
This thesis faces the problem of automatically classifying environmental sounds, i.e., any non-speech or non-music sounds that can be found in the environment. Broadly speaking, two main processes are needed to perform such classification: the signal feature extraction so as to compose representative sound patterns and the machine learning technique that performs the classification of such patterns. The main focus of this research is put on the former, studying relevant signal features that optimally represent the sound characteristics since, according to several references, it is a key issue to attain a robust recognition. This type of audio signals holds many differences with speech or music signals, thus specific features should be determined and adapted to their own characteristics. In this sense, new signal features, inspired by the human auditory system and the human perception of sound, are proposed to improve ...
Valero, Xavier — La Salle-Universitat Ramon Llull
Statistical Parametric Speech Synthesis Based on the Degree of Articulation
Nowadays, speech synthesis is part of various daily life applications. The ultimate goal of such technologies consists in extending the possibilities of interaction with the machine, in order to get closer to human-like communications. However, current state-of-the-art systems often lack of realism: although high-quality speech synthesis can be produced by many researchers and companies around the world, synthetic voices are generally perceived as hyperarticulated. In any case, their degree of articulation is fixed once and for all. The present thesis falls within the more general quest for enriching expressivity in speech synthesis. The main idea consists in improving statistical parametric speech synthesis, whose most famous example is Hidden Markov Model (HMM) based speech synthesis, by introducing a control of the articulation degree, so as to enable synthesizers to automatically adapt their way of speaking to the contextual situation, like humans ...
Picart, Benjamin — Université de Mons (UMONS)
Design and Evaluation of Feedback Control Algorithms for Implantable Hearing Devices
Using a hearing device is one of the most successful approaches to partially restore the degraded functionality of an impaired auditory system. However, due to the complex structure of the human auditory system, hearing impairment can manifest itself in different ways and, therefore, its compensation can be achieved through different classes of hearing devices. Although the majority of hearing devices consists of conventional hearing aids (HAs), several other classes of hearing devices have been developed. For instance, bone-conduction devices (BCDs) and cochlear implants (CIs) have successfully been used for more than thirty years. More recently, other classes of implantable devices have been developed such as middle ear implants (MEIs), implantable BCDs, and direct acoustic cochlear implants (DACIs). Most of these different classes of hearing devices rely on a sound processor running different algorithms able to compensate for the hearing impairment. ...
Bernardi, Giuliano — KU Leuven
Computational models of expressive gesture in multimedia systems
This thesis focuses on the development of paradigms and techniques for the design and implementation of multimodal interactive systems, mainly for performing arts applications. The work addresses research issues in the fields of human-computer interaction, multimedia systems, and sound and music computing. The thesis is divided into two parts. In the first one, after a short review of the state-of-the-art, the focus moves on the definition of environments in which novel forms of technology-integrated artistic performances can take place. These are distributed active mixed reality environments in which information at different layers of abstraction is conveyed mainly non-verbally through expressive gestures. Expressive gesture is therefore defined and the internal structure of a virtual observer able to process it (and inhabiting the proposed environments) is described in a multimodal perspective. The definition of the structure of the environments, of the virtual ...
Volpe, Gualtiero — University of Genova
Multi-channel EMG pattern classification based on deep learning
In recent years, a huge body of data generated by various applications in domains like social networks and healthcare have paved the way for the development of high performance models. Deep learning has transformed the field of data analysis by dramatically improving the state of the art in various classification and prediction tasks. Combined with advancements in electromyography it has given rise to new hand gesture recognition applications, such as human computer interfaces, sign language recognition, robotics control and rehabilitation games. The purpose of this thesis is to develop novel methods for electromyography signal analysis based on deep learning for the problem of hand gesture recognition. Specifically, we focus on methods for data preparation and developing accurate models even when few data are available. Electromyography signals are in general one-dimensional time-series with a rich frequency content. Various feature sets have ...
Tsinganos, Panagiotis — University of Patras, Greece - Vrije Universiteit Brussel, Belgium
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.