Multi-channel EMG pattern classification based on deep learning

In recent years, a huge body of data generated by various applications in domains like social networks and healthcare have paved the way for the development of high performance models. Deep learning has transformed the field of data analysis by dramatically improving the state of the art in various classification and prediction tasks. Combined with advancements in electromyography it has given rise to new hand gesture recognition applications, such as human computer interfaces, sign language recognition, robotics control and rehabilitation games. The purpose of this thesis is to develop novel methods for electromyography signal analysis based on deep learning for the problem of hand gesture recognition. Specifically, we focus on methods for data preparation and developing accurate models even when few data are available. Electromyography signals are in general one-dimensional time-series with a rich frequency content. Various feature sets have ...

Tsinganos, Panagiotis — University of Patras, Greece - Vrije Universiteit Brussel, Belgium


Central and peripheral mechanisms: a multimodal approach to understanding and restoring human motor control

All human actions involve motor control. Even the simplest movement requires the coordinated recruitment of many muscles, orchestrated by neuronal circuits in the brain and the spinal cord. As a consequence, lesions affecting the central nervous system, such as stroke, can lead to a wide range of motor impairments. While a certain degree of recovery can often be achieved by harnessing the plasticity of the motor hierarchy, patients typically struggle to regain full motor control. In this context, technology-assisted interventions offer the prospect of intense, controllable and quantifiable motor training. Yet, clinical outcomes remain comparable to conventional approaches, suggesting the need for a paradigm shift towards customized knowledge-driven treatments to fully exploit their potential. In this thesis, we argue that a detailed understanding of healthy and impaired motor pathways can foster the development of therapies optimally engaging plasticity. To this ...

Kinany, Nawal — Ecole Polytechnique Fédérale de Lausanne (EPFL)


Realtime and Accurate Musical Control of Expression in Voice Synthesis

In the early days of speech synthesis research, understanding voice production has attracted the attention of scientists with the goal of producing intelligible speech. Later, the need to produce more natural voices led researchers to use prerecorded voice databases, containing speech units, reassembled by a concatenation algorithm. With the outgrowth of computer capacities, the length of units increased, going from diphones to non-uniform units, in the so-called unit selection framework, using a strategy referred to as 'take the best, modify the least'. Today the new challenge in voice synthesis is the production of expressive speech or singing. The mainstream solution to this problem is based on the “there is no data like more data” paradigm: emotionspecific databases are recorded and emotion-specific units are segmented. In this thesis, we propose to restart the expressive speech synthesis problem, from its original voice ...

D' Alessandro, N. — Universite de Mons


Video person recognition strategies using head motion and facial appearance

In this doctoral dissertation, we principally explore the use of the temporal information available in video sequences for person and gender recognition; in particular, we focus on the analysis of head and facial motion, and their potential application as biometric identifiers. We also investigate how to exploit as much video information as possible for the automatic recognition; more precisely, we examine the possibility of integrating the head and mouth motion information with facial appearance into a multimodal biometric system, and we study the extraction of novel spatio-temporal facial features for recognition. We initially present a person recognition system that exploits the unconstrained head motion information, extracted by tracking a few facial landmarks in the image plane. In particular, we detail how each video sequence is firstly pre-processed by semiautomatically detecting the face, and then automatically tracking the facial landmarks over ...

Matta, Federico — Eurécom / Multimedia communications


Mixed structural models for 3D audio in virtual environments

In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...

Geronazzo, Michele — University of Padova


Content-based search and browsing in semantic multimedia retrieval

Growth in storage capacity has led to large digital video repositories and complicated the discovery of specific information without the laborious manual annotation of data. The research focuses on creating a retrieval system that is ultimately independent of manual work. To retrieve relevant content, the semantic gap between the searcher's information need and the content data has to be overcome using content-based technology. Semantic gap constitutes of two distinct elements: the ambiguity of the true information need and the equivocalness of digital video data. The research problem of this thesis is: what computational content-based models for retrieval increase the effectiveness of the semantic retrieval of digital video? The hypothesis is that semantic search performance can be improved using pattern recognition, data abstraction and clustering techniques jointly with human interaction through manually created queries and visual browsing. The results of this ...

Rautiainen, Mika — University of Oulou


Audio-visual processing and content management techniques, for the study of (human) bioacoustics phenomena

The present doctoral thesis aims towards the development of new long-term, multi-channel, audio-visual processing techniques for the analysis of bioacoustics phenomena. The effort is focused on the study of the physiology of the gastrointestinal system, aiming at the support of medical research for the discovery of gastrointestinal motility patterns and the diagnosis of functional disorders. The term "processing" in this case is quite broad, incorporating the procedures of signal processing, content description, manipulation and analysis, that are applied to all the recorded bioacoustics signals, the auxiliary audio-visual surveillance information (for the monitoring of experiments and the subjects' status), and the extracted audio-video sequences describing the abdominal sound-field alterations. The thesis outline is as follows. The main objective of the thesis, which is the technological support of medical research, is presented in the first chapter. A quick problem definition is initially ...

Dimoulas, Charalampos — Department of Electrical and Computer Engineering, Faculty of Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece


Selected Topics in Inertial and Visual Sensor Fusion: Calibration, Observability Analysis and Applications

Recent improvements in the development of inertial and visual sensors allow building small, lightweight, and cheap motion capture systems, which are becoming a standard feature of smartphones and personal digital assistants. This dissertation describes developments of new motion sensing strategies using the inertial and inertial-visual sensors. The thesis contributions are presented in two parts. The first part focuses mainly on the use of inertial measurement units. First, the problem of sensor calibration is addressed and a low-cost and accurate method to calibrate the accelerometer cluster of this unit is proposed. The method is based on the maximum likelihood estimation framework, which results in a minimum variance unbiased estimator.Then using the inertial measurement unit, a probabilistic user-independent method is proposed for pedestrian activity classification and gait analysis.The work targets two groups of applications including human activity classificationand joint human activity and ...

Panahandeh Ghazaleh — KTH Royal Institute of Technology


Constrained Non-negative Matrix Factorization for Vocabulary Acquisition from Continuous Speech

One desideratum in designing cognitive robots is autonomous learning of communication skills, just like humans. The primary step towards this goal is vocabulary acquisition. Being different from the training procedures of the state-of-the-art automatic speech recognition (ASR) systems, vocabulary acquisition cannot rely on prior knowledge of language in the same way. Like what infants do, the acquisition process should be data-driven with multi-level abstraction and coupled with multi-modal inputs. To avoid lengthy training efforts in a word-by-word interactive learning process, a clever learning agent should be able to acquire vocabularies from continuous speech automatically. The work presented in this thesis is entitled \emph{Constrained Non-negative Matrix Factorization for Vocabulary Acquisition from Continuous Speech}. Enlightened by the extensively studied techniques in ASR, we design computational models to discover and represent vocabularies from continuous speech with little prior knowledge of the language to ...

Sun, Meng — Katholieke Universiteit Leuven


Motion Analysis and Modeling for Activity Recognition and 3-D Animation based on Geometrical and Video Processing Algorithms

The analysis of audiovisual data aims at extracting high level information, equivalent with the one(s) that can be extracted by a human. It is considered as a fundamental, unsolved (in its general form) problem. Even though the inverse problem, the audiovisual (sound and animation) synthesis, is judged easier than the previous, it remains an unsolved problem. The systematic research on these problems yields solutions that constitute the basis for a great number of continuously developing applications. In this thesis, we examine the two aforementioned fundamental problems. We propose algorithms and models of analysis and synthesis of articulated motion and undulatory (snake) locomotion, using data from video sequences. The goal of this research is the multilevel information extraction from video, like object tracking and activity recognition, and the 3-D animation synthesis in virtual environments based on the results of analysis. An ...

Panagiotakis, Costas — University of Crete


A multimicrophone approach to speech processing in a smart-room environment

Recent advances in computer technology and speech and language processing have made possible that some new ways of person-machine communication and computer assistance to human activities start to appear feasible. Concretely, the interest on the development of new challenging applications in indoor environments equipped with multiple multimodal sensors, also known as smart-rooms, has considerably grown. In general, it is well-known that the quality of speech signals captured by microphones that can be located several meters away from the speakers is severely distorted by acoustic noise and room reverberation. In the context of the development of hands-free speech applications in smart-room environments, the use of obtrusive sensors like close-talking microphones is usually not allowed, and consequently, speech technologies must operate on the basis of distant-talking recordings. In such conditions, speech technologies that usually perform reasonably well in free of noise and ...

Abad, Alberto — Universitat Politecnica de Catalunya


Human-Centered Content-Based Image Retrieval

Retrieval of images that lack a (suitable) annotations cannot be achieved through (traditional) Information Retrieval (IR) techniques. Access through such collections can be achieved through the application of computer vision techniques on the IR problem, which is baptized Content-Based Image Retrieval (CBIR). In contrast with most purely technological approaches, the thesis Human-Centered Content-Based Image Retrieval approaches the problem from a human/user centered perspective. Psychophysical experiments were conducted in which people were asked to categorize colors. The data gathered from these experiments was fed to a Fast Exact Euclidean Distance (FEED) transform (Schouten & Van den Broek, 2004), which enabled the segmentation of color space based on human perception (Van den Broek et al., 2008). This unique color space segementation was exploited for texture analysis and image segmentation, and subsequently for full-featured CBIR. In addition, a unique CBIR-benchmark was developed (Van ...

van den Broek, Egon L. — Radboud University Nijmegen


Fusing prosodic and acoustic information for speaker recognition

Automatic speaker recognition is the use of a machine to identify an individual from a spoken sentence. Recently, this technology has been undergone an increasing use in applications such as access control, transaction authentication, law enforcement, forensics, and system customisation, among others. One of the central questions addressed by this field is what is it in the speech signal that conveys speaker identity. Traditionally, automatic speaker recognition systems have relied mostly on short-term features related to the spectrum of the voice. However, human speaker recognition relies on other sources of information; therefore, there is reason to believe that these sources can play also an important role in the automatic speaker recognition task, adding complementary knowledge to the traditional spectrum-based recognition systems and thus improving their accuracy. The main objective of this thesis is to add prosodic information to a traditional ...

Farrus, Mireia — Universitat Politecnica de Catalunya


Fading in Wearable Communications Channels and its Mitigation

The fabrication of miniature electronics and sensors has encouraged the creation of a wide range of wireless enabled devices designed to be worn on the human body. This has led to the prominence of so-called wearable communications, which have emerged to satisfy the demand for wireless connectivity between these devices and with external networks. The work in this thesis has focused on the characterization of the composite fading (i.e combined multipath and shadowing) observed in wearable communications channels. It has also investigated the mitigation of the deleterious effects of both of these propagation phenomena in wearable communications. In order to accurately characterize the behaviour of the composite fading signal observed in wearable communications channels, new fading models such as F, $\kappa$-$\mu$ / inverse gamma and $\eta$-$\mu$ / inverse gamma composite fading models, have been proposed. The generality and utility of ...

Seong Ki Yoo — Queen's University Belfast


Non-rigid Registration-based Data-driven 3D Facial Action Unit Detection

Automated analysis of facial expressions has been an active area of study due to its potential applications not only for intelligent human-computer interfaces but also for human facial behavior research. To advance automatic expression analysis, this thesis proposes and empirically proves two hypotheses: (i) 3D face data is a better data modality than conventional 2D camera images, not only for being much less disturbed by illumination and head pose effects but also for capturing true facial surface information. (ii) It is possible to perform detailed face registration without resorting to any face modeling. This means that data-driven methods in automatic expression analysis can compensate for the confounding effects like pose and physiognomy differences, and can process facial features more effectively, without suffering the drawbacks of model-driven analysis. Our study is based upon Facial Action Coding System (FACS) as this paradigm ...

Savran, Arman — Bogazici University

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.