Biosignal processing and activity modeling for multimodal human activity recognition

This dissertation's primary goal was to systematically study human activity recognition and enhance its performance by advancing human activities' sequential modeling based on HMM-based machine learning. Driven by these purposes, this dissertation has the following major contributions: The proposal of our HAR research pipeline that guides the building of a robust wearable end-to-end HAR system and the implementation of the recording and recognition software Activity Signal Kit (ASK) according to the pipeline; Collecting several datasets of multimodal biosignals from over 25 subjects using the self-implemented ASK software and implementing an easy mechanism to segment and annotate the data; The comprehensive research on the offline HAR system based on the recorded datasets and the implementation of an end-to-end real-time HAR system; A novel activity modeling method for HAR, which partitions the human activity into a sequence of shared, meaningful, and activity ...

Liu, Hui — University of Bremen


Deep Learning for Event Detection, Sequence Labelling and Similarity Estimation in Music Signals

When listening to music, some humans can easily recognize which instruments play at what time or when a new musical segment starts, but cannot describe exactly how they do this. To automatically describe particular aspects of a music piece – be it for an academic interest in emulating human perception, or for practical applications –, we can thus not directly replicate the steps taken by a human. We can, however, exploit that humans can easily annotate examples, and optimize a generic function to reproduce these annotations. In this thesis, I explore solving different music perception tasks with deep learning, a recent branch of machine learning that optimizes functions of many stacked nonlinear operations – referred to as deep neural networks – and promises to obtain better results or require less domain knowledge than more traditional techniques. In particular, I employ ...

Schlüter, Jan — Department of Computational Perception, Johannes Kepler University Linz


Automated detection of epileptic seizures in pediatric patients based on accelerometry and surface electromyography

Epilepsy is one of the most common neurological diseases that manifests in repetitive epileptic seizures as a result of an abnormal, synchronous activity of a large group of neurons. Depending on the affected brain regions, seizures produce various severe clinical symptoms. There is no cure for epilepsy and sometimes even medication and other therapies, like surgery, vagus nerve stimulation or ketogenic diet, do not control the number of seizures. In that case, long-term (home) monitoring and automatic seizure detection would enable the tracking of the evolution of the disease and improve objective insight in any responses to medical interventions or changes in medical treatment. Especially during the night, supervision is reduced; hence a large number of seizures is missed. In addition, an alarm should be integrated into the automated seizure detection algorithm for severe seizures in order to help the ...

Milošević, Milica — KU Leuven


Robust Speech Recognition on Intelligent Mobile Devices with Dual-Microphone

Despite the outstanding progress made on automatic speech recognition (ASR) throughout the last decades, noise-robust ASR still poses a challenge. Tackling with acoustic noise in ASR systems is more important than ever before for a twofold reason: 1) ASR technology has begun to be extensively integrated in intelligent mobile devices (IMDs) such as smartphones to easily accomplish different tasks (e.g. search-by-voice), and 2) IMDs can be used anywhere at any time, that is, under many different acoustic (noisy) conditions. On the other hand, with the aim of enhancing noisy speech, IMDs have begun to embed small microphone arrays, i.e. microphone arrays comprised of a few sensors close each other. These multi-sensor IMDs often embed one microphone (usually at their rear) intended to capture the acoustic environment more than the speaker’s voice. This is the so-called secondary microphone. While classical microphone ...

López-Espejo, Iván — University of Granada


Automatic Analysis of Head and Facial Gestures in Video Streams

Automatic analysis of head gestures and facial expressions is a challenging research area and it has significant applications for intelligent human-computer interfaces. An important task is the automatic classification of non-verbal messages composed of facial signals where both facial expressions and head rotations are observed. This is a challenging task, because there is no definite grammar or code-book for mapping the non-verbal facial signals into a corresponding mental state. Furthermore, non-verbal facial signals and the observed emotions have dependency on personality, society, state of the mood and also the context in which they are displayed or observed. This thesis mainly addresses the three desired tasks for an effective visual information based automatic face and head gesture (FHG) analyzer. First we develop a fully automatic, robust and accurate 17-point facial landmark localizer based on local appearance information and structural information of ...

Cinar Akakin, Hatice — Bogazici University


Computational models of expressive gesture in multimedia systems

This thesis focuses on the development of paradigms and techniques for the design and implementation of multimodal interactive systems, mainly for performing arts applications. The work addresses research issues in the fields of human-computer interaction, multimedia systems, and sound and music computing. The thesis is divided into two parts. In the first one, after a short review of the state-of-the-art, the focus moves on the definition of environments in which novel forms of technology-integrated artistic performances can take place. These are distributed active mixed reality environments in which information at different layers of abstraction is conveyed mainly non-verbally through expressive gestures. Expressive gesture is therefore defined and the internal structure of a virtual observer able to process it (and inhabiting the proposed environments) is described in a multimodal perspective. The definition of the structure of the environments, of the virtual ...

Volpe, Gualtiero — University of Genova


Deep Learning for i-Vector Speaker and Language Recognition

Over the last few years, i-vectors have been the state-of-the-art technique in speaker and language recognition. Recent advances in Deep Learning (DL) technology have improved the quality of i-vectors but the DL techniques in use are computationally expensive and need speaker or/and phonetic labels for the background data, which are not easily accessible in practice. On the other hand, the lack of speaker-labeled background data makes a big performance gap, in speaker recognition, between two well-known cosine and Probabilistic Linear Discriminant Analysis (PLDA) i-vector scoring techniques. It has recently been a challenge how to fill this gap without speaker labels, which are expensive in practice. Although some unsupervised clustering techniques are proposed to estimate the speaker labels, they cannot accurately estimate the labels. This thesis tries to solve the problems above by using the DL technology in different ways, without ...

Ghahabi, Omid — Universitat Politecnica de Catalunya


Learning Transferable Knowledge through Embedding Spaces

The unprecedented processing demand, posed by the explosion of big data, challenges researchers to design efficient and adaptive machine learning algorithms that do not require persistent retraining and avoid learning redundant information. Inspired from learning techniques of intelligent biological agents, identifying transferable knowledge across learning problems has been a significant research focus to improve machine learning algorithms. In this thesis, we address the challenges of knowledge transfer through embedding spaces that capture and store hierarchical knowledge. In the first part of the thesis, we focus on the problem of cross-domain knowledge transfer. We first address zero-shot image classification, where the goal is to identify images from unseen classes using semantic descriptions of these classes. We train two coupled dictionaries which align visual and semantic domains via an intermediate embedding space. We then extend this idea by training deep networks that ...

Mohammad Rostami — University of Pennsylvania


Spike train discrimination and analysis in neural and surface electromyography (sEMG) applications

The term "spike" is used to describe a short-time event that is the result of the activity of its source. Spikes can be seen in different signal modalities. In these modalities, often more than one source generates spikes. Classification algorithms can be used to group similar spikes, ideally spikes from the same source. This work examines the classification of spikes generated from neurons and muscles. When each detected spike is assigned to its source, the spike trains of these sources can provide information on complex brain network functioning, muscle disorders, and other applications. During the past several decades, there were many attempts to create and improve spike classification algorithms. No matter how advanced these methods are today, errors in classification cannot be avoided. Therefore, methods that would determine and improve reliability of classification are very desirable. In this work, it ...

Gligorijevic, Ivan — KU Leuven


Emotion assessment for affective computing based on brain and peripheral signals

Current Human-Machine Interfaces (HMI) lack of “emotional intelligence”, i.e. they are not able to identify human emotional states and take this information into account to decide on the proper actions to execute. The goal of affective computing is to fill this lack by detecting emotional cues occurring during Human-Computer Interaction (HCI) and synthesizing emotional responses. In the last decades, most of the studies on emotion assessment have focused on the analysis of facial expressions and speech to determine the emotional state of a person. Physiological activity also includes emotional information that can be used for emotion assessment but has received less attention despite of its advantages (for instance it can be less easily faked than facial expressions). This thesis reports on the use of two types of physiological activities to assess emotions in the context of affective computing: the activity ...

Chanel, Guillaume — University of Geneva


Acoustic Event Detection: Feature, Evaluation and Dataset Design

It takes more time to think of a silent scene, action or event than finding one that emanates sound. Not only speaking or playing music but almost everything that happens is accompanied with or results in one or more sounds mixed together. This makes acoustic event detection (AED) one of the most researched topics in audio signal processing nowadays and it will probably not see a decline anywhere in the near future. This is due to the thirst for understanding and digitally abstracting more and more events in life via the enormous amount of recorded audio through thousands of applications in our daily routine. But it is also a result of two intrinsic properties of audio: it doesn’t need a direct sight to be perceived and is less intrusive to record when compared to image or video. Many applications such ...

Mina Mounir — KU Leuven, ESAT STADIUS


Vision Based Sign Language Recognition: Modeling and Recognizing Isolated Signs With Manual and Non-manual Components

This thesis addresses the problem of vision based sign language recognition and focuses on three main tasks to design improved techniques that increase the performance of sign language recognition systems. We first attack the markerless tracking problem during natural and unrestricted signing in less restricted environments. We propose a joint particle filter approach for tracking multiple identical objects, in our case the two hands and the face, which is robust to situations including fast movement, interactions and occlusions. Our experiments show that the proposed approach has a robust tracking performance during the challenging situations and is suitable for tracking long durations of signing with its ability of fast recovery. Second, we attack the problem of the recognition of signs that include both manual (hand gestures) and non-manual (head/body gestures) components. We investigated multi-modal fusion techniques to model the different temporal ...

Aran, Oya — Bogazici University


Automated Face Recognition from Low-resolution Imagery

Recently, significant advances in the field of automated face recognition have been achieved using computer vision, machine learning, and deep learning methodologies. However, despite claims of super-human performance of face recognition algorithms on select key benchmark tasks, there remain several open problems that preclude the general replacement of human face recognition work with automated systems. State-of-the-art automated face recognition systems based on deep learning methods are able to achieve high accuracy when the face images they are tasked with recognizing subjects from are of sufficiently high quality. However, low image resolution remains one of the principal obstacles to face recognition systems, and their performance in the low-resolution regime is decidedly below human capabilities. In this PhD thesis, we present a systematic study of modern automated face recognition systems in the presence of image degradation in various forms. Based on our ...

Grm, Klemen — University of Ljubljana


Non-intrusive Quality Evaluation of Speech Processed in Noisy and Reverberant Environments

In many speech applications such as hands-free telephony or voice-controlled home assistants, the distance between the user and the recording microphones can be relatively large. In such a far-field scenario, the recorded microphone signals are typically corrupted by noise and reverberation, which may severely degrade the performance of speech recognition systems and reduce intelligibility and quality of speech in communication applications. In order to limit these effects, speech enhancement algorithms are typically applied. The main objective of this thesis is to develop novel speech enhancement algorithms for noisy and reverberant environments and signal-based measures to evaluate these algorithms, focusing on solutions that are applicable in realistic scenarios. First, we propose a single-channel speech enhancement algorithm for joint noise and reverberation reduction. The proposed algorithm uses a spectral gain to enhance the input signal, where the gain is computed using a ...

Cauchi, Benjamin — University of Oldenburg


Digital design and experimental validation of high-performance real-time OFDM systems

The goal of this Ph.D. dissertation is to address a number of challenges encountered in the digital baseband design of modern and future wireless communication systems. The fast and continuous evolution of wireless communications has been driven by the ambitious goal of providing ubiquitous services that could guarantee high throughput, reliability of the communication link and satisfy the increasing demand for efficient re-utilization of the heavily populated wireless spectrum. To cope with these ever-growing performance requirements, researchers around the world have introduced sophisticated broadband physical (PHY)-layer communication schemes able to accommodate higher bandwidth, which indicatively include multiple antennas at the transmitter and receiver and are capable of delivering improved spectral efficiency by applying interference management policies. The merging of Multiple Input Multiple Output (MIMO) schemes with the Orthogonal Frequency Division Multiplexing (OFDM) offers a flexible signal processing substrate to implement ...

Font-Bach, Oriol — Centre Tecnològic de Telecomunicacions de Catalunya (CTTC)

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.