Visual Analysis of Faces with Application in Biometrics, Forensics and Health Informatics

Computer vision-based analysis of human facial video provides information regarding to expression, diseases symptoms, and physiological parameters such as heartbeat rate, blood pressure and respiratory rate. It also provides a convenient source of heartbeat signal to be used in biometrics and forensics. This thesis is a collection of works done in five themes in the realm of computer vision-based facial image analysis: Monitoring elderly patients at private homes, Face quality assessment, Measurement of physiological parameters, Contact-free heartbeat biometrics, and Decision support system for healthcare. The work related to monitoring elderly patients at private homes includes a detailed survey and review of the monitoring technologies relevant to older patients living at home by discussing previous reviews and relevant taxonomies, different scenarios for home monitoring solutions for older patients, sensing and data acquisition techniques, data processing and analysis techniques, available datasets for ...

Haque, Mohammad Ahsanul — Aalborg Univeristy


Contributions to Human Motion Modeling and Recognition using Non-intrusive Wearable Sensors

This thesis contributes to motion characterization through inertial and physiological signals captured by wearable devices and analyzed using signal processing and deep learning techniques. This research leverages the possibilities of motion analysis for three main applications: to know what physical activity a person is performing (Human Activity Recognition), to identify who is performing that motion (user identification) or know how the movement is being performed (motor anomaly detection). Most previous research has addressed human motion modeling using invasive sensors in contact with the user or intrusive sensors that modify the user’s behavior while performing an action (cameras or microphones). In this sense, wearable devices such as smartphones and smartwatches can collect motion signals from users during their daily lives in a less invasive or intrusive way. Recently, there has been an exponential increase in research focused on inertial-signal processing to ...

Gil-Martín, Manuel — Universidad Politécnica de Madrid


Deep Learning Techniques for Visual Counting

The explosion of Deep Learning (DL) added a boost to the already rapidly developing field of Computer Vision to such a point that vision-based tasks are now parts of our everyday lives. Applications such as image classification, photo stylization, or face recognition are nowadays pervasive, as evidenced by the advent of modern systems trivially integrated into mobile applications. In this thesis, we investigated and enhanced the visual counting task, which automatically estimates the number of objects in still images or video frames. Recently, due to the growing interest in it, several Convolutional Neural Network (CNN)-based solutions have been suggested by the scientific community. These artificial neural networks, inspired by the organization of the animal visual cortex, provide a way to automatically learn effective representations from raw visual data and can be successfully employed to address typical challenges characterizing this task, ...

Ciampi Luca — University of Pisa


Biosignal processing and activity modeling for multimodal human activity recognition

This dissertation's primary goal was to systematically study human activity recognition and enhance its performance by advancing human activities' sequential modeling based on HMM-based machine learning. Driven by these purposes, this dissertation has the following major contributions: The proposal of our HAR research pipeline that guides the building of a robust wearable end-to-end HAR system and the implementation of the recording and recognition software Activity Signal Kit (ASK) according to the pipeline; Collecting several datasets of multimodal biosignals from over 25 subjects using the self-implemented ASK software and implementing an easy mechanism to segment and annotate the data; The comprehensive research on the offline HAR system based on the recorded datasets and the implementation of an end-to-end real-time HAR system; A novel activity modeling method for HAR, which partitions the human activity into a sequence of shared, meaningful, and activity ...

Liu, Hui — University of Bremen


Acoustic Event Detection: Feature, Evaluation and Dataset Design

It takes more time to think of a silent scene, action or event than finding one that emanates sound. Not only speaking or playing music but almost everything that happens is accompanied with or results in one or more sounds mixed together. This makes acoustic event detection (AED) one of the most researched topics in audio signal processing nowadays and it will probably not see a decline anywhere in the near future. This is due to the thirst for understanding and digitally abstracting more and more events in life via the enormous amount of recorded audio through thousands of applications in our daily routine. But it is also a result of two intrinsic properties of audio: it doesn’t need a direct sight to be perceived and is less intrusive to record when compared to image or video. Many applications such ...

Mina Mounir — KU Leuven, ESAT STADIUS


Detection of epileptic seizures based on video and accelerometer recordings

Epilepsy is one of the most common neurological diseases, especially in children. And although the majority of patients can be treated through medication or surgery (70%-75%), a significant group of patients cannot be treated. For this latter group of patients it is advisable to follow the evolution of the disease. This can be done through a long-term automatic monitoring, which gives an objective measure of the number of seizures that the patient has, for example during the night. On the other hand, there is a reduced social control overnight and the parents or caregivers can miss some seizures. In severe seizures, it is sometimes necessary, however, to avoid dangerous situations during or after the seizure (e.g. the danger of suffocation caused by vomiting or a position that obstructs breathing, or the risk of injury during violent movements), and to comfort ...

Cuppens, Kris — Katholieke Universiteit Leuven


Sound Event Detection by Exploring Audio Sequence Modelling

Everyday sounds in real-world environments are a powerful source of information by which humans can interact with their environments. Humans can infer what is happening around them by listening to everyday sounds. At the same time, it is a challenging task for a computer algorithm in a smart device to automatically recognise, understand, and interpret everyday sounds. Sound event detection (SED) is the process of transcribing an audio recording into sound event tags with onset and offset time values. This involves classification and segmentation of sound events in the given audio recording. SED has numerous applications in everyday life which include security and surveillance, automation, healthcare monitoring, multimedia information retrieval, and assisted living technologies. SED is to everyday sounds what automatic speech recognition (ASR) is to speech and automatic music transcription (AMT) is to music. The fundamental questions in designing ...

[Pankajakshan], [Arjun] — Queen Mary University of London


Video Based Detection of Driver Fatigue

This thesis addresses the problem of drowsy driver detection using computer vision techniques applied to the human face. Specifically we explore the possibility of discriminating drowsy from alert video segments using facial expressions automatically extracted from video. Several approaches were previously proposed for the detection and prediction of drowsiness. There has recently been increasing interest in computer vision approaches as it is a potentially promising approach due to its non-invasive nature for detecting drowsiness. Previous studies with vision based approaches detect driver drowsiness primarily by making pre-assumptions about the relevant behavior, focusing on blink rate, eye closure, and yawning. Here we employ machine learning to explore, understand and exploit actual human behavior during drowsiness episodes. We have collected two datasets including facial and head movement measures. Head motion is collected through an accelerometer for the first dataset (UYAN-1) and an ...

Vural, Esra — Sabanci University


Visual ear detection and recognition in unconstrained environments

Automatic ear recognition systems have seen increased interest over recent years due to multiple desirable characteristics. Ear images used in such systems can typically be extracted from profile head shots or video footage. The acquisition procedure is contactless and non-intrusive, and it also does not depend on the cooperation of the subjects. In this regard, ear recognition technology shares similarities with other image-based biometric modalities. Another appealing property of ear biometrics is its distinctiveness. Recent studies even empirically validated existing conjectures that certain features of the ear are distinct for identical twins. This fact has significant implications for security-related applications and puts ear images on a par with epigenetic biometric modalities, such as the iris. Ear images can also supplement other biometric modalities in automatic recognition systems and provide identity cues when other information is unreliable or even unavailable. In ...

Emeršič, Žiga — University of Ljubljana, Faculty of Computer and Information Science


Cognitive Models for Acoustic and Audiovisual Sound Source Localization

Sound source localization algorithms have a long research history in the field of digital signal processing. Many common applications like intelligent personal assistants, teleconferencing systems and methods for technical diagnosis in acoustics require an accurate localization of sound sources in the environment. However, dynamic environments entail a particular challenge for these systems. For instance, voice controlled smart home applications, where the speaker, as well as potential noise sources, are moving within the room, are a typical example of dynamic environments. Classical sound source localization systems only have limited capabilities to deal with dynamic acoustic scenarios. In this thesis, three novel approaches to sound source localization that extend existing classical methods will be presented. The first system is proposed in the context of audiovisual source localization. Determining the position of sound sources in adverse acoustic conditions can be improved by including ...

Schymura, Christopher — Ruhr University Bochum


Multi-channel EMG pattern classification based on deep learning

In recent years, a huge body of data generated by various applications in domains like social networks and healthcare have paved the way for the development of high performance models. Deep learning has transformed the field of data analysis by dramatically improving the state of the art in various classification and prediction tasks. Combined with advancements in electromyography it has given rise to new hand gesture recognition applications, such as human computer interfaces, sign language recognition, robotics control and rehabilitation games. The purpose of this thesis is to develop novel methods for electromyography signal analysis based on deep learning for the problem of hand gesture recognition. Specifically, we focus on methods for data preparation and developing accurate models even when few data are available. Electromyography signals are in general one-dimensional time-series with a rich frequency content. Various feature sets have ...

Tsinganos, Panagiotis — University of Patras, Greece - Vrije Universiteit Brussel, Belgium


Non-rigid Registration-based Data-driven 3D Facial Action Unit Detection

Automated analysis of facial expressions has been an active area of study due to its potential applications not only for intelligent human-computer interfaces but also for human facial behavior research. To advance automatic expression analysis, this thesis proposes and empirically proves two hypotheses: (i) 3D face data is a better data modality than conventional 2D camera images, not only for being much less disturbed by illumination and head pose effects but also for capturing true facial surface information. (ii) It is possible to perform detailed face registration without resorting to any face modeling. This means that data-driven methods in automatic expression analysis can compensate for the confounding effects like pose and physiognomy differences, and can process facial features more effectively, without suffering the drawbacks of model-driven analysis. Our study is based upon Facial Action Coding System (FACS) as this paradigm ...

Savran, Arman — Bogazici University


Fire Detection Algorithms Using Multimodal Signal and Image Analysis

Dynamic textures are common in natural scenes. Examples of dynamic textures in video include fire, smoke, clouds, volatile organic compound (VOC) plumes in infra-red (IR) videos, trees in the wind, sea and ocean waves, etc. Researchers extensively studied 2-D textures and related problems in the fields of image processing and computer vision. On the other hand, there is very little research on dynamic texture detection in video. In this dissertation, signal and image processing methods developed for detection of a specific set of dynamic textures are presented. Signal and image processing methods are developed for the detection of flames and smoke in open and large spaces with a range of up to $30$m to the camera in visible-range (IR) video. Smoke is semi-transparent at the early stages of fire. Edges present in image frames with smoke start loosing their sharpness ...

Toreyin, Behcet Ugur — Bilkent University


Central and peripheral mechanisms: a multimodal approach to understanding and restoring human motor control

All human actions involve motor control. Even the simplest movement requires the coordinated recruitment of many muscles, orchestrated by neuronal circuits in the brain and the spinal cord. As a consequence, lesions affecting the central nervous system, such as stroke, can lead to a wide range of motor impairments. While a certain degree of recovery can often be achieved by harnessing the plasticity of the motor hierarchy, patients typically struggle to regain full motor control. In this context, technology-assisted interventions offer the prospect of intense, controllable and quantifiable motor training. Yet, clinical outcomes remain comparable to conventional approaches, suggesting the need for a paradigm shift towards customized knowledge-driven treatments to fully exploit their potential. In this thesis, we argue that a detailed understanding of healthy and impaired motor pathways can foster the development of therapies optimally engaging plasticity. To this ...

Kinany, Nawal — Ecole Polytechnique Fédérale de Lausanne (EPFL)


Emotion assessment for affective computing based on brain and peripheral signals

Current Human-Machine Interfaces (HMI) lack of “emotional intelligence”, i.e. they are not able to identify human emotional states and take this information into account to decide on the proper actions to execute. The goal of affective computing is to fill this lack by detecting emotional cues occurring during Human-Computer Interaction (HCI) and synthesizing emotional responses. In the last decades, most of the studies on emotion assessment have focused on the analysis of facial expressions and speech to determine the emotional state of a person. Physiological activity also includes emotional information that can be used for emotion assessment but has received less attention despite of its advantages (for instance it can be less easily faked than facial expressions). This thesis reports on the use of two types of physiological activities to assess emotions in the context of affective computing: the activity ...

Chanel, Guillaume — University of Geneva

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.