Vision-based human activities recognition in supervised or assisted environment (2022)
Visual Analysis of Faces with Application in Biometrics, Forensics and Health Informatics
Computer vision-based analysis of human facial video provides information regarding to expression, diseases symptoms, and physiological parameters such as heartbeat rate, blood pressure and respiratory rate. It also provides a convenient source of heartbeat signal to be used in biometrics and forensics. This thesis is a collection of works done in five themes in the realm of computer vision-based facial image analysis: Monitoring elderly patients at private homes, Face quality assessment, Measurement of physiological parameters, Contact-free heartbeat biometrics, and Decision support system for healthcare. The work related to monitoring elderly patients at private homes includes a detailed survey and review of the monitoring technologies relevant to older patients living at home by discussing previous reviews and relevant taxonomies, different scenarios for home monitoring solutions for older patients, sensing and data acquisition techniques, data processing and analysis techniques, available datasets for ...
Haque, Mohammad Ahsanul — Aalborg Univeristy
Contributions to Human Motion Modeling and Recognition using Non-intrusive Wearable Sensors
This thesis contributes to motion characterization through inertial and physiological signals captured by wearable devices and analyzed using signal processing and deep learning techniques. This research leverages the possibilities of motion analysis for three main applications: to know what physical activity a person is performing (Human Activity Recognition), to identify who is performing that motion (user identification) or know how the movement is being performed (motor anomaly detection). Most previous research has addressed human motion modeling using invasive sensors in contact with the user or intrusive sensors that modify the user’s behavior while performing an action (cameras or microphones). In this sense, wearable devices such as smartphones and smartwatches can collect motion signals from users during their daily lives in a less invasive or intrusive way. Recently, there has been an exponential increase in research focused on inertial-signal processing to ...
Gil-Martín, Manuel — Universidad Politécnica de Madrid
Deep Learning Techniques for Visual Counting
The explosion of Deep Learning (DL) added a boost to the already rapidly developing field of Computer Vision to such a point that vision-based tasks are now parts of our everyday lives. Applications such as image classification, photo stylization, or face recognition are nowadays pervasive, as evidenced by the advent of modern systems trivially integrated into mobile applications. In this thesis, we investigated and enhanced the visual counting task, which automatically estimates the number of objects in still images or video frames. Recently, due to the growing interest in it, several Convolutional Neural Network (CNN)-based solutions have been suggested by the scientific community. These artificial neural networks, inspired by the organization of the animal visual cortex, provide a way to automatically learn effective representations from raw visual data and can be successfully employed to address typical challenges characterizing this task, ...
Ciampi Luca — University of Pisa
Biosignal processing and activity modeling for multimodal human activity recognition
This dissertation's primary goal was to systematically study human activity recognition and enhance its performance by advancing human activities' sequential modeling based on HMM-based machine learning. Driven by these purposes, this dissertation has the following major contributions: The proposal of our HAR research pipeline that guides the building of a robust wearable end-to-end HAR system and the implementation of the recording and recognition software Activity Signal Kit (ASK) according to the pipeline; Collecting several datasets of multimodal biosignals from over 25 subjects using the self-implemented ASK software and implementing an easy mechanism to segment and annotate the data; The comprehensive research on the offline HAR system based on the recorded datasets and the implementation of an end-to-end real-time HAR system; A novel activity modeling method for HAR, which partitions the human activity into a sequence of shared, meaningful, and activity ...
Liu, Hui — University of Bremen
Acoustic Event Detection: Feature, Evaluation and Dataset Design
It takes more time to think of a silent scene, action or event than finding one that emanates sound. Not only speaking or playing music but almost everything that happens is accompanied with or results in one or more sounds mixed together. This makes acoustic event detection (AED) one of the most researched topics in audio signal processing nowadays and it will probably not see a decline anywhere in the near future. This is due to the thirst for understanding and digitally abstracting more and more events in life via the enormous amount of recorded audio through thousands of applications in our daily routine. But it is also a result of two intrinsic properties of audio: it doesn’t need a direct sight to be perceived and is less intrusive to record when compared to image or video. Many applications such ...
Mina Mounir — KU Leuven, ESAT STADIUS
Predictive modelling and deep learning for quantifying human health
Machine learning and deep learning techniques have emerged as powerful tools for addressing complex challenges across diverse domains. These methodologies are powerful because they extract patterns and insights from large and complex datasets, automate decision-making processes, and continuously improve over time. They enable us to observe and quantify patterns in data that a normal human would not be able to capture, leading to deeper insights and more accurate predictions. This dissertation presents two research papers that leverage these methodologies to tackle distinct yet interconnected problems in neuroimaging and computer vision for the quantification of human health. The first investigation, "Age prediction using resting-state functional MRI," addresses the challenge of understanding brain aging. By employing the Least Absolute Shrinkage and Selection Operator (LASSO) on resting-state functional MRI (rsfMRI) data, we identify the most predictive correlations related to brain age. Our study, ...
Chang Jose — National Cheng Kung University
Detection of epileptic seizures based on video and accelerometer recordings
Epilepsy is one of the most common neurological diseases, especially in children. And although the majority of patients can be treated through medication or surgery (70%-75%), a significant group of patients cannot be treated. For this latter group of patients it is advisable to follow the evolution of the disease. This can be done through a long-term automatic monitoring, which gives an objective measure of the number of seizures that the patient has, for example during the night. On the other hand, there is a reduced social control overnight and the parents or caregivers can miss some seizures. In severe seizures, it is sometimes necessary, however, to avoid dangerous situations during or after the seizure (e.g. the danger of suffocation caused by vomiting or a position that obstructs breathing, or the risk of injury during violent movements), and to comfort ...
Cuppens, Kris — Katholieke Universiteit Leuven
Sound Event Detection by Exploring Audio Sequence Modelling
Everyday sounds in real-world environments are a powerful source of information by which humans can interact with their environments. Humans can infer what is happening around them by listening to everyday sounds. At the same time, it is a challenging task for a computer algorithm in a smart device to automatically recognise, understand, and interpret everyday sounds. Sound event detection (SED) is the process of transcribing an audio recording into sound event tags with onset and offset time values. This involves classification and segmentation of sound events in the given audio recording. SED has numerous applications in everyday life which include security and surveillance, automation, healthcare monitoring, multimedia information retrieval, and assisted living technologies. SED is to everyday sounds what automatic speech recognition (ASR) is to speech and automatic music transcription (AMT) is to music. The fundamental questions in designing ...
[Pankajakshan], [Arjun] — Queen Mary University of London
Video Based Detection of Driver Fatigue
This thesis addresses the problem of drowsy driver detection using computer vision techniques applied to the human face. Specifically we explore the possibility of discriminating drowsy from alert video segments using facial expressions automatically extracted from video. Several approaches were previously proposed for the detection and prediction of drowsiness. There has recently been increasing interest in computer vision approaches as it is a potentially promising approach due to its non-invasive nature for detecting drowsiness. Previous studies with vision based approaches detect driver drowsiness primarily by making pre-assumptions about the relevant behavior, focusing on blink rate, eye closure, and yawning. Here we employ machine learning to explore, understand and exploit actual human behavior during drowsiness episodes. We have collected two datasets including facial and head movement measures. Head motion is collected through an accelerometer for the first dataset (UYAN-1) and an ...
Vural, Esra — Sabanci University
Cognitive Models for Acoustic and Audiovisual Sound Source Localization
Sound source localization algorithms have a long research history in the field of digital signal processing. Many common applications like intelligent personal assistants, teleconferencing systems and methods for technical diagnosis in acoustics require an accurate localization of sound sources in the environment. However, dynamic environments entail a particular challenge for these systems. For instance, voice controlled smart home applications, where the speaker, as well as potential noise sources, are moving within the room, are a typical example of dynamic environments. Classical sound source localization systems only have limited capabilities to deal with dynamic acoustic scenarios. In this thesis, three novel approaches to sound source localization that extend existing classical methods will be presented. The first system is proposed in the context of audiovisual source localization. Determining the position of sound sources in adverse acoustic conditions can be improved by including ...
Schymura, Christopher — Ruhr University Bochum
Visual ear detection and recognition in unconstrained environments
Automatic ear recognition systems have seen increased interest over recent years due to multiple desirable characteristics. Ear images used in such systems can typically be extracted from profile head shots or video footage. The acquisition procedure is contactless and non-intrusive, and it also does not depend on the cooperation of the subjects. In this regard, ear recognition technology shares similarities with other image-based biometric modalities. Another appealing property of ear biometrics is its distinctiveness. Recent studies even empirically validated existing conjectures that certain features of the ear are distinct for identical twins. This fact has significant implications for security-related applications and puts ear images on a par with epigenetic biometric modalities, such as the iris. Ear images can also supplement other biometric modalities in automatic recognition systems and provide identity cues when other information is unreliable or even unavailable. In ...
Emeršič, Žiga — University of Ljubljana, Faculty of Computer and Information Science
Non-rigid Registration-based Data-driven 3D Facial Action Unit Detection
Automated analysis of facial expressions has been an active area of study due to its potential applications not only for intelligent human-computer interfaces but also for human facial behavior research. To advance automatic expression analysis, this thesis proposes and empirically proves two hypotheses: (i) 3D face data is a better data modality than conventional 2D camera images, not only for being much less disturbed by illumination and head pose effects but also for capturing true facial surface information. (ii) It is possible to perform detailed face registration without resorting to any face modeling. This means that data-driven methods in automatic expression analysis can compensate for the confounding effects like pose and physiognomy differences, and can process facial features more effectively, without suffering the drawbacks of model-driven analysis. Our study is based upon Facial Action Coding System (FACS) as this paradigm ...
Savran, Arman — Bogazici University
Multi-channel EMG pattern classification based on deep learning
In recent years, a huge body of data generated by various applications in domains like social networks and healthcare have paved the way for the development of high performance models. Deep learning has transformed the field of data analysis by dramatically improving the state of the art in various classification and prediction tasks. Combined with advancements in electromyography it has given rise to new hand gesture recognition applications, such as human computer interfaces, sign language recognition, robotics control and rehabilitation games. The purpose of this thesis is to develop novel methods for electromyography signal analysis based on deep learning for the problem of hand gesture recognition. Specifically, we focus on methods for data preparation and developing accurate models even when few data are available. Electromyography signals are in general one-dimensional time-series with a rich frequency content. Various feature sets have ...
Tsinganos, Panagiotis — University of Patras, Greece - Vrije Universiteit Brussel, Belgium
Fire Detection Algorithms Using Multimodal Signal and Image Analysis
Dynamic textures are common in natural scenes. Examples of dynamic textures in video include fire, smoke, clouds, volatile organic compound (VOC) plumes in infra-red (IR) videos, trees in the wind, sea and ocean waves, etc. Researchers extensively studied 2-D textures and related problems in the fields of image processing and computer vision. On the other hand, there is very little research on dynamic texture detection in video. In this dissertation, signal and image processing methods developed for detection of a specific set of dynamic textures are presented. Signal and image processing methods are developed for the detection of flames and smoke in open and large spaces with a range of up to $30$m to the camera in visible-range (IR) video. Smoke is semi-transparent at the early stages of fire. Edges present in image frames with smoke start loosing their sharpness ...
Toreyin, Behcet Ugur — Bilkent University
This thesis focuses on wearables for health status monitoring, covering applications aimed at emergency solutions to the COVID-19 pandemic and aging society. The methods of ambient assisted living (AAL) are presented for the neurodegenerative disease Parkinson’s disease (PD), facilitating ’aging in place’ thanks to machine learning and around wearables - solutions of mHealth. Furthermore, the approaches using machine learning and wearables are discussed for early-stage COVID-19 detection, with encouraging accuracy. Firstly, a publicly available dataset containing COVID-19, influenza, and healthy control data was reused for research purposes. The solution presented in this thesis is considering the classification problem and outperformed the state-of-the-art methods, whereas the original paper introduced just anomaly detection and not shown the specificity of the created models. The proposed model in the thesis for early detection of COVID-19 achieved 78 % for the k-NN classifier. Moreover, a ...
Justyna Skibińska — Brno University of Technology & Tampere University
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.