Disentanglement for improved data-driven modeling of dynamical systems

Modeling dynamical systems is a fundamental task in various scientific and engineering domains, requiring accurate predictions, robustness to varying conditions, and interpretability of the underlying mechanisms. Traditional data-driven approaches often struggle with long-term prediction accuracy, generalization to out-of-distribution (OOD) scenarios, and providing insights into the system's behavior. This thesis explores the integration of supervised disentanglement into deep learning models as a means to address these challenges. We begin by advancing the state-of-the-art in modeling wave propagation governed by the Saint-Venant equations. Utilizing U-Net architectures and purposefully designed training strategies, we develop deep learning models that significantly improve prediction accuracy. Through OOD analysis, we highlight the limitations of standard deep learning models in capturing complex spatiotemporal dynamics, demonstrating how integrating domain knowledge through architectural design and training practices can enhance model performance. We further extend our supervised disentanglement approach to high-dimensional ...

Stathi Fotiadis — Imperial College London


Digital Audio Processing Methods for Voice Pathology Detection

Voice pathology is a diverse field that includes various disorders affecting vocal quality and production. Using audio machine learning for voice pathology classification represents an innovative approach to diagnosing a wide range of voice disorders. Despite extensive research in this area, there remains a significant gap in the development of classifiers and their ability to adapt and generalize effectively. This thesis aims to address this gap by contributing new insights and methods. This research provides a comprehensive exploration of automatic voice pathology classification, focusing on challenges such as data limitations and the potential of integrating multiple modalities to enhance diagnostic accuracy and adaptability. To achieve generalization capabilities and enhance the flexibility of the classifier across diverse types of voice disorders, this research explores various datasets and pathology types comprehensively. It covers a broad range of voice disorders, including functional dysphonia, ...

Ioanna Miliaresi — University of Pireaus


Machine Learning For Data-Driven Signal Separation and Interference Mitigation in Radio-Frequency Communications

Single-channel source separation for radio-frequency (RF) systems is a challenging problem relevant to key applications, including wireless communications, radar, and spectrum monitoring. This thesis addresses the challenge by focusing on data-driven approaches for source separation, leveraging datasets of sample realizations when source models are not explicitly provided. To this end, deep learning techniques are employed as function approximations for source separation, with models trained using available data. Two problem abstractions are studied as benchmarks for our proposed deep-learning approaches. Through a simplified problem involving Orthogonal Frequency Division Multiplexing (OFDM), we reveal the limitations of existing deep learning solutions and suggest modifications that account for the signal modality for improved performance. Further, we study the impact of time shifts on the formulation of an optimal estimator for cyclostationary Gaussian time series, serving as a performance lower bound for evaluating data-driven methods. ...

Lee, Cheng Feng Gary — Massachusetts Institute of Technology


Deep Learning Techniques for Visual Counting

The explosion of Deep Learning (DL) added a boost to the already rapidly developing field of Computer Vision to such a point that vision-based tasks are now parts of our everyday lives. Applications such as image classification, photo stylization, or face recognition are nowadays pervasive, as evidenced by the advent of modern systems trivially integrated into mobile applications. In this thesis, we investigated and enhanced the visual counting task, which automatically estimates the number of objects in still images or video frames. Recently, due to the growing interest in it, several Convolutional Neural Network (CNN)-based solutions have been suggested by the scientific community. These artificial neural networks, inspired by the organization of the animal visual cortex, provide a way to automatically learn effective representations from raw visual data and can be successfully employed to address typical challenges characterizing this task, ...

Ciampi Luca — University of Pisa


Contributions to Human Motion Modeling and Recognition using Non-intrusive Wearable Sensors

This thesis contributes to motion characterization through inertial and physiological signals captured by wearable devices and analyzed using signal processing and deep learning techniques. This research leverages the possibilities of motion analysis for three main applications: to know what physical activity a person is performing (Human Activity Recognition), to identify who is performing that motion (user identification) or know how the movement is being performed (motor anomaly detection). Most previous research has addressed human motion modeling using invasive sensors in contact with the user or intrusive sensors that modify the user’s behavior while performing an action (cameras or microphones). In this sense, wearable devices such as smartphones and smartwatches can collect motion signals from users during their daily lives in a less invasive or intrusive way. Recently, there has been an exponential increase in research focused on inertial-signal processing to ...

Gil-Martín, Manuel — Universidad Politécnica de Madrid


Model-Based Deep Speech Enhancement for Improved Interpretability and Robustness

Technology advancements profoundly impact numerous aspects of life, including how we communicate and interact. For instance, hearing aids enable hearing-impaired or elderly people to participate comfortably in daily conversations; telecommunications equipment lifts distance constraints, enabling people to communicate remotely; smart machines are developed to interact with humans by understanding and responding to their instructions. These applications involve speech-based interaction not only between humans but also between humans and machines. However, the microphones mounted on these technical devices can capture both target speech and interfering sounds, posing challenges to the reliability of speech communication in noisy environments. For example, distorted speech signals may reduce communication fluency among participants during teleconferencing. Additionally, noise interference can negatively affect the speech recognition and understanding modules of a voice-controlled machine. This calls for speech enhancement algorithms to extract clean speech and suppress undesired interfering signals, ...

Fang, Huajian — University of Hamburg


Predictive modelling and deep learning for quantifying human health

Machine learning and deep learning techniques have emerged as powerful tools for addressing complex challenges across diverse domains. These methodologies are powerful because they extract patterns and insights from large and complex datasets, automate decision-making processes, and continuously improve over time. They enable us to observe and quantify patterns in data that a normal human would not be able to capture, leading to deeper insights and more accurate predictions. This dissertation presents two research papers that leverage these methodologies to tackle distinct yet interconnected problems in neuroimaging and computer vision for the quantification of human health. The first investigation, "Age prediction using resting-state functional MRI," addresses the challenge of understanding brain aging. By employing the Least Absolute Shrinkage and Selection Operator (LASSO) on resting-state functional MRI (rsfMRI) data, we identify the most predictive correlations related to brain age. Our study, ...

Chang Jose — National Cheng Kung University


Bipolar and high-density surface EMG to investigate electrical signs of muscular fatigue

Surface electromyography (sEMG) has become an indispensable tool, extensively used across various fields such as medical diagnosis, rehabilitation, sports science, and prosthetic control. Among these applications, the study of neuromuscular adaptations related to muscle fatigue stands out due to its complexity and the intricate physiological processes underlying muscle activity. This PhD thesis aims to address this challenge by exploring the use of bipolar and high-density surface EMG (HD-EMG) to study the electrical signs of muscle fatigue across different scenarios. The primary objective is to advance our understanding of the neuromuscular system's strategies during fatigue and to use non-invasive sEMG as a reliable method for accurately detecting and characterizing the progression of muscle fatigue. This research is structured around several key questions addressing different aspects of muscle fatigue assessment. The first part focuses on evaluating various spectral estimation techniques, as changes ...

Corvini Giovanni — Roma Tre University


Deep Learning for Event Detection, Sequence Labelling and Similarity Estimation in Music Signals

When listening to music, some humans can easily recognize which instruments play at what time or when a new musical segment starts, but cannot describe exactly how they do this. To automatically describe particular aspects of a music piece – be it for an academic interest in emulating human perception, or for practical applications –, we can thus not directly replicate the steps taken by a human. We can, however, exploit that humans can easily annotate examples, and optimize a generic function to reproduce these annotations. In this thesis, I explore solving different music perception tasks with deep learning, a recent branch of machine learning that optimizes functions of many stacked nonlinear operations – referred to as deep neural networks – and promises to obtain better results or require less domain knowledge than more traditional techniques. In particular, I employ ...

Schlüter, Jan — Department of Computational Perception, Johannes Kepler University Linz


Robust Lung Sound and Acoustic Scene Classification

Auscultation with a stethoscope enables us to recognize pathological changes of the lung. It is a fast and inexpensive diagnosis method. However, it has several disadvantages: subjectiveness, i.e. the lung sound evaluation depends on the experience of physicians, can not provide continuous monitoring and a trained expert is required. Furthermore, the characteristics of the lung sounds are in the low frequency range, where the human hearing has limited sensitivity and is susceptible to noise artifacts. Exploiting the advances in digital recording devices, signal processing and machine learning, computational methods for the analysis of lung sounds have been a successful and effective approach. Computational lung sound analysis is beneficial for computer-supported diagnosis, digital storage and monitoring in critical care. Beside computational lung sound analysis, the recognition of acoustic contextual information is important in various applications. The motivation for recent research on ...

Truc Nguyen — SPSC - TUGraz


Improving Efficiency and Generalization in Deep Learning Models for Industrial Applications

Over the last decade, deep learning methods have gained increasing traction in indus trial applications, ranging from image-based automated quality control, over signal enhancement to condition monitoring tasks. While deep learning has immensely in creased the performance and capabilities of machine learning models, it also increased the vulnerability of those models. Moreover, these models require vast amounts of data in order to generalize well. This is problematic for industrial applications since the amount of available data is often limited and most practical applications outside the field of big data have to deal with scarce data. This is especially true for supervised tasks, as creating labeled datasets often involves expensive expert labor. In contrast, big data methods can rely on increasingly large datasets, solving the problem of generaliza tion on a data level, allowing for even bigger and more flexible models ...

Fuchs, Alexander — Graz University of Technology


Interpretable Machine Learning for Machine Listening

Recent years have witnessed a significant interest in interpretable machine learning (IML) research that develops techniques to analyse machine learning (ML) models. Understanding ML models is essential to gain trust in their predictions and to improve datasets, model architectures and training techniques. The majority of effort in IML research has been in analysing models that classify images or structured data and comparatively less work exists that analyses models for other domains. This research focuses on developing novel IML methods and on extending existing methods to understand machine listening models that analyse audio. In particular, this thesis reports the results of three studies that apply three different IML methods to analyse five singing voice detection (SVD) models that predict singing voice activity in musical audio excerpts. The first study introduces SoundLIME (SLIME), a method to generate temporal, spectral or time-frequency explanations ...

Mishra, Saumitra — Queen Mary University of London


Representation and Metric Learning Advances for Deep Neural Network Face and Speaker Biometric Systems

The increasing use of technological devices and biometric recognition systems in people daily lives has motivated a great deal of research interest in the development of effective and robust systems. However, there are still some challenges to be solved in these systems when Deep Neural Networks (DNNs) are employed. For this reason, this thesis proposes different approaches to address these issues. First of all, we have analyzed the effect of introducing the most widespread DNN architectures to develop systems for face and text-dependent speaker verification tasks. In this analysis, we observed that state-of-the-art DNNs established for many tasks, including face verification, did not perform efficiently for text-dependent speaker verification. Therefore, we have conducted a study to find the cause of this poor performance and we have noted that under certain circumstances this problem is due to the use of a ...

Mingote, Victoria — University of Zaragoza


Visual ear detection and recognition in unconstrained environments

Automatic ear recognition systems have seen increased interest over recent years due to multiple desirable characteristics. Ear images used in such systems can typically be extracted from profile head shots or video footage. The acquisition procedure is contactless and non-intrusive, and it also does not depend on the cooperation of the subjects. In this regard, ear recognition technology shares similarities with other image-based biometric modalities. Another appealing property of ear biometrics is its distinctiveness. Recent studies even empirically validated existing conjectures that certain features of the ear are distinct for identical twins. This fact has significant implications for security-related applications and puts ear images on a par with epigenetic biometric modalities, such as the iris. Ear images can also supplement other biometric modalities in automatic recognition systems and provide identity cues when other information is unreliable or even unavailable. In ...

Emeršič, Žiga — University of Ljubljana, Faculty of Computer and Information Science


Wireless Localization via Learned Channel Features in Massive MIMO Systems

Future wireless networks will evolve to integrate communication, localization, and sensing capabilities. This evolution is driven by emerging application platforms such as digital twins, on the one hand, and advancements in wireless technologies, on the other, characterized by increased bandwidths, more antennas, and enhanced computational power. Crucial to this development is the application of artificial intelligence (AI), which is set to harness the vast amounts of available data in the sixth-generation (6G) of mobile networks and beyond. Integrating AI and machine learning (ML) algorithms, in particular, with wireless localization offers substantial opportunities to refine communication systems, improve the ability of wireless networks to locate the users precisely, enable context-aware transmission, and utilize processing and energy resources more efficiently. In this dissertation, advanced ML algorithms for enhanced wireless localization are proposed. Motivated by the capabilities of deep neural networks (DNNs) and ...

Artan Salihu — TU Wien

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.