Voice biometric system security: Design and analysis of countermeasures for replay attacks

Voice biometric systems use automatic speaker verification (ASV) technology for user authentication. Even if it is among the most convenient means of biometric authentication, the robustness and security of ASV in the face of spoofing attacks (or presentation attacks) is of growing concern and is now well acknowledged by the research community. A spoofing attack involves illegitimate access to personal data of a targeted user. Replay is among the simplest attacks to mount - yet difficult to detect reliably and is the focus of this thesis. This research focuses on the analysis and design of existing and novel countermeasures for replay attack detection in ASV, organised in two major parts. The first part of the thesis investigates existing methods for spoofing detection from several perspectives. I first study the generalisability of hand-crafted features for replay detection that show promising results ...

Bhusan Chettri — Queen Mary University of London


Deep Learning for Event Detection, Sequence Labelling and Similarity Estimation in Music Signals

When listening to music, some humans can easily recognize which instruments play at what time or when a new musical segment starts, but cannot describe exactly how they do this. To automatically describe particular aspects of a music piece – be it for an academic interest in emulating human perception, or for practical applications –, we can thus not directly replicate the steps taken by a human. We can, however, exploit that humans can easily annotate examples, and optimize a generic function to reproduce these annotations. In this thesis, I explore solving different music perception tasks with deep learning, a recent branch of machine learning that optimizes functions of many stacked nonlinear operations – referred to as deep neural networks – and promises to obtain better results or require less domain knowledge than more traditional techniques. In particular, I employ ...

Schlüter, Jan — Department of Computational Perception, Johannes Kepler University Linz


Machine learning methods for multiple sclerosis classification and prediction using MRI brain connectivity

In this thesis, the power of Machine Learning (ML) algorithms is combined with brain connectivity patterns, using Magnetic Resonance Imaging (MRI), for classification and prediction of Multiple Sclerosis (MS). White Matter (WM) as well as Grey Matter (GM) graphs are studied as connectome data types. The thesis addresses three main research objectives. The first objective aims to generate realistic brain connectomes data for improving the classification of MS clinical profiles in cases of data scarcity and class imbalance. To solve the problem of limited and imbalanced data, a Generative Adversarial Network (GAN) was developed for the generation of realistic and biologically meaningful connec- tomes. This network achieved a 10% better MS classification performance compared to classical approaches. As second research objective, we aim to improve classification of MS clinical profiles us- ing morphological features only extracted from GM brain tissue. ...

Barile, Berardino — KU Leuven


Wireless Localization via Learned Channel Features in Massive MIMO Systems

Future wireless networks will evolve to integrate communication, localization, and sensing capabilities. This evolution is driven by emerging application platforms such as digital twins, on the one hand, and advancements in wireless technologies, on the other, characterized by increased bandwidths, more antennas, and enhanced computational power. Crucial to this development is the application of artificial intelligence (AI), which is set to harness the vast amounts of available data in the sixth-generation (6G) of mobile networks and beyond. Integrating AI and machine learning (ML) algorithms, in particular, with wireless localization offers substantial opportunities to refine communication systems, improve the ability of wireless networks to locate the users precisely, enable context-aware transmission, and utilize processing and energy resources more efficiently. In this dissertation, advanced ML algorithms for enhanced wireless localization are proposed. Motivated by the capabilities of deep neural networks (DNNs) and ...

Artan Salihu — TU Wien


Data-driven Speech Enhancement: from Non-negative Matrix Factorization to Deep Representation Learning

In natural listening environments, speech signals are easily distorted by variousacoustic interference, which reduces the speech quality and intelligibility of human listening; meanwhile, it makes difficult for many speech-related applications, such as automatic speech recognition (ASR). Thus, many speech enhancement (SE) algorithms have been developed in the past decades. However, most current SE algorithms are difficult to capture underlying speech information (e.g., phoneme) in the SE process. This causes it to be challenging to know what specific information is lost or interfered with in the SE process, which limits the application of enhanced speech. For instance, some SE algorithms aimed to improve human listening usually damage the ASR system. The objective of this dissertation is to develop SE algorithms that have the potential to capture various underlying speech representations (information) and improve the quality and intelligibility of noisy speech. This ...

Xiang, Yang — Aalborg University, Capturi A/S


Non-linear Spatial Filtering for Multi-channel Speech Enhancement

A large part of human speech communication takes place in noisy environments and is supported by technical devices. For example, a hearing-impaired person might use a hearing aid to take part in a conversation in a busy restaurant. These devices, but also telecommunication in noisy environments or voiced-controlled assistants, make use of speech enhancement and separation algorithms that improve the quality and intelligibility of speech by separating speakers and suppressing background noise as well as other unwanted effects such as reverberation. If the devices are equipped with more than one microphone, which is very common nowadays, then multi-channel speech enhancement approaches can leverage spatial information in addition to single-channel tempo-spectral information to perform the task. Traditionally, linear spatial filters, so-called beamformers, have been employed to suppress the signal components from other than the target direction and thereby enhance the desired ...

Tesch, Kristina — Universität Hamburg


Advanced time-domain methods for nuclear magnetic resonance spectroscopy data analysis

Over the past years magnetic resonance spectroscopy (MRS) has been of significant importance both as a fundamental research technique in different fields, as well as a diagnostic tool in medical environments. With MRS, for example, spectroscopic information, such as the concentrations of chemical substances, can be determined non-invasively. To that end, the signals are first modeled by an appropriate model function and mathematical techniques are subsequently applied to determine the model parameters. In this thesis, signal processing algorithms are developed to quantify in-vivo and ex-vivo MRS signals. These are usually characterized by a poor signal-to-noise ratio, overlapping peaks, deviations from the model function and in some cases the presence of disturbing components (e.g. the residual water in proton spectra). The work presented in this thesis addresses a part of the total effort to provide accurate, efficient and automatic data analysis ...

Vanhamme, Leentje — Katholieke Universiteit Leuven


Representation and Metric Learning Advances for Deep Neural Network Face and Speaker Biometric Systems

The increasing use of technological devices and biometric recognition systems in people daily lives has motivated a great deal of research interest in the development of effective and robust systems. However, there are still some challenges to be solved in these systems when Deep Neural Networks (DNNs) are employed. For this reason, this thesis proposes different approaches to address these issues. First of all, we have analyzed the effect of introducing the most widespread DNN architectures to develop systems for face and text-dependent speaker verification tasks. In this analysis, we observed that state-of-the-art DNNs established for many tasks, including face verification, did not perform efficiently for text-dependent speaker verification. Therefore, we have conducted a study to find the cause of this poor performance and we have noted that under certain circumstances this problem is due to the use of a ...

Mingote, Victoria — University of Zaragoza


Sound Event Detection by Exploring Audio Sequence Modelling

Everyday sounds in real-world environments are a powerful source of information by which humans can interact with their environments. Humans can infer what is happening around them by listening to everyday sounds. At the same time, it is a challenging task for a computer algorithm in a smart device to automatically recognise, understand, and interpret everyday sounds. Sound event detection (SED) is the process of transcribing an audio recording into sound event tags with onset and offset time values. This involves classification and segmentation of sound events in the given audio recording. SED has numerous applications in everyday life which include security and surveillance, automation, healthcare monitoring, multimedia information retrieval, and assisted living technologies. SED is to everyday sounds what automatic speech recognition (ASR) is to speech and automatic music transcription (AMT) is to music. The fundamental questions in designing ...

[Pankajakshan], [Arjun] — Queen Mary University of London


Diplophonic Voice - Definitions, models, and detection

Voice disorders need to be better understood because they may lead to reduced job chances and social isolation. Correct treatment indication and treatment effect measurements are needed to tackle these problems. They must rely on robust outcome measures for clinical intervention studies. Diplophonia is a severe and often misunderstood sign of voice disorders. Depending on its underlying etiology, diplophonic patients typically receive treatment such as logopedic therapy or phonosurgery. In the current clinical practice diplophonia is determined auditively by the medical doctor, which is problematic from the viewpoints of evidence-based medicine and scientific methodology. The aim of this thesis is to work towards objective (i.e., automatic) detection of diplophonia. A database of 40 euphonic, 40 diplophonic and 40 dysphonic subjects has been acquired. The collected material consists of laryngeal high-speed videos and simultaneous high-quality audio recordings. All material has been ...

Aichinger, Philipp — Division of Phoniatrics-Logopedics, Department of Otorhinolaryngology, Medical University of Vienna; Signal Processing and Speech Communication Laboratory Graz University of Technology, Austria


Making music through real-time voice timbre analysis: machine learning and timbral control

People can achieve rich musical expression through vocal sound -- see for example human beatboxing, which achieves a wide timbral variety through a range of extended techniques. Yet the vocal modality is under-exploited as a controller for music systems. If we can analyse a vocal performance suitably in real time, then this information could be used to create voice-based interfaces with the potential for intuitive and fulfilling levels of expressive control. Conversely, many modern techniques for music synthesis do not imply any particular interface. Should a given parameter be controlled via a MIDI keyboard, or a slider/fader, or a rotary dial? Automatic vocal analysis could provide a fruitful basis for expressive interfaces to such electronic musical instruments. The principal questions in applying vocal-based control are how to extract musically meaningful information from the voice signal in real time, and how ...

Stowell, Dan — Queen Mary University of London


Representation Learning in Distributed Networks

The effectiveness of machine learning (ML) in today's applications largely depends on the goodness of the representation of data used within the ML algorithms. While the massiveness in dimension of modern day data often requires lower-dimensional data representations in many applications for efficient use of available computational resources, the use of uncorrelated features is also known to enhance the performance of ML algorithms. Thus, an efficient representation learning solution should focus on dimension reduction as well as uncorrelated feature extraction. Even though Principal Component Analysis (PCA) and linear autoencoders are fundamental data preprocessing tools that are largely used for dimension reduction, when engineered properly they can also be used to extract uncorrelated features. At the same time, factors like ever-increasing volume of data or inherently distributed data generation impede the use of existing centralized solutions for representation learning that require ...

Gang, Arpita — Rutgers University-New Brunswick


Respiratory sinus arrhythmia estimation : closing the gap between research and applications

The respiratory sinus arrhythmia (RSA) is a form of cardiorespiratory coupling in which the heart rate accelerates during inhalation and decelerates during exhalation. Its quantification has been suggested as a tool to assess different diseases and conditions. However, whilst the potential of the RSA estimation as a diagnostic tool is shown in research works, its use in clinical practice and mobile applications is rather limited. This can be attributed to the lack of understanding of the mechanisms generating the RSA. To try to explain the RSA, studies are done using noninvasive signals, namely, respiration and heart rate variability (HRV), which are combined using different algorithms. Nevertheless, the algorithms are not standardized, making it difficult to draw solid conclusions from these studies. Therefore, the first aim of this thesis was to develop a framework to evaluate algorithms for RSA estimation. To ...

Morales, John — KU Leuven


Machine Learning-Aided Monitoring and Prediction of Respiratory and Neurodegenerative Diseases Using Wearables

This thesis focuses on wearables for health status monitoring, covering applications aimed at emergency solutions to the COVID-19 pandemic and aging society. The methods of ambient assisted living (AAL) are presented for the neurodegenerative disease Parkinson’s disease (PD), facilitating ’aging in place’ thanks to machine learning and around wearables - solutions of mHealth. Furthermore, the approaches using machine learning and wearables are discussed for early-stage COVID-19 detection, with encouraging accuracy. Firstly, a publicly available dataset containing COVID-19, influenza, and healthy control data was reused for research purposes. The solution presented in this thesis is considering the classification problem and outperformed the state-of-the-art methods, whereas the original paper introduced just anomaly detection and not shown the specificity of the created models. The proposed model in the thesis for early detection of COVID-19 achieved 78 % for the k-NN classifier. Moreover, a ...

Justyna Skibińska — Brno University of Technology & Tampere University


Machine Learning For Data-Driven Signal Separation and Interference Mitigation in Radio-Frequency Communications

Single-channel source separation for radio-frequency (RF) systems is a challenging problem relevant to key applications, including wireless communications, radar, and spectrum monitoring. This thesis addresses the challenge by focusing on data-driven approaches for source separation, leveraging datasets of sample realizations when source models are not explicitly provided. To this end, deep learning techniques are employed as function approximations for source separation, with models trained using available data. Two problem abstractions are studied as benchmarks for our proposed deep-learning approaches. Through a simplified problem involving Orthogonal Frequency Division Multiplexing (OFDM), we reveal the limitations of existing deep learning solutions and suggest modifications that account for the signal modality for improved performance. Further, we study the impact of time shifts on the formulation of an optimal estimator for cyclostationary Gaussian time series, serving as a performance lower bound for evaluating data-driven methods. ...

Lee, Cheng Feng Gary — Massachusetts Institute of Technology

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.