Integrating monaural and binaural cues for sound localization and segregation in reverberant environments

The problem of segregating a sound source of interest from an acoustic background has been extensively studied due to applications in hearing prostheses, robust speech/speaker recognition and audio information retrieval. Computational auditory scene analysis (CASA) approaches the segregation problem by utilizing grouping cues involved in the perceptual organization of sound by human listeners. Binaural processing, where input signals resemble those that enter the two ears, is of particular interest in the CASA field. The dominant approach to binaural segregation has been to derive spatially selective filters in order to enhance the signal in a direction of interest. As such, the problems of sound localization and sound segregation are closely tied. While spatial filtering has been widely utilized, substantial performance degradation is incurred in reverberant environments and more fundamentally, segregation cannot be performed without sufficient spatial separation between sources. This dissertation ...

Woodruff, John — The Ohio State University


Mixed structural models for 3D audio in virtual environments

In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...

Geronazzo, Michele — University of Padova


Contributions to Human Motion Modeling and Recognition using Non-intrusive Wearable Sensors

This thesis contributes to motion characterization through inertial and physiological signals captured by wearable devices and analyzed using signal processing and deep learning techniques. This research leverages the possibilities of motion analysis for three main applications: to know what physical activity a person is performing (Human Activity Recognition), to identify who is performing that motion (user identification) or know how the movement is being performed (motor anomaly detection). Most previous research has addressed human motion modeling using invasive sensors in contact with the user or intrusive sensors that modify the user’s behavior while performing an action (cameras or microphones). In this sense, wearable devices such as smartphones and smartwatches can collect motion signals from users during their daily lives in a less invasive or intrusive way. Recently, there has been an exponential increase in research focused on inertial-signal processing to ...

Gil-Martín, Manuel — Universidad Politécnica de Madrid


Speech derereverberation in noisy environments using time-frequency domain signal models

Reverberation is the sum of reflected sound waves and is present in any conventional room. Speech communication devices such as mobile phones in hands-free mode, tablets, smart TVs, teleconferencing systems, hearing aids, voice-controlled systems, etc. use one or more microphones to pick up the desired speech signals. When the microphones are not in the proximity of the desired source, strong reverberation and noise can degrade the signal quality at the microphones and can impair the intelligibility and the performance of automatic speech recognizers. Therefore, it is a highly demanded task to process the microphone signals such that reverberation and noise are reduced. The process of reducing or removing reverberation from recorded signals is called dereverberation. As dereverberation is usually a completely blind problem, where the only available information are the microphone signals, and as the acoustic scenario can be non-stationary, ...

Braun, Sebastian — Friedrich-Alexander Universität Erlangen-Nürnberg


Automated audio captioning with deep learning methods

In the audio research field, the majority of machine learning systems focus on recognizing a limited number of sound events. However, when a machine interacts with real data, it must be able to handle much more varied and complex situations. To tackle this problem, annotators use natural language, which allows any sound information to be summarized. Automated Audio Captioning (AAC) was introduced recently to develop systems capable of automatically producing a description of any type of sound in text form. This task concerns all kinds of sound events such as environmental, urban, domestic sounds, sound effects, music or speech. This type of system could be used by people who are deaf or hard of hearing, and could improve the indexing of large audio databases. In the first part of this thesis, we present the state of the art of the ...

Labbé, Étienne — IRIT


Non-intrusive Quality Evaluation of Speech Processed in Noisy and Reverberant Environments

In many speech applications such as hands-free telephony or voice-controlled home assistants, the distance between the user and the recording microphones can be relatively large. In such a far-field scenario, the recorded microphone signals are typically corrupted by noise and reverberation, which may severely degrade the performance of speech recognition systems and reduce intelligibility and quality of speech in communication applications. In order to limit these effects, speech enhancement algorithms are typically applied. The main objective of this thesis is to develop novel speech enhancement algorithms for noisy and reverberant environments and signal-based measures to evaluate these algorithms, focusing on solutions that are applicable in realistic scenarios. First, we propose a single-channel speech enhancement algorithm for joint noise and reverberation reduction. The proposed algorithm uses a spectral gain to enhance the input signal, where the gain is computed using a ...

Cauchi, Benjamin — University of Oldenburg


Deep Learning-based Speaker Verification In Real Conditions

Smart applications like speaker verification have become essential in verifying the user's identity for availing of personal assistants or online banking services based on the user's voice characteristics. However, far-field or distant speaker verification is constantly affected by surrounding noises which can severely distort the speech signal. Moreover, speech signals propagating in long-range get reflected by various objects in the surrounding area, which creates reverberation and further degrades the signal quality. This PhD thesis explores deep learning-based multichannel speech enhancement techniques to improve the performance of speaker verification systems in real conditions. Multichannel speech enhancement aims to enhance distorted speech using multiple microphones. It has become crucial to many smart devices, which are flexible and convenient for speech applications. Three novel approaches are proposed to improve the robustness of speaker verification systems in noisy and reverberated conditions. Firstly, we integrate ...

Dowerah Sandipana — Universite de Lorraine, CNRS, Inria, Loria


On Bayesian Methods for Black-Box Optimization: Efficiency, Adaptation and Reliability

Recent advances in many fields ranging from engineering to natural science, require increasingly complicated optimization tasks in the experiment design, for which the target objectives are generally in the form of black-box functions that are expensive to evaluate. In a common formulation of this problem, a designer is expected to solve the black-box optimization tasks via sequentially attempting candidate solutions and receiving feedback from the system. This thesis considers Bayesian optimization (BO) as the black-box optimization framework, and investigates the enhancements on BO from the aspects of efficiency, adaptation and reliability. Generally, BO consists of a surrogate model for providing probabilistic inference and an acquisition function which leverages the probabilistic inference for selecting the next candidate solution. Gaussian process (GP) is a prominent non-parametric surrogate model, and the quality of its inference is a critical factor on the optimality performance ...

Zhang, Yunchuan — King's College London


Deep Learning for i-Vector Speaker and Language Recognition

Over the last few years, i-vectors have been the state-of-the-art technique in speaker and language recognition. Recent advances in Deep Learning (DL) technology have improved the quality of i-vectors but the DL techniques in use are computationally expensive and need speaker or/and phonetic labels for the background data, which are not easily accessible in practice. On the other hand, the lack of speaker-labeled background data makes a big performance gap, in speaker recognition, between two well-known cosine and Probabilistic Linear Discriminant Analysis (PLDA) i-vector scoring techniques. It has recently been a challenge how to fill this gap without speaker labels, which are expensive in practice. Although some unsupervised clustering techniques are proposed to estimate the speaker labels, they cannot accurately estimate the labels. This thesis tries to solve the problems above by using the DL technology in different ways, without ...

Ghahabi, Omid — Universitat Politecnica de Catalunya


GRAPH-TIME SIGNAL PROCESSING: FILTERING AND SAMPLING STRATEGIES

The necessity to process signals living in non-Euclidean domains, such as signals de- fined on the top of a graph, has led to the extension of signal processing techniques to the graph setting. Among different approaches, graph signal processing distinguishes it- self by providing a Fourier analysis of these signals. Analogously to the Fourier transform for time and image signals, the graph Fourier transform decomposes the graph signals in terms of the harmonics provided by the underlying topology. For instance, a graph signal characterized by a slow variation between adjacent nodes has a low frequency content. Along with the graph Fourier transform, graph filters are the key tool to alter the graph frequency content of a graph signal. This thesis focuses on graph filters that are performed distributively in the node domain–that is, each node needs to exchange in- formation ...

Elvin Isufi — Delft University of Technology


Transmission over Time- and Frequency-Selective Mobile Wireless Channels

The wireless communication industry has experienced rapid growth in recent years, and digital cellular systems are currently designed to provide high data rates at high terminal speeds. High data rates give rise to intersymbol interference (ISI) due to so-called multipath fading. Such an ISI channel is called frequency selective. On the other hand, due to terminal mobility and/or receiver frequency offset the received signal is subject to frequency shifts (Doppler shifts). Doppler shift induces time-selectivity characteristics. The Doppler effect in conjunction with ISI gives rise to a so-called doubly selective channel (frequency- and time-selective). In addition to the channel effects, the analog front-end may suffer from an imbalance between the I and Q branch amplitudes and phases as well as from carrier frequency offset. These analog front-end imperfections then result in an additional and significant degradation in system performance, especially ...

Barhumi, Imad — Katholieke Universiteit Leuven


Machine Learning For Data-Driven Signal Separation and Interference Mitigation in Radio-Frequency Communications

Single-channel source separation for radio-frequency (RF) systems is a challenging problem relevant to key applications, including wireless communications, radar, and spectrum monitoring. This thesis addresses the challenge by focusing on data-driven approaches for source separation, leveraging datasets of sample realizations when source models are not explicitly provided. To this end, deep learning techniques are employed as function approximations for source separation, with models trained using available data. Two problem abstractions are studied as benchmarks for our proposed deep-learning approaches. Through a simplified problem involving Orthogonal Frequency Division Multiplexing (OFDM), we reveal the limitations of existing deep learning solutions and suggest modifications that account for the signal modality for improved performance. Further, we study the impact of time shifts on the formulation of an optimal estimator for cyclostationary Gaussian time series, serving as a performance lower bound for evaluating data-driven methods. ...

Lee, Cheng Feng Gary — Massachusetts Institute of Technology


Dynamic Scheme Selection in Image Coding

This thesis deals with the coding of images with multiple coding schemes and their dynamic selection. In our society of information highways, electronic communication is taking everyday a bigger place in our lives. The number of transmitted images is also increasing everyday. Therefore, research on image compression is still an active area. However, the current trend is to add several functionalities to the compression scheme such as progressiveness for more comfortable browsing of web-sites or databases. Classical image coding schemes have a rigid structure. They usually process an image as a whole and treat the pixels as a simple signal with no particular characteristics. Second generation schemes use the concept of objects in an image, and introduce a model of the human visual system in the design of the coding scheme. Dynamic coding schemes, as their name tells us, make ...

Fleury, Pascal — Swiss Federal Institute of Technology


Analysis and improvement of quantification algorithms for magnetic resonance spectroscopy

Magnetic Resonance Spectroscopy (MRS) is a technique used in fundamental research and in clinical environments. During recent years, clinical application of MRS gained importance, especially as a non-invasive tool for diagnosis and therapy monitoring of brain and prostate tumours. The most important asset of MRS is its ability to determine the concentration of chemical substances non-invasively. To extract relevant signal parameters, MRS data have to be quantified. This usually doesn¢t prove to be straightforward since in vivo MRS signals are characterized by poor signal-to-noise ratios, overlapping peaks, acquisition related artefacts and the presence of disturbing components (e.g. residual water in proton spectra). The work presented in this thesis aims to improve the quantification in different applications of MRS in vivo. To obtain the signal parameters related to MRS data, different approaches were suggested in the past. Black-box methods, don¢t require ...

Pels, Pieter — Katholieke Universiteit Leuven


Adaptive Digital Predistortion of Nonlinear Systems

Compensating or reducing the nonlinear distortion - usually resulting from a nonlinear system - is becoming an essential requirement in many areas. In this thesis adaptive digital predistortion techniques for a wide class of nonlinear systems are presented. For estimating the coefficients of the predistorter, different learning architectures are considered: the Direct Learning Architecture (DLA) and Indirect Learning Architecture (ILA). In the DLA approach, we propose a new adaptation algorithm - the Nonlinear Filtered-x Prediction Error Method (NFxPEM) algorithm, which has much faster convergence and much better performance compared to the conventional Nonlinear Filtered-x Least Mean Squares (NFxLMS) algorithm. All of these time domain adaptive algorithms require accurate system identification of the nonlinear system. In order to relax or avoid this strict requirement, the NFxLMS with Initial Subsystem Estimates (NFxLMS-ISE) and NFxPEM-ISE algorithms are proposed. Furthermore, we propose a frequency ...

Gan, Li — Graz University of Technology

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.