Multi-microphone noise reduction and dereverberation techniques for speech applications

In typical speech communication applications, such as hands-free mobile telephony, voice-controlled systems and hearing aids, the recorded microphone signals are corrupted by background noise, room reverberation and far-end echo signals. This signal degradation can lead to total unintelligibility of the speech signal and decreases the performance of automatic speech recognition systems. In this thesis several multi-microphone noise reduction and dereverberation techniques are developed. In Part I we present a Generalised Singular Value Decomposition (GSVD) based optimal filtering technique for enhancing multi-microphone speech signals which are degraded by additive coloured noise. Several techniques are presented for reducing the computational complexity and we show that the GSVD-based optimal filtering technique can be integrated into a `Generalised Sidelobe Canceller' type structure. Simulations show that the GSVD-based optimal filtering technique achieves a larger signal-to-noise ratio improvement than standard fixed and adaptive beamforming techniques and ...

Doclo, Simon — Katholieke Universiteit Leuven


Speech Modeling and Robust Estimation for Diagnosis of Parkinson's Disease

According to the Parkinson’s Foundation, more than 10 million people world- wide suffer from Parkinson’s disease (PD). The common symptoms are tremor, muscle rigidity and slowness of movement. There is no cure available cur- rently, but clinical intervention can help alleviate the symptoms significantly. Recently, it has been found that PD can be detected and telemonitored by voice signals, such as sustained phonation /a/. However, the voiced-based PD detector suffers from severe performance degradation in adverse envi- ronments, such as noise, reverberation and nonlinear distortion, which are common in uncontrolled settings. In this thesis, we focus on deriving speech modeling and robust estima- tion algorithms capable of improving the PD detection accuracy in adverse environments. Robust estimation algorithms using parametric modeling of voice signals are proposed. We present both segment-wise and sample-wise robust pitch tracking algorithms using the harmonic model. ...

Shi, Liming — Aalborg University


Pre-processing of Speech Signals for Robust Parameter Estimation

The topic of this thesis is methods of pre-processing speech signals for robust estimation of model parameters in models of these signals. Here, there is a special focus on the situation where the desired signal is contaminated by colored noise. In order to estimate the speech signal, or its voiced and unvoiced components, from a noisy observation, it is important to have robust estimators that can handle colored and non-stationary noise. Two important aspects are investigated. The first one is a robust estimation of the speech signal parameters, such as the fundamental frequency, which is required in many contexts. For this purpose, fast estimation methods based on a simple white Gaussian noise (WGN) assumption are often used. To keep using those methods, the noisy signal can be pre-processed using a filter. If the colored noise is modelled as an autoregressive ...

Esquivel Jaramillo, Alfredo — Aalborg University


Fundamental Frequency and Direction-of-Arrival Estimation for Multichannel Speech Enhancement

Audio systems receive the speech signals of interest usually in the presence of noise. The noise has profound impacts on the quality and intelligibility of the speech signals, and it is therefore clear that the noisy signals must be cleaned up before being played back, stored, or analyzed. We can estimate the speech signal of interest from the noisy signals using a priori knowledge about it. A human speech signal is broadband and consists of both voiced and unvoiced parts. The voiced part is quasi-periodic with a time-varying fundamental frequency (or pitch as it is commonly referred to). We consider the periodic signals basically as the sum of harmonics. Therefore, we can pass the noisy signals through bandpass filters centered at the frequencies of the harmonics to enhance the signal. In addition, although the frequencies of the harmonics are the ...

Karimian-Azari, Sam — Aalborg Univeristy


Sparse Modeling Heuristics for Parameter Estimation - Applications in Statistical Signal Processing

This thesis examines sparse statistical modeling on a range of applications in audio modeling, audio localizations, DNA sequencing, and spectroscopy. In the examined cases, the resulting estimation problems are computationally cumbersome, both as one often suffers from a lack of model order knowledge for this form of problems, but also due to the high dimensionality of the parameter spaces, which typically also yield optimization problems with numerous local minima. In this thesis, these problems are treated using sparse modeling heuristics, with the resulting criteria being solved using convex relaxations, inspired from disciplined convex programming ideas, to maintain tractability. The contributions to audio modeling and estimation focus on the estimation of the fundamental frequency of harmonically related sinusoidal signals, which is commonly used model for, e.g., voiced speech or tonal audio. We examine both the problems of estimating multiple audio sources ...

Adalbjörnsson, Stefan Ingi — Lund University


Oscillator-plus-Noise Modeling of Speech Signals

In this thesis we examine the autonomous oscillator model for synthesis of speech signals. The contributions comprise an analysis of realizations and training methods for the nonlinear function used in the oscillator model, the combination of the oscillator model with inverse filtering, both significantly increasing the number of `successfully' re-synthesized speech signals, and the introduction of a new technique suitable for the re-generation of the noise-like signal component in speech signals. Nonlinear function models are compared in a one-dimensional modeling task regarding their presupposition for adequate re-synthesis of speech signals, in particular considering stability. The considerations also comprise the structure of the nonlinear functions, with the aspect of the possible interpolation between models for different speech sounds. Both regarding stability of the oscillator and the premiss of a nonlinear function structure that may be pre-defined, RBF networks are found a ...

Rank, Erhard — Vienna University of Technology


Speech Enhancement for Disordered and Substitution Voices

This thesis presents methods to enhance the speech of patients with voice disorders or with substitution voices. The first method enhances speech of patients with laryngeal neoplasm. The enhancement enables a reduction of pitch and a strengthening of the harmonics of voiced segments as well as decreasing the perceived speaking effort. The need for reliable pitch mark determination on disordered and substitution voices led to the implementation of a state-space based algorithm. Its performance is comparable to a state-of-the art pitch detection algorithm but does not require post processing. A subsequent part of the thesis deals with alaryngeal speech, with a focus on Electro-Larynx (EL) speech. After investigating an EL speech production model, which takes into account the common source of the speech signal and the directly radiated EL (DREL) sound, a solution to suppress the direct sound is based ...

Hagmuller, Martin — Graz University of Technology


Enhancement of Periodic Signals: with Application to Speech Signals

The topic of this thesis is the enhancement of noisy, periodic signals with application to speech signals. Generally speaking, enhancement methods can be divided into signal- and noise-driven methods. In this thesis, we focus on the signal-driven approach by employing relevant signal parameters for the enhancement of periodic signals. The enhancement problem consists of two major subproblems: the estimation of relevant parameters or statistics, and the actual noise reduction of the observed signal. We consider both of these subproblems. First, we consider the problem of estimating signal parameters relevant to the enhancement of periodic signals. The fundamental frequency is one example of such a parameter. Furthermore, in multichannel scenarios, the direction-of-arrival of the periodic sources onto an array of sensors is another parameter of relevance. We propose methods for the estimation of the fundamental frequency that have benefits compared to ...

Jensen, Jesper Rindom — Aalborg University


The Bionic Electro-Larynx Speech System - Challenges, Investigations, and Solutions

Humans without larynx need to use a substitution voice to re-obtain speech. The electro-larynx (EL) is a widely used device but is known for its unnatural and monotonic speech quality. Previous research tackled these problems, but until now no significant improvements could be reported. The EL speech system is a complex system including hardware (artificial excitation source or sound transducer) and software (control and generation of the artificial excitation signal). It is not enough to consider one separated problem, but all aspects of the EL speech system need to be taken into account. In this thesis we would like to push forward the boundaries of the conventional EL device towards a new bionic electro-larynx speech system. We formulate two overall scenarios: a closed-loop scenario, where EL speech is excited and simultaneously recorded using an EL speech system, and the artificial ...

Fuchs, Anna Katharina — Graz University of Technology, Signal Processing and Speech Communication Laboratory


Informed spatial filters for speech enhancement

In modern devices which provide hands-free speech capturing functionality, such as hands-free communication kits and voice-controlled devices, the received speech signal at the microphones is corrupted by background noise, interfering speech signals, and room reverberation. In many practical situations, the microphones are not necessarily located near the desired source, and hence, the ratio of the desired speech power to the power of the background noise, the interfering speech, and the reverberation at the microphones can be very low, often around or even below 0 dB. In such situations, the comfort of human-to-human communication, as well as the accuracy of automatic speech recognisers for voice-controlled applications can be signi cantly degraded. Therefore, e ffective speech enhancement algorithms are required to process the microphone signals before transmitting them to the far-end side for communication, or before feeding them into a speech recognition ...

Taseska, Maja — Friedrich-Alexander Universität Erlangen-Nürnberg


Spatio-Temporal Speech Enhancement in Adverse Acoustic Conditions

Never before has speech been captured as often by electronic devices equipped with one or multiple microphones, serving a variety of applications. It is the key aspect in digital telephony, hearing devices, and voice-driven human-to-machine interaction. When speech is recorded, the microphones also capture a variety of further, undesired sound components due to adverse acoustic conditions. Interfering speech, background noise and reverberation, i.e. the persistence of sound in a room after excitation caused by a multitude of reflections on the room enclosure, are detrimental to the quality and intelligibility of target speech as well as the performance of automatic speech recognition. Hence, speech enhancement aiming at estimating the early target-speech component, which contains the direct component and early reflections, is crucial to nearly all speech-related applications presently available. In this thesis, we compare, propose and evaluate existing and novel approaches ...

Dietzen, Thomas — KU Leuven


Compressive Sensing of Cyclostationary Propeller Noise

This dissertation is the combination of three manuscripts –either published in or submitted to journals– on compressive sensing of propeller noise for detection, identification and localization of water crafts. Propeller noise, as a result of rotating blades, is broadband and radiates through water dominating underwater acoustic noise spectrum especially when cavitation develops. Propeller cavitation yields cyclostationary noise which can be modeled by amplitude modulation, i.e., the envelope-carrier product. The envelope consists of the so-called propeller tonals representing propeller characteristics which is used to identify water crafts whereas the carrier is a stationary broadband process. Sampling for propeller noise processing yields large data sizes due to Nyquist rate and multiple sensor deployment. A compressive sensing scheme is proposed for efficient sampling of second-order cyclostationary propeller noise since the spectral correlation function of the amplitude modulation model is sparse as shown in ...

Fırat, Umut — Istanbul Technical University


MIMO Instantaneous Blind Identification and Separation based on Arbitrary Order Temporal Structure in the Data

This thesis is concerned with three closely related problems. The first one is called Multiple-Input Multiple-Output (MIMO) Instantaneous Blind Identification, which we denote by MIBI. In this problem a number of mutually statistically independent source signals are mixed by a MIMO instantaneous mixing system and only the mixed signals are observed, i.e. both the mixing system and the original sources are unknown or ‘blind’. The goal of MIBI is to identify the MIMO system from the observed mixtures of the source signals only. The second problem is called Instantaneous Blind Signal Separation (IBSS) and deals with recovering mutually statistically independent source signals from their observed instantaneous mixtures only. The observation model and assumptions on the signals and mixing system are the same as those of MIBI. However, the main purpose of IBSS is the estimation of the source signals, whereas ...

van de Laar, Jakob — TU Eindhoven


MIMO instantaneous blind idenfitication and separation based on arbitrary order

This thesis is concerned with three closely related problems. The first one is called Multiple-Input Multiple-Output (MIMO) Instantaneous Blind Identification, which we denote by MIBI. In this problem a number of mutually statistically independent source signals are mixed by a MIMO instantaneous mixing system and only the mixed signals are observed, i.e. both the mixing system and the original sources are unknown or ¡blind¢. The goal of MIBI is to identify the MIMO system from the observed mixtures of the source signals only. The second problem is called Instantaneous Blind Signal Separation (IBSS) and deals with recovering mutually statistically independent source signals from their observed instantaneous mixtures only. The observation model and assumptions on the signals and mixing system are the same as those of MIBI. However, the main purpose of IBSS is the estimation of the source signals, whereas ...

van de Laar, Jakob — T.U. Eindhoven


Speech Enhancement Using Nonnegative Matrix Factorization and Hidden Markov Models

Reducing interference noise in a noisy speech recording has been a challenging task for many years yet has a variety of applications, for example, in handsfree mobile communications, in speech recognition, and in hearing aids. Traditional single-channel noise reduction schemes, such as Wiener filtering, do not work satisfactorily in the presence of non-stationary background noise. Alternatively, supervised approaches, where the noise type is known in advance, lead to higher-quality enhanced speech signals. This dissertation proposes supervised and unsupervised single-channel noise reduction algorithms. We consider two classes of methods for this purpose: approaches based on nonnegative matrix factorization (NMF) and methods based on hidden Markov models (HMM). The contributions of this dissertation can be divided into three main (overlapping) parts. First, we propose NMF-based enhancement approaches that use temporal dependencies of the speech signals. In a standard NMF, the important temporal ...

Mohammadiha, Nasser — KTH Royal Institute of Technology

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.