Kernel PCA and Pre-Image Iterations for Speech Enhancement

In this thesis, we present novel methods to enhance speech corrupted by noise. All methods are based on the processing of complex-valued spectral data. First, kernel principal component analysis (PCA) for speech enhancement is proposed. Subsequently, a simplification of kernel PCA, called pre-image iterations (PI), is derived. This method computes enhanced feature vectors iteratively by linear combination of noisy feature vectors. The weighting for the linear combination is found by a kernel function that measures the similarity between the feature vectors. The kernel variance is a key parameter for the degree of de-noising and has to be set according to the signal-to-noise ratio (SNR). Initially, PI were proposed for speech corrupted by additive white Gaussian noise. To be independent of knowledge about the SNR and to generalize to other stationary noise types, PI are extended by automatic determination of the ...

Leitner, Christina — Graz University of Technology


Robust Speech Recognition on Intelligent Mobile Devices with Dual-Microphone

Despite the outstanding progress made on automatic speech recognition (ASR) throughout the last decades, noise-robust ASR still poses a challenge. Tackling with acoustic noise in ASR systems is more important than ever before for a twofold reason: 1) ASR technology has begun to be extensively integrated in intelligent mobile devices (IMDs) such as smartphones to easily accomplish different tasks (e.g. search-by-voice), and 2) IMDs can be used anywhere at any time, that is, under many different acoustic (noisy) conditions. On the other hand, with the aim of enhancing noisy speech, IMDs have begun to embed small microphone arrays, i.e. microphone arrays comprised of a few sensors close each other. These multi-sensor IMDs often embed one microphone (usually at their rear) intended to capture the acoustic environment more than the speaker’s voice. This is the so-called secondary microphone. While classical microphone ...

López-Espejo, Iván — University of Granada


Informed spatial filters for speech enhancement

In modern devices which provide hands-free speech capturing functionality, such as hands-free communication kits and voice-controlled devices, the received speech signal at the microphones is corrupted by background noise, interfering speech signals, and room reverberation. In many practical situations, the microphones are not necessarily located near the desired source, and hence, the ratio of the desired speech power to the power of the background noise, the interfering speech, and the reverberation at the microphones can be very low, often around or even below 0 dB. In such situations, the comfort of human-to-human communication, as well as the accuracy of automatic speech recognisers for voice-controlled applications can be signi cantly degraded. Therefore, e ffective speech enhancement algorithms are required to process the microphone signals before transmitting them to the far-end side for communication, or before feeding them into a speech recognition ...

Taseska, Maja — Friedrich-Alexander Universität Erlangen-Nürnberg


Non-intrusive Quality Evaluation of Speech Processed in Noisy and Reverberant Environments

In many speech applications such as hands-free telephony or voice-controlled home assistants, the distance between the user and the recording microphones can be relatively large. In such a far-field scenario, the recorded microphone signals are typically corrupted by noise and reverberation, which may severely degrade the performance of speech recognition systems and reduce intelligibility and quality of speech in communication applications. In order to limit these effects, speech enhancement algorithms are typically applied. The main objective of this thesis is to develop novel speech enhancement algorithms for noisy and reverberant environments and signal-based measures to evaluate these algorithms, focusing on solutions that are applicable in realistic scenarios. First, we propose a single-channel speech enhancement algorithm for joint noise and reverberation reduction. The proposed algorithm uses a spectral gain to enhance the input signal, where the gain is computed using a ...

Cauchi, Benjamin — University of Oldenburg


Acoustic sensor network geometry calibration and applications

In the modern world, we are increasingly surrounded by computation devices with communication links and one or more microphones. Such devices are, for example, smartphones, tablets, laptops or hearing aids. These devices can work together as nodes in an acoustic sensor network (ASN). Such networks are a growing platform that opens the possibility for many practical applications. ASN based speech enhancement, source localization, and event detection can be applied for teleconferencing, camera control, automation, or assisted living. For this kind of applications, the awareness of auditory objects and their spatial positioning are key properties. In order to provide these two kinds of information, novel methods have been developed in this thesis. Information on the type of auditory objects is provided by a novel real-time sound classification method. Information on the position of human speakers is provided by a novel localization ...

Plinge, Axel — TU Dortmund University


Digital Processing Based Solutions for Life Science Engineering Recognition Problems

The field of Life Science Engineering (LSE) is rapidly expanding and predicted to grow strongly in the next decades. It covers areas of food and medical research, plant and pests’ research, and environmental research. In each research area, engineers try to find equations that model a certain life science problem. Once found, they research different numerical techniques to solve for the unknown variables of these equations. Afterwards, solution improvement is examined by adopting more accurate conventional techniques, or developing novel algorithms. In particular, signal and image processing techniques are widely used to solve those LSE problems require pattern recognition. However, due to the continuous evolution of the life science problems and their natures, these solution techniques can not cover all aspects, and therefore demanding further enhancement and improvement. The thesis presents numerical algorithms of digital signal and image processing to ...

Hussein, Walid — Technische Universität München


Speech derereverberation in noisy environments using time-frequency domain signal models

Reverberation is the sum of reflected sound waves and is present in any conventional room. Speech communication devices such as mobile phones in hands-free mode, tablets, smart TVs, teleconferencing systems, hearing aids, voice-controlled systems, etc. use one or more microphones to pick up the desired speech signals. When the microphones are not in the proximity of the desired source, strong reverberation and noise can degrade the signal quality at the microphones and can impair the intelligibility and the performance of automatic speech recognizers. Therefore, it is a highly demanded task to process the microphone signals such that reverberation and noise are reduced. The process of reducing or removing reverberation from recorded signals is called dereverberation. As dereverberation is usually a completely blind problem, where the only available information are the microphone signals, and as the acoustic scenario can be non-stationary, ...

Braun, Sebastian — Friedrich-Alexander Universität Erlangen-Nürnberg


Prediction and Optimization of Speech Intelligibility in Adverse Conditions

In digital speech-communication systems like mobile phones, public address systems and hearing aids, conveying the message is one of the most important goals. This can be challenging since the intelligibility of the speech may be harmed at various stages before, during and after the transmission process from sender to receiver. Causes which create such adverse conditions include background noise, an unreliable internet connection during a Skype conversation or a hearing impairment of the receiver. To overcome this, many speech-communication systems include speech processing algorithms to compensate for these signal degradations like noise reduction. To determine the effect on speech intelligibility of these signal processing based solutions, the speech signal has to be evaluated by means of a listening test with human listeners. However, such tests are costly and time consuming. As an alternative, reliable and fast machine-driven intelligibility predictors are ...

Taal, Cees — Delft University of Technology


Some Contributions to Machine Learning-based System Identification and Speech Enhancement for Nonlinear Acoustic Echo Control

Given the widespread use of miniaturized audio interfaces, echo control systems are faced with increasing challenges to address a large variety of acoustic conditions observed by such interfaces. This motivates the use of sophisticated machine learning-based techniques to overcome the limitations of conventional methods. The contributions in this thesis can be outlined by decomposing the task of nonlinear acoustic echo control into two subtasks: Nonlinear Acoustic Echo Cancellation (NAEC) and Acoustic Echo Suppression (AES). In particular, by formulating the single-channel NAEC model-adaptation task as a Bayesian recursive filtering problem, an evolutionary resampling strategy for particle filtering is proposed. The resulting Elitist Resampling Particle Filter (ERPF) is shown experimentally to be an efficient and high-performing approach that can be extended to address challenging conditions such as non-stationary interferers. The fundamental problem of nonlinear model design is addressed by proposing a novel ...

Halimeh, Mhd Modar — Friedrich-Alexander-Universität Erlangen-Nürnberg


Speech Enhancement Algorithms for Audiological Applications

The improvement of speech intelligibility is a traditional problem which still remains open and unsolved. The recent boom of applications such as hands-free communi- cations or automatic speech recognition systems and the ever-increasing demands of the hearing-impaired community have given a definitive impulse to the research in this area. This PhD thesis is focused on speech enhancement for audiological applications. Most of the research conducted in this thesis has been focused on the improvement of speech intelligibility in hearing aids, considering the variety of restrictions and limitations imposed by this type of devices. The combination of source separation techniques and spatial filtering with machine learning and evolutionary computation has originated novel and interesting algorithms which are included in this thesis. The thesis is divided in two main parts. The first one contains a preliminary study of the problem and a ...

Ayllón, David — Universidad de Alcalá


Wavelet Analysis For Robust Speech Processing and Applications

In this work, we study the application of wavelet analysis for robust speech processing. Reliable time-scale features (TS) which characterize the relevant phonetic classes such as voiced (V), unvoiced (UV), silence (S), mixed-excitation, and stop sounds are extracted. By training neural and Bayesian networks, the classification rates provided by only 7 TS features are mostly similar to the ones obtained by 13 MFCC features. The TS features are further enhanced to design a reliable and low-complexity V/UV/S classifier. Quantile filtering and slope tracking are used for deriving adaptive thresholds. A robust voice activity detector is then built and used as a pre-processing stage to improve the performance of a speaker verification system. Based on wavelet shrinkage, a statistical wavelet filtering (SWF) method is designed for speech enhancement. Non-stationary and colored noise is handled by employing quantile filtering and time-frequency adaptive ...

Pham, Van Tuan — Graz University of Technology


Embedded Optimization Algorithms for Perceptual Enhancement of Audio Signals

This thesis investigates the design and evaluation of an embedded optimization framework for the perceptual enhancement of audio signals which are degraded by linear and/or nonlinear distortion. In general, audio signal enhancement has the goal to improve the perceived audio quality, speech intelligibility, or another desired perceptual attribute of the distorted audio signal by applying a real-time digital signal processing algorithm. In the designed embedded optimization framework, the audio signal enhancement problem under consideration is formulated and solved as a per-frame numerical optimization problem, allowing to compute the enhanced audio signal frame that is optimal according to a desired perceptual attribute. The first stage of the embedded optimization framework consists in the formulation of the per-frame optimization problem aimed at maximally enhancing the desired perceptual attribute, by explicitly incorporating a suitable model of human sound perception. The second stage of ...

Defraene, Bruno — KU Leuven


Spherical Microphone Array Processing for Acoustic Parameter Estimation and Signal Enhancement

In many distant speech acquisition scenarios, such as hands-free telephony or teleconferencing, the desired speech signal is corrupted by noise and reverberation. This degrades both the speech quality and intelligibility, making communication difficult or even impossible. Speech enhancement techniques seek to mitigate these effects and extract the desired speech signal. This objective is commonly achieved through the use of microphone arrays, which take advantage of the spatial properties of the sound field in order to reduce noise and reverberation. Spherical microphone arrays, where the microphones are arranged in a spherical configuration, usually mounted on a rigid baffle, are able to analyze the sound field in three dimensions; the captured sound field can then be efficiently described in the spherical harmonic domain (SHD). In this thesis, a number of novel spherical array processing algorithms are proposed, based in the SHD. In ...

Jarrett, Daniel P. — Imperial College London


Speech Enhancement Using Nonnegative Matrix Factorization and Hidden Markov Models

Reducing interference noise in a noisy speech recording has been a challenging task for many years yet has a variety of applications, for example, in handsfree mobile communications, in speech recognition, and in hearing aids. Traditional single-channel noise reduction schemes, such as Wiener filtering, do not work satisfactorily in the presence of non-stationary background noise. Alternatively, supervised approaches, where the noise type is known in advance, lead to higher-quality enhanced speech signals. This dissertation proposes supervised and unsupervised single-channel noise reduction algorithms. We consider two classes of methods for this purpose: approaches based on nonnegative matrix factorization (NMF) and methods based on hidden Markov models (HMM). The contributions of this dissertation can be divided into three main (overlapping) parts. First, we propose NMF-based enhancement approaches that use temporal dependencies of the speech signals. In a standard NMF, the important temporal ...

Mohammadiha, Nasser — KTH Royal Institute of Technology


Deep Learning-based Speaker Verification In Real Conditions

Smart applications like speaker verification have become essential in verifying the user's identity for availing of personal assistants or online banking services based on the user's voice characteristics. However, far-field or distant speaker verification is constantly affected by surrounding noises which can severely distort the speech signal. Moreover, speech signals propagating in long-range get reflected by various objects in the surrounding area, which creates reverberation and further degrades the signal quality. This PhD thesis explores deep learning-based multichannel speech enhancement techniques to improve the performance of speaker verification systems in real conditions. Multichannel speech enhancement aims to enhance distorted speech using multiple microphones. It has become crucial to many smart devices, which are flexible and convenient for speech applications. Three novel approaches are proposed to improve the robustness of speaker verification systems in noisy and reverberated conditions. Firstly, we integrate ...

Dowerah Sandipana — Universite de Lorraine, CNRS, Inria, Loria

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.