Spatio-Temporal Speech Enhancement in Adverse Acoustic Conditions (2019)
Speech derereverberation in noisy environments using time-frequency domain signal models
Reverberation is the sum of reflected sound waves and is present in any conventional room. Speech communication devices such as mobile phones in hands-free mode, tablets, smart TVs, teleconferencing systems, hearing aids, voice-controlled systems, etc. use one or more microphones to pick up the desired speech signals. When the microphones are not in the proximity of the desired source, strong reverberation and noise can degrade the signal quality at the microphones and can impair the intelligibility and the performance of automatic speech recognizers. Therefore, it is a highly demanded task to process the microphone signals such that reverberation and noise are reduced. The process of reducing or removing reverberation from recorded signals is called dereverberation. As dereverberation is usually a completely blind problem, where the only available information are the microphone signals, and as the acoustic scenario can be non-stationary, ...
Braun, Sebastian — Friedrich-Alexander Universität Erlangen-Nürnberg
Broadband adaptive beamforming with low complexity and frequency invariant response
This thesis proposes different methods to reduce the computational complexity as well as increasing the adaptation rate of adaptive broadband beamformers. This is performed exemplarily for the generalised sidelobe canceller (GSC) structure. The GSC is an alternative implementation of the linearly constrained minimum variance beamformer, which can utilise well-known adaptive filtering algorithms, such as the least mean square (LMS) or the recursive least squares (RLS) to perform unconstrained adaptive optimisation. A direct DFT implementation, by which broadband signals are decomposed into frequency bins and processed by independent narrowband beamforming algorithms, is thought to be computationally optimum. However, this setup fail to converge to the time domain minimum mean square error (MMSE) if signal components are not aligned to frequency bins, resulting in a large worst case error. To mitigate this problem of the so-called independent frequency bin (IFB) processor, overlap-save ...
Koh, Choo Leng — University of Southampton
Multi-microphone noise reduction and dereverberation techniques for speech applications
In typical speech communication applications, such as hands-free mobile telephony, voice-controlled systems and hearing aids, the recorded microphone signals are corrupted by background noise, room reverberation and far-end echo signals. This signal degradation can lead to total unintelligibility of the speech signal and decreases the performance of automatic speech recognition systems. In this thesis several multi-microphone noise reduction and dereverberation techniques are developed. In Part I we present a Generalised Singular Value Decomposition (GSVD) based optimal filtering technique for enhancing multi-microphone speech signals which are degraded by additive coloured noise. Several techniques are presented for reducing the computational complexity and we show that the GSVD-based optimal filtering technique can be integrated into a `Generalised Sidelobe Canceller' type structure. Simulations show that the GSVD-based optimal filtering technique achieves a larger signal-to-noise ratio improvement than standard fixed and adaptive beamforming techniques and ...
Doclo, Simon — Katholieke Universiteit Leuven
Informed spatial filters for speech enhancement
In modern devices which provide hands-free speech capturing functionality, such as hands-free communication kits and voice-controlled devices, the received speech signal at the microphones is corrupted by background noise, interfering speech signals, and room reverberation. In many practical situations, the microphones are not necessarily located near the desired source, and hence, the ratio of the desired speech power to the power of the background noise, the interfering speech, and the reverberation at the microphones can be very low, often around or even below 0 dB. In such situations, the comfort of human-to-human communication, as well as the accuracy of automatic speech recognisers for voice-controlled applications can be signi cantly degraded. Therefore, e ffective speech enhancement algorithms are required to process the microphone signals before transmitting them to the far-end side for communication, or before feeding them into a speech recognition ...
Taseska, Maja — Friedrich-Alexander Universität Erlangen-Nürnberg
Adaptive filtering techniques for noise reduction and acoustic feedback cancellation in hearing aids
Understanding speech in noise and the occurrence of acoustic feedback belong to the major problems of current hearing aid users. Hence, an urgent demand exists for efficient and well-working digital signal processing algorithms that offer a solution to these issues. In this thesis we develop adaptive filtering techniques for noise reduction and acoustic feedback cancellation. Thanks to the availability of low power digital signal processors, these algorithms can be integrated in a hearing aid. Because of the ongoing miniaturization in the hearing aid industry and the growing tendency towards multi-microphone hearing aids, robustness against imperfections such as microphone mismatch, has become a major issue in the design of a noise reduction algorithm. In this thesis we propose multimicrophone noise reduction techniques that are based on multi-channel Wiener filtering (MWF). Theoretical and experimental analysis demonstrate that these MWF-based techniques are less ...
Spriet, Ann — Katholieke Universiteit Leuven
MVDR Broadband Beamforming Using Polynomial Matrix Techniques
This thesis addresses the formulation of and solution to broadband minimum variance distortionless response (MVDR) beamforming. Two approaches to this problem are considered, namely, generalised sidelobe canceller (GSC) and Capon beamformers. These are examined based on a novel technique which relies on polynomial matrix formulations. The new scheme is based on the second order statistics of the array sensor measurements in order to estimate a space-time covariance matrix. The beamforming problem can be formulated based on this space-time covariance matrix. Akin to the narrowband problem, where an optimum solution can be derived from the eigenvalue decomposition (EVD) of a constant covariance matrix, this utility is here extended to the broadband case. The decoupling of the space-time covariance matrix in this case is provided by means of a polynomial matrix EVD. The proposed approach is initially exploited to design a GSC ...
Alzin, Ahmed — University of Strathclyde
Model-based Techniques and Diffusion Models for Speech Dereverberation
Reverberation occurs in most of our environments and often degrades the intelligibility and quality of human speech, with an aggravated effect on hearing-impaired listeners. Meanwhile, the evolution of technologies for multimedia entertainment, communications and medical applications has led to a greater demand for improved sound quality. Therefore, many embedded devices now include a dereverberation algorithm, which aims to recover the anechoic component of speech. Dereverberation is an arduous task and an ill-posed inverse problem: even perfectly knowing the room acoustics does not guarantee to obtain a perfectly dereverberated signal. Furthermore, in most real-life cases, such knowledge is not available and therefore most dereverberation algorithms are blind, i.e. they must extract information from the reverberant speech signal only. Traditional dereverberation algorithms derive anechoic speech estimators exploiting statistical properties of speech signals, distributional assumptions and even knowledge of room acoustics when available. ...
Lemercier, Jean-Marie — University of Hamburg
Low-complexity acoustic echo cancellation and model-based residual echo suppression
Hands-free speech communication devices, typically equipped with multiple microphones and loudspeakers, are used for a wide variety of applications, such as teleconferencing, in-car communication and personal assistants. In addition to capturing the desired speech from the user, the microphones pick up undesired interferences such as background noise and acoustic echo due to the acoustic coupling between the loudspeakers and the microphones. These interferences typically degrade speech quality and intelligibility, and negatively affect the performance of automatic speech recognition systems. Acoustic echo control systems typically employ a combination of acoustic echo cancellation (AEC) and residual echo suppression (RES). An AEC system uses adaptive filters to compensate for the acoustic echo paths between the loudspeakers and the microphones. When short AEC filters are used to reduce computational complexity and increase convergence speed, this may lead to a significant amount of residual echo, ...
Naveen Kumar Desiraju — University of Oldenburg, Germany
Distributed Signal Processing Algorithms for Multi-Task Wireless Acoustic Sensor Networks
Recent technological advances in analogue and digital electronics as well as in hardware miniaturization have taken wireless sensing devices to another level by introducing low-power communication protocols, improved digital signal processing capabilities and compact sensors. When these devices perform a certain pre-defined signal processing task such as the estimation or detection of phenomena of interest, a cooperative scheme through wireless connections can significantly enhance the overall performance, especially in adverse conditions. The resulting network consisting of such connected devices (or nodes) is referred to as a wireless sensor network (WSN). In acoustical applications (e.g., speech enhancement) a variant of WSNs, called wireless acoustic sensor networks (WASNs) can be employed in which the sensing unit at each node consists of a single microphone or a microphone array. The nodes of such a WASN can then cooperate to perform a multi-channel acoustic ...
Hassani, Amin — KU Leuven
Dereverberation and noise reduction techniques based on acoustic multi-channel equalization
In many hands-free speech communication applications such as teleconferencing or voice-controlled applications, the recorded microphone signals do not only contain the desired speech signal, but also attenuated and delayed copies of the desired speech signal due to reverberation as well as additive background noise. Reverberation and background noise cause a signal degradation which can impair speech intelligibility and decrease the performance for many signal processing techniques. Acoustic multi-channel equalization techniques, which aim at inverting or reshaping the measured or estimated room impulse responses between the speech source and the microphone array, comprise an attractive approach to speech dereverberation since in theory perfect dereverberation can be achieved. However in practice, such techniques suffer from several drawbacks, such as uncontrolled perceptual effects, sensitivity to perturbations in the measured or estimated room impulse responses, and background noise amplification. The aim of this thesis ...
Kodrasi, Ina — University of Oldenburg
Non-linear Spatial Filtering for Multi-channel Speech Enhancement
A large part of human speech communication takes place in noisy environments and is supported by technical devices. For example, a hearing-impaired person might use a hearing aid to take part in a conversation in a busy restaurant. These devices, but also telecommunication in noisy environments or voiced-controlled assistants, make use of speech enhancement and separation algorithms that improve the quality and intelligibility of speech by separating speakers and suppressing background noise as well as other unwanted effects such as reverberation. If the devices are equipped with more than one microphone, which is very common nowadays, then multi-channel speech enhancement approaches can leverage spatial information in addition to single-channel tempo-spectral information to perform the task. Traditionally, linear spatial filters, so-called beamformers, have been employed to suppress the signal components from other than the target direction and thereby enhance the desired ...
Tesch, Kristina — Universität Hamburg
Spherical Microphone Array Processing for Acoustic Parameter Estimation and Signal Enhancement
In many distant speech acquisition scenarios, such as hands-free telephony or teleconferencing, the desired speech signal is corrupted by noise and reverberation. This degrades both the speech quality and intelligibility, making communication difficult or even impossible. Speech enhancement techniques seek to mitigate these effects and extract the desired speech signal. This objective is commonly achieved through the use of microphone arrays, which take advantage of the spatial properties of the sound field in order to reduce noise and reverberation. Spherical microphone arrays, where the microphones are arranged in a spherical configuration, usually mounted on a rigid baffle, are able to analyze the sound field in three dimensions; the captured sound field can then be efficiently described in the spherical harmonic domain (SHD). In this thesis, a number of novel spherical array processing algorithms are proposed, based in the SHD. In ...
Jarrett, Daniel P. — Imperial College London
Single-Microphone Multi-Frame Speech Enhancement Exploiting Speech Interframe Correlation
Speech communication devices such as hearing aids or mobile phones are often used in acoustically challenging situations, where the desired speech signal is affected by undesired background noise. Since in these situations speech quality and speech intelligibility may be degraded, speech enhancement algorithms are required to suppress the undesired background noise, while preserving the desired speech signal. In this thesis, we focus on single-microphone speech enhancement algorithms in the short-time Fourier transform domain, more in particular on multi-frame algorithms that aim at exploiting speech correlation across time-frames. In principle, exploiting the speech interframe correlation enables to suppress the undesired background noise, while keeping speech distortion low. Existing single-microphone multi-frame speech enhancement algorithms, such as the multi-frame minimum variance distortionless response (MFMVDR) filter and the multi-frame minimum power distortionless response (MFMPDR) filter, depend on the normalized speech correlation vector, which is ...
Dörte Fischer — University of Oldenburg, Germany
A speech signal captured by multiple microphones is often subject to a reduced intelligibility and quality due to the presence of noise and room acoustic interferences. Multi-microphone speech enhancement systems therefore aim at the suppression or cancellation of such undesired signals without substantial distortion of the speech signal. A fundamental aspect to the design of several multi-microphone speech enhancement systems is that of the spatial information which relates each microphone signal to the desired speech source. This spatial information is unknown in practice and has to be somehow estimated. Under certain conditions, however, the estimated spatial information can be inaccurate, which subsequently degrades the performance of a multi-microphone speech enhancement system. This doctoral dissertation is focused on the development and evaluation of acoustic signal processing algorithms in order to address this issue. Specifically, as opposed to conventional means of estimating ...
Ali, Randall — KU Leuven
Modern devices such as mobile phones, tablets or smart speakers are commonly equipped with several loudspeakers and microphones. If, for instance, one employs such a device for hands-free communication applications, the signals that are reproduced by the loudspeakers are propagated through the room and are inevitably acquired by the microphones. If no processing is applied, the participants in the far-end room receive delayed reverberated replicas of their own voice, which strongly degrades both speech intelligibility and user comfort. In order to prevent that so-called acoustic echoes are transmitted back to the far-end room, acoustic echo cancelers are commonly employed. The latter make use of adaptive filtering techniques to identify the propagation paths between loudspeakers and microphones. The estimated propagation paths are then employed to compute acoustic echo estimates, which are finally subtracted from the signals acquired by the microphones. In ...
Luis Valero, Maria — International Audio Laboratories Erlangen
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.