Speech derereverberation in noisy environments using time-frequency domain signal models

Reverberation is the sum of reflected sound waves and is present in any conventional room. Speech communication devices such as mobile phones in hands-free mode, tablets, smart TVs, teleconferencing systems, hearing aids, voice-controlled systems, etc. use one or more microphones to pick up the desired speech signals. When the microphones are not in the proximity of the desired source, strong reverberation and noise can degrade the signal quality at the microphones and can impair the intelligibility and the performance of automatic speech recognizers. Therefore, it is a highly demanded task to process the microphone signals such that reverberation and noise are reduced. The process of reducing or removing reverberation from recorded signals is called dereverberation. As dereverberation is usually a completely blind problem, where the only available information are the microphone signals, and as the acoustic scenario can be non-stationary, dereverberation is one of the most challenging tasks in speech enhancement. While in theory perfect dereverberation can be achieved by inverse filtering under some conditions and with knowledge of the room impulse response (RIR in practice the blind identification of the RIR is not sufficiently accurate and robust in time-varying and noisy acoustic conditions. Therefore, successful dereverberation methods have been developed in the time-frequency domain that often relax the problem to partial dereverberation, where mainly the late reverberation tail is reduced. Although in the recent years some robust and efficient methods have been proposed that can reduce the late reverberation tail to some extent, it is still challenging to obtain a dereverberated signal with high audio quality, without speech distortion and artifacts using real-time processing techniques with minimal delay. In this thesis, we focus on robust dereverberation methods for online processing as required in real-time speech communication systems. To achieve dereverberation, two main aspects can be exploited: temporal and spatial information. Firstly, reverberation introduces correlation over time and extends the duration of phonemes or sound events. By exploiting temporal correlation, filters can be derived to extract the desired speech signal or to reduce the reverberation. Secondly, by using multiple microphones, spatial information can be exploited to distinguish between the coherent direct sound and the reverberation, which has a spatially diffuse property. To extract the coherent sound, spatial filters, also known as beamformers, can be used that combine the microphone signals such that only sound from a certain direction is extracted, whereas sound from other directions and diffuse sound components are suppressed. In this thesis, a variety of signal models is exploited to model reverberation using temporal and spatial aspects. All considered signal models are defined in the short-time Fourier transform (STFT) domain, which is widely used in many speech and audio processing techniques, therefore allowing simple integration with other existing techniques. In particular, we utilize a narrowband moving average model, a narrowband multichannel autoregressive model, and a spatial coherence based model. For each of these three signal models, a method for dereverberation and noise reduction is proposed. The first main contribution is a single-channel estimator of the late reverberation power spectral density (PSD), which is required to compute a Wiener filter reducing reverberation and noise. The proposed reverberation PSD estimator is based on a narrowband moving average model using relative convolutive transfer functions (RCTFs). In contrast to other single-channel reverberation PSD estimators, the proposed estimator explicitly models time-varying acoustic conditions and additive noise, and requires no prior information on the room acoustics like the reverberation time or the direct-to-reverberation ratio (DRR). The second main contribution is a multichannel reverberation PSD estimator based on the spatial coherence, where the reverberation is modeled as an additive diffuse sound component with a time-invariant spatial coherence. In the multichannel case, the desired signal can be estimated by a multichannel Wiener filter (MWF) that requires the reverberation PSD. To mitigate speech distortion and artifacts, a generalized method to control the attenuation of reverberation and noise at the output of a MWF independently is proposed. As there exists a wide variety of such single- and multichannel reverberation PSD estimators, an extensive overview, comparison and benchmark of state-of-the-art estimators is provided. As a cure for a common weakness of all reverberation PSD estimators, a bias compensation for high DRRs is proposed. The third main contribution is an online solution for dereverberation and noise reduction based on a narrowband multichannel autoregressive (MAR) signal model for time-varying acoustic environments. Using this model, the late reverberation is predicted from previous reverberant speech samples using the MAR coefficients, and is then subtracted from the current reverberant signal. A main novelty of this approach is a parallel estimation structure, that allows to obtain causal estimates of time-varying MAR coefficients in noisy environments. In addition, a method to control the amount of reverberation and noise reduction independently is proposed. In the last part of this thesis, the three proposed dereverberation systems are compared using objective measures, a listening test, and an automatic speech recognition system. It is shown that the proposed algorithms efficiently reduce reverberation and noise, and can be directly applied in speech communication devices. The theoretical overview and the evaluation shows that each dereverberation method has different strengths and limitations. By considering these algorithms as representatives of their dereverberation class, useful insights and conclusions are provided that can help for the choice of a dereverberation method for a specific application.

File Type: pdf
File Size: 4 MB
Publication Year: 2018
Author: Braun, Sebastian
Supervisors: Emanuel Habets
Institution: Friedrich-Alexander Universit?t Erlangen-N?rnberg
Keywords: Dereverberation, noise supression, speech enhancement, beamforming, adaptive filtering