Dereverberation and noise reduction techniques based on acoustic multi-channel equalization
In many hands-free speech communication applications such as teleconferencing or voice-controlled applications, the recorded microphone signals do not only contain the desired speech signal, but also attenuated and delayed copies of the desired speech signal due to reverberation as well as additive background noise. Reverberation and background noise cause a signal degradation which can impair speech intelligibility and decrease the performance for many signal processing techniques. Acoustic multi-channel equalization techniques, which aim at inverting or reshaping the measured or estimated room impulse responses between the speech source and the microphone array, comprise an attractive approach to speech dereverberation since in theory perfect dereverberation can be achieved. However in practice, such techniques suffer from several drawbacks, such as uncontrolled perceptual effects, sensitivity to perturbations in the measured or estimated room impulse responses, and background noise amplification. The aim of this thesis is to tackle these drawbacks by designing perceptually advantageous and robust acoustic multi-channel equalization techniques for speech dereverberation as well as for joint dereverberation and noise reduction. First, in order to control the perceptual speech quality, we propose the perceptually advantageous partial multi-channel equalization technique based on the multipleinput/output inverse theorem (PMINT which aims not only at suppressing the late reflections but also at controlling the early reflections. Simulation results show that the proposed PMINT technique results in a better perceptual speech quality than state-of-the-art acoustic multi-channel equalization techniques, such as the multipleinput/output inverse theorem (MINT), channel shortening (CS), and relaxed multichannel least-squares (RMCLS). Second, in order to increase the robustness of all considered acoustic multi-channel equalization techniques against room impulse response perturbations, i.e., of the MINT, CS, RMCLS, and PMINT techniques, we propose several methods. On the one hand, we propose signal-independent methods, i.e., decreasing the reshaping filter length to improve the conditioning of the optimization criteria or incorporating (automatic) regularization to reduce the energy of distortions due to room impulse response perturbations. On the other hand, we propose a signal-dependent method, i.e., using a sparsity-promoting penalty function to promote sparsity in the output speech signal and reduce artifacts generated by non-robust techniques. All proposed methods are validated using instrumental performance measures and subjective listening tests, which show that the regularized and sparsity-promoting extensions of the PMINT technique yield the best dereverberation performance in comparison to the robust extensions of state-of-the-art acoustic multi-channel equalization techniques. Finally, in order to achieve joint dereverberation and noise reduction we propose two techniques based on robust acoustic multi-channel equalization. The first technique, namely regularized PMINT for joint dereverberation and noise reduction (RP-DNR), can be seen as an extension of the regularized PMINT technique that explicitly takes the noise statistics into account. The second technique, namely multichannel Wiener filter for joint dereverberation and noise reduction (MWF-DNR), in addition takes the speech statistics into account and uses the dereverberated output signal of the regularized PMINT technique as the reference signal for the multichannel Wiener filter. In addition to the regularization parameter used in the regularized PMINT technique, a weighting parameter is introduced in the RP-DNR and MWF-DNR techniques to trade off between dereverberation and noise reduction. To determine the regularization and weighting parameters, we propose automatic non-intrusive procedures based on the L-hypersurface and the L-curve. Simulation results show that the RP-DNR technique maintains the high dereverberation performance of the regularized PMINT technique while improving the noise reduction performance. Furthermore, simulation results show that the MWF-DNR technique yields a significantly better noise reduction performance than the RP-DNR technique at the expense of a worse dereverberation performance, depending on the amount of estimation errors in the speech correlation matrix.
