Low-complexity acoustic echo cancellation and model-based residual echo suppression

Hands-free speech communication devices, typically equipped with multiple microphones and loudspeakers, are used for a wide variety of applications, such as teleconferencing, in-car communication and personal assistants. In addition to capturing the desired speech from the user, the microphones pick up undesired interferences such as background noise and acoustic echo due to the acoustic coupling between the loudspeakers and the microphones. These interferences typically degrade speech quality and intelligibility, and negatively affect the performance of automatic speech recognition systems. Acoustic echo control systems typically employ a combination of acoustic echo cancellation (AEC) and residual echo suppression (RES). An AEC system uses adaptive filters to compensate for the acoustic echo paths between the loudspeakers and the microphones. When short AEC filters are used to reduce computational complexity and increase convergence speed, this may lead to a significant amount of residual echo, which is typically suppressed using a RES postfilter. To compute the spectral weights of this postfilter, an accurate estimate of the power spectral density (PSD) of the residual echo is required. The main aim of this thesis is to achieve low-complexity acoustic echo cancellation for multichannel systems by developing efficient tap selection schemes for partially updating the AEC filters, and to develop model-based residual echo PSD estimators for improved residual echo suppression. First, we propose novel tap selection schemes which exploit input signal sparsity across the dimensions of frequency, channels and time, leading to efficient partial updates of multichannel AEC filters in the subband domain. In particular, the proposed dynamic effort allocation scheme proportionately selects more filter taps for update in subbands and channels with larger magnitude tap-inputs while not ignoring the filters with smaller magnitude tap-inputs. Simulation results for both synthetic as well as real-world multichannel input signals show that the proposed tap selection scheme achieves similar echo cancellation performance compared to updating all filter taps at a significantly reduced computational cost (about 28%). Second, we propose novel signal-based methods to estimate the late residual echo PSD in online mode. The late residual echo PSD is modeled using an infinite impulse response (IIR) filter on the PSD of the loudspeaker signal, based on frequency-dependent reverberation scaling and decay parameters. We propose several signal-based methods based on output error and equation error to jointly estimate both reverberation parameters by minimizing a single cost function in online mode. Simulation results using both artificially generated as well as measured impulse responses show that the output error method minimizing the mean squared log error (MSLE) cost function outperforms state-of-the-art offline and online methods in terms of parameter estimation accuracy, late residual echo PSD estimation accuracy and residual echo suppression performance. Third, we propose a novel model for the early residual echo PSD and combine it with the IIR filter model for the late residual echo PSD to yield a novel model for the residual echo PSD. In particular, we model the early residual echo PSD using a moving average filter on the PSD of the loudspeaker signal, based on a frequency-dependent coupling factor. We propose signal-based methods based on output error to jointly estimate all three model parameters, i.e., the coupling factor and the reverberation scaling and decay parameters, by minimizing a single MSLE cost function in online mode. Simulation results using both artificially generated as well as measured impulse responses show that the proposed output error method with the recursive prediction error algorithm outperforms state-of-the-art offline and online parameter estimation methods in terms of parameter estimation accuracy and residual echo PSD estimation accuracy. Compared to state-of-the-art RES methods, the proposed method yields the best segmental speech-to-speech distortion ratio score (about 2-5 dB better while also yielding the best segmental residual echo attenuation score (about 1-2 dB better).

File Type: pdf
File Size: 6 MB
Publication Year: 2022
Author: Naveen Kumar Desiraju
Supervisors: Tobias Wolff, Simon Doclo
Institution: University of Oldenburg, Germany
Keywords: acoustic echo cancellation, residual echo suppression, tap selection schemes, partial update adaptive filters, dynamic effort allocation scheme, output error, equation error