Cognitive-driven speech enhancement using EEG-based auditory attention decoding for hearing aid applications
Identifying the target speaker in hearing aid applications is an essential ingredient to improve speech intelligibility. Although several speech enhancement algorithms are available to reduce background noise or to perform source separation in multi-speaker scenarios, their performance depends on correctly identifying the target speaker to be enhanced. Recent advances in electroencephalography (EEG) have shown that it is possible to identify the target speaker which the listener is attending to using single-trial EEG-based auditory attention decoding (AAD) methods. However, in realistic acoustic environments the AAD performance is influenced by undesired disturbances such as interfering speakers, noise and reverberation. In addition, it is important for real-world hearing aid applications to close the AAD loop by presenting on-line auditory feedback. This thesis deals with the problem of identifying and enhancing the target speaker in realistic acoustic environments based on decoding the auditory attention of the listener using single-trial EEG recordings. To this end, we thoroughly analyze the AAD performance in noisy and reverberant environments, we propose novel methods for decoding auditory attention and we propose open-loop and closed-loop cognitive-driven speech enhancement systems for hearing aid applications. First, we analyze the impact of different acoustic conditions (anechoic, reverberant, noisy, and reverberant-noisy) on the performance of a least-squares-based AAD method. We show that for all considered acoustic conditions it is possible to decode auditory attention with a considerably large decoding performance, but that the decoding performance is significantly affected by the presence of background noise and especially the interfering speaker in the reference signals used for decoding. Second, we propose several open-loop and closed-loop cognitive-driven speech enhancement systems. The first system is an open-loop cognitive-driven binaural beamformer, aiming at enhancing the target speaker and suppressing the interfering speaker and background noise while preserving the spatial impression of the acoustic scene. In this system a binaural minimum-variance-distortionless-response (MVDR) or binaural linearly-constrained-minimum-variance (LCMV) beamformer is steered based on AAD. For a two-speaker scenario in diffuse babble noise we show that the proposed cognitive-driven binaural beamforming system yields a significantly larger speech enhancement performance than a fixed forward-steered binaural MVDR beamformer, both in anechoic as well as reverberant conditions. The second system is an open-loop cognitive-driven convolutional beamformer, aiming at enhancing the target speaker and jointly suppressing the interfering speaker, reverberation and background noise. This system combines a neural-network-based mask estimator, convolutional beamformers and AAD. We show that the proposed cognitive-driven convolutional beamforming system yields a significantly larger speech enhancement performance than cognitive-driven systems based on conventional beamformers. The third system is a closed-loop cognitive-driven gain controller, where real-time AAD enables the listener to directly interact with an adaptive gain controller. Although there is a significant delay to detect attention switches, experimental results demonstrate the feasibility of the proposed system, which is able to improve the SIR between the attended and the unattended speaker. Third, we propose novel methods to decode auditory attention. More specifically, we propose a reference signal generation approach based on binary masking, which uses binary masks based on directional speech presence probability to discard low-energy intervals which are susceptible to interfering speech and background noise. In addition, we propose an AAD method based on a state-space model, which translates the correlation coefficients into more reliable probabilistic attention measures and improves the decoding performance of linear and non-linear methods using small correlation windows.
