Robust speaker verification with reverberation suppression and spoofing detection

Since speaker verification systems (SV) are typically used for access control, the business decision about their deployment is largely dependent on their efficacy and security. This dissertation investigates methods aiming to increase the robustness of SV towards fraud attempts and reverberation mismatch between speaker enrollment and verification. The impact of reverberation is investigated first in the context of speaker verification in a room with randomly distributed microphones. The main contributions of this study include the preferred microphone selection strategy for training and testing of an SV system and a novel feature extraction method, which integrates reverberation robust features with features extracted from a dereverberated signal. The results of the experiments, conducted using all major speaker modeling methods, confirm that such integration provides a higher SV efficacy improvement compared to other existing methods. A large part of this thesis is dedicated to an introduction of a novel dereverberation method that enforces the sparsity of the Short-time Fourier Transform coefficients of the desired signal. The algorithm generalizes all major up-to-date sparse Multichannel Linear Prediction-based (MCLP) dereverberation techniques and yields superior improvements in terms of dereverberation performance measures and efficacy of speaker verification and automatic speech recognition (ASR) when used as a preprocessing step in the latter two tasks. Additionally, the study on the relation between dereverberation performance and efficacy of the subsequent SV and ASR indicates that improvements in Cepstral Distance and Frequency Weighted Signal-To-Noise Ratio are the measures that correlate most with the improvements of the metrics related to both tasks. Increasing the security of SV is addressed by the detection of spoofing the input signal based on the playback of pre-recorded speech into the input microphone. This spoofing technique is known as a replay attack and is considered the most frequent and likely to occur among other methods. The pioneering study in this research area reveals that relevant spoofing cues can be found at high frequencies. Additionally, a novel feature extraction method based on an integration of cepstra from LP coefficients and LP residual signal is proposed.

File Type: pdf
File Size: 10 MB
Publication Year: 2022
Author: Witkowski, Marcin
Supervisors: Konrad Kowalczyk, Jakub Ga?ka
Institution: AGH University of Krakow
Keywords: speaker verification, dereverberation, multichannel linear prediction, replay attack detection, antispoofing