Contributions to Statistical Modeling for Minimum Mean Square Error Estimation in Speech Enhancement
This thesis deals with minimum mean square error (MMSE) speech enhancement schemes in the short-time Fourier transform (STFT) domain with a focus on statistical models for speech and corresponding estimators. MMSE speech enhancement approaches taking speech presence uncertainty (SPU) into account usually consist of a common MMSE estimator for speech and an a posteriori speech presence probability (SPP) estimator. It is shown that both estimators should be based on the same statistical speech model, as they are in the same estimation framework and assume the same a priori knowledge. In order to give a synopsis of consistent MMSE estimation under SPU, typical common MMSE estimators and a posteriori SPP estimators are recapitulated. Furthermore, a new specific a posteriori SPP estimator is derived based on a novel statistical model for speech. Then, a synopsis of approaches to consistent MMSE estimation under SPU is given. In the context of statistical modeling, we enhance a modern a posteriori SPP estimation approach based on fixed parameters. More precisely, the conservative speech model of this reference approach is replaced by an improved one. Then, a new a posteriori SPP estimator is derived and its fixed parameters are trained. The resulting proposed approach unifies the advantages of fixed parameters and a novel statistical speech model. Although both speech enhancement and error concealment deal with distorted (speech) signals, there has not yet been an attempt to relate both fields to each other. However, since there are many commonalities between these disciplines, many interesting links between them are discussed based on recursive MMSE estimation. Furthermore, besides these commonalities, also interesting differences are analyzed and a general advantage of error concealment is identified. Based on this finding, research perspectives for the field of speech enhancement are sketched, inspired by error concealment. This thesis provides a new statistical framework for recursive MMSE speech enhancement. This advantageously allows for applying the improved statistical models from classical, nonrecursive speech enhancement to the recursive case. As a specific enhancement scheme, we extend recursive MMSE estimation by taking SPU into account. Finally, a new reference-free signal-to-noise ratio (SNR) measurement approach is proposed in this thesis. This approach aims at estimating the SNR of a speech signal distorted by car noise as close as possible to reference-based measurement approach according to ITU-T Recommendation P.56, but in a reference-free fashion. The proposed approach achieves small estimation errors and shows high correlation with the ITU-T P.56 measurement within a typical SNR range. Furthermore, it provides relaxed computational complexity and can be applied to narrowband and wideband signals. Within ITU-T Study Group 12, the Focus Group on Car Communication (FG CarCOM) has decided to adopt the new reference-free SNR measurement approach for the draft of a recommendation proposal.
