Non-Intrusive Speech Intelligibility Prediction

The ability to communicate through speech is important for social interaction. We rely on the ability to communicate with each other even in noisy conditions. Ideally, the speech is easy to understand but this is not always the case, if the speech is degraded, e.g., due to background noise, distortion or hearing impairment. One of the most important factors to consider in relation to such degradations is speech intelligibility, which is a measure of how easy or difficult it is to understand the speech. In this thesis, the focus is on the topic of speech intelligibility prediction. The thesis consists of an introduction to the field of speech intelligibility prediction and a collection of scientific papers. The introduction provides a background to the challenges with speech communication in noisy conditions, followed by an introduction to how speech is produced and perceived by the listener. After this, the topic of speech intelligibility and the factors governing speech intelligibility is covered. Finally, the concept of speech intelligibility prediction is introduced and a background to existing intrusive and non-intrusive speech intelligibility prediction measures is provided. The primary contribution of the thesis is the collection of papers, which propose objective measures for non-intrusive speech intelligibility prediction. The measures are based on the same approach in which an existing intrusive speech intelligibility measure is extended such that it can predict speech intelligibility non-intrusively without access to a clean reference signal. The principle is to estimate a reference signal from its degraded counterpart and use this as input to an intrusive measure. The difference between them lies in how the reference signal is estimated, where they can broadly be divided into two approaches to the problem; Paper A, B and F propose a multichannel solution to the problem, where the spatial content of the desired source is used to extract the signal, while paper C-E propose a single-channel solution, where the reference signal is estimated by finding a combination of signals from a model of the speech production system, which best fits the data. The measures are shown to be well correlated with both the intrusive scores and data from subjective listening tests.

File Type: pdf
File Size: 3 MB
Publication Year: 2020
Author: S?rensen, Charlotte
Supervisors: Mads Gr?sb?ll Christensen, Jesper B?nsow Boldt
Institution: Aalborg University
Keywords: speech, intelligibility, understanding, non-intrusive, metric, metrics, measure, prediction, estimation, speech enhancement, short-time objective intelligibility, STOI, hearing aids