Artificial Bandwidth Extension of Telephone Speech Signals Using Phonetic A Priori Knowledge

The narrowband frequency range of telephone speech signals originally caused by former analog transmission techniques still leads to frequent acoustical limitations in today?s digital telephony systems. It provokes muffled sounding phone calls with reduced speech intelligibility and quality. By means of artificial speech bandwidth extension approaches, missing frequency components can be estimated and reconstructed. However, the artificially extended speech bandwidth typically suffers from annoying artifacts. Particularly susceptible to this are the fricatives /s/ and /z/. They can hardly be estimated based on the narrowband spectrum and are therefore easily confusable with other phonemes as well as speech pauses. This work takes advantage of phonetic a priori knowledge to optimize the performance of artificial bandwidth extension. Both the offline training part conducted in advance and the main processing part performed later on shall be thereby provided with important phoneme information. As the preceding training part does not require online processing, phonetic a priori knowledge can be made available. But its availability during the later processing part depends on the online requirements of the particular application. In this work, the two main application areas of artificial bandwidth extension are addressed. On the one hand, existing telephone speech databases shall be upgraded in bandwidth to be able to train telephony-based wideband interactive voice response systems. For this purpose, the artificial bandwidth extension takes place offline before the speech recognition training and does therefore not require for this human-to-machine application (i.e., telephone conversation with automatic speech recognizer) any online capabilities. Consequently, phonetic a priori knowledge can be exploited. On the other hand, narrowband telephone speech services shall be artificially extended in bandwidth to enhance their intelligibility and quality. This human-to-human application (i.e., telephone conversation with another conversational partner) needs to be online-capable. Thus, an appropriate estimation of the phonetic a priori knowledge is necessary. The artificial bandwidth extension approach developed within the scope of this work could successfully demonstrate its abilities for both application areas in comparison with the state of the art.

File Type: pdf
File Size: 2 MB
Publication Year: 2017
Author: Bauer, Patrick Marcel
Supervisors: Tim Fingscheidt
Institution: Institute for Communications Technology, Technical University Braunschweig
Keywords: speech processing, telephone signals, artificial bandwidth extension, automatic speech recognition