Best Signal Selection with Automatic Delay Compensation in VoIP Environment

In the last decades, air traffic spread more and more in the world, connecting more and more places. At the same time, the need to manage all the flights correctly and securely increased. Air traffic authorities imposed and updated several standards for the air traffic management (ATM) system, keeping in pace with the growing traffic flow. To achieve this, special voice communication systems (VCS) were developed. They ensure the communication between the pilots and the operators from the ground control centers. When a communication is initiated between the aircraft?s pilot and the ground air traffic control operator, various systems are used. The pilot speaks through the aircraft?s radio station and the signal is received by several ground radio stations. Then, the signal from each ground radio station arrives on different paths to the control center. Here one of the received signals is played to the operator. Ideally, this should be the one which is the clearest and offers the highest intelligibility. This is the equivalent of the Best Signal Selection (BSS) for the received signals, which now is achieved through special signal processing algorithms. The main objective of this thesis is the development of a solution for best signal selection in VoIP environment. Thus, the system has to integrate the new work in order to provide highly accurate selections for real time conditions. Besides this main goal, another aim was to provide an enhanced best signal selection (eBSS which should make use of all the input signals, to offer a more intelligible output than the BSS. To achieve this, several specific objectives were addressed: a) Develop a fast and accurate time delay estimation method, based on generalized cross-correlation. This is needed to align incoming signals before analyzing their intelligibility. Also, TDE is the first step in further speech enhancement processing for eBSS; b) Implement an optimum VAD algorithm as an important part of the best signal selection process; c) Search for an optimum signal-to-noise estimators which could be used for multichannel speech enhancement; The thesis is organized around seven chapters, as follows: Chapter 1 starts with an introduction to the best signal selection problem of the air traffic management voice communication systems. Then it summarizes the main tasks needed for obtaining an effective BSS and eBSS, and highlights the difficulties encountered by the system designer. Finally, chapter 1 describes the main objectives and outlines this thesis. Chapter 2 presents the time delay estimation problem and different state-of-the art approaches to solve it. Firstly the general adaptive filtering solution is presented, with two basic variants, least mean square (LMS) and recursive least square (RLS) algorithms. Then, a little bit like in previous approaches, the adaptive eigenvalue decomposition (AED) is presented. This is followed by an introduction to the difference functions. Continuing the presentation of TDE solutions, the generalized cross-correlation (GCC) method is introduced, with all its traditional approaches. The end of this chapter describes the wavelet based TDE, including specific methods. Chapter 3 proposes the accumulated GCC TDE methods for multi-frame analysis. Firstly the accumulated cross-power spectrum is presented, as well as ways extend it to all well-known GCC methods. Then the database and the metrics used in the following experiments are described. Further, based on several metrics (accuracy and error rate, relative error, standard deviation of relative error, computing time) all traditional GCC methods implemented with conventional and proposed approaches are analyzed. Chapter 4 is dedicated to the description of several VAD algorithms. The standard G729, ETSI-AMR1 and ETSI-AMR2 VADs are presented at the beginning, as important reference points of this field. They are followed by two recently proposed VAD algorithms which were integrated into the BSS solution. Chapter 5 describes the VAD algorithms analysis for the BSS solution. This is supported by detailed aspects regarding the correlation between the VAD scores and speech intelligibility. Beside the BSS solutions based on the two VAD algorithms presented in chapter 4, here a new BSS solution is introduced, called Smoothed Sub-band Spectral Flatness Measure (3SFM). Moreover, it is shown that this solution could also be used as a VAD in proper configuration. Chapter 6 is dedicated to the enhanced best signal selection. It describe the state of the art unbiased estimator based on speech presence probability and distributed multi-channel speech enhancement. Further it is proposed the eBSS solution based on the SNR estimation. The simulated results are analyzed based on the Perceptual Objective Listening Quality Assessment (POLQA). Chapter 7 summarizes the conclusion of this thesis regarding time delay estimation, voice activity detection, best signal selection and enhanced best signal selection issues. Then it includes references to the personal contribution and describes the further directions to be followed.

File Type: pdf
File Size: 5 MB
Publication Year: 2013
Author: Marinescu, Radu-Sebastian
Supervisors: Corneliu Burileanu
Institution: University Politehnica of Bucharest
Keywords: Signal Processing, Time Delay Estimation, Voice Activity Detection, Speech Enhancement, Voice Quality