Auditory Inspired Methods for Multiple Speaker Localization and Tracking Using a Circular Microphone Array

This thesis presents a new approach to the problem of localizing and tracking multiple acoustic sources using a microphone array. The use of microphone arrays offers enhancements of speech signals recorded in meeting rooms and office spaces. A common solution for speech enhancement in realistic environments with ambient noise and multi-path propagation is the application of so-called beamforming techniques, that enhance signals at the desired angle, using constructive interference, while attenuating signals coming from other directions, by destructive interference. Such beamforming algorithms require as prior knowledge the source location. Therefore, source localization and tracking algorithms are an integral part of such a system. However, conventional localization algorithms deteriorate in realistic scenarios with multiple concurrent speakers. In contrast to conventional localization algorithms, the localization algorithm presented in this thesis makes use of fundamental frequency or pitch information of speech signals in addition to the location information. This “position-pitch”-based algorithm pre-processes the speech signals by a multiband gammatone filterbank that is inspired from the auditory model of the human inner ear. The role of this gammatone filterbank is analyzed and discussed in details. For a robust localization of multiple concurrent speakers, a frequency-selective criterion is explored that is based on a study of the human neural system’s use of correlations between adjacent sub-band frequencies. This frequency-selective criterion leads to more robust localization and pitch cues. In the following, two different kinds of tracking algorithms that further improve localization accuracy of an arbitrary number of speakers are presented: the first one is based on grouping of spectro-temporal regions formed by fundamental frequency cues. The second one applies sequential Monte Carlo methods or particle filters using the location cues provided by the multiband position-pitch algorithm. Finally, a novel particle filter-based joint position and pitch tracking algorithm is presented. Various solutions are proposed for the existing problems faced by the particle filter-based trackers, including an improvement in the likelihood model on information of source activity. All proposed speaker localization and tracking algorithms are tested using real-world recordings made with a 24-channel uniform circular microphone array using loudspeakers and human speakers under various acoustic environments. The proposed techniques give on average 20% more accurate results than the state-of-the-art SRP-PHAT algorithm.

File Type: pdf
File Size: 4 MB
Publication Year: 2012
Author: Habib, Tania
Supervisors: Gernot Kubin, Harald Romsdorfer
Institution: Signal Processing and Speech Communication Laboratory, Graz University of Technology, Austria
Keywords: source localisation, microphone arrays