Cognitive Models for Acoustic and Audiovisual Sound Source Localization
Sound source localization algorithms have a long research history in the field of digital signal processing. Many common applications like intelligent personal assistants, teleconferencing systems and methods for technical diagnosis in acoustics require an accurate localization of sound sources in the environment. However, dynamic environments entail a particular challenge for these systems. For instance, voice controlled smart home applications, where the speaker, as well as potential noise sources, are moving within the room, are a typical example of dynamic environments. Classical sound source localization systems only have limited capabilities to deal with dynamic acoustic scenarios. In this thesis, three novel approaches to sound source localization that extend existing classical methods will be presented. The first system is proposed in the context of audiovisual source localization. Determining the position of sound sources in adverse acoustic conditions can be improved by including visual information obtained using cameras. For instance, computer vision methods like face detection allow us to obtain additional information about the position of a speaker. The system proposed in this thesis introduces a novel weighting function, which allows to weight acoustic and visual sensor information according to their reliability. In the case of prominent acoustic disturbances, the system would prefer the visual modality, whereas, for instance, in rooms with bad lighting conditions, the focus will be set on the acoustic modality. The second contribution of this thesis focuses on sound source localization in the domain of robotics. Starting from psychoacoustic evidence that human listeners utilize head movements to refine localization of sounds, a closed-loop feedback control system inspired by these findings is presented. Subsequently, this system is extended to mobile robotic agents, which can perform rotational, as well as translatory movements to explore their environment. This ultimately yields an active exploration framework proposed for the challenging task of acoustic simultaneous localization and mapping. In the last part of this thesis, an algorithm for determining the direct sound direction-of-arrival of a sound source in reverberant environments is introduced. This framework is based on the causal relationship between the incoming direct sound and corresponding reflections, according to the physical sound propagation laws. Mathematically, this can be captured via causal models, which enable a causal analysis of the microphone signals, aiming at determining the direct sound components. This yields an improved sound source localization performance in challenging acoustic environments with large reverberation time.
