Distributed Localization and Tracking of Acoustic Sources
Localization, separation and tracking of acoustic sources are ancient challenges that lots of animals and human beings are doing intuitively and sometimes with an impressive accuracy. Artificial methods have been developed for various applications and conditions. The majority of those methods are centralized, meaning that all signals are processed together to produce the estimation results. The concept of distributed sensor networks is becoming more realistic as technology advances in the fields of nano-technology, micro electro-mechanic systems (MEMS) and communication. A distributed sensor network comprises scattered nodes which are autonomous, self-powered modules consisting of sensors, actuators and communication capabilities. A variety of layout and connectivity graphs are usually used. Distributed sensor networks have a broad range of applications, which can be categorized in ecology, military, environment monitoring, medical, security and surveillance. In this dissertation we develop algorithms for distributed sensor networks with applications to speech processing, but some of the techniques can be applied also for other applications. Such wireless acoustic sensor networks (WASNs) can be found useful in lots of modern scenarios. The first example that can be dealt is ambient immerse communications. Nowadays, almost everyone carries `his/her personal microphones’ as part of the cellular phone, laptop computer or tablet. These spatially distributed sensors allow exploitation of spatial information in addition to spectro-temporal information. Spatial information in this context relates to location of active speakers and other acoustic sources. These sensors make the establishment of an ad-hoc (distributed) microphones network feasible and allow the application of sophisticated signal extraction algorithms without the need to install expensive audio systems. A second example is smart homes that became very popular in the recent years. Intelligent networks of microphones are crucial components for control and monitoring systems as well as for communication in emergency cases. The last example to be mentioned here is law enforcement. Authorities like the police or homeland security use eavesdropping and acoustic surveillance of public spaces as part of their regular procedure. This is usually done under adverse conditions. The availability of only partial information in the nodes, the dynamics of the network, and the limited communication, connectivity and power capabilities call upon developing novel algorithms that address these challenges. The latter challenges are typical to distributed algorithms and cannot be found in classical array processing algorithms. The contribution of the dissertation is fivefold. Firstly, distributed localization algorithms are derived using a novel set of hidden variables that are estimated by static or dynamic microphone arrays. It turns out that in addition to distributed computation, the new set of hidden variables improves the convergence speed and accuracy compared to previous approaches, since they enable the usage of incremental expectation-maximization (IEM) principle for the spatial domain. The distributed localization algorithms developed covers the batch EM and the on-line recursive expectation-maximization (REM). Secondly, we developed a few localization techniques that reduces significantly the effect of reverberation on the performance. Processing that emphasizes the direct path is integrated with our modified localization algorithm and is shown to improve the performance especially when the number of concurrent speakers in the room is increased. In order to strengthen the node of two microphones, we have shown that instead of the pair-wise relative phase ratio (PRP we can use the raw samples themselves with any known microphone geometry. Those samples can be processed within a new model that takes into account the late tail of the reverberation in addition to the direct path. Thirdly, we have found out that the localization results of the mentioned algorithms can be utilized also for blind source separation (BSS). A major contribution for the separation algorithms is the hidden variables used for the EM mechanism. They were proven to be very efficient spectral masks, since their physical meaning is association of time-frequency bins to various speakers. Fourthly, a major challenge with ad hoc networks is that the arrays locations are not known. We suggest a solution for joint calibration of the arrays and localization of the sources. They are all estimated relatively to an anchor array. Finally, we address dynamic problems. Distributed tracking based on the recursive distributed expectation-maximization (RDEM) algorithm is described first for static arrays. Tracking multiple concurrent speakers is highly challenging, since the signals and the room impulse responses (RIRs) are varying in a complex way. For example, the speakers do not utter speech continuously, but they might move continuously. It means that there are time gaps that need to be filled or extrapolated. A possible way to deal with those gaps is to utilize future information about the speakers. A short delay enables adding non-causal processing to the classical non-Bayesian tracking mechanism. Another problem (actually more realistic) is localization and tracking speakers using dynamic arrays of microphones. The movement of a microphones pair is utilized to localize and track speakers using Bayesian and non-Bayesian techniques.
