Flexible Multi-Microphone Acquisition and Processing of Spatial Sound Using Parametric Sound Field Representations

This thesis deals with the efficient and flexible acquisition and processing of spatial sound using multiple microphones. In spatial sound acquisition and processing, we use multiple microphones to capture the sound of multiple sources being simultaneously active at a rever- berant recording side and process the sound depending on the application at the application side. Typical applications include source extraction, immersive spatial sound reproduction, or speech enhancement. A flexible sound acquisition and processing means that we can capture the sound with almost arbitrary microphone configurations without constraining the application at the ap- plication side. This means that we can realize and adjust the different applications indepen- dently of the microphone configuration used at the recording side. For example in spatial sound reproduction, where we aim at reproducing the sound such that the listener perceives the same impression as if he ...

Thiergart, Oliver — Friedrich-Alexander-Universitat Erlangen-Nurnberg


Spherical Microphone Array Processing for Acoustic Parameter Estimation and Signal Enhancement

In many distant speech acquisition scenarios, such as hands-free telephony or teleconferencing, the desired speech signal is corrupted by noise and reverberation. This degrades both the speech quality and intelligibility, making communication difficult or even impossible. Speech enhancement techniques seek to mitigate these effects and extract the desired speech signal. This objective is commonly achieved through the use of microphone arrays, which take advantage of the spatial properties of the sound field in order to reduce noise and reverberation. Spherical microphone arrays, where the microphones are arranged in a spherical configuration, usually mounted on a rigid baffle, are able to analyze the sound field in three dimensions; the captured sound field can then be efficiently described in the spherical harmonic domain (SHD). In this thesis, a number of novel spherical array processing algorithms are proposed, based in the SHD. In ...

Jarrett, Daniel P. — Imperial College London


Synthetic reproduction of head-related transfer functions by using microphone arrays

Spatial hearing for human listeners is based on the interaural as well as on the monaural analysis of the signals arriving at both ears, enabling the listeners to assign certain spatial components to these signals. This spatial aspect gets lost when the signals are reproduced via headphones without considering the acoustical influence of the head and torso, i.e. head-related transfer function (HRTFs). A common procedure to take into account spatial aspects in a binaural reproduction is to use so-called artificial heads. Artificial heads are replicas of a human head and torso with average anthropometric geometries and built-in microphones in the ears. Although, the signals recorded with artificial heads contain relevant spatial aspects, binaural recordings using artificial heads often suffer from front-back confusions and the perception of the sound source being inside the head (internalization). These shortcomings can be attributed to ...

Rasumow, Eugen — University of Oldenburg


Fundamental Frequency and Direction-of-Arrival Estimation for Multichannel Speech Enhancement

Audio systems receive the speech signals of interest usually in the presence of noise. The noise has profound impacts on the quality and intelligibility of the speech signals, and it is therefore clear that the noisy signals must be cleaned up before being played back, stored, or analyzed. We can estimate the speech signal of interest from the noisy signals using a priori knowledge about it. A human speech signal is broadband and consists of both voiced and unvoiced parts. The voiced part is quasi-periodic with a time-varying fundamental frequency (or pitch as it is commonly referred to). We consider the periodic signals basically as the sum of harmonics. Therefore, we can pass the noisy signals through bandpass filters centered at the frequencies of the harmonics to enhance the signal. In addition, although the frequencies of the harmonics are the ...

Karimian-Azari, Sam — Aalborg Univeristy


Implementation of the radiation characteristics of musical instruments in wave field synthesis applications

In this thesis a method to implement the radiation characteristics of musical instruments in wave field synthesis systems is developed. It is applied and tested in two loudspeaker systems. Because the loudspeaker systems have a comparably low number of loudspeakers the wave field is synthesized at discrete listening positions by solving a linear equation system. Thus, for every constellation of listening and source position all loudspeakers can be used for the synthesis. The calculations are done in spectral domain, denying sound propagation velocity at first. This approach causes artefacts in the loudspeaker signals and synthesis errors in the listening area which are compensated by means of psychoacoustic methods. With these methods the aliasing frequency is determined by the extent of the listening area whereas in other wave field synthesis systems it is determined by the distance of adjacent loudspeakers. Musical ...

Ziemer, Tim — University of Hamburg


Mixed structural models for 3D audio in virtual environments

In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...

Geronazzo, Michele — University of Padova


A multimicrophone approach to speech processing in a smart-room environment

Recent advances in computer technology and speech and language processing have made possible that some new ways of person-machine communication and computer assistance to human activities start to appear feasible. Concretely, the interest on the development of new challenging applications in indoor environments equipped with multiple multimodal sensors, also known as smart-rooms, has considerably grown. In general, it is well-known that the quality of speech signals captured by microphones that can be located several meters away from the speakers is severely distorted by acoustic noise and room reverberation. In the context of the development of hands-free speech applications in smart-room environments, the use of obtrusive sensors like close-talking microphones is usually not allowed, and consequently, speech technologies must operate on the basis of distant-talking recordings. In such conditions, speech technologies that usually perform reasonably well in free of noise and ...

Abad, Alberto — Universitat Politecnica de Catalunya


Application of Sound Source Separation Methods to Advanced Spatial Audio Systems

This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in two-channel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to ...

Cobos, Maximo — Universidad Politecnica de Valencia


Some Contributions to Adaptive Filtering for Acoustic Multiple-Input/Multiple-Output Systems in the Wave Domain

Recently emerging techniques like wave field synthesis (WFS) or Higher-Order Ambisonics (HOA) allow for high-quality spatial audio reproduction, which makes them candidates for the audio reproduction in future telepresence systems or interactive gaming environments with acoustic human-machine interfaces. In such scenarios, acoustic echo cancellation (AEC) will generally be necessary to remove the loudspeaker echoes in the recorded microphone signals before further processing. Moreover, the reproduction quality of WFS or HOA can be improved by adaptive pre-equalization of the loudspeaker signals, as facilitated by listening room equalization (LRE). However, AEC and LRE require adaptive filters, where the large number of reproduction channels of WFS and HOA imply major computational and algorithmic challenges for the implementation of adaptive filters. A technique called wave-domain adaptive filtering (WDAF) promises to master these challenges. However, known literature is still far away from providing sufficient insight ...

Schneider, Martin — Friedrich-Alexander-University Erlangen-Nuremberg


Array Signal Processing Algorithms for Beamforming and Direction Finding

Array processing is an area of study devoted to processing the signals received from an antenna array and extracting information of interest. It has played an important role in widespread applications like radar, sonar, and wireless communications. Numerous adaptive array processing algorithms have been reported in the literature in the last several decades. These algorithms, in a general view, exhibit a trade-off between performance and required computational complexity. In this thesis, we focus on the development of array processing algorithms in the application of beamforming and direction of arrival (DOA) estimation. In the beamformer design, we employ the constrained minimum variance (CMV) and the constrained constant modulus (CCM) criteria to propose full-rank and reduced-rank adaptive algorithms. Specifically, for the full-rank algorithms, we present two low-complexity adaptive step size mechanisms with the CCM criterion for the step size adaptation of the ...

Lei Wang — University of York


Distributed Signal Processing Algorithms for Multi-Task Wireless Acoustic Sensor Networks

Recent technological advances in analogue and digital electronics as well as in hardware miniaturization have taken wireless sensing devices to another level by introducing low-power communication protocols, improved digital signal processing capabilities and compact sensors. When these devices perform a certain pre-defined signal processing task such as the estimation or detection of phenomena of interest, a cooperative scheme through wireless connections can significantly enhance the overall performance, especially in adverse conditions. The resulting network consisting of such connected devices (or nodes) is referred to as a wireless sensor network (WSN). In acoustical applications (e.g., speech enhancement) a variant of WSNs, called wireless acoustic sensor networks (WASNs) can be employed in which the sensing unit at each node consists of a single microphone or a microphone array. The nodes of such a WASN can then cooperate to perform a multi-channel acoustic ...

Hassani, Amin — KU Leuven


Embedded Optimization Algorithms for Perceptual Enhancement of Audio Signals

This thesis investigates the design and evaluation of an embedded optimization framework for the perceptual enhancement of audio signals which are degraded by linear and/or nonlinear distortion. In general, audio signal enhancement has the goal to improve the perceived audio quality, speech intelligibility, or another desired perceptual attribute of the distorted audio signal by applying a real-time digital signal processing algorithm. In the designed embedded optimization framework, the audio signal enhancement problem under consideration is formulated and solved as a per-frame numerical optimization problem, allowing to compute the enhanced audio signal frame that is optimal according to a desired perceptual attribute. The first stage of the embedded optimization framework consists in the formulation of the per-frame optimization problem aimed at maximally enhancing the desired perceptual attribute, by explicitly incorporating a suitable model of human sound perception. The second stage of ...

Defraene, Bruno — KU Leuven


Post-Filter Optimization for Multichannel Automotive Speech Enhancement

In an automotive environment, quality of speech communication using a hands-free equipment is often deteriorated by interfering car noise. In order to preserve the speech signal without car noise, a multichannel speech enhancement system including a beamformer and a post-filter can be applied. Since employing a beamformer alone is insufficient to substantially reducing the level of car noise, a post-filter has to be applied to provide further noise reduction, especially at low frequencies. In this thesis, two novel post-filter designs along with their optimization for different driving conditions are presented. The first post-filter design utilizes an adaptive smoothing factor for the power spectral density estimation as well as a hybrid noise coherence function. The hybrid noise coherence function is a mixture of the diffuse and the measured noise coherence functions for a specific driving condition. The second post-filter design applies ...

Yu, Huajun — Technische Universität Braunschweig


Integrating monaural and binaural cues for sound localization and segregation in reverberant environments

The problem of segregating a sound source of interest from an acoustic background has been extensively studied due to applications in hearing prostheses, robust speech/speaker recognition and audio information retrieval. Computational auditory scene analysis (CASA) approaches the segregation problem by utilizing grouping cues involved in the perceptual organization of sound by human listeners. Binaural processing, where input signals resemble those that enter the two ears, is of particular interest in the CASA field. The dominant approach to binaural segregation has been to derive spatially selective filters in order to enhance the signal in a direction of interest. As such, the problems of sound localization and sound segregation are closely tied. While spatial filtering has been widely utilized, substantial performance degradation is incurred in reverberant environments and more fundamentally, segregation cannot be performed without sufficient spatial separation between sources. This dissertation ...

Woodruff, John — The Ohio State University


Exploiting Prior Information in Parametric Estimation Problems for Multi-Channel Signal Processing Applications

This thesis addresses a number of problems all related to parameter estimation in sensor array processing. The unifying theme is that some of these parameters are known before the measurements are acquired. We thus study how to improve the estimation of the unknown parameters by incorporating the knowledge of the known parameters; exploiting this knowledge successfully has the potential to dramatically improve the accuracy of the estimates. For covariance matrix estimation, we exploit that the true covariance matrix is Kronecker and Toeplitz structured. We then devise a method to ascertain that the estimates possess this structure. Additionally, we can show that our proposed estimator has better performance than the state-of-art when the number of samples is low, and that it is also efficient in the sense that the estimates have Cramér-Rao lower Bound (CRB) equivalent variance. In the direction of ...

Wirfält, Petter — KTH Royal Institute of Technology

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.