Speech derereverberation in noisy environments using time-frequency domain signal models

Reverberation is the sum of reflected sound waves and is present in any conventional room. Speech communication devices such as mobile phones in hands-free mode, tablets, smart TVs, teleconferencing systems, hearing aids, voice-controlled systems, etc. use one or more microphones to pick up the desired speech signals. When the microphones are not in the proximity of the desired source, strong reverberation and noise can degrade the signal quality at the microphones and can impair the intelligibility and the performance of automatic speech recognizers. Therefore, it is a highly demanded task to process the microphone signals such that reverberation and noise are reduced. The process of reducing or removing reverberation from recorded signals is called dereverberation. As dereverberation is usually a completely blind problem, where the only available information are the microphone signals, and as the acoustic scenario can be non-stationary, ...

Braun, Sebastian — Friedrich-Alexander Universität Erlangen-Nürnberg


Acoustic echo reduction for multiple loudspeakers and microphones: Complexity reduction and convergence enhancement

Modern devices such as mobile phones, tablets or smart speakers are commonly equipped with several loudspeakers and microphones. If, for instance, one employs such a device for hands-free communication applications, the signals that are reproduced by the loudspeakers are propagated through the room and are inevitably acquired by the microphones. If no processing is applied, the participants in the far-end room receive delayed reverberated replicas of their own voice, which strongly degrades both speech intelligibility and user comfort. In order to prevent that so-called acoustic echoes are transmitted back to the far-end room, acoustic echo cancelers are commonly employed. The latter make use of adaptive filtering techniques to identify the propagation paths between loudspeakers and microphones. The estimated propagation paths are then employed to compute acoustic echo estimates, which are finally subtracted from the signals acquired by the microphones. In ...

Luis Valero, Maria — International Audio Laboratories Erlangen


Adaptive filtering algorithms for acoustic echo cancellation and acoustic feedback control in speech communication applications

Multimedia consumer electronics are nowadays everywhere from teleconferencing, hands-free communications, in-car communications to smart TV applications and more. We are living in a world of telecommunication where ideal scenarios for implementing these applications are hard to find. Instead, practical implementations typically bring many problems associated to each real-life scenario. This thesis mainly focuses on two of these problems, namely, acoustic echo and acoustic feedback. On the one hand, acoustic echo cancellation (AEC) is widely used in mobile and hands-free telephony where the existence of echoes degrades the intelligibility and listening comfort. On the other hand, acoustic feedback limits the maximum amplification that can be applied in, e.g., in-car communications or in conferencing systems, before howling due to instability, appears. Even though AEC and acoustic feedback cancellation (AFC) are functional in many applications, there are still open issues. This means that ...

Gil-Cacho, Jose Manuel — KU Leuven


Cognitive Models for Acoustic and Audiovisual Sound Source Localization

Sound source localization algorithms have a long research history in the field of digital signal processing. Many common applications like intelligent personal assistants, teleconferencing systems and methods for technical diagnosis in acoustics require an accurate localization of sound sources in the environment. However, dynamic environments entail a particular challenge for these systems. For instance, voice controlled smart home applications, where the speaker, as well as potential noise sources, are moving within the room, are a typical example of dynamic environments. Classical sound source localization systems only have limited capabilities to deal with dynamic acoustic scenarios. In this thesis, three novel approaches to sound source localization that extend existing classical methods will be presented. The first system is proposed in the context of audiovisual source localization. Determining the position of sound sources in adverse acoustic conditions can be improved by including ...

Schymura, Christopher — Ruhr University Bochum


Flexible Multi-Microphone Acquisition and Processing of Spatial Sound Using Parametric Sound Field Representations

This thesis deals with the efficient and flexible acquisition and processing of spatial sound using multiple microphones. In spatial sound acquisition and processing, we use multiple microphones to capture the sound of multiple sources being simultaneously active at a rever- berant recording side and process the sound depending on the application at the application side. Typical applications include source extraction, immersive spatial sound reproduction, or speech enhancement. A flexible sound acquisition and processing means that we can capture the sound with almost arbitrary microphone configurations without constraining the application at the ap- plication side. This means that we can realize and adjust the different applications indepen- dently of the microphone configuration used at the recording side. For example in spatial sound reproduction, where we aim at reproducing the sound such that the listener perceives the same impression as if he ...

Thiergart, Oliver — Friedrich-Alexander-Universitat Erlangen-Nurnberg


Mixed structural models for 3D audio in virtual environments

In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...

Geronazzo, Michele — University of Padova


Cognitive-driven speech enhancement using EEG-based auditory attention decoding for hearing aid applications

Identifying the target speaker in hearing aid applications is an essential ingredient to improve speech intelligibility. Although several speech enhancement algorithms are available to reduce background noise or to perform source separation in multi-speaker scenarios, their performance depends on correctly identifying the target speaker to be enhanced. Recent advances in electroencephalography (EEG) have shown that it is possible to identify the target speaker which the listener is attending to using single-trial EEG-based auditory attention decoding (AAD) methods. However, in realistic acoustic environments the AAD performance is influenced by undesired disturbances such as interfering speakers, noise and reverberation. In addition, it is important for real-world hearing aid applications to close the AAD loop by presenting on-line auditory feedback. This thesis deals with the problem of identifying and enhancing the target speaker in realistic acoustic environments based on decoding the auditory attention ...

Aroudi, Ali — University of Oldenburg, Germany


Feedback Delay Networks in Artificial Reverberation and Reverberation Enhancement

In today's audio production and reproduction as well as in music performance practices it has become common practice to alter reverberation artificially through electronics or electro-acoustics. For music productions, radio plays, and movie soundtracks, the sound is often captured in small studio spaces with little to no reverberation to save real estate and to ensure a controlled environment such that the artistically intended spatial impression can be added during post-production. Spatial sound reproduction systems require flexible adjustment of artificial reverberation to the diffuse sound portion to help the reconstruction of the spatial impression. Many modern performance spaces are multi-purpose, and the reverberation needs to be adjustable to the desired performance style. Employing electro-acoustic feedback, also known as Reverberation Enhancement Systems (RESs), it is possible to extend the physical to the desired reverberation. These examples demonstrate a wide range of applications ...

Schlecht, Sebastian Jiro — Friedrich-Alexander-Universität Erlangen-Nürnberg


Performance Improvement of Multichannel Audio by Graphics Processing Units

Multichannel acoustic signal processing has undergone major development in recent years due to the increased complexity of current audio processing applications. People want to collaborate through communication with the feeling of being together and sharing the same environment, what is considered as Immersive Audio Schemes. In this phenomenon, several acoustic effects are involved: 3D spatial sound, room compensation, crosstalk cancelation, sound source localization, among others. However, high computing capacity is required to achieve any of these effects in a real large-scale system, what represents a considerable limitation for real-time applications. The increase of the computational capacity has been historically linked to the number of transistors in a chip. However, nowadays the improvements in the computational capacity are mainly given by increasing the number of processing units, i.e expanding parallelism in computing. This is the case of the Graphics Processing Units ...

Belloch, Jose A. — Universitat Politècnica de València


Solving inverse problems in room acoustics using physical models, sparse regularization and numerical optimization

Reverberation consists of a complex acoustic phenomenon that occurs inside rooms. Many audio signal processing methods, addressing source localization, signal enhancement and other tasks, often assume absence of reverberation. Consequently, reverberant environments are considered challenging as state-ofthe-art methods can perform poorly. The acoustics of a room can be described using a variety of mathematical models, among which, physical models are the most complete and accurate. The use of physical models in audio signal processing methods is often non-trivial since it can lead to ill-posed inverse problems. These inverse problems require proper regularization to achieve meaningful results and involve the solution of computationally intensive large-scale optimization problems. Recently, however, sparse regularization has been applied successfully to inverse problems arising in different scientific areas. The increased computational power of modern computers and the development of new efficient optimization algorithms makes it possible ...

Antonello, Niccolò — KU Leuven


Parametric spatial audio processing utilising compact microphone arrays

This dissertation focuses on the development of novel parametric spatial audio techniques using compact microphone arrays. Compact arrays are of special interest since they can be adapted to fit in portable devices, opening the possibility of exploiting the potential of immersive spatial audio algorithms in our daily lives. The techniques developed in this thesis consider the use of signal processing algorithms adapted for human listeners, thus exploiting the capabilities and limitations of human spatial hearing. The findings of this research are in the following three areas of spatial audio processing: directional filtering, spatial audio reproduction, and direction of arrival estimation. In directional filtering, two novel algorithms have been developed based on the cross-pattern coherence (CroPaC). The method essentially exploits the directional response of two different types of beamformers by using their cross-spectrum to estimate a soft masker. The soft masker ...

Delikaris-Manias, Symeon — Aalto University


On Adaptive Filtering in Oversampled Subbands

For a number of applications like acoustic echo cancellation, adaptive filters are required to identify very long impulse responses. To reduce the computational cost in implementations, adaptive filtering in subband is known to be beneficial. Based on a review of popular fullband adaptive filtering algorithms and various subband approaches, this thesis investigates the implementation, design, and limitations of oversampled subband adaptive filter systems based on modulated complex and real valued filter banks. The main aim is to achieve a computationally efficient implementation for adaptive filter systems, for which fast methods of performing both the subband decomposition and the subband processing are researched. Therefore, a highly efficient polyphase implementation of a complex valued modulated generalized DFT (GDFT) lter bank with a judicious selection of properties for non-integer oversampling ratios is introduced. By modification, a real valued single sideband modulated lter bank ...

Weiss, Stephan — University of Strathclyde


Efficient parametric modeling, identification and equalization of room acoustics

Room acoustic signal enhancement (RASE) applications, such as digital equalization, acoustic echo and feedback cancellation, which are commonly found in communication devices and audio equipment, aim at processing the acoustic signals with the final goal of improving the perceived sound quality in rooms. In order to do so, signal processing algorithms require the acoustic response of the room to be represented by means of parametric models and to be identified from the input and output signals of the room acoustic system. In particular, a good model should be both accurate, thus capturing those features of room acoustics that are physically and perceptually most relevant, and efficient, so that it can be implemented as a digital filter and used in practical signal processing tasks. This thesis addresses the fundamental question in room acoustic signal processing concerning the appropriateness of different parametric ...

Vairetti, Giacomo — KU Leuven


Some Contributions to Adaptive Filtering for Acoustic Multiple-Input/Multiple-Output Systems in the Wave Domain

Recently emerging techniques like wave field synthesis (WFS) or Higher-Order Ambisonics (HOA) allow for high-quality spatial audio reproduction, which makes them candidates for the audio reproduction in future telepresence systems or interactive gaming environments with acoustic human-machine interfaces. In such scenarios, acoustic echo cancellation (AEC) will generally be necessary to remove the loudspeaker echoes in the recorded microphone signals before further processing. Moreover, the reproduction quality of WFS or HOA can be improved by adaptive pre-equalization of the loudspeaker signals, as facilitated by listening room equalization (LRE). However, AEC and LRE require adaptive filters, where the large number of reproduction channels of WFS and HOA imply major computational and algorithmic challenges for the implementation of adaptive filters. A technique called wave-domain adaptive filtering (WDAF) promises to master these challenges. However, known literature is still far away from providing sufficient insight ...

Schneider, Martin — Friedrich-Alexander-University Erlangen-Nuremberg


Integrating monaural and binaural cues for sound localization and segregation in reverberant environments

The problem of segregating a sound source of interest from an acoustic background has been extensively studied due to applications in hearing prostheses, robust speech/speaker recognition and audio information retrieval. Computational auditory scene analysis (CASA) approaches the segregation problem by utilizing grouping cues involved in the perceptual organization of sound by human listeners. Binaural processing, where input signals resemble those that enter the two ears, is of particular interest in the CASA field. The dominant approach to binaural segregation has been to derive spatially selective filters in order to enhance the signal in a direction of interest. As such, the problems of sound localization and sound segregation are closely tied. While spatial filtering has been widely utilized, substantial performance degradation is incurred in reverberant environments and more fundamentally, segregation cannot be performed without sufficient spatial separation between sources. This dissertation ...

Woodruff, John — The Ohio State University

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.