Speech derereverberation in noisy environments using time-frequency domain signal models

Reverberation is the sum of reflected sound waves and is present in any conventional room. Speech communication devices such as mobile phones in hands-free mode, tablets, smart TVs, teleconferencing systems, hearing aids, voice-controlled systems, etc. use one or more microphones to pick up the desired speech signals. When the microphones are not in the proximity of the desired source, strong reverberation and noise can degrade the signal quality at the microphones and can impair the intelligibility and the performance of automatic speech recognizers. Therefore, it is a highly demanded task to process the microphone signals such that reverberation and noise are reduced. The process of reducing or removing reverberation from recorded signals is called dereverberation. As dereverberation is usually a completely blind problem, where the only available information are the microphone signals, and as the acoustic scenario can be non-stationary, ...

Braun, Sebastian — Friedrich-Alexander Universität Erlangen-Nürnberg


Solving inverse problems in room acoustics using physical models, sparse regularization and numerical optimization

Reverberation consists of a complex acoustic phenomenon that occurs inside rooms. Many audio signal processing methods, addressing source localization, signal enhancement and other tasks, often assume absence of reverberation. Consequently, reverberant environments are considered challenging as state-ofthe-art methods can perform poorly. The acoustics of a room can be described using a variety of mathematical models, among which, physical models are the most complete and accurate. The use of physical models in audio signal processing methods is often non-trivial since it can lead to ill-posed inverse problems. These inverse problems require proper regularization to achieve meaningful results and involve the solution of computationally intensive large-scale optimization problems. Recently, however, sparse regularization has been applied successfully to inverse problems arising in different scientific areas. The increased computational power of modern computers and the development of new efficient optimization algorithms makes it possible ...

Antonello, Niccolò — KU Leuven


Feedback Delay Networks in Artificial Reverberation and Reverberation Enhancement

In today's audio production and reproduction as well as in music performance practices it has become common practice to alter reverberation artificially through electronics or electro-acoustics. For music productions, radio plays, and movie soundtracks, the sound is often captured in small studio spaces with little to no reverberation to save real estate and to ensure a controlled environment such that the artistically intended spatial impression can be added during post-production. Spatial sound reproduction systems require flexible adjustment of artificial reverberation to the diffuse sound portion to help the reconstruction of the spatial impression. Many modern performance spaces are multi-purpose, and the reverberation needs to be adjustable to the desired performance style. Employing electro-acoustic feedback, also known as Reverberation Enhancement Systems (RESs), it is possible to extend the physical to the desired reverberation. These examples demonstrate a wide range of applications ...

Schlecht, Sebastian Jiro — Friedrich-Alexander-Universität Erlangen-Nürnberg


Informed spatial filters for speech enhancement

In modern devices which provide hands-free speech capturing functionality, such as hands-free communication kits and voice-controlled devices, the received speech signal at the microphones is corrupted by background noise, interfering speech signals, and room reverberation. In many practical situations, the microphones are not necessarily located near the desired source, and hence, the ratio of the desired speech power to the power of the background noise, the interfering speech, and the reverberation at the microphones can be very low, often around or even below 0 dB. In such situations, the comfort of human-to-human communication, as well as the accuracy of automatic speech recognisers for voice-controlled applications can be signi cantly degraded. Therefore, e ffective speech enhancement algorithms are required to process the microphone signals before transmitting them to the far-end side for communication, or before feeding them into a speech recognition ...

Taseska, Maja — Friedrich-Alexander Universität Erlangen-Nürnberg


Performance Improvement of Multichannel Audio by Graphics Processing Units

Multichannel acoustic signal processing has undergone major development in recent years due to the increased complexity of current audio processing applications. People want to collaborate through communication with the feeling of being together and sharing the same environment, what is considered as Immersive Audio Schemes. In this phenomenon, several acoustic effects are involved: 3D spatial sound, room compensation, crosstalk cancelation, sound source localization, among others. However, high computing capacity is required to achieve any of these effects in a real large-scale system, what represents a considerable limitation for real-time applications. The increase of the computational capacity has been historically linked to the number of transistors in a chip. However, nowadays the improvements in the computational capacity are mainly given by increasing the number of processing units, i.e expanding parallelism in computing. This is the case of the Graphics Processing Units ...

Belloch, Jose A. — Universitat Politècnica de València


Mixed structural models for 3D audio in virtual environments

In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...

Geronazzo, Michele — University of Padova


Spatio-Temporal Speech Enhancement in Adverse Acoustic Conditions

Never before has speech been captured as often by electronic devices equipped with one or multiple microphones, serving a variety of applications. It is the key aspect in digital telephony, hearing devices, and voice-driven human-to-machine interaction. When speech is recorded, the microphones also capture a variety of further, undesired sound components due to adverse acoustic conditions. Interfering speech, background noise and reverberation, i.e. the persistence of sound in a room after excitation caused by a multitude of reflections on the room enclosure, are detrimental to the quality and intelligibility of target speech as well as the performance of automatic speech recognition. Hence, speech enhancement aiming at estimating the early target-speech component, which contains the direct component and early reflections, is crucial to nearly all speech-related applications presently available. In this thesis, we compare, propose and evaluate existing and novel approaches ...

Dietzen, Thomas — KU Leuven


Efficient parametric modeling, identification and equalization of room acoustics

Room acoustic signal enhancement (RASE) applications, such as digital equalization, acoustic echo and feedback cancellation, which are commonly found in communication devices and audio equipment, aim at processing the acoustic signals with the final goal of improving the perceived sound quality in rooms. In order to do so, signal processing algorithms require the acoustic response of the room to be represented by means of parametric models and to be identified from the input and output signals of the room acoustic system. In particular, a good model should be both accurate, thus capturing those features of room acoustics that are physically and perceptually most relevant, and efficient, so that it can be implemented as a digital filter and used in practical signal processing tasks. This thesis addresses the fundamental question in room acoustic signal processing concerning the appropriateness of different parametric ...

Vairetti, Giacomo — KU Leuven


Application of Sound Source Separation Methods to Advanced Spatial Audio Systems

This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in two-channel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to ...

Cobos, Maximo — Universidad Politecnica de Valencia


Acoustic sensor network geometry calibration and applications

In the modern world, we are increasingly surrounded by computation devices with communication links and one or more microphones. Such devices are, for example, smartphones, tablets, laptops or hearing aids. These devices can work together as nodes in an acoustic sensor network (ASN). Such networks are a growing platform that opens the possibility for many practical applications. ASN based speech enhancement, source localization, and event detection can be applied for teleconferencing, camera control, automation, or assisted living. For this kind of applications, the awareness of auditory objects and their spatial positioning are key properties. In order to provide these two kinds of information, novel methods have been developed in this thesis. Information on the type of auditory objects is provided by a novel real-time sound classification method. Information on the position of human speakers is provided by a novel localization ...

Plinge, Axel — TU Dortmund University


Acoustic echo reduction for multiple loudspeakers and microphones: Complexity reduction and convergence enhancement

Modern devices such as mobile phones, tablets or smart speakers are commonly equipped with several loudspeakers and microphones. If, for instance, one employs such a device for hands-free communication applications, the signals that are reproduced by the loudspeakers are propagated through the room and are inevitably acquired by the microphones. If no processing is applied, the participants in the far-end room receive delayed reverberated replicas of their own voice, which strongly degrades both speech intelligibility and user comfort. In order to prevent that so-called acoustic echoes are transmitted back to the far-end room, acoustic echo cancelers are commonly employed. The latter make use of adaptive filtering techniques to identify the propagation paths between loudspeakers and microphones. The estimated propagation paths are then employed to compute acoustic echo estimates, which are finally subtracted from the signals acquired by the microphones. In ...

Luis Valero, Maria — International Audio Laboratories Erlangen


Robust Direction-of-Arrival estimation and spatial filtering in noisy and reverberant environments

The advent of multi-microphone setups on a plethora of commercial devices in recent years has generated a newfound interest in the development of robust microphone array signal processing methods. These methods are generally used to either estimate parameters associated with acoustic scene or to extract signal(s) of interest. In most practical scenarios, the sources are located in the far-field of a microphone array where the main spatial information of interest is the direction-of-arrival (DOA) of the plane waves originating from the source positions. The focus of this thesis is to incorporate robustness against either lack of or imperfect/erroneous information regarding the DOAs of the sound sources within a microphone array signal processing framework. The DOAs of sound sources is by itself important information, however, it is most often used as a parameter for a subsequent processing method. One of the ...

Chakrabarty, Soumitro — Friedrich-Alexander Universität Erlangen-Nürnberg


Flexible Multi-Microphone Acquisition and Processing of Spatial Sound Using Parametric Sound Field Representations

This thesis deals with the efficient and flexible acquisition and processing of spatial sound using multiple microphones. In spatial sound acquisition and processing, we use multiple microphones to capture the sound of multiple sources being simultaneously active at a rever- berant recording side and process the sound depending on the application at the application side. Typical applications include source extraction, immersive spatial sound reproduction, or speech enhancement. A flexible sound acquisition and processing means that we can capture the sound with almost arbitrary microphone configurations without constraining the application at the ap- plication side. This means that we can realize and adjust the different applications indepen- dently of the microphone configuration used at the recording side. For example in spatial sound reproduction, where we aim at reproducing the sound such that the listener perceives the same impression as if he ...

Thiergart, Oliver — Friedrich-Alexander-Universitat Erlangen-Nurnberg


Multi-microphone noise reduction and dereverberation techniques for speech applications

In typical speech communication applications, such as hands-free mobile telephony, voice-controlled systems and hearing aids, the recorded microphone signals are corrupted by background noise, room reverberation and far-end echo signals. This signal degradation can lead to total unintelligibility of the speech signal and decreases the performance of automatic speech recognition systems. In this thesis several multi-microphone noise reduction and dereverberation techniques are developed. In Part I we present a Generalised Singular Value Decomposition (GSVD) based optimal filtering technique for enhancing multi-microphone speech signals which are degraded by additive coloured noise. Several techniques are presented for reducing the computational complexity and we show that the GSVD-based optimal filtering technique can be integrated into a `Generalised Sidelobe Canceller' type structure. Simulations show that the GSVD-based optimal filtering technique achieves a larger signal-to-noise ratio improvement than standard fixed and adaptive beamforming techniques and ...

Doclo, Simon — Katholieke Universiteit Leuven


Acoustic Event Detection: Feature, Evaluation and Dataset Design

It takes more time to think of a silent scene, action or event than finding one that emanates sound. Not only speaking or playing music but almost everything that happens is accompanied with or results in one or more sounds mixed together. This makes acoustic event detection (AED) one of the most researched topics in audio signal processing nowadays and it will probably not see a decline anywhere in the near future. This is due to the thirst for understanding and digitally abstracting more and more events in life via the enormous amount of recorded audio through thousands of applications in our daily routine. But it is also a result of two intrinsic properties of audio: it doesn’t need a direct sight to be perceived and is less intrusive to record when compared to image or video. Many applications such ...

Mina Mounir — KU Leuven, ESAT STADIUS

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.