Non-linear Spatial Filtering for Multi-channel Speech Enhancement

A large part of human speech communication takes place in noisy environments and is supported by technical devices. For example, a hearing-impaired person might use a hearing aid to take part in a conversation in a busy restaurant. These devices, but also telecommunication in noisy environments or voiced-controlled assistants, make use of speech enhancement and separation algorithms that improve the quality and intelligibility of speech by separating speakers and suppressing background noise as well as other unwanted effects such as reverberation. If the devices are equipped with more than one microphone, which is very common nowadays, then multi-channel speech enhancement approaches can leverage spatial information in addition to single-channel tempo-spectral information to perform the task. Traditionally, linear spatial filters, so-called beamformers, have been employed to suppress the signal components from other than the target direction and thereby enhance the desired ...

Tesch, Kristina — Universität Hamburg


Speech derereverberation in noisy environments using time-frequency domain signal models

Reverberation is the sum of reflected sound waves and is present in any conventional room. Speech communication devices such as mobile phones in hands-free mode, tablets, smart TVs, teleconferencing systems, hearing aids, voice-controlled systems, etc. use one or more microphones to pick up the desired speech signals. When the microphones are not in the proximity of the desired source, strong reverberation and noise can degrade the signal quality at the microphones and can impair the intelligibility and the performance of automatic speech recognizers. Therefore, it is a highly demanded task to process the microphone signals such that reverberation and noise are reduced. The process of reducing or removing reverberation from recorded signals is called dereverberation. As dereverberation is usually a completely blind problem, where the only available information are the microphone signals, and as the acoustic scenario can be non-stationary, ...

Braun, Sebastian — Friedrich-Alexander Universität Erlangen-Nürnberg


Deep Learning for Distant Speech Recognition

Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. ...

Ravanelli, Mirco — Fondazione Bruno Kessler


Robust Direction-of-Arrival estimation and spatial filtering in noisy and reverberant environments

The advent of multi-microphone setups on a plethora of commercial devices in recent years has generated a newfound interest in the development of robust microphone array signal processing methods. These methods are generally used to either estimate parameters associated with acoustic scene or to extract signal(s) of interest. In most practical scenarios, the sources are located in the far-field of a microphone array where the main spatial information of interest is the direction-of-arrival (DOA) of the plane waves originating from the source positions. The focus of this thesis is to incorporate robustness against either lack of or imperfect/erroneous information regarding the DOAs of the sound sources within a microphone array signal processing framework. The DOAs of sound sources is by itself important information, however, it is most often used as a parameter for a subsequent processing method. One of the ...

Chakrabarty, Soumitro — Friedrich-Alexander Universität Erlangen-Nürnberg


Wireless Localization via Learned Channel Features in Massive MIMO Systems

Future wireless networks will evolve to integrate communication, localization, and sensing capabilities. This evolution is driven by emerging application platforms such as digital twins, on the one hand, and advancements in wireless technologies, on the other, characterized by increased bandwidths, more antennas, and enhanced computational power. Crucial to this development is the application of artificial intelligence (AI), which is set to harness the vast amounts of available data in the sixth-generation (6G) of mobile networks and beyond. Integrating AI and machine learning (ML) algorithms, in particular, with wireless localization offers substantial opportunities to refine communication systems, improve the ability of wireless networks to locate the users precisely, enable context-aware transmission, and utilize processing and energy resources more efficiently. In this dissertation, advanced ML algorithms for enhanced wireless localization are proposed. Motivated by the capabilities of deep neural networks (DNNs) and ...

Artan Salihu — TU Wien


Sparse Multi-Channel Linear Prediction for Blind Speech Dereverberation

In many speech communication applications, such as hands-free telephony and hearing aids, the microphones are located at a distance from the speaker. Therefore, in addition to the desired speech signal, the microphone signals typically contain undesired reverberation and noise, caused by acoustic reflections and undesired sound sources. Since these disturbances tend to degrade the quality of speech communication, decrease speech intelligibility and negatively affect speech recognition, efficient dereverberation and denoising methods are required. This thesis deals with blind dereverberation methods, not requiring any knowledge about the room impulse responses between the speaker and the microphones. More specifically, we propose a general framework for blind speech dereverberation based on multi-channel linear prediction (MCLP) and exploiting sparsity of the speech signal in the time-frequency domain.

Jukić, Ante — University of Oldenburg


Data-driven Speech Enhancement: from Non-negative Matrix Factorization to Deep Representation Learning

In natural listening environments, speech signals are easily distorted by variousacoustic interference, which reduces the speech quality and intelligibility of human listening; meanwhile, it makes difficult for many speech-related applications, such as automatic speech recognition (ASR). Thus, many speech enhancement (SE) algorithms have been developed in the past decades. However, most current SE algorithms are difficult to capture underlying speech information (e.g., phoneme) in the SE process. This causes it to be challenging to know what specific information is lost or interfered with in the SE process, which limits the application of enhanced speech. For instance, some SE algorithms aimed to improve human listening usually damage the ASR system. The objective of this dissertation is to develop SE algorithms that have the potential to capture various underlying speech representations (information) and improve the quality and intelligibility of noisy speech. This ...

Xiang, Yang — Aalborg University, Capturi A/S


Non-intrusive Quality Evaluation of Speech Processed in Noisy and Reverberant Environments

In many speech applications such as hands-free telephony or voice-controlled home assistants, the distance between the user and the recording microphones can be relatively large. In such a far-field scenario, the recorded microphone signals are typically corrupted by noise and reverberation, which may severely degrade the performance of speech recognition systems and reduce intelligibility and quality of speech in communication applications. In order to limit these effects, speech enhancement algorithms are typically applied. The main objective of this thesis is to develop novel speech enhancement algorithms for noisy and reverberant environments and signal-based measures to evaluate these algorithms, focusing on solutions that are applicable in realistic scenarios. First, we propose a single-channel speech enhancement algorithm for joint noise and reverberation reduction. The proposed algorithm uses a spectral gain to enhance the input signal, where the gain is computed using a ...

Cauchi, Benjamin — University of Oldenburg


Low-complexity acoustic echo cancellation and model-based residual echo suppression

Hands-free speech communication devices, typically equipped with multiple microphones and loudspeakers, are used for a wide variety of applications, such as teleconferencing, in-car communication and personal assistants. In addition to capturing the desired speech from the user, the microphones pick up undesired interferences such as background noise and acoustic echo due to the acoustic coupling between the loudspeakers and the microphones. These interferences typically degrade speech quality and intelligibility, and negatively affect the performance of automatic speech recognition systems. Acoustic echo control systems typically employ a combination of acoustic echo cancellation (AEC) and residual echo suppression (RES). An AEC system uses adaptive filters to compensate for the acoustic echo paths between the loudspeakers and the microphones. When short AEC filters are used to reduce computational complexity and increase convergence speed, this may lead to a significant amount of residual echo, ...

Naveen Kumar Desiraju — University of Oldenburg, Germany


Multi-microphone noise reduction and dereverberation techniques for speech applications

In typical speech communication applications, such as hands-free mobile telephony, voice-controlled systems and hearing aids, the recorded microphone signals are corrupted by background noise, room reverberation and far-end echo signals. This signal degradation can lead to total unintelligibility of the speech signal and decreases the performance of automatic speech recognition systems. In this thesis several multi-microphone noise reduction and dereverberation techniques are developed. In Part I we present a Generalised Singular Value Decomposition (GSVD) based optimal filtering technique for enhancing multi-microphone speech signals which are degraded by additive coloured noise. Several techniques are presented for reducing the computational complexity and we show that the GSVD-based optimal filtering technique can be integrated into a `Generalised Sidelobe Canceller' type structure. Simulations show that the GSVD-based optimal filtering technique achieves a larger signal-to-noise ratio improvement than standard fixed and adaptive beamforming techniques and ...

Doclo, Simon — Katholieke Universiteit Leuven


Dereverberation and noise reduction techniques based on acoustic multi-channel equalization

In many hands-free speech communication applications such as teleconferencing or voice-controlled applications, the recorded microphone signals do not only contain the desired speech signal, but also attenuated and delayed copies of the desired speech signal due to reverberation as well as additive background noise. Reverberation and background noise cause a signal degradation which can impair speech intelligibility and decrease the performance for many signal processing techniques. Acoustic multi-channel equalization techniques, which aim at inverting or reshaping the measured or estimated room impulse responses between the speech source and the microphone array, comprise an attractive approach to speech dereverberation since in theory perfect dereverberation can be achieved. However in practice, such techniques suffer from several drawbacks, such as uncontrolled perceptual effects, sensitivity to perturbations in the measured or estimated room impulse responses, and background noise amplification. The aim of this thesis ...

Kodrasi, Ina — University of Oldenburg


Perceptually Motivated Speech Enhancement

Speech Enhancement (SE) is a vital technology for online human communication. Applications of Deep Neural Network (DNN) technologies in concert with traditional signal processing approaches to the task have revolutionised both the research and implementation of SE in recent years. However, the training objective of these Neural Network Speech Enhancement (NNSE) systems generally do not consider the psychoacoustic processing which occurs in the human auditory system. As a result, enhanced audio can often contain auditory artefacts which degrade the perceptual quality or intelligibility of the speech. To overcome this, systems which directly incorporate psychoacoustically motivated measures into the training objectives of NNSE systems have been proposed. A key development in speech audio processing in recent years is the emergence of Self Supervised Speech Representation (SSSR) models. These are powerful foundational DNN models which can be utilised for a number of ...

Close, George — University of Sheffield


Cognitive-driven speech enhancement using EEG-based auditory attention decoding for hearing aid applications

Identifying the target speaker in hearing aid applications is an essential ingredient to improve speech intelligibility. Although several speech enhancement algorithms are available to reduce background noise or to perform source separation in multi-speaker scenarios, their performance depends on correctly identifying the target speaker to be enhanced. Recent advances in electroencephalography (EEG) have shown that it is possible to identify the target speaker which the listener is attending to using single-trial EEG-based auditory attention decoding (AAD) methods. However, in realistic acoustic environments the AAD performance is influenced by undesired disturbances such as interfering speakers, noise and reverberation. In addition, it is important for real-world hearing aid applications to close the AAD loop by presenting on-line auditory feedback. This thesis deals with the problem of identifying and enhancing the target speaker in realistic acoustic environments based on decoding the auditory attention ...

Aroudi, Ali — University of Oldenburg, Germany


Spatio-Temporal Speech Enhancement in Adverse Acoustic Conditions

Never before has speech been captured as often by electronic devices equipped with one or multiple microphones, serving a variety of applications. It is the key aspect in digital telephony, hearing devices, and voice-driven human-to-machine interaction. When speech is recorded, the microphones also capture a variety of further, undesired sound components due to adverse acoustic conditions. Interfering speech, background noise and reverberation, i.e. the persistence of sound in a room after excitation caused by a multitude of reflections on the room enclosure, are detrimental to the quality and intelligibility of target speech as well as the performance of automatic speech recognition. Hence, speech enhancement aiming at estimating the early target-speech component, which contains the direct component and early reflections, is crucial to nearly all speech-related applications presently available. In this thesis, we compare, propose and evaluate existing and novel approaches ...

Dietzen, Thomas — KU Leuven


Robust speaker verification with reverberation suppression and spoofing detection

Since speaker verification systems (SV) are typically used for access control, the business decision about their deployment is largely dependent on their efficacy and security. This dissertation investigates methods aiming to increase the robustness of SV towards fraud attempts and reverberation mismatch between speaker enrollment and verification. The impact of reverberation is investigated first in the context of speaker verification in a room with randomly distributed microphones. The main contributions of this study include the preferred microphone selection strategy for training and testing of an SV system and a novel feature extraction method, which integrates reverberation robust features with features extracted from a dereverberated signal. The results of the experiments, conducted using all major speaker modeling methods, confirm that such integration provides a higher SV efficacy improvement compared to other existing methods. A large part of this thesis is dedicated ...

Witkowski, Marcin — AGH University of Krakow

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.