Speech Enhancement Algorithms for Audiological Applications

The improvement of speech intelligibility is a traditional problem which still remains open and unsolved. The recent boom of applications such as hands-free communi- cations or automatic speech recognition systems and the ever-increasing demands of the hearing-impaired community have given a definitive impulse to the research in this area. This PhD thesis is focused on speech enhancement for audiological applications. Most of the research conducted in this thesis has been focused on the improvement of speech intelligibility in hearing aids, considering the variety of restrictions and limitations imposed by this type of devices. The combination of source separation techniques and spatial filtering with machine learning and evolutionary computation has originated novel and interesting algorithms which are included in this thesis. The thesis is divided in two main parts. The first one contains a preliminary study of the problem and a ...

Ayllón, David — Universidad de Alcalá

Some Contributions to Music Signal Processing and to Mono-Microphone Blind Audio Source Separation

For humans, the sound is valuable mostly for its meaning. The voice is spoken language, music, artistic intent. Its physiological functioning is highly developed, as well as our understanding of the underlying process. It is a challenge to replicate this analysis using a computer: in many aspects, its capabilities do not match those of human beings when it comes to speech or instruments music recognition from the sound, to name a few. In this thesis, two problems are investigated: the source separation and the musical processing. The first part investigates the source separation using only one Microphone. The problem of sources separation arises when several audio sources are present at the same moment, mixed together and acquired by some sensors (one in our case). In this kind of situation it is natural for a human to separate and to recognize ...

Schutz, Antony — Eurecome/Mobile

MIMO instantaneous blind idenfitication and separation based on arbitrary order

This thesis is concerned with three closely related problems. The first one is called Multiple-Input Multiple-Output (MIMO) Instantaneous Blind Identification, which we denote by MIBI. In this problem a number of mutually statistically independent source signals are mixed by a MIMO instantaneous mixing system and only the mixed signals are observed, i.e. both the mixing system and the original sources are unknown or ¡blind¢. The goal of MIBI is to identify the MIMO system from the observed mixtures of the source signals only. The second problem is called Instantaneous Blind Signal Separation (IBSS) and deals with recovering mutually statistically independent source signals from their observed instantaneous mixtures only. The observation model and assumptions on the signals and mixing system are the same as those of MIBI. However, the main purpose of IBSS is the estimation of the source signals, whereas ...

van de Laar, Jakob — T.U. Eindhoven

Preserving binaural cues in noise reduction algorithms for hearing aids

Hearing aid users experience great difficulty in understanding speech in noisy environments. This has led to the introduction of noise reduction algorithms in hearing aids. The development of these algorithms is typically done monaurally. However, the human auditory system is a binaural system, which compares and combines the signals received by both ears to perceive a sound source as a single entity in space. Providing two monaural, independently operating, noise reduction systems, i.e. a bilateral configuration, to the hearing aid user may disrupt binaural information, needed to localize sound sources correctly and to improve speech perception in noise. In this research project, we first examined the influence of commercially available, bilateral, noise reduction algorithms on binaural hearing. Extensive objective and perceptual evaluations showed that the bilateral adaptive directional microphone (ADM) and the bilateral fixed directional microphone, two of the most ...

Van den Bogaert, Tim — Katholieke Universiteit Leuven

Signal Separation

The problem of signal separation is a very broad and fundamental one. A powerful paradigm within which signal separation can be achieved is the assumption that the signals/sources are statistically independent of one another. This is known as Independent Component Analysis (ICA). In this thesis, the theoretical aspects and derivation of ICA are examined, from which disparate approaches to signal separation are drawn together in a unifying framework. This is followed by a review of signal separation techniques based on ICA. Second order statistics based output decorrelation methods are employed to try to solve the challenging problem of separating convolutively mixed signals, in the context of mainly audio source separation and the Cocktail Party Problem. Various optimisation techniques are devised to implement second order signal separation of both artificially mixed signals and real mixtures. A study of the advantages and ...

Ahmed, Alijah — University of Cambridge

Integrating monaural and binaural cues for sound localization and segregation in reverberant environments

The problem of segregating a sound source of interest from an acoustic background has been extensively studied due to applications in hearing prostheses, robust speech/speaker recognition and audio information retrieval. Computational auditory scene analysis (CASA) approaches the segregation problem by utilizing grouping cues involved in the perceptual organization of sound by human listeners. Binaural processing, where input signals resemble those that enter the two ears, is of particular interest in the CASA field. The dominant approach to binaural segregation has been to derive spatially selective filters in order to enhance the signal in a direction of interest. As such, the problems of sound localization and sound segregation are closely tied. While spatial filtering has been widely utilized, substantial performance degradation is incurred in reverberant environments and more fundamentally, segregation cannot be performed without sufficient spatial separation between sources. This dissertation ...

Woodruff, John — The Ohio State University

Integrated active noise control and noise reduction in hearing aids

In every day life conversations and listening scenarios the desired speech signal is rarely delivered alone. The listener most commonly faces a scenario where he has to understand speech in a noisy environment. Hearing impairments, and more particularly sensorineural losses, can cause a reduction of speech understanding in noise. Therefore, in a hearing aid compensating for such kind of losses it is not sufficient to just amplify the incoming sound. Hearing aids also need to integrate algorithms that allow to discriminate between speech and noise in order to extract a desired speech from a noisy environment. A standard noise reduction scheme in general aims at maximising the signal-to-noise ratio of the signal to be fed in the hearing aid loudspeaker. This signal, however, does not reach the eardrum directly. It first has to propagate through an acoustic path and encounter ...

Serizel, Romain — KU Leuven

Sparse Multi-Channel Linear Prediction for Blind Speech Dereverberation

In many speech communication applications, such as hands-free telephony and hearing aids, the microphones are located at a distance from the speaker. Therefore, in addition to the desired speech signal, the microphone signals typically contain undesired reverberation and noise, caused by acoustic reflections and undesired sound sources. Since these disturbances tend to degrade the quality of speech communication, decrease speech intelligibility and negatively affect speech recognition, efficient dereverberation and denoising methods are required. This thesis deals with blind dereverberation methods, not requiring any knowledge about the room impulse responses between the speaker and the microphones. More specifically, we propose a general framework for blind speech dereverberation based on multi-channel linear prediction (MCLP) and exploiting sparsity of the speech signal in the time-frequency domain.

Jukić, Ante — University of Oldenburg

Informed spatial filters for speech enhancement

In modern devices which provide hands-free speech capturing functionality, such as hands-free communication kits and voice-controlled devices, the received speech signal at the microphones is corrupted by background noise, interfering speech signals, and room reverberation. In many practical situations, the microphones are not necessarily located near the desired source, and hence, the ratio of the desired speech power to the power of the background noise, the interfering speech, and the reverberation at the microphones can be very low, often around or even below 0 dB. In such situations, the comfort of human-to-human communication, as well as the accuracy of automatic speech recognisers for voice-controlled applications can be signi cantly degraded. Therefore, e ffective speech enhancement algorithms are required to process the microphone signals before transmitting them to the far-end side for communication, or before feeding them into a speech recognition ...

Taseska, Maja — Friedrich-Alexander Universität Erlangen-Nürnberg

Signal processing algorithms for wireless acoustic sensor networks

Recent academic developments have initiated a paradigm shift in the way spatial sensor data can be acquired. Traditional localized and regularly arranged sensor arrays are replaced by sensor nodes that are randomly distributed over the entire spatial field, and which communicate with each other or with a master node through wireless communication links. Together, these nodes form a so-called ‘wireless sensor network’ (WSN). Each node of a WSN has a local sensor array and a signal processing unit to perform computations on the acquired data. The advantage of WSNs compared to traditional (wired) sensor arrays, is that many more sensors can be used that physically cover the full spatial field, which typically yields more variety (and thus more information) in the signals. It is likely that future data acquisition, control and physical monitoring, will heavily rely on this type of ...

Bertrand, Alexander — Katholieke Universiteit Leuven

Synthetic reproduction of head-related transfer functions by using microphone arrays

Spatial hearing for human listeners is based on the interaural as well as on the monaural analysis of the signals arriving at both ears, enabling the listeners to assign certain spatial components to these signals. This spatial aspect gets lost when the signals are reproduced via headphones without considering the acoustical influence of the head and torso, i.e. head-related transfer function (HRTFs). A common procedure to take into account spatial aspects in a binaural reproduction is to use so-called artificial heads. Artificial heads are replicas of a human head and torso with average anthropometric geometries and built-in microphones in the ears. Although, the signals recorded with artificial heads contain relevant spatial aspects, binaural recordings using artificial heads often suffer from front-back confusions and the perception of the sound source being inside the head (internalization). These shortcomings can be attributed to ...

Rasumow, Eugen — University of Oldenburg

Source-Filter Model Based Single Channel Speech Separation

In a natural acoustic environment, multiple sources are usually active at the same time. The task of source separation is the estimation of individual source signals from this complex mixture. The challenge of single channel source separation (SCSS) is to recover more than one source from a single observation. Basically, SCSS can be divided in methods that try to mimic the human auditory system and model-based methods, which find a probabilistic representation of the individual sources and employ this prior knowledge for inference. This thesis presents several strategies for the separation of two speech utterances mixed into a single channel and is structured in four parts: The first part reviews factorial models in model-based SCSS and introduces the soft-binary mask for signal reconstruction. This mask shows improved performance compared to the soft and the binary masks in automatic speech recognition ...

Stark, Michael — Graz University of Technology

New strategies for single-channel speech separation

We present new results on single-channel speech separation and suggest a new separation approach to improve the speech quality of separated signals from an observed mix- ture. The key idea is to derive a mixture estimator based on sinusoidal parameters. The proposed estimator is aimed at finding sinusoidal parameters in the form of codevectors from vector quantization (VQ) codebooks pre-trained for speakers that, when combined, best fit the observed mixed signal. The selected codevectors are then used to reconstruct the recovered signals for the speakers in the mixture. Compared to the log- max mixture estimator used in binary masks and the Wiener filtering approach, it is observed that the proposed method achieves an acceptable perceptual speech quality with less cross- talk at different signal-to-signal ratios. Moreover, the method is independent of pitch estimates and reduces the computational complexity of the ...

Pejman Mowlaee — Department of Electronic Systems, Aalborg University

Design and evaluation of noise reduction techniques for binaural hearing aids

One of the main complaints of hearing aid users is their degraded speech understanding in noisy environments. Modern hearing aids therefore include noise reduction techniques. These techniques are typically designed for a monaural application, i.e. in a single device. However, the majority of hearing aid users currently have hearing aids at both ears in a so-called bilateral fitting, as it is widely accepted that this leads to a better speech understanding and user satisfaction. Unfortunately, the independent signal processing (in particular the noise reduction) in a bilateral fitting can destroy the so-called binaural cues, namely the interaural time and level differences (ITDs and ILDs) which are used to localize sound sources in the horizontal plane. A recent technological advance are so-called binaural hearing aids, where a wireless link allows for the exchange of data (or even microphone signals) between the ...

Cornelis, Bram — KU Leuven

Probabilistic Model-Based Multiple Pitch Tracking of Speech

Multiple pitch tracking of speech is an important task for the segregation of multiple speakers in a single-channel recording. In this thesis, a probabilistic model-based approach for estimation and tracking of multiple pitch trajectories is proposed. A probabilistic model that captures pitch-dependent characteristics of the single-speaker short-time spectrum is obtained a priori from clean speech data. The resulting speaker model, which is based on Gaussian mixture models, can be trained either in a speaker independent (SI) or a speaker dependent (SD) fashion. Speaker models are then combined using an interaction model to obtain a probabilistic description of the observed speech mixture. A factorial hidden Markov model is applied for tracking the pitch trajectories of multiple speakers over time. The probabilistic model-based approach is capable to explicitly incorporate timbral information and all associated uncertainties of spectral structure into the model. While ...

Wohlmayr, Michael — Graz University of Technology

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.