Speech recognition in noisy conditions using missing feature approach

The research in this thesis addresses the problem of automatic speech recognition in noisy environments. Automatic speech recognition systems obtain acceptable performances in noise free conditions but these performances degrade dramatically in presence of additive noise. This is mainly due to the mismatch between the training and the noisy operating conditions. In the time-frequency representation of the noisy speech signal, some of the clean speech features are masked by noise. In this case the clean speech features cannot be correctly estimated from the noisy speech and therefore they are considered as missing or unreliable. In order to improve the performance of speech recognition systems in additive noise conditions, special attention should be paid to the problems of detection and compensation of these unreliable features. This thesis is concerned with the problem of missing features applied to automatic speaker-independent speech recognition. ...

Renevey, Philippe — Swiss Federal Institute of Technology


Compressive Sensing of Cyclostationary Propeller Noise

This dissertation is the combination of three manuscripts –either published in or submitted to journals– on compressive sensing of propeller noise for detection, identification and localization of water crafts. Propeller noise, as a result of rotating blades, is broadband and radiates through water dominating underwater acoustic noise spectrum especially when cavitation develops. Propeller cavitation yields cyclostationary noise which can be modeled by amplitude modulation, i.e., the envelope-carrier product. The envelope consists of the so-called propeller tonals representing propeller characteristics which is used to identify water crafts whereas the carrier is a stationary broadband process. Sampling for propeller noise processing yields large data sizes due to Nyquist rate and multiple sensor deployment. A compressive sensing scheme is proposed for efficient sampling of second-order cyclostationary propeller noise since the spectral correlation function of the amplitude modulation model is sparse as shown in ...

Fırat, Umut — Istanbul Technical University


Robust Speech Recognition on Intelligent Mobile Devices with Dual-Microphone

Despite the outstanding progress made on automatic speech recognition (ASR) throughout the last decades, noise-robust ASR still poses a challenge. Tackling with acoustic noise in ASR systems is more important than ever before for a twofold reason: 1) ASR technology has begun to be extensively integrated in intelligent mobile devices (IMDs) such as smartphones to easily accomplish different tasks (e.g. search-by-voice), and 2) IMDs can be used anywhere at any time, that is, under many different acoustic (noisy) conditions. On the other hand, with the aim of enhancing noisy speech, IMDs have begun to embed small microphone arrays, i.e. microphone arrays comprised of a few sensors close each other. These multi-sensor IMDs often embed one microphone (usually at their rear) intended to capture the acoustic environment more than the speaker’s voice. This is the so-called secondary microphone. While classical microphone ...

López-Espejo, Iván — University of Granada


Kernel PCA and Pre-Image Iterations for Speech Enhancement

In this thesis, we present novel methods to enhance speech corrupted by noise. All methods are based on the processing of complex-valued spectral data. First, kernel principal component analysis (PCA) for speech enhancement is proposed. Subsequently, a simplification of kernel PCA, called pre-image iterations (PI), is derived. This method computes enhanced feature vectors iteratively by linear combination of noisy feature vectors. The weighting for the linear combination is found by a kernel function that measures the similarity between the feature vectors. The kernel variance is a key parameter for the degree of de-noising and has to be set according to the signal-to-noise ratio (SNR). Initially, PI were proposed for speech corrupted by additive white Gaussian noise. To be independent of knowledge about the SNR and to generalize to other stationary noise types, PI are extended by automatic determination of the ...

Leitner, Christina — Graz University of Technology


Non-intrusive Quality Evaluation of Speech Processed in Noisy and Reverberant Environments

In many speech applications such as hands-free telephony or voice-controlled home assistants, the distance between the user and the recording microphones can be relatively large. In such a far-field scenario, the recorded microphone signals are typically corrupted by noise and reverberation, which may severely degrade the performance of speech recognition systems and reduce intelligibility and quality of speech in communication applications. In order to limit these effects, speech enhancement algorithms are typically applied. The main objective of this thesis is to develop novel speech enhancement algorithms for noisy and reverberant environments and signal-based measures to evaluate these algorithms, focusing on solutions that are applicable in realistic scenarios. First, we propose a single-channel speech enhancement algorithm for joint noise and reverberation reduction. The proposed algorithm uses a spectral gain to enhance the input signal, where the gain is computed using a ...

Cauchi, Benjamin — University of Oldenburg


Sparsity in Linear Predictive Coding of Speech

This thesis deals with developing improved modeling methods for speech and audio processing based on the recent developments in sparse signal representation. In particular, this work is motivated by the need to address some of the limitations of the well-known linear prediction (LP) based all-pole models currently applied in many modern speech and audio processing systems. In the first part of this thesis, we introduce \emph{Sparse Linear Prediction}, a set of speech processing tools created by introducing sparsity constraints into the LP framework. This approach defines predictors that look for a sparse residual rather than a minimum variance one, with direct applications to coding but also consistent with the speech production model of voiced speech, where the excitation of the all-pole filter is model as an impulse train. Introducing sparsity in the LP framework, will also bring to develop the ...

Giacobello, Daniele — Aalborg University


Robust Methods for Sensing and Reconstructing Sparse Signals

Compressed sensing (CS) is a recently introduced signal acquisition framework that goes against the traditional Nyquist sampling paradigm. CS demonstrates that a sparse, or compressible, signal can be acquired using a low rate acquisition process. Since noise is always present in practical data acquisition systems, sensing and reconstruction methods are developed assuming a Gaussian (light-tailed) model for the corrupting noise. However, when the underlying signal and/or the measurements are corrupted by impulsive noise, commonly employed linear sampling operators, coupled with Gaussian-derived reconstruction algorithms, fail to recover a close approximation of the signal. This dissertation develops robust sampling and reconstruction methods for sparse signals in the presence of impulsive noise. To achieve this objective, we make use of robust statistics theory to develop appropriate methods addressing the problem of impulsive noise in CS systems. We develop a generalized Cauchy distribution (GCD) ...

Carrillo, Rafael — University of Delaware


Contributions to signal analysis and processing using compressed sensing techniques

Chapter 2 contains a short introduction to the fundamentals of compressed sensing theory, which is the larger context of this thesis. We start with introducing the key concepts of sparsity and sparse representations of signals. We discuss the central problem of compressed sensing, i.e. how to adequately recover sparse signals from a small number of measurements, as well as the multiple formulations of the reconstruction problem. A large part of the chapter is devoted to some of the most important conditions necessary and/or sufficient to guarantee accurate recovery. The aim is to introduce the reader to the basic results, without the burden of detailed proofs. In addition, we also present a few of the popular reconstruction and optimization algorithms that we use throughout the thesis. Chapter 3 presents an alternative sparsity model known as analysis sparsity, that offers similar recovery ...

Cleju, Nicolae — "Gheorghe Asachi" Technical University of Iasi


The Removal of Environmental Noise in Cellular Communications by Perceptual Techniques

This thesis describes the application of a perceptually based spectral subtraction algorithm for the enhancement of non-stationary noise corrupted speech. Through examination of speech enhancement techniques, explanations are given for the choice of magnitude spectral subtraction and how the human auditory system can be modelled for frequency domain speech enhancement. It is discovered, that the cochlea provides the mechanical speech enhancement in the auditory system, through the use of masking. Frequency masking is used in spectral subtraction, to improve the algorithm execution time, and to shape the enhancement process making it sound natural to the ear. A new technique for estimation of background noise is presented, which operates during speech sections as well as pauses. This uses two microphones placed on opposite ends of the cellular handset. Using these, the algorithm determines whether the signal is speech, or noise, by ...

Tuffy, Mark — University Of Edinburgh


Exploiting Sparsity for Efficient Compression and Analysis of ECG and Fetal-ECG Signals

Over the last decade there has been an increasing interest in solutions for the continuous monitoring of health status with wireless, and in particular, wearable devices that provide remote analysis of physiological data. The use of wireless technologies have introduced new problems such as the transmission of a huge amount of data within the constraint of limited battery life devices. The design of an accurate and energy efficient telemonitoring system can be achieved by reducing the amount of data that should be transmitted, which is still a challenging task on devices with both computational and energy constraints. Furthermore, it is not sufficient merely to collect and transmit data, and algorithms that provide real-time analysis are needed. In this thesis, we address the problems of compression and analysis of physiological data using the emerging frameworks of Compressive Sensing (CS) and sparse ...

Da Poian, Giulia — University of Udine


Parameter Estimation and Filtering Using Sparse Modeling

Sparsity-based estimation techniques deal with the problem of retrieving a data vector from an undercomplete set of linear observations, when the data vector is known to have few nonzero elements with unknown positions. It is also known as the atomic decomposition problem, and has been carefully studied in the field of compressed sensing. Recent findings have led to a method called basis pursuit, also known as Least Absolute Shrinkage and Selection Operator (LASSO), as a numerically reliable sparsity-based approach. Although the atomic decomposition problem is generally NP-hard, it has been shown that basis pursuit may provide exact solutions under certain assumptions. This has led to an extensive study of signals with sparse representation in different domains, providing a new general insight into signal processing. This thesis further investigates the role of sparsity-based techniques, especially basis pursuit, for solving parameter estimation ...

Panahi, Ashkan — Chalmers University of Technology


Speech Enhancement Using Nonnegative Matrix Factorization and Hidden Markov Models

Reducing interference noise in a noisy speech recording has been a challenging task for many years yet has a variety of applications, for example, in handsfree mobile communications, in speech recognition, and in hearing aids. Traditional single-channel noise reduction schemes, such as Wiener filtering, do not work satisfactorily in the presence of non-stationary background noise. Alternatively, supervised approaches, where the noise type is known in advance, lead to higher-quality enhanced speech signals. This dissertation proposes supervised and unsupervised single-channel noise reduction algorithms. We consider two classes of methods for this purpose: approaches based on nonnegative matrix factorization (NMF) and methods based on hidden Markov models (HMM). The contributions of this dissertation can be divided into three main (overlapping) parts. First, we propose NMF-based enhancement approaches that use temporal dependencies of the speech signals. In a standard NMF, the important temporal ...

Mohammadiha, Nasser — KTH Royal Institute of Technology


Non-Intrusive Speech Intelligibility Prediction

The ability to communicate through speech is important for social interaction. We rely on the ability to communicate with each other even in noisy conditions. Ideally, the speech is easy to understand but this is not always the case, if the speech is degraded, e.g., due to background noise, distortion or hearing impairment. One of the most important factors to consider in relation to such degradations is speech intelligibility, which is a measure of how easy or difficult it is to understand the speech. In this thesis, the focus is on the topic of speech intelligibility prediction. The thesis consists of an introduction to the field of speech intelligibility prediction and a collection of scientific papers. The introduction provides a background to the challenges with speech communication in noisy conditions, followed by an introduction to how speech is produced and ...

Sørensen, Charlotte — Aalborg University


Advances in DFT-Based Single-Microphone Speech Enhancement

The interest in the field of speech enhancement emerges from the increased usage of digital speech processing applications like mobile telephony, digital hearing aids and human-machine communication systems in our daily life. The trend to make these applications mobile increases the variety of potential sources for quality degradation. Speech enhancement methods can be used to increase the quality of these speech processing devices and make them more robust under noisy conditions. The name "speech enhancement" refers to a large group of methods that are all meant to improve certain quality aspects of these devices. Examples of speech enhancement algorithms are echo control, bandwidth extension, packet loss concealment and noise reduction. In this thesis we focus on single-microphone additive noise reduction and aim at methods that work in the discrete Fourier transform (DFT) domain. The main objective of the presented research ...

Hendriks, Richard Christian — Delft University of Technology


Bayesian methods for sparse and low-rank matrix problems

Many scientific and engineering problems require us to process measurements and data in order to extract information. Since we base decisions on information, it is important to design accurate and efficient processing algorithms. This is often done by modeling the signal of interest and the noise in the problem. One type of modeling is Compressed Sensing, where the signal has a sparse or low-rank representation. In this thesis we study different approaches to designing algorithms for sparse and low-rank problems. Greedy methods are fast methods for sparse problems which iteratively detects and estimates the non-zero components. By modeling the detection problem as an array processing problem and a Bayesian filtering problem, we improve the detection accuracy. Bayesian methods approximate the sparsity by probability distributions which are iteratively modified. We show one approach to making the Bayesian method the Relevance Vector ...

Sundin, Martin — Department of Signal Processing, Royal Institute of Technology KTH

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.