Pre-processing of Speech Signals for Robust Parameter Estimation (2023)
Fundamental Frequency and Direction-of-Arrival Estimation for Multichannel Speech Enhancement
Audio systems receive the speech signals of interest usually in the presence of noise. The noise has profound impacts on the quality and intelligibility of the speech signals, and it is therefore clear that the noisy signals must be cleaned up before being played back, stored, or analyzed. We can estimate the speech signal of interest from the noisy signals using a priori knowledge about it. A human speech signal is broadband and consists of both voiced and unvoiced parts. The voiced part is quasi-periodic with a time-varying fundamental frequency (or pitch as it is commonly referred to). We consider the periodic signals basically as the sum of harmonics. Therefore, we can pass the noisy signals through bandpass filters centered at the frequencies of the harmonics to enhance the signal. In addition, although the frequencies of the harmonics are the ...
Karimian-Azari, Sam — Aalborg Univeristy
Enhancement of Speech Signals - with a Focus on Voiced Speech Models
The topic of this thesis is speech enhancement with a focus on models of voiced speech. Speech is divided into two subcategories dependent on the characteristics of the signal. One part is the voiced speech, the other is the unvoiced. In this thesis, we primarily focus on the voiced speech parts and utilise the structure of the signal in relation to speech enhancement. The basis for the models is the harmonic model which is a very often used model for voiced speech because it describes periodic signals perfectly. First, we consider the problem of non-stationarity in the speech signal. The speech signal changes its characteristics continuously over time whereas most speech analysis and enhancement methods assume stationarity within 20-30 ms. We propose to change the model to allow the fundamental frequency to vary linearly over time by introducing a chirp ...
Nørholm, Sidsel Marie — Aalborg University
Wavelet Analysis For Robust Speech Processing and Applications
In this work, we study the application of wavelet analysis for robust speech processing. Reliable time-scale features (TS) which characterize the relevant phonetic classes such as voiced (V), unvoiced (UV), silence (S), mixed-excitation, and stop sounds are extracted. By training neural and Bayesian networks, the classification rates provided by only 7 TS features are mostly similar to the ones obtained by 13 MFCC features. The TS features are further enhanced to design a reliable and low-complexity V/UV/S classifier. Quantile filtering and slope tracking are used for deriving adaptive thresholds. A robust voice activity detector is then built and used as a pre-processing stage to improve the performance of a speaker verification system. Based on wavelet shrinkage, a statistical wavelet filtering (SWF) method is designed for speech enhancement. Non-stationary and colored noise is handled by employing quantile filtering and time-frequency adaptive ...
Pham, Van Tuan — Graz University of Technology
Oscillator-plus-Noise Modeling of Speech Signals
In this thesis we examine the autonomous oscillator model for synthesis of speech signals. The contributions comprise an analysis of realizations and training methods for the nonlinear function used in the oscillator model, the combination of the oscillator model with inverse filtering, both significantly increasing the number of `successfully' re-synthesized speech signals, and the introduction of a new technique suitable for the re-generation of the noise-like signal component in speech signals. Nonlinear function models are compared in a one-dimensional modeling task regarding their presupposition for adequate re-synthesis of speech signals, in particular considering stability. The considerations also comprise the structure of the nonlinear functions, with the aspect of the possible interpolation between models for different speech sounds. Both regarding stability of the oscillator and the premiss of a nonlinear function structure that may be pre-defined, RBF networks are found a ...
Rank, Erhard — Vienna University of Technology
Kernel PCA and Pre-Image Iterations for Speech Enhancement
In this thesis, we present novel methods to enhance speech corrupted by noise. All methods are based on the processing of complex-valued spectral data. First, kernel principal component analysis (PCA) for speech enhancement is proposed. Subsequently, a simplification of kernel PCA, called pre-image iterations (PI), is derived. This method computes enhanced feature vectors iteratively by linear combination of noisy feature vectors. The weighting for the linear combination is found by a kernel function that measures the similarity between the feature vectors. The kernel variance is a key parameter for the degree of de-noising and has to be set according to the signal-to-noise ratio (SNR). Initially, PI were proposed for speech corrupted by additive white Gaussian noise. To be independent of knowledge about the SNR and to generalize to other stationary noise types, PI are extended by automatic determination of the ...
Leitner, Christina — Graz University of Technology
Enhancement of Periodic Signals: with Application to Speech Signals
The topic of this thesis is the enhancement of noisy, periodic signals with application to speech signals. Generally speaking, enhancement methods can be divided into signal- and noise-driven methods. In this thesis, we focus on the signal-driven approach by employing relevant signal parameters for the enhancement of periodic signals. The enhancement problem consists of two major subproblems: the estimation of relevant parameters or statistics, and the actual noise reduction of the observed signal. We consider both of these subproblems. First, we consider the problem of estimating signal parameters relevant to the enhancement of periodic signals. The fundamental frequency is one example of such a parameter. Furthermore, in multichannel scenarios, the direction-of-arrival of the periodic sources onto an array of sensors is another parameter of relevance. We propose methods for the estimation of the fundamental frequency that have benefits compared to ...
Jensen, Jesper Rindom — Aalborg University
Speech derereverberation in noisy environments using time-frequency domain signal models
Reverberation is the sum of reflected sound waves and is present in any conventional room. Speech communication devices such as mobile phones in hands-free mode, tablets, smart TVs, teleconferencing systems, hearing aids, voice-controlled systems, etc. use one or more microphones to pick up the desired speech signals. When the microphones are not in the proximity of the desired source, strong reverberation and noise can degrade the signal quality at the microphones and can impair the intelligibility and the performance of automatic speech recognizers. Therefore, it is a highly demanded task to process the microphone signals such that reverberation and noise are reduced. The process of reducing or removing reverberation from recorded signals is called dereverberation. As dereverberation is usually a completely blind problem, where the only available information are the microphone signals, and as the acoustic scenario can be non-stationary, ...
Braun, Sebastian — Friedrich-Alexander Universität Erlangen-Nürnberg
Speech Modeling and Robust Estimation for Diagnosis of Parkinson's Disease
According to the Parkinson’s Foundation, more than 10 million people world- wide suffer from Parkinson’s disease (PD). The common symptoms are tremor, muscle rigidity and slowness of movement. There is no cure available cur- rently, but clinical intervention can help alleviate the symptoms significantly. Recently, it has been found that PD can be detected and telemonitored by voice signals, such as sustained phonation /a/. However, the voiced-based PD detector suffers from severe performance degradation in adverse envi- ronments, such as noise, reverberation and nonlinear distortion, which are common in uncontrolled settings. In this thesis, we focus on deriving speech modeling and robust estima- tion algorithms capable of improving the PD detection accuracy in adverse environments. Robust estimation algorithms using parametric modeling of voice signals are proposed. We present both segment-wise and sample-wise robust pitch tracking algorithms using the harmonic model. ...
Shi, Liming — Aalborg University
Deep Learning-based Speaker Verification In Real Conditions
Smart applications like speaker verification have become essential in verifying the user's identity for availing of personal assistants or online banking services based on the user's voice characteristics. However, far-field or distant speaker verification is constantly affected by surrounding noises which can severely distort the speech signal. Moreover, speech signals propagating in long-range get reflected by various objects in the surrounding area, which creates reverberation and further degrades the signal quality. This PhD thesis explores deep learning-based multichannel speech enhancement techniques to improve the performance of speaker verification systems in real conditions. Multichannel speech enhancement aims to enhance distorted speech using multiple microphones. It has become crucial to many smart devices, which are flexible and convenient for speech applications. Three novel approaches are proposed to improve the robustness of speaker verification systems in noisy and reverberated conditions. Firstly, we integrate ...
Dowerah Sandipana — Universite de Lorraine, CNRS, Inria, Loria
Extraction of efficient and characteristic features of multidimensional time series
In numerous signal processing applications one disposes of multiple probes, delivering simultaneously information about one or multiple observed processes. The resulting multidimensional time series are often highly redundant and may contain stochastic contributions. The perception of the useful information becomes therefore very difficult and sometimes impossible. Thus, the major issue of concern of this thesis resides in the development of novel algorithms for the extraction of the salient and characteristic features of multidimensional time series. The proposed algorithms are based on parametric signal processing, namely we assume that the features of the experimental data can be represented efficiently by a specific model. We present a global framework for the selection of a specific model out of the large span of techniques proposed in the literature. For the selection of the model classes we use, in addition to prior knowledge about ...
Vetter, Rolf — Swiss Federal Institute of Technology
The problem of segregating a sound source of interest from an acoustic background has been extensively studied due to applications in hearing prostheses, robust speech/speaker recognition and audio information retrieval. Computational auditory scene analysis (CASA) approaches the segregation problem by utilizing grouping cues involved in the perceptual organization of sound by human listeners. Binaural processing, where input signals resemble those that enter the two ears, is of particular interest in the CASA field. The dominant approach to binaural segregation has been to derive spatially selective filters in order to enhance the signal in a direction of interest. As such, the problems of sound localization and sound segregation are closely tied. While spatial filtering has been widely utilized, substantial performance degradation is incurred in reverberant environments and more fundamentally, segregation cannot be performed without sufficient spatial separation between sources. This dissertation ...
Woodruff, John — The Ohio State University
Adaptive filtering techniques for noise reduction and acoustic feedback cancellation in hearing aids
Understanding speech in noise and the occurrence of acoustic feedback belong to the major problems of current hearing aid users. Hence, an urgent demand exists for efficient and well-working digital signal processing algorithms that offer a solution to these issues. In this thesis we develop adaptive filtering techniques for noise reduction and acoustic feedback cancellation. Thanks to the availability of low power digital signal processors, these algorithms can be integrated in a hearing aid. Because of the ongoing miniaturization in the hearing aid industry and the growing tendency towards multi-microphone hearing aids, robustness against imperfections such as microphone mismatch, has become a major issue in the design of a noise reduction algorithm. In this thesis we propose multimicrophone noise reduction techniques that are based on multi-channel Wiener filtering (MWF). Theoretical and experimental analysis demonstrate that these MWF-based techniques are less ...
Spriet, Ann — Katholieke Universiteit Leuven
Sparse Modeling Heuristics for Parameter Estimation - Applications in Statistical Signal Processing
This thesis examines sparse statistical modeling on a range of applications in audio modeling, audio localizations, DNA sequencing, and spectroscopy. In the examined cases, the resulting estimation problems are computationally cumbersome, both as one often suffers from a lack of model order knowledge for this form of problems, but also due to the high dimensionality of the parameter spaces, which typically also yield optimization problems with numerous local minima. In this thesis, these problems are treated using sparse modeling heuristics, with the resulting criteria being solved using convex relaxations, inspired from disciplined convex programming ideas, to maintain tractability. The contributions to audio modeling and estimation focus on the estimation of the fundamental frequency of harmonically related sinusoidal signals, which is commonly used model for, e.g., voiced speech or tonal audio. We examine both the problems of estimating multiple audio sources ...
Adalbjörnsson, Stefan Ingi — Lund University
Some Parametric Methods of Speech Processing
Parametric modelling of speech signals finds its use in various speech processing applications. Recently, publications concerning sinusoidal speech modelling have been increasingly appeared in scientific literature. The thesis is mainly devoted to the sinusoidal model with harmonically related component sine waves, i.e. the harmonic model. The main objective is to find new approaches to synthetic speech quality improvement. A novel method for speech spectrum envelope determination is introduced. This method uses a staircase envelope considering the spectral behaviour in voiced as well as unvoiced speech frames. The staircase envelope is smoothed by weighted moving average. The determined envelope is parametrized using autoregressive (AR) model or cepstral coefficients. It has been shown that the new method is of most importance in high-pitch speakers. Besides, new methods or modifications of known methods can be found in pitch synchronization, AR model order selection ...
Pribilova, Anna — Slovak University of Technology
Speech Watermarking and Air Traffic Control
Air traffic control (ATC) voice radio communication between aircraft pilots and controllers is subject to technical and functional constraints owing to the legacy radio system currently in use worldwide. This thesis investigates the embedding of digital side information, so called watermarks, into speech signals. Applied to the ATC voice radio, a watermarking system could overcome existing limitations, and ultimately increase safety, security and efficiency in ATC. In contrast to conventional watermarking methods, this field of application allows embedding of the data in perceptually irrelevant signal components. We show that the resulting theoretical watermark capacity far exceeds the capacity of conventional watermarking channels. Based on this finding, we present a general purpose blind speech watermarking algorithm that embeds watermark data in the phase of non-voiced speech segments by replacing the excitation signal of an autoregressive signal representation. Our implementation embeds the ...
Hofbauer, Konrad — Graz University
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.