Some Parametric Methods of Speech Processing

Parametric modelling of speech signals finds its use in various speech processing applications. Recently, publications concerning sinusoidal speech modelling have been increasingly appeared in scientific literature. The thesis is mainly devoted to the sinusoidal model with harmonically related component sine waves, i.e. the harmonic model. The main objective is to find new approaches to synthetic speech quality improvement. A novel method for speech spectrum envelope determination is introduced. This method uses a staircase envelope considering the spectral behaviour in voiced as well as unvoiced speech frames. The staircase envelope is smoothed by weighted moving average. The determined envelope is parametrized using autoregressive (AR) model or cepstral coefficients. It has been shown that the new method is of most importance in high-pitch speakers. Besides, new methods or modifications of known methods can be found in pitch synchronization, AR model order selection ...

Pribilova, Anna — Slovak University of Technology


Pre-processing of Speech Signals for Robust Parameter Estimation

The topic of this thesis is methods of pre-processing speech signals for robust estimation of model parameters in models of these signals. Here, there is a special focus on the situation where the desired signal is contaminated by colored noise. In order to estimate the speech signal, or its voiced and unvoiced components, from a noisy observation, it is important to have robust estimators that can handle colored and non-stationary noise. Two important aspects are investigated. The first one is a robust estimation of the speech signal parameters, such as the fundamental frequency, which is required in many contexts. For this purpose, fast estimation methods based on a simple white Gaussian noise (WGN) assumption are often used. To keep using those methods, the noisy signal can be pre-processed using a filter. If the colored noise is modelled as an autoregressive ...

Esquivel Jaramillo, Alfredo — Aalborg University


Predictive modelling and deep learning for quantifying human health

Machine learning and deep learning techniques have emerged as powerful tools for addressing complex challenges across diverse domains. These methodologies are powerful because they extract patterns and insights from large and complex datasets, automate decision-making processes, and continuously improve over time. They enable us to observe and quantify patterns in data that a normal human would not be able to capture, leading to deeper insights and more accurate predictions. This dissertation presents two research papers that leverage these methodologies to tackle distinct yet interconnected problems in neuroimaging and computer vision for the quantification of human health. The first investigation, "Age prediction using resting-state functional MRI," addresses the challenge of understanding brain aging. By employing the Least Absolute Shrinkage and Selection Operator (LASSO) on resting-state functional MRI (rsfMRI) data, we identify the most predictive correlations related to brain age. Our study, ...

Chang Jose — National Cheng Kung University


Enhancement of Speech Signals - with a Focus on Voiced Speech Models

The topic of this thesis is speech enhancement with a focus on models of voiced speech. Speech is divided into two subcategories dependent on the characteristics of the signal. One part is the voiced speech, the other is the unvoiced. In this thesis, we primarily focus on the voiced speech parts and utilise the structure of the signal in relation to speech enhancement. The basis for the models is the harmonic model which is a very often used model for voiced speech because it describes periodic signals perfectly. First, we consider the problem of non-stationarity in the speech signal. The speech signal changes its characteristics continuously over time whereas most speech analysis and enhancement methods assume stationarity within 20-30 ms. We propose to change the model to allow the fundamental frequency to vary linearly over time by introducing a chirp ...

Nørholm, Sidsel Marie — Aalborg University


Fundamental Frequency and Direction-of-Arrival Estimation for Multichannel Speech Enhancement

Audio systems receive the speech signals of interest usually in the presence of noise. The noise has profound impacts on the quality and intelligibility of the speech signals, and it is therefore clear that the noisy signals must be cleaned up before being played back, stored, or analyzed. We can estimate the speech signal of interest from the noisy signals using a priori knowledge about it. A human speech signal is broadband and consists of both voiced and unvoiced parts. The voiced part is quasi-periodic with a time-varying fundamental frequency (or pitch as it is commonly referred to). We consider the periodic signals basically as the sum of harmonics. Therefore, we can pass the noisy signals through bandpass filters centered at the frequencies of the harmonics to enhance the signal. In addition, although the frequencies of the harmonics are the ...

Karimian-Azari, Sam — Aalborg Univeristy


Robust Estimation and Model Order Selection for Signal Processing

In this thesis, advanced robust estimation methodologies for signal processing are developed and analyzed. The developed methodologies solve problems concerning multi-sensor data, robust model selection as well as robustness for dependent data. The work has been applied to solve practical signal processing problems in different areas of biomedical and array signal processing. In particular, for univariate independent data, a robust criterion is presented to select the model order with an application to corneal-height data modeling. The proposed criterion overcomes some limitations of existing robust criteria. For real-world data, it selects the radial model order of the Zernike polynomial of the corneal topography map in accordance with clinical expectations, even if the measurement conditions for the videokeratoscopy, which is the state-of-the-art method to collect corneal-height data, are poor. For multi-sensor data, robust model order selection selection criteria are proposed and applied ...

Muma, Michael — Technische Universität Darmstadt


Machine Learning-Aided Monitoring and Prediction of Respiratory and Neurodegenerative Diseases Using Wearables

This thesis focuses on wearables for health status monitoring, covering applications aimed at emergency solutions to the COVID-19 pandemic and aging society. The methods of ambient assisted living (AAL) are presented for the neurodegenerative disease Parkinson’s disease (PD), facilitating ’aging in place’ thanks to machine learning and around wearables - solutions of mHealth. Furthermore, the approaches using machine learning and wearables are discussed for early-stage COVID-19 detection, with encouraging accuracy. Firstly, a publicly available dataset containing COVID-19, influenza, and healthy control data was reused for research purposes. The solution presented in this thesis is considering the classification problem and outperformed the state-of-the-art methods, whereas the original paper introduced just anomaly detection and not shown the specificity of the created models. The proposed model in the thesis for early detection of COVID-19 achieved 78 % for the k-NN classifier. Moreover, a ...

Justyna Skibińska — Brno University of Technology & Tampere University


Sparse Modeling Heuristics for Parameter Estimation - Applications in Statistical Signal Processing

This thesis examines sparse statistical modeling on a range of applications in audio modeling, audio localizations, DNA sequencing, and spectroscopy. In the examined cases, the resulting estimation problems are computationally cumbersome, both as one often suffers from a lack of model order knowledge for this form of problems, but also due to the high dimensionality of the parameter spaces, which typically also yield optimization problems with numerous local minima. In this thesis, these problems are treated using sparse modeling heuristics, with the resulting criteria being solved using convex relaxations, inspired from disciplined convex programming ideas, to maintain tractability. The contributions to audio modeling and estimation focus on the estimation of the fundamental frequency of harmonically related sinusoidal signals, which is commonly used model for, e.g., voiced speech or tonal audio. We examine both the problems of estimating multiple audio sources ...

Adalbjörnsson, Stefan Ingi — Lund University


Wavelet Analysis For Robust Speech Processing and Applications

In this work, we study the application of wavelet analysis for robust speech processing. Reliable time-scale features (TS) which characterize the relevant phonetic classes such as voiced (V), unvoiced (UV), silence (S), mixed-excitation, and stop sounds are extracted. By training neural and Bayesian networks, the classification rates provided by only 7 TS features are mostly similar to the ones obtained by 13 MFCC features. The TS features are further enhanced to design a reliable and low-complexity V/UV/S classifier. Quantile filtering and slope tracking are used for deriving adaptive thresholds. A robust voice activity detector is then built and used as a pre-processing stage to improve the performance of a speaker verification system. Based on wavelet shrinkage, a statistical wavelet filtering (SWF) method is designed for speech enhancement. Non-stationary and colored noise is handled by employing quantile filtering and time-frequency adaptive ...

Pham, Van Tuan — Graz University of Technology


Joint Source-Cryptographic-Channel Coding for Real-Time Secure Voice Communications on Voice Channels

The growing risk of privacy violation and espionage associated with the rapid spread of mobile communications renewed interest in the original concept of sending encrypted voice as audio signal over arbitrary voice channels. The usual methods used for encrypted data transmission over analog telephony turned out to be inadequate for modern vocal links (cellular networks, VoIP) equipped with voice compression, voice activity detection, and adaptive noise suppression algorithms. The limited available bandwidth, nonlinear channel distortion, and signal fadings motivate the investigation of a dedicated, joint approach for speech encoding and encryption adapted to modern noisy voice channels. This thesis aims to develop, analyze, and validate secure and efficient schemes for real-time speech encryption and transmission via modern voice channels. In addition to speech encryption, this study covers the security and operational aspects of the whole voice communication system, as this ...

Krasnowski, Piotr — Université Côte d'Azur


Probabilistic Model-Based Multiple Pitch Tracking of Speech

Multiple pitch tracking of speech is an important task for the segregation of multiple speakers in a single-channel recording. In this thesis, a probabilistic model-based approach for estimation and tracking of multiple pitch trajectories is proposed. A probabilistic model that captures pitch-dependent characteristics of the single-speaker short-time spectrum is obtained a priori from clean speech data. The resulting speaker model, which is based on Gaussian mixture models, can be trained either in a speaker independent (SI) or a speaker dependent (SD) fashion. Speaker models are then combined using an interaction model to obtain a probabilistic description of the observed speech mixture. A factorial hidden Markov model is applied for tracking the pitch trajectories of multiple speakers over time. The probabilistic model-based approach is capable to explicitly incorporate timbral information and all associated uncertainties of spectral structure into the model. While ...

Wohlmayr, Michael — Graz University of Technology


Advances in Glottal Analysis and its Applications

From artificial voices in GPS to automatic systems of dictation, from voice-based identity verification to voice pathology detection, speech processing applications are nowadays omnipresent in our daily life. By offering solutions to companies seeking for efficiency enhancement with simultaneous cost saving, the market of speech technology is forecast to be especially promising in the next years. The present thesis deals with advances in glottal analysis in order to incorporate new techniques within speech processing applications. While current systems are usually based on information related to the vocal tract configuration, the airflow passing through the vocal folds, and called glottal flow, is expected to exhibit a relevant complementarity. Unfortunately, glottal analysis from speech recordings requires specific complex processing operations, which explains why it has been generally avoided. The main goal of this thesis is to provide new advances in glottal analysis ...

Drugman, Thomas — Universite de Mons


New approaches for EEG signal processing: Artifact EOG removal by ICA-RLS scheme and Tracks extraction method

Localizing the bioelectric phenomena originating from the cerebral cortex and evoked by auditory and somatosensory stimuli are clear objectives to both understand how the brain works and to recognize different pathologies. Diseases such as Parkinson's, Alzheimer's, schizophrenia and epilepsy are intensively studied to find a cure or accurate diagnosis. Epilepsy is considered the disease with major prevalence within disorders with neurological origin. The recurrent and sudden incidence of seizures can lead to dangerous and possibly life-threatening situations. Since disturbance of consciousness and sudden loss of motor control often occur without any warning, the ability to predict epileptic seizures would reduce patients' anxiety, thus considerably improving quality of life and safety. The common procedure for epilepsy seizure detection is based on brain activity monitorization via electroencephalogram (EEG) data. This process consumes a lot of time, especially in the case of long ...

Carlos Guerrero-Mosquera — University Carlos III of Madrid


Some Contributions to Music Signal Processing and to Mono-Microphone Blind Audio Source Separation

For humans, the sound is valuable mostly for its meaning. The voice is spoken language, music, artistic intent. Its physiological functioning is highly developed, as well as our understanding of the underlying process. It is a challenge to replicate this analysis using a computer: in many aspects, its capabilities do not match those of human beings when it comes to speech or instruments music recognition from the sound, to name a few. In this thesis, two problems are investigated: the source separation and the musical processing. The first part investigates the source separation using only one Microphone. The problem of sources separation arises when several audio sources are present at the same moment, mixed together and acquired by some sensors (one in our case). In this kind of situation it is natural for a human to separate and to recognize ...

Schutz, Antony — Eurecome/Mobile


Accelerating Monte Carlo methods for Bayesian inference in dynamical models

Making decisions and predictions from noisy observations are two important and challenging problems in many areas of society. Some examples of applications are recommendation systems for online shopping and streaming services, connecting genes with certain diseases and modelling climate change. In this thesis, we make use of Bayesian statistics to construct probabilistic models given prior information and historical data, which can be used for decision support and predictions. The main obstacle with this approach is that it often results in mathematical problems lacking analytical solutions. To cope with this, we make use of statistical simulation algorithms known as Monte Carlo methods to approximate the intractable solution. These methods enjoy well-understood statistical properties but are often computational prohibitive to employ. The main contribution of this thesis is the exploration of different strategies for accelerating inference methods based on sequential Monte Carlo ...

Dahlin, Johan — Linköping University

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.