The Bionic Electro-Larynx Speech System - Challenges, Investigations, and Solutions (2015)
Identifying the target speaker in hearing aid applications is an essential ingredient to improve speech intelligibility. Although several speech enhancement algorithms are available to reduce background noise or to perform source separation in multi-speaker scenarios, their performance depends on correctly identifying the target speaker to be enhanced. Recent advances in electroencephalography (EEG) have shown that it is possible to identify the target speaker which the listener is attending to using single-trial EEG-based auditory attention decoding (AAD) methods. However, in realistic acoustic environments the AAD performance is influenced by undesired disturbances such as interfering speakers, noise and reverberation. In addition, it is important for real-world hearing aid applications to close the AAD loop by presenting on-line auditory feedback. This thesis deals with the problem of identifying and enhancing the target speaker in realistic acoustic environments based on decoding the auditory attention ...
Aroudi, Ali — University of Oldenburg, Germany
Speech Enhancement for Disordered and Substitution Voices
This thesis presents methods to enhance the speech of patients with voice disorders or with substitution voices. The first method enhances speech of patients with laryngeal neoplasm. The enhancement enables a reduction of pitch and a strengthening of the harmonics of voiced segments as well as decreasing the perceived speaking effort. The need for reliable pitch mark determination on disordered and substitution voices led to the implementation of a state-space based algorithm. Its performance is comparable to a state-of-the art pitch detection algorithm but does not require post processing. A subsequent part of the thesis deals with alaryngeal speech, with a focus on Electro-Larynx (EL) speech. After investigating an EL speech production model, which takes into account the common source of the speech signal and the directly radiated EL (DREL) sound, a solution to suppress the direct sound is based ...
Hagmuller, Martin — Graz University of Technology
Sparsity in Linear Predictive Coding of Speech
This thesis deals with developing improved modeling methods for speech and audio processing based on the recent developments in sparse signal representation. In particular, this work is motivated by the need to address some of the limitations of the well-known linear prediction (LP) based all-pole models currently applied in many modern speech and audio processing systems. In the first part of this thesis, we introduce \emph{Sparse Linear Prediction}, a set of speech processing tools created by introducing sparsity constraints into the LP framework. This approach defines predictors that look for a sparse residual rather than a minimum variance one, with direct applications to coding but also consistent with the speech production model of voiced speech, where the excitation of the all-pole filter is model as an impulse train. Introducing sparsity in the LP framework, will also bring to develop the ...
Giacobello, Daniele — Aalborg University
Advances in Glottal Analysis and its Applications
From artificial voices in GPS to automatic systems of dictation, from voice-based identity verification to voice pathology detection, speech processing applications are nowadays omnipresent in our daily life. By offering solutions to companies seeking for efficiency enhancement with simultaneous cost saving, the market of speech technology is forecast to be especially promising in the next years. The present thesis deals with advances in glottal analysis in order to incorporate new techniques within speech processing applications. While current systems are usually based on information related to the vocal tract configuration, the airflow passing through the vocal folds, and called glottal flow, is expected to exhibit a relevant complementarity. Unfortunately, glottal analysis from speech recordings requires specific complex processing operations, which explains why it has been generally avoided. The main goal of this thesis is to provide new advances in glottal analysis ...
Drugman, Thomas — Universite de Mons
Oscillator-plus-Noise Modeling of Speech Signals
In this thesis we examine the autonomous oscillator model for synthesis of speech signals. The contributions comprise an analysis of realizations and training methods for the nonlinear function used in the oscillator model, the combination of the oscillator model with inverse filtering, both significantly increasing the number of `successfully' re-synthesized speech signals, and the introduction of a new technique suitable for the re-generation of the noise-like signal component in speech signals. Nonlinear function models are compared in a one-dimensional modeling task regarding their presupposition for adequate re-synthesis of speech signals, in particular considering stability. The considerations also comprise the structure of the nonlinear functions, with the aspect of the possible interpolation between models for different speech sounds. Both regarding stability of the oscillator and the premiss of a nonlinear function structure that may be pre-defined, RBF networks are found a ...
Rank, Erhard — Vienna University of Technology
Statistical Parametric Speech Synthesis Based on the Degree of Articulation
Nowadays, speech synthesis is part of various daily life applications. The ultimate goal of such technologies consists in extending the possibilities of interaction with the machine, in order to get closer to human-like communications. However, current state-of-the-art systems often lack of realism: although high-quality speech synthesis can be produced by many researchers and companies around the world, synthetic voices are generally perceived as hyperarticulated. In any case, their degree of articulation is fixed once and for all. The present thesis falls within the more general quest for enriching expressivity in speech synthesis. The main idea consists in improving statistical parametric speech synthesis, whose most famous example is Hidden Markov Model (HMM) based speech synthesis, by introducing a control of the articulation degree, so as to enable synthesizers to automatically adapt their way of speaking to the contextual situation, like humans ...
Picart, Benjamin — Université de Mons (UMONS)
Realtime and Accurate Musical Control of Expression in Voice Synthesis
In the early days of speech synthesis research, understanding voice production has attracted the attention of scientists with the goal of producing intelligible speech. Later, the need to produce more natural voices led researchers to use prerecorded voice databases, containing speech units, reassembled by a concatenation algorithm. With the outgrowth of computer capacities, the length of units increased, going from diphones to non-uniform units, in the so-called unit selection framework, using a strategy referred to as 'take the best, modify the least'. Today the new challenge in voice synthesis is the production of expressive speech or singing. The mainstream solution to this problem is based on the “there is no data like more data” paradigm: emotionspecific databases are recorded and emotion-specific units are segmented. In this thesis, we propose to restart the expressive speech synthesis problem, from its original voice ...
D' Alessandro, N. — Universite de Mons
This thesis presents a system for the interpretation of natural speech which serves as input module for a spoken dialog system. It carries out the task of extracting application-specific pieces of information from the user utterance in order to pass them to the control module of the dialog system. By following the approach of integrating speech recognition and speech interpretation, the system is able to determine the spoken word sequence together with the hierarchical utterance structure that is necessary for the extraction of information directly from the recorded speech signal. The efficient implementation of the underlying decoder is based on the powerful tool of weighted finite state transducers (WFSTs). This tool allows to compile all involved knowledge sources into an optimized network representation of the search space which is constructed dynamically during the ongoing decoding process. In addition to the ...
Lieb, Robert — Technische Universität München
Non-intrusive Quality Evaluation of Speech Processed in Noisy and Reverberant Environments
In many speech applications such as hands-free telephony or voice-controlled home assistants, the distance between the user and the recording microphones can be relatively large. In such a far-field scenario, the recorded microphone signals are typically corrupted by noise and reverberation, which may severely degrade the performance of speech recognition systems and reduce intelligibility and quality of speech in communication applications. In order to limit these effects, speech enhancement algorithms are typically applied. The main objective of this thesis is to develop novel speech enhancement algorithms for noisy and reverberant environments and signal-based measures to evaluate these algorithms, focusing on solutions that are applicable in realistic scenarios. First, we propose a single-channel speech enhancement algorithm for joint noise and reverberation reduction. The proposed algorithm uses a spectral gain to enhance the input signal, where the gain is computed using a ...
Cauchi, Benjamin — University of Oldenburg
Performative Statistical Parametric Speech Synthesis Applied To Interactive Designs
This dissertation introduces interactive designs in the context of statistical parametric synthesis. The objective is to develop methods and designs that enrich the Human-Computer Interaction by enabling computers (or other devices) to have more expressive and adjustable voices. First, we tackle the problem of interactive controls and present a novel method for performative HMM-based synthesis (pHTS). Second, we apply interpolation methods, initially developed for the traditional HMM-based speech synthesis system, in the interactive framework of pHTS. Third, we integrate articulatory control in our interactive approach. Fourth, we present a collection of interactive applications based on our work. Finally, we unify our research into an open source library, Mage. To our current knowledge Mage is the first system for interactive programming of HMM-based synthesis that allows realtime manipulation of all speech production levels. It has been used also in cases that ...
Astrinaki, Maria — University of Mons
Speech Modeling and Robust Estimation for Diagnosis of Parkinson's Disease
According to the Parkinson’s Foundation, more than 10 million people world- wide suffer from Parkinson’s disease (PD). The common symptoms are tremor, muscle rigidity and slowness of movement. There is no cure available cur- rently, but clinical intervention can help alleviate the symptoms significantly. Recently, it has been found that PD can be detected and telemonitored by voice signals, such as sustained phonation /a/. However, the voiced-based PD detector suffers from severe performance degradation in adverse envi- ronments, such as noise, reverberation and nonlinear distortion, which are common in uncontrolled settings. In this thesis, we focus on deriving speech modeling and robust estima- tion algorithms capable of improving the PD detection accuracy in adverse environments. Robust estimation algorithms using parametric modeling of voice signals are proposed. We present both segment-wise and sample-wise robust pitch tracking algorithms using the harmonic model. ...
Shi, Liming — Aalborg University
This thesis deals with several open problems in acoustic echo cancellation and acoustic feedback control. Our main goal has been to develop solutions that provide a high performance and sound quality, and behave in a robust way in realistic conditions. This can be achieved by departing from the traditional ad-hoc methods, and instead deriving theoretically well-founded solutions, based on results from parameter estimation and system identification. In the development of these solutions, the computational efficiency has permanently been taken into account as a design constraint, in that the complexity increase compared to the state-of-the-art solutions should not exceed 50 % of the original complexity. In the context of acoustic echo cancellation, we have investigated the problems of double-talk robustness, acoustic echo path undermodeling, and poor excitation. The two former problems have been tackled by including adaptive decorrelation filters in the ...
van Waterschoot, Toon — Katholieke Universiteit Leuven
An ever-increasing demand for higher mobility, capacity and reliability, together with a definitive compromise with sustainability, are the hallmarks of mobile and wireless communications systems nowadays. Under these premises, smart antenna devices -capable of sensing the electromagnetic environment and suitably adapting its radiation features- are correspondingly called to play a crucial role. In this sense, today's wireless standards consider multiple-antenna techniques in order to exploit space diversity, spatial multiplexing and beamforming to achieve better levels of reliability and capacity. Such advantages, however, are obtained at the expense of increased system complexity which may be unaffordable in terms of size and energy efficiency. Consequently, some technical challenges remain to develop the adequate antenna technologies capable of supporting the aforementioned features in a limited physical space that the mobility demand dictates. The concept of time-modulated array (TMA) is a feasible multi-antenna technique ...
Maneiro-Catoria, Roberto — University of A Coruña
Wavelet Analysis For Robust Speech Processing and Applications
In this work, we study the application of wavelet analysis for robust speech processing. Reliable time-scale features (TS) which characterize the relevant phonetic classes such as voiced (V), unvoiced (UV), silence (S), mixed-excitation, and stop sounds are extracted. By training neural and Bayesian networks, the classification rates provided by only 7 TS features are mostly similar to the ones obtained by 13 MFCC features. The TS features are further enhanced to design a reliable and low-complexity V/UV/S classifier. Quantile filtering and slope tracking are used for deriving adaptive thresholds. A robust voice activity detector is then built and used as a pre-processing stage to improve the performance of a speaker verification system. Based on wavelet shrinkage, a statistical wavelet filtering (SWF) method is designed for speech enhancement. Non-stationary and colored noise is handled by employing quantile filtering and time-frequency adaptive ...
Pham, Van Tuan — Graz University of Technology
Speech Watermarking and Air Traffic Control
Air traffic control (ATC) voice radio communication between aircraft pilots and controllers is subject to technical and functional constraints owing to the legacy radio system currently in use worldwide. This thesis investigates the embedding of digital side information, so called watermarks, into speech signals. Applied to the ATC voice radio, a watermarking system could overcome existing limitations, and ultimately increase safety, security and efficiency in ATC. In contrast to conventional watermarking methods, this field of application allows embedding of the data in perceptually irrelevant signal components. We show that the resulting theoretical watermark capacity far exceeds the capacity of conventional watermarking channels. Based on this finding, we present a general purpose blind speech watermarking algorithm that embeds watermark data in the phase of non-voiced speech segments by replacing the excitation signal of an autoregressive signal representation. Our implementation embeds the ...
Hofbauer, Konrad — Graz University
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.