Zeros of the z-transform (ZZT) representation and chirp group delay processing for the analysis of source and filter characteristics of speech signals (2005)
Glottal Source Estimation and Automatic Detection of Dysphonic Speakers
Among all the biomedical signals, speech is among the most complex ones since it is produced and received by humans. The extraction and the analysis of the information conveyed by this signal are the basis of many applications, including the topics discussed in this thesis: the estimation of the glottal source and the automatic detection of voice pathologies. In the first part of the thesis, after a presentation of existing methods for the estimation of the glottal source, a focus is made on the occurence of irregular glottal source estimations when the representation based on the Zeros of the Z-Transform (ZZT) is concerned. As this method is sensitive to the location of the analysis window, it is proposed to regularize the estimation by shifting the analysis window around its initial location. The best shift is found by using a dynamic ...
Dubuisson, Thomas — University of Mons
Realtime and Accurate Musical Control of Expression in Voice Synthesis
In the early days of speech synthesis research, understanding voice production has attracted the attention of scientists with the goal of producing intelligible speech. Later, the need to produce more natural voices led researchers to use prerecorded voice databases, containing speech units, reassembled by a concatenation algorithm. With the outgrowth of computer capacities, the length of units increased, going from diphones to non-uniform units, in the so-called unit selection framework, using a strategy referred to as 'take the best, modify the least'. Today the new challenge in voice synthesis is the production of expressive speech or singing. The mainstream solution to this problem is based on the “there is no data like more data” paradigm: emotionspecific databases are recorded and emotion-specific units are segmented. In this thesis, we propose to restart the expressive speech synthesis problem, from its original voice ...
D' Alessandro, N. — Universite de Mons
Advances in Glottal Analysis and its Applications
From artificial voices in GPS to automatic systems of dictation, from voice-based identity verification to voice pathology detection, speech processing applications are nowadays omnipresent in our daily life. By offering solutions to companies seeking for efficiency enhancement with simultaneous cost saving, the market of speech technology is forecast to be especially promising in the next years. The present thesis deals with advances in glottal analysis in order to incorporate new techniques within speech processing applications. While current systems are usually based on information related to the vocal tract configuration, the airflow passing through the vocal folds, and called glottal flow, is expected to exhibit a relevant complementarity. Unfortunately, glottal analysis from speech recordings requires specific complex processing operations, which explains why it has been generally avoided. The main goal of this thesis is to provide new advances in glottal analysis ...
Drugman, Thomas — Universite de Mons
Glottal-Synchronous Speech Processing
Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up ...
Thomas, Mark — Imperial College London
Radial Basis Function Network Robust Learning Algorithms in Computer Vision Applications
This thesis introduces new learning algorithms for Radial Basis Function (RBF) networks. RBF networks is a feed-forward two-layer neural network used for functional approximation or pattern classification applications. The proposed training algorithms are based on robust statistics. Their theoretical performance has been assessed and compared with that of classical algorithms for training RBF networks. The applications of RBF networks described in this thesis consist of simultaneously modeling moving object segmentation and optical flow estimation in image sequences and 3-D image modeling and segmentation. A Bayesian classifier model is used for the representation of the image sequence and 3-D images. This employs an energy based description of the probability functions involved. The energy functions are represented by RBF networks whose inputs are various features drawn from the images and whose outputs are objects. The hidden units embed kernel functions. Each kernel ...
Bors, Adrian G. — Aristotle University of Thessaloniki
Multimedia consumer electronics are nowadays everywhere from teleconferencing, hands-free communications, in-car communications to smart TV applications and more. We are living in a world of telecommunication where ideal scenarios for implementing these applications are hard to find. Instead, practical implementations typically bring many problems associated to each real-life scenario. This thesis mainly focuses on two of these problems, namely, acoustic echo and acoustic feedback. On the one hand, acoustic echo cancellation (AEC) is widely used in mobile and hands-free telephony where the existence of echoes degrades the intelligibility and listening comfort. On the other hand, acoustic feedback limits the maximum amplification that can be applied in, e.g., in-car communications or in conferencing systems, before howling due to instability, appears. Even though AEC and acoustic feedback cancellation (AFC) are functional in many applications, there are still open issues. This means that ...
Gil-Cacho, Jose Manuel — KU Leuven
Robust Speech Recognition: Analysis and Equalization of Lombard Effect in Czech Corpora
When exposed to noise, speakers will modify the way they speak in an effort to maintain intelligible communication. This process, which is referred to as Lombard effect (LE), involves a combination of both conscious and subconscious articulatory adjustment. Speech production variations due to LE can cause considerable degradation in automatic speech recognition (ASR) since they introduce a mismatch between parameters of the speech to be recognized and the ASR system’s acoustic models, which are usually trained on neutral speech. The main objective of this thesis is to analyze the impact of LE on speech production and to propose methods that increase ASR system performance in LE. All presented experiments were conducted on the Czech spoken language, yet, the proposed concepts are assumed applicable to other languages. The first part of the thesis focuses on the design and acquisition of a ...
Boril, Hynek — Czech Technical University in Prague
The separation of independent sources from mixed observed data is a fundamental and challenging signal processing problem. In many practical situations, one or more desired signals need to be recovered from the mixtures only. A typical example is speech recordings made in an acoustic environment in the presence of background noise and/or competing speakers. Other examples include EEG signals, passive sonar applications and cross-talk in data communications. The audio signal separation problem is sometimes referred to as The Cocktail Party Problem. When several people in the same room are conversing at the same time, it is remarkable that a person is able to choose to concentrate on one of the speakers and listen to his or her speech flow unimpeded. This ability, usually referred to as the binaural cocktail party effect, results in part from binaural (two-eared) hearing. In contrast, ...
Chan, Dominic C. B. — University of Cambridge
Advances in DFT-Based Single-Microphone Speech Enhancement
The interest in the field of speech enhancement emerges from the increased usage of digital speech processing applications like mobile telephony, digital hearing aids and human-machine communication systems in our daily life. The trend to make these applications mobile increases the variety of potential sources for quality degradation. Speech enhancement methods can be used to increase the quality of these speech processing devices and make them more robust under noisy conditions. The name "speech enhancement" refers to a large group of methods that are all meant to improve certain quality aspects of these devices. Examples of speech enhancement algorithms are echo control, bandwidth extension, packet loss concealment and noise reduction. In this thesis we focus on single-microphone additive noise reduction and aim at methods that work in the discrete Fourier transform (DFT) domain. The main objective of the presented research ...
Hendriks, Richard Christian — Delft University of Technology
Spike train discrimination and analysis in neural and surface electromyography (sEMG) applications
The term "spike" is used to describe a short-time event that is the result of the activity of its source. Spikes can be seen in different signal modalities. In these modalities, often more than one source generates spikes. Classification algorithms can be used to group similar spikes, ideally spikes from the same source. This work examines the classification of spikes generated from neurons and muscles. When each detected spike is assigned to its source, the spike trains of these sources can provide information on complex brain network functioning, muscle disorders, and other applications. During the past several decades, there were many attempts to create and improve spike classification algorithms. No matter how advanced these methods are today, errors in classification cannot be avoided. Therefore, methods that would determine and improve reliability of classification are very desirable. In this work, it ...
Gligorijevic, Ivan — KU Leuven
Source-Filter Model Based Single Channel Speech Separation
In a natural acoustic environment, multiple sources are usually active at the same time. The task of source separation is the estimation of individual source signals from this complex mixture. The challenge of single channel source separation (SCSS) is to recover more than one source from a single observation. Basically, SCSS can be divided in methods that try to mimic the human auditory system and model-based methods, which find a probabilistic representation of the individual sources and employ this prior knowledge for inference. This thesis presents several strategies for the separation of two speech utterances mixed into a single channel and is structured in four parts: The first part reviews factorial models in model-based SCSS and introduces the soft-binary mask for signal reconstruction. This mask shows improved performance compared to the soft and the binary masks in automatic speech recognition ...
Stark, Michael — Graz University of Technology
Extraction of efficient and characteristic features of multidimensional time series
In numerous signal processing applications one disposes of multiple probes, delivering simultaneously information about one or multiple observed processes. The resulting multidimensional time series are often highly redundant and may contain stochastic contributions. The perception of the useful information becomes therefore very difficult and sometimes impossible. Thus, the major issue of concern of this thesis resides in the development of novel algorithms for the extraction of the salient and characteristic features of multidimensional time series. The proposed algorithms are based on parametric signal processing, namely we assume that the features of the experimental data can be represented efficiently by a specific model. We present a global framework for the selection of a specific model out of the large span of techniques proposed in the literature. For the selection of the model classes we use, in addition to prior knowledge about ...
Vetter, Rolf — Swiss Federal Institute of Technology
Signal Processing in Phase-Domain All-Digital Phase-Locked Loops
The implementation of wireless transceivers on a single chip in a single technology requires digital realizations of traditional analog building blocks such as phase-locked loops (PLLs). All-digital PLLs (ADPLLs) utilize the zero crossings of signals instead of their amplitudes to realize the frequency synthesizer entirely in digital CMOS technology. This thesis analyzes ADPLLs and highlights the system-level signal processing aspects. A z-domain model and a mixed-signal model are used to develop signal processing algorithms, to perform high-level simulations, and to evaluate the performance of ADPLLs. The impact of imperfections on the output phase noise spectrum are analytically described and compared to event-driven simulation outcomes. Oscillator noise, frequency quantization noise with sigma-delta noise shaping, and reference clock jitter raise the output phase noise level, whereas phase quantization and injection pulling manifest themselves as spurs in the output phase noise spectrum. Furthermore, ...
Stefan Mendel — Graz University of Technology
PARTICLE METHODS FOR BAYESIAN MULTI-OBJECT TRACKING AND PARAMETER ESTIMATION
In this thesis a number of improvements have been established for specific methods which utilize sequential Monte Carlo (SMC), aka. Particle filtering (PF) techniques. The first problem is the Bayesian multi-target tracking (MTT) problem for which we propose the use of non-parametric Bayesian models that are based on time varying extension of Dirichlet process (DP) models. The second problem studied in this thesis is an important application area for the proposed DP based MTT method; the tracking of vocal tract resonance frequencies of the speech signals. Lastly, we investigate SMC based parameter estimation problem of nonlinear non-Gaussian state space models in which we provide a performance improvement for the path density based methods by utilizing regularization techniques.
Ozkan, Emre — Middle East Technical University
Cross-Lingual Voice Conversion
Cross-lingual voice conversion refers to the automatic transformation of a source speaker’s voice to a target speaker’s voice in a language that the target speaker can not speak. It involves a set of statistical analysis, pattern recognition, machine learning, and signal processing techniques. This study focuses on the problems related to cross-lingual voice conversion by discussing open research questions, presenting new methods, and performing comparisons with the state-of-the-art techniques. In the training stage, a Phonetic Hidden Markov Model based automatic segmentation and alignment method is developed for cross-lingual applications which support textindependent and text-dependent modes. Vocal tract transformation function is estimated using weighted speech frame mapping in more detail. Adjusting the weights, similarity to target voice and output quality can be balanced depending on the requirements of the cross- lingual voice conversion application. A context-matching algorithm is developed to reduce ...
Turk, Oytun — Bogazici University
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.