An Investigation of Nonlinear Speech Synthesis and Pitch Modification Techniques (1999)
Nonlinear analysis of speech from a synthesis perspective
With the emergence of nonlinear dynamical systems analysis over recent years it has become clear that conventional time domain and frequency domain approaches to speech synthesis may be far from optimal. Using state space reconstructions of the time domain speech signal it is, at least in theory, possible to investigate a number of invariant geometrical measures for the underlying system which give a more thorough understanding of the dynamics of the system and therefore the form that any model should take. This thesis introduces a number of nonlinear dynamical analysis tools which are then applied to a database of vowels to extract the underlying invariant geometrical properties. The results of this analysis are then applied, using ideas taken from nonlinear dynamics, to the problem of speech synthesis and a novel synthesis technique is described and demonstrated. The tools used for ...
Banbrook, Mike — University Of Edinburgh
Oscillator-plus-Noise Modeling of Speech Signals
In this thesis we examine the autonomous oscillator model for synthesis of speech signals. The contributions comprise an analysis of realizations and training methods for the nonlinear function used in the oscillator model, the combination of the oscillator model with inverse filtering, both significantly increasing the number of `successfully' re-synthesized speech signals, and the introduction of a new technique suitable for the re-generation of the noise-like signal component in speech signals. Nonlinear function models are compared in a one-dimensional modeling task regarding their presupposition for adequate re-synthesis of speech signals, in particular considering stability. The considerations also comprise the structure of the nonlinear functions, with the aspect of the possible interpolation between models for different speech sounds. Both regarding stability of the oscillator and the premiss of a nonlinear function structure that may be pre-defined, RBF networks are found a ...
Rank, Erhard — Vienna University of Technology
Speech Enhancement for Disordered and Substitution Voices
This thesis presents methods to enhance the speech of patients with voice disorders or with substitution voices. The first method enhances speech of patients with laryngeal neoplasm. The enhancement enables a reduction of pitch and a strengthening of the harmonics of voiced segments as well as decreasing the perceived speaking effort. The need for reliable pitch mark determination on disordered and substitution voices led to the implementation of a state-space based algorithm. Its performance is comparable to a state-of-the art pitch detection algorithm but does not require post processing. A subsequent part of the thesis deals with alaryngeal speech, with a focus on Electro-Larynx (EL) speech. After investigating an EL speech production model, which takes into account the common source of the speech signal and the directly radiated EL (DREL) sound, a solution to suppress the direct sound is based ...
Hagmuller, Martin — Graz University of Technology
Realtime and Accurate Musical Control of Expression in Voice Synthesis
In the early days of speech synthesis research, understanding voice production has attracted the attention of scientists with the goal of producing intelligible speech. Later, the need to produce more natural voices led researchers to use prerecorded voice databases, containing speech units, reassembled by a concatenation algorithm. With the outgrowth of computer capacities, the length of units increased, going from diphones to non-uniform units, in the so-called unit selection framework, using a strategy referred to as 'take the best, modify the least'. Today the new challenge in voice synthesis is the production of expressive speech or singing. The mainstream solution to this problem is based on the “there is no data like more data” paradigm: emotionspecific databases are recorded and emotion-specific units are segmented. In this thesis, we propose to restart the expressive speech synthesis problem, from its original voice ...
D' Alessandro, N. — Universite de Mons
Some Parametric Methods of Speech Processing
Parametric modelling of speech signals finds its use in various speech processing applications. Recently, publications concerning sinusoidal speech modelling have been increasingly appeared in scientific literature. The thesis is mainly devoted to the sinusoidal model with harmonically related component sine waves, i.e. the harmonic model. The main objective is to find new approaches to synthetic speech quality improvement. A novel method for speech spectrum envelope determination is introduced. This method uses a staircase envelope considering the spectral behaviour in voiced as well as unvoiced speech frames. The staircase envelope is smoothed by weighted moving average. The determined envelope is parametrized using autoregressive (AR) model or cepstral coefficients. It has been shown that the new method is of most importance in high-pitch speakers. Besides, new methods or modifications of known methods can be found in pitch synchronization, AR model order selection ...
Pribilova, Anna — Slovak University of Technology
The growing risk of privacy violation and espionage associated with the rapid spread of mobile communications renewed interest in the original concept of sending encrypted voice as audio signal over arbitrary voice channels. The usual methods used for encrypted data transmission over analog telephony turned out to be inadequate for modern vocal links (cellular networks, VoIP) equipped with voice compression, voice activity detection, and adaptive noise suppression algorithms. The limited available bandwidth, nonlinear channel distortion, and signal fadings motivate the investigation of a dedicated, joint approach for speech encoding and encryption adapted to modern noisy voice channels. This thesis aims to develop, analyze, and validate secure and efficient schemes for real-time speech encryption and transmission via modern voice channels. In addition to speech encryption, this study covers the security and operational aspects of the whole voice communication system, as this ...
Krasnowski, Piotr — Université Côte d'Azur
Advances in Glottal Analysis and its Applications
From artificial voices in GPS to automatic systems of dictation, from voice-based identity verification to voice pathology detection, speech processing applications are nowadays omnipresent in our daily life. By offering solutions to companies seeking for efficiency enhancement with simultaneous cost saving, the market of speech technology is forecast to be especially promising in the next years. The present thesis deals with advances in glottal analysis in order to incorporate new techniques within speech processing applications. While current systems are usually based on information related to the vocal tract configuration, the airflow passing through the vocal folds, and called glottal flow, is expected to exhibit a relevant complementarity. Unfortunately, glottal analysis from speech recordings requires specific complex processing operations, which explains why it has been generally avoided. The main goal of this thesis is to provide new advances in glottal analysis ...
Drugman, Thomas — Universite de Mons
The use of High-Order Sparse Linear Prediction for the Restoration of Archived Audio
Since the invention of Gramophone by Thomas Edison in 1877, vast amounts of cultural, entertainment, educational and historical audio recordings have been recorded and stored throughout the world. Through natural aging and improper storage, the recorded signal degrades and loses its information in terms of quality and intelligibility. Degradation of audio signals is considered as any unwanted modification to the audio signal after it has been recorded. There are different degradations affecting recorded signals on analog storage media. The degradations that are often encountered are clicks, hiss and ‘Wow and Flutter’. Several researches have been conducted in restoring degraded audio recordings. Most of the methods rely on some prior information of the underlying data and the degradation process. The success of these methods heavily depends on the prior information available. When such information is not available, a model of the ...
Dufera, Bisrat Derebssa — School of Electrical and Computer Engineering, Addis Ababa Institute of Technology, Addis Ababa University
Statistical Parametric Speech Synthesis Based on the Degree of Articulation
Nowadays, speech synthesis is part of various daily life applications. The ultimate goal of such technologies consists in extending the possibilities of interaction with the machine, in order to get closer to human-like communications. However, current state-of-the-art systems often lack of realism: although high-quality speech synthesis can be produced by many researchers and companies around the world, synthetic voices are generally perceived as hyperarticulated. In any case, their degree of articulation is fixed once and for all. The present thesis falls within the more general quest for enriching expressivity in speech synthesis. The main idea consists in improving statistical parametric speech synthesis, whose most famous example is Hidden Markov Model (HMM) based speech synthesis, by introducing a control of the articulation degree, so as to enable synthesizers to automatically adapt their way of speaking to the contextual situation, like humans ...
Picart, Benjamin — Université de Mons (UMONS)
Bispectral Analysis of Speech Signals
Techniques which utilise a signal’s Higher Order Statistics (HOS) can reveal information about non-Gaussian signals and nonlinearities which cannot be obtained using conventional (second-order) techniques. This information may be useful in speech processing because it may provide clues about how to construct new models of speech production which are better than existing models. There has been a recent surge of interest in the application of HOS techniques to speech processing, but this has been handicapped by a lack of understanding of what the HOS properties of speech signals are. Without this understanding the HOS information which is in speech signals can not be efficiently utilised. This thesis describes an investigation into the use of HOS techniques, in particular the third-order frequency domain measure called the bispectrum, to speech signals Several issues relating to bispectral speech analysis are addressed; including nonlinearity ...
Fackrell, Justin W. A. — University Of Edinburgh
Glottal Source Estimation and Automatic Detection of Dysphonic Speakers
Among all the biomedical signals, speech is among the most complex ones since it is produced and received by humans. The extraction and the analysis of the information conveyed by this signal are the basis of many applications, including the topics discussed in this thesis: the estimation of the glottal source and the automatic detection of voice pathologies. In the first part of the thesis, after a presentation of existing methods for the estimation of the glottal source, a focus is made on the occurence of irregular glottal source estimations when the representation based on the Zeros of the Z-Transform (ZZT) is concerned. As this method is sensitive to the location of the analysis window, it is proposed to regularize the estimation by shifting the analysis window around its initial location. The best shift is found by using a dynamic ...
Dubuisson, Thomas — University of Mons
When the deaf listen to music. Pitch perception with cochlear implants
Cochlear implants (CI) are surgically implanted hearing aids that provide auditory sensations to deaf people through direct electrical stimulation of the auditory nerve. Although relatively good speech understanding can be achieved by implanted subjects, pitch perception by CI subjects is about 50 times worse than observed for normal-hearing (NH) persons. Pitch is, however, important for intonation, music, speech understanding in tonal languages, and for separating multiple simultaneous sound sources. The major goal of this work is to improve pitch perception by CI subjects. In CI subjects two fundamental mechanisms are used for pitch perception: place pitch and temporal pitch. Our results show that place pitch is correlated to the sound¢s brightness because place pitch sensation is related to the centroid of the excitation pattern along the cochlea. The slopes of the excitation pattern determine place pitch sensitivity. Our results also ...
Laneau, Johan — Katholieke Universiteit Leuven
Speech Modeling and Robust Estimation for Diagnosis of Parkinson's Disease
According to the Parkinson’s Foundation, more than 10 million people world- wide suffer from Parkinson’s disease (PD). The common symptoms are tremor, muscle rigidity and slowness of movement. There is no cure available cur- rently, but clinical intervention can help alleviate the symptoms significantly. Recently, it has been found that PD can be detected and telemonitored by voice signals, such as sustained phonation /a/. However, the voiced-based PD detector suffers from severe performance degradation in adverse envi- ronments, such as noise, reverberation and nonlinear distortion, which are common in uncontrolled settings. In this thesis, we focus on deriving speech modeling and robust estima- tion algorithms capable of improving the PD detection accuracy in adverse environments. Robust estimation algorithms using parametric modeling of voice signals are proposed. We present both segment-wise and sample-wise robust pitch tracking algorithms using the harmonic model. ...
Shi, Liming — Aalborg University
Performative Statistical Parametric Speech Synthesis Applied To Interactive Designs
This dissertation introduces interactive designs in the context of statistical parametric synthesis. The objective is to develop methods and designs that enrich the Human-Computer Interaction by enabling computers (or other devices) to have more expressive and adjustable voices. First, we tackle the problem of interactive controls and present a novel method for performative HMM-based synthesis (pHTS). Second, we apply interpolation methods, initially developed for the traditional HMM-based speech synthesis system, in the interactive framework of pHTS. Third, we integrate articulatory control in our interactive approach. Fourth, we present a collection of interactive applications based on our work. Finally, we unify our research into an open source library, Mage. To our current knowledge Mage is the first system for interactive programming of HMM-based synthesis that allows realtime manipulation of all speech production levels. It has been used also in cases that ...
Astrinaki, Maria — University of Mons
Cross-Lingual Voice Conversion
Cross-lingual voice conversion refers to the automatic transformation of a source speaker’s voice to a target speaker’s voice in a language that the target speaker can not speak. It involves a set of statistical analysis, pattern recognition, machine learning, and signal processing techniques. This study focuses on the problems related to cross-lingual voice conversion by discussing open research questions, presenting new methods, and performing comparisons with the state-of-the-art techniques. In the training stage, a Phonetic Hidden Markov Model based automatic segmentation and alignment method is developed for cross-lingual applications which support textindependent and text-dependent modes. Vocal tract transformation function is estimated using weighted speech frame mapping in more detail. Adjusting the weights, similarity to target voice and output quality can be balanced depending on the requirements of the cross- lingual voice conversion application. A context-matching algorithm is developed to reduce ...
Turk, Oytun — Bogazici University
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.