Glottal Source Estimation and Automatic Detection of Dysphonic Speakers

Among all the biomedical signals, speech is among the most complex ones since it is produced and received by humans. The extraction and the analysis of the information conveyed by this signal are the basis of many applications, including the topics discussed in this thesis: the estimation of the glottal source and the automatic detection of voice pathologies. In the first part of the thesis, after a presentation of existing methods for the estimation of the glottal source, a focus is made on the occurence of irregular glottal source estimations when the representation based on the Zeros of the Z-Transform (ZZT) is concerned. As this method is sensitive to the location of the analysis window, it is proposed to regularize the estimation by shifting the analysis window around its initial location. The best shift is found by using a dynamic ...

Dubuisson, Thomas — University of Mons


Oscillator-plus-Noise Modeling of Speech Signals

In this thesis we examine the autonomous oscillator model for synthesis of speech signals. The contributions comprise an analysis of realizations and training methods for the nonlinear function used in the oscillator model, the combination of the oscillator model with inverse filtering, both significantly increasing the number of `successfully' re-synthesized speech signals, and the introduction of a new technique suitable for the re-generation of the noise-like signal component in speech signals. Nonlinear function models are compared in a one-dimensional modeling task regarding their presupposition for adequate re-synthesis of speech signals, in particular considering stability. The considerations also comprise the structure of the nonlinear functions, with the aspect of the possible interpolation between models for different speech sounds. Both regarding stability of the oscillator and the premiss of a nonlinear function structure that may be pre-defined, RBF networks are found a ...

Rank, Erhard — Vienna University of Technology


Advances in Glottal Analysis and its Applications

From artificial voices in GPS to automatic systems of dictation, from voice-based identity verification to voice pathology detection, speech processing applications are nowadays omnipresent in our daily life. By offering solutions to companies seeking for efficiency enhancement with simultaneous cost saving, the market of speech technology is forecast to be especially promising in the next years. The present thesis deals with advances in glottal analysis in order to incorporate new techniques within speech processing applications. While current systems are usually based on information related to the vocal tract configuration, the airflow passing through the vocal folds, and called glottal flow, is expected to exhibit a relevant complementarity. Unfortunately, glottal analysis from speech recordings requires specific complex processing operations, which explains why it has been generally avoided. The main goal of this thesis is to provide new advances in glottal analysis ...

Drugman, Thomas — Universite de Mons


EEG-Biofeedback and Epilepsy: Concept, Methodology and Tools for (Neuro)therapy Planning and Objective Evaluation

Objective diagnosis and therapy evaluation are still challenging tasks for many neurological disorders. This is highly related to the diversity of cases and the variety of treatment modalities available. Especially in the case of epilepsy, which is a complex disorder not well-explained at the biochemical and physiological levels, there is the need for investigations for novel features, which can be extracted and quantified from electrophysiological signals in clinical practice. Neurotherapy is a complementary treatment applied in various disorders of the central nervous system, including epilepsy. The method is subsumed under behavioral medicine and is considered an operant conditioning in psychological terms. Although the application areas of this promising unconventional approach are rapidly increasing, the method is strongly debated, since the neurophysiological underpinnings of the process are not yet well understood. Therefore, verification of the efficacy of the treatment is one ...

Kirlangic, Mehmet Eylem — Technische Universitaet Ilmenau


Joint Source-Cryptographic-Channel Coding for Real-Time Secure Voice Communications on Voice Channels

The growing risk of privacy violation and espionage associated with the rapid spread of mobile communications renewed interest in the original concept of sending encrypted voice as audio signal over arbitrary voice channels. The usual methods used for encrypted data transmission over analog telephony turned out to be inadequate for modern vocal links (cellular networks, VoIP) equipped with voice compression, voice activity detection, and adaptive noise suppression algorithms. The limited available bandwidth, nonlinear channel distortion, and signal fadings motivate the investigation of a dedicated, joint approach for speech encoding and encryption adapted to modern noisy voice channels. This thesis aims to develop, analyze, and validate secure and efficient schemes for real-time speech encryption and transmission via modern voice channels. In addition to speech encryption, this study covers the security and operational aspects of the whole voice communication system, as this ...

Krasnowski, Piotr — Université Côte d'Azur


Glottal-Synchronous Speech Processing

Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up ...

Thomas, Mark — Imperial College London


Realtime and Accurate Musical Control of Expression in Voice Synthesis

In the early days of speech synthesis research, understanding voice production has attracted the attention of scientists with the goal of producing intelligible speech. Later, the need to produce more natural voices led researchers to use prerecorded voice databases, containing speech units, reassembled by a concatenation algorithm. With the outgrowth of computer capacities, the length of units increased, going from diphones to non-uniform units, in the so-called unit selection framework, using a strategy referred to as 'take the best, modify the least'. Today the new challenge in voice synthesis is the production of expressive speech or singing. The mainstream solution to this problem is based on the “there is no data like more data” paradigm: emotionspecific databases are recorded and emotion-specific units are segmented. In this thesis, we propose to restart the expressive speech synthesis problem, from its original voice ...

D' Alessandro, N. — Universite de Mons


Gait Analysis in Unconstrained Environments

Gait can be defined as the individuals’ manner of walking. Its analysis can provide significant information about their identity and health, opening a wide range of possibilities in the field of biometric recognition and medical diagnosis. In the field of biometric, the use of gait to perform recognition can provide advantages, such as acquisition from a distance and without the cooperation of the individual being observed. In the field of medicine, gait analysis can be used to detect or assess the development of different gait related pathologies. It can also be used to assess neurological or systemic disorders as their effects are reflected in the individuals’ gait. This Thesis focuses on performing gait analysis in unconstrained environments, using a single 2D camera. This can be a challenging task due to the lack of depth information and self-occlusions in a 2D ...

Tanmay Tulsidas Verlekar — UNIVERSIDADE DE LISBOA, INSTITUTO SUPERIOR TÉCNICO


Automated quantification of preterm brain maturation using electroencephalography

Around 10 percent of all human births is premature, which means that annually about 15 million babies are born before 37 completed weeks of gestation. About one third of the admissions to the Neonatal Intensive Care Unit (NICU) consists of this patient group. Due to complications, 1 million babies die from premature delivery, and it is therefore the most important cause of neonatal death. In general, premature and immature babies have a high risk for neurological abnormalities by maturation in extra-uterine life. Even though improved health care has increased the survival changes of these neonates, they are sensitive to brain damage and consequently, neurocognitive disabilities. Nowadays, critical information about the brain development can be extracted from the electroencephalography (EEG). Clinical experts visually assess evolving EEG characteristics over both short and long periods to evaluate maturation of patients at risk and, ...

Koolen, Ninah — KU Leuven


Speech Enhancement for Disordered and Substitution Voices

This thesis presents methods to enhance the speech of patients with voice disorders or with substitution voices. The first method enhances speech of patients with laryngeal neoplasm. The enhancement enables a reduction of pitch and a strengthening of the harmonics of voiced segments as well as decreasing the perceived speaking effort. The need for reliable pitch mark determination on disordered and substitution voices led to the implementation of a state-space based algorithm. Its performance is comparable to a state-of-the art pitch detection algorithm but does not require post processing. A subsequent part of the thesis deals with alaryngeal speech, with a focus on Electro-Larynx (EL) speech. After investigating an EL speech production model, which takes into account the common source of the speech signal and the directly radiated EL (DREL) sound, a solution to suppress the direct sound is based ...

Hagmuller, Martin — Graz University of Technology


Acoustic Event Detection: Feature, Evaluation and Dataset Design

It takes more time to think of a silent scene, action or event than finding one that emanates sound. Not only speaking or playing music but almost everything that happens is accompanied with or results in one or more sounds mixed together. This makes acoustic event detection (AED) one of the most researched topics in audio signal processing nowadays and it will probably not see a decline anywhere in the near future. This is due to the thirst for understanding and digitally abstracting more and more events in life via the enormous amount of recorded audio through thousands of applications in our daily routine. But it is also a result of two intrinsic properties of audio: it doesn’t need a direct sight to be perceived and is less intrusive to record when compared to image or video. Many applications such ...

Mina Mounir — KU Leuven, ESAT STADIUS


Acoustic sensor network geometry calibration and applications

In the modern world, we are increasingly surrounded by computation devices with communication links and one or more microphones. Such devices are, for example, smartphones, tablets, laptops or hearing aids. These devices can work together as nodes in an acoustic sensor network (ASN). Such networks are a growing platform that opens the possibility for many practical applications. ASN based speech enhancement, source localization, and event detection can be applied for teleconferencing, camera control, automation, or assisted living. For this kind of applications, the awareness of auditory objects and their spatial positioning are key properties. In order to provide these two kinds of information, novel methods have been developed in this thesis. Information on the type of auditory objects is provided by a novel real-time sound classification method. Information on the position of human speakers is provided by a novel localization ...

Plinge, Axel — TU Dortmund University


Advanced equalization techniques for DMT-based systems

Digital subscriber line (DSL) technology is one of the fastest growing broadband internet access media. Whereas asymmetric DSL (ADSL) already offers data rates of a few megabits per second, next-generation ADSL2+ and VDSL promise even higher bit rates to support so-called triple play (high-quality video, voice and high-speed data). The use of a large bandwidth over the phone line (up to 12 MHz for VDSL) induces impairments, such as severe channel distortion, echo, narrow-band radiofrequency interference (RFI) and crosstalk from other DSL systems. DSL communication makes use of so-called discrete multitone (DMT) modulation, supplemented with advanced digital signal processing algorithms, to tackle these impairments and serve a maximum number of customers. In this thesis, we focus on channel equalization and RFI mitigation algorithms that outperform existing algorithms in terms of bit rate. DMT equalization is typically done by means of ...

Vanbleu, Koen — Katholieke Universiteit Leuven


Integrating monaural and binaural cues for sound localization and segregation in reverberant environments

The problem of segregating a sound source of interest from an acoustic background has been extensively studied due to applications in hearing prostheses, robust speech/speaker recognition and audio information retrieval. Computational auditory scene analysis (CASA) approaches the segregation problem by utilizing grouping cues involved in the perceptual organization of sound by human listeners. Binaural processing, where input signals resemble those that enter the two ears, is of particular interest in the CASA field. The dominant approach to binaural segregation has been to derive spatially selective filters in order to enhance the signal in a direction of interest. As such, the problems of sound localization and sound segregation are closely tied. While spatial filtering has been widely utilized, substantial performance degradation is incurred in reverberant environments and more fundamentally, segregation cannot be performed without sufficient spatial separation between sources. This dissertation ...

Woodruff, John — The Ohio State University


Speech Modeling and Robust Estimation for Diagnosis of Parkinson's Disease

According to the Parkinson’s Foundation, more than 10 million people world- wide suffer from Parkinson’s disease (PD). The common symptoms are tremor, muscle rigidity and slowness of movement. There is no cure available cur- rently, but clinical intervention can help alleviate the symptoms significantly. Recently, it has been found that PD can be detected and telemonitored by voice signals, such as sustained phonation /a/. However, the voiced-based PD detector suffers from severe performance degradation in adverse envi- ronments, such as noise, reverberation and nonlinear distortion, which are common in uncontrolled settings. In this thesis, we focus on deriving speech modeling and robust estima- tion algorithms capable of improving the PD detection accuracy in adverse environments. Robust estimation algorithms using parametric modeling of voice signals are proposed. We present both segment-wise and sample-wise robust pitch tracking algorithms using the harmonic model. ...

Shi, Liming — Aalborg University

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.