Similar: Bispectral Analysis of Speech Signals

Single-Microphone Multi-Frame Speech Enhancement Exploiting Speech Interframe Correlation

Speech communication devices such as hearing aids or mobile phones are often used in acoustically challenging situations, where the desired speech signal is affected by undesired background noise. Since in these situations speech quality and speech intelligibility may be degraded, speech enhancement algorithms are required to suppress the undesired background noise, while preserving the desired speech signal. In this thesis, we focus on single-microphone speech enhancement algorithms in the short-time Fourier transform domain, more in particular on multi-frame algorithms that aim at exploiting speech correlation across time-frames. In principle, exploiting the speech interframe correlation enables to suppress the undesired background noise, while keeping speech distortion low. Existing single-microphone multi-frame speech enhancement algorithms, such as the multi-frame minimum variance distortionless response (MFMVDR) filter and the multi-frame minimum power distortionless response (MFMPDR) filter, depend on the normalized speech correlation vector, which is ...

Dörte Fischer — University of Oldenburg, Germany

Nonlinear Noise Cancellation

Noise or interference is often assumed to be a random process. Conventional linear filtering, control or prediction techniques are used to cancel or reduce the noise. However, some noise processes have been shown to be nonlinear and deterministic. These nonlinear deterministic noise processes appear to be random when analysed with second order statistics. As nonlinear processes are widespread in nature it may be beneficial to exploit the coherence of the nonlinear deterministic noise with nonlinear filtering techniques. The nonlinear deterministic noise processes used in this thesis are generated from nonlinear difference or differential equations which are derived from real world scenarios. Analysis tools from the theory of nonlinear dynamics are used to determine an appropriate sampling rate of the nonlinear deterministic noise processes and their embedding dimensions. Nonlinear models, such as the Volterra series filter and the radial basis function ...

Strauch, Paul E. — University Of Edinburgh

Nonlinear analysis of speech from a synthesis perspective

With the emergence of nonlinear dynamical systems analysis over recent years it has become clear that conventional time domain and frequency domain approaches to speech synthesis may be far from optimal. Using state space reconstructions of the time domain speech signal it is, at least in theory, possible to investigate a number of invariant geometrical measures for the underlying system which give a more thorough understanding of the dynamics of the system and therefore the form that any model should take. This thesis introduces a number of nonlinear dynamical analysis tools which are then applied to a database of vowels to extract the underlying invariant geometrical properties. The results of this analysis are then applied, using ideas taken from nonlinear dynamics, to the problem of speech synthesis and a novel synthesis technique is described and demonstrated. The tools used for ...

Banbrook, Mike — University Of Edinburgh

Advances in Glottal Analysis and its Applications

From artificial voices in GPS to automatic systems of dictation, from voice-based identity verification to voice pathology detection, speech processing applications are nowadays omnipresent in our daily life. By offering solutions to companies seeking for efficiency enhancement with simultaneous cost saving, the market of speech technology is forecast to be especially promising in the next years. The present thesis deals with advances in glottal analysis in order to incorporate new techniques within speech processing applications. While current systems are usually based on information related to the vocal tract configuration, the airflow passing through the vocal folds, and called glottal flow, is expected to exhibit a relevant complementarity. Unfortunately, glottal analysis from speech recordings requires specific complex processing operations, which explains why it has been generally avoided. The main goal of this thesis is to provide new advances in glottal analysis ...

Drugman, Thomas — Universite de Mons

Prediction and Optimization of Speech Intelligibility in Adverse Conditions

In digital speech-communication systems like mobile phones, public address systems and hearing aids, conveying the message is one of the most important goals. This can be challenging since the intelligibility of the speech may be harmed at various stages before, during and after the transmission process from sender to receiver. Causes which create such adverse conditions include background noise, an unreliable internet connection during a Skype conversation or a hearing impairment of the receiver. To overcome this, many speech-communication systems include speech processing algorithms to compensate for these signal degradations like noise reduction. To determine the effect on speech intelligibility of these signal processing based solutions, the speech signal has to be evaluated by means of a listening test with human listeners. However, such tests are costly and time consuming. As an alternative, reliable and fast machine-driven intelligibility predictors are ...

Taal, Cees — Delft University of Technology

Speech Enhancement Using Data-Driven Concepts

Speech communication frequently suffers from transmitted background noises. Numerous speech enhancement algorithms have thus been proposed to obtain a speech signal with a reduced amount of background noise and better speech quality. In most cases they are analytically derived as spectral weighting rules for given error criteria along with statistical models of the speech and noise spectra. However, as these spectral distributions are indeed not easy to be measured and modeled, such algorithms achieve in practice only a suboptimal performance. In the development of state-of-the-art algorithms, speech and noise training data is commonly exploited for the statistical modeling of the respective spectral distributions. In this thesis, the training data is directly applied to train data-driven speech enhancement algorithms, avoiding any modeling of the spectral distributions. Two applications are proposed: (1) A set of spectral weighting rules is trained from noise ...

Suhadi — Technische Universität Braunschweig

High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion

This Ph.D. thesis focuses on developing a system for high-quality speech synthesis and voice conversion. Vocoder-based speech analysis, manipulation, and synthesis plays a crucial role in various kinds of statistical parametric speech research. Although there are vocoding methods which yield close to natural synthesized speech, they are typically computationally expensive, and are thus not suitable for real-time implementation, especially in embedded environments. Therefore, there is a need for simple and computationally feasible digital signal processing algorithms for generating high-quality and natural-sounding synthesized speech. In this dissertation, I propose a solution to extract optimal acoustic features and a new waveform generator to achieve higher sound quality and conversion accuracy by applying advances in deep learning. The approach remains computationally efficient. This challenge resulted in five thesis groups, which are briefly summarized below. I introduce firstly a new method to shape the ...

Al-Radhi Mohammed Salah — Budapest University of Technology and Economics

Modelling context in automatic speech recognition

Speech is at the core of human communication. Speaking and listing comes so natural to us that we do not have to think about it at all. The underlying cognitive processes are very rapid and almost completely subconscious. It is hard, if not impossible not to understand speech. For computers on the other hand, recognising speech is a daunting task. It has to deal with a large number of different voices "influenced, among other things, by emotion, moods and fatigue" the acoustic properties of different environments, dialects, a huge vocabulary and an unlimited creativity of speakers to combine words and to break the rules of grammar. Almost all existing automatic speech recognisers use statistics over speech sounds "what is the probability that a piece of audio is an a-sound" and statistics over word combinations to deal with this complexity. The ...

Wiggers, Pascal — Delft University of Technology

Adaptation of statistical models for single channel source separation. Application to voice / music separation in songs

Single channel source separation is a quite recent problem of constantly growing interest in the scientific world. However, this problem is still very far to be solved, and even more, it cannot be solved in all its generality. Indeed, since this problem is highly underdetermined, the main difficulty is that a very strong knowledge about the sources is required to be able to separate them. For a grand class of existing separation methods, this knowledge is expressed by statistical source models, notably Gaussian Mixture Models (GMM), which are learned from some training examples. The subject of this work is to study the separation methods based on statistical models in general, and then to apply them to the particular problem of separating singing voice from background music in mono recordings of songs. It can be very useful to propose some satisfactory ...

OZEROV, Alexey — University of Rennes 1

Non-linear Spatial Filtering for Multi-channel Speech Enhancement

A large part of human speech communication takes place in noisy environments and is supported by technical devices. For example, a hearing-impaired person might use a hearing aid to take part in a conversation in a busy restaurant. These devices, but also telecommunication in noisy environments or voiced-controlled assistants, make use of speech enhancement and separation algorithms that improve the quality and intelligibility of speech by separating speakers and suppressing background noise as well as other unwanted effects such as reverberation. If the devices are equipped with more than one microphone, which is very common nowadays, then multi-channel speech enhancement approaches can leverage spatial information in addition to single-channel tempo-spectral information to perform the task. Traditionally, linear spatial filters, so-called beamformers, have been employed to suppress the signal components from other than the target direction and thereby enhance the desired ...

Tesch, Kristina — Universität Hamburg

Discrete Quadratic Time-Frequency Distributions: Definition, Computation, and a Newborn Electroencephalogram Application

Most signal processing methods were developed for continuous signals. Digital devices, such as the computer, process only discrete signals. This dissertation proposes new techniques to accurately define and efficiently implement an important signal processing method---the time--frequency distribution (TFD)---using discrete signals. The TFD represents a signal in the joint time--frequency domain. Because these distributions are a function of both time and frequency they, unlike traditional signal processing methods, can display frequency content that changes over time. TFDs have been used successfully in many signal processing applications as almost all real-world signals have time-varying frequency content. Although TFDs are well defined for continuous signals, defining and computing a TFD for discrete signals is problematic. This work overcomes these problems by making contributions to the definition, computation, and application of discrete TFDs. The first contribution is a new discrete definition of TFDs. A ...

O' Toole, John M. — University of Queensland

Heart rate variability : linear and nonlinear analysis with applications in human physiology

Cardiovascular diseases are a growing problem in today’s society. The World Health Organization (WHO) reported that these diseases make up about 30% of total global deaths and that heart diseases have no geographic, gender or socioeconomic boundaries. Therefore, detecting cardiac irregularities early-stage and a correct treatment are very important. However, this requires a good physiological understanding of the cardiovascular system. The heart is stimulated electrically by the brain via the autonomic nervous system, where sympathetic and vagal pathways are always interacting and modulating heart rate. Continuous monitoring of the heart activity is obtained by means of an ElectroCardioGram (ECG). Studying the fluctuations of heart beat intervals over time reveals a lot of information and is called heart rate variability (HRV) analysis. A reduction of HRV has been reported in several cardiological and noncardiological diseases. Moreover, HRV also has a prognostic ...

Vandeput, Steven — KU Leuven

Glottal Source Estimation and Automatic Detection of Dysphonic Speakers

Among all the biomedical signals, speech is among the most complex ones since it is produced and received by humans. The extraction and the analysis of the information conveyed by this signal are the basis of many applications, including the topics discussed in this thesis: the estimation of the glottal source and the automatic detection of voice pathologies. In the first part of the thesis, after a presentation of existing methods for the estimation of the glottal source, a focus is made on the occurence of irregular glottal source estimations when the representation based on the Zeros of the Z-Transform (ZZT) is concerned. As this method is sensitive to the location of the analysis window, it is proposed to regularize the estimation by shifting the analysis window around its initial location. The best shift is found by using a dynamic ...

Dubuisson, Thomas — University of Mons

Audiovisual Speech Synthesis Based on Hidden Markov Models

In this dissertation, new methods for audiovisual speech synthesis using Hidden Markov Models (HMMs) are presented and their properties are investigated. The problem of audiovisual speech synthesis is to computationally generate both audible speech as well as a matching facial animation or video (a “visual speech signal”) for any given input text. This results in “talking heads” that can read any text to a user, with applications ranging from virtual agents in human-computer interaction to characters in animated films and computer games. For recording and playback of facial motion, an optical marker-based facial motion capturing hardware system and 3D animation software are employed, which represent the state of the art in the animation industry. For modeling the acoustic and motion parameters of the synchronously recorded speech data, an existing HMM-based acoustic speech synthesis framework has been extended to the visual ...

Schabus, Dietmar — Graz University of Technology, Signal Processing and Speech Communication Laboratory

Subspace-based quantification of magnetic resonance spectroscopy data using biochemical prior knowledge

Nowadays, Nuclear Magnetic Resonance (NMR) is widely used in oncology as a non-invasive diagnostic tool in order to detect the presence of tumor regions in the human body. An application of NMR is Magnetic Resonance Imaging, which is applied in routine clinical practice to localize tumors and determine their size. Magnetic Resonance Imaging is able to provide an initial diagnosis, but its ability to delineate anatomical and pathological information is significantly improved by its combination with another NMR application, namely Magnetic Resonance Spectroscopy. The latter reveals information on the biochemical profile tissues, thereby allowing clinicians and radiologists to identify in a non{invasive way the different tissue types characterizing the sample under investigation, and to study the biochemical changes underlying a pathological situation. In particular, an NMR application exists which provides spatial as well as biochemical information. This application is called ...

Laudadio, Teresa — Katholieke Universiteit Leuven

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.

Follow @eurasip

Bispectral Analysis of Speech Signals (1996)