Bispectral Analysis of Speech Signals (1996)
Noise or interference is often assumed to be a random process. Conventional linear filtering, control or prediction techniques are used to cancel or reduce the noise. However, some noise processes have been shown to be nonlinear and deterministic. These nonlinear deterministic noise processes appear to be random when analysed with second order statistics. As nonlinear processes are widespread in nature it may be beneficial to exploit the coherence of the nonlinear deterministic noise with nonlinear filtering techniques. The nonlinear deterministic noise processes used in this thesis are generated from nonlinear difference or differential equations which are derived from real world scenarios. Analysis tools from the theory of nonlinear dynamics are used to determine an appropriate sampling rate of the nonlinear deterministic noise processes and their embedding dimensions. Nonlinear models, such as the Volterra series filter and the radial basis function ...
Strauch, Paul E. — University Of Edinburgh
Nonlinear analysis of speech from a synthesis perspective
With the emergence of nonlinear dynamical systems analysis over recent years it has become clear that conventional time domain and frequency domain approaches to speech synthesis may be far from optimal. Using state space reconstructions of the time domain speech signal it is, at least in theory, possible to investigate a number of invariant geometrical measures for the underlying system which give a more thorough understanding of the dynamics of the system and therefore the form that any model should take. This thesis introduces a number of nonlinear dynamical analysis tools which are then applied to a database of vowels to extract the underlying invariant geometrical properties. The results of this analysis are then applied, using ideas taken from nonlinear dynamics, to the problem of speech synthesis and a novel synthesis technique is described and demonstrated. The tools used for ...
Banbrook, Mike — University Of Edinburgh
Advances in Glottal Analysis and its Applications
From artificial voices in GPS to automatic systems of dictation, from voice-based identity verification to voice pathology detection, speech processing applications are nowadays omnipresent in our daily life. By offering solutions to companies seeking for efficiency enhancement with simultaneous cost saving, the market of speech technology is forecast to be especially promising in the next years. The present thesis deals with advances in glottal analysis in order to incorporate new techniques within speech processing applications. While current systems are usually based on information related to the vocal tract configuration, the airflow passing through the vocal folds, and called glottal flow, is expected to exhibit a relevant complementarity. Unfortunately, glottal analysis from speech recordings requires specific complex processing operations, which explains why it has been generally avoided. The main goal of this thesis is to provide new advances in glottal analysis ...
Drugman, Thomas — Universite de Mons
Prediction and Optimization of Speech Intelligibility in Adverse Conditions
In digital speech-communication systems like mobile phones, public address systems and hearing aids, conveying the message is one of the most important goals. This can be challenging since the intelligibility of the speech may be harmed at various stages before, during and after the transmission process from sender to receiver. Causes which create such adverse conditions include background noise, an unreliable internet connection during a Skype conversation or a hearing impairment of the receiver. To overcome this, many speech-communication systems include speech processing algorithms to compensate for these signal degradations like noise reduction. To determine the effect on speech intelligibility of these signal processing based solutions, the speech signal has to be evaluated by means of a listening test with human listeners. However, such tests are costly and time consuming. As an alternative, reliable and fast machine-driven intelligibility predictors are ...
Taal, Cees — Delft University of Technology
Speech Enhancement Using Data-Driven Concepts
Speech communication frequently suffers from transmitted background noises. Numerous speech enhancement algorithms have thus been proposed to obtain a speech signal with a reduced amount of background noise and better speech quality. In most cases they are analytically derived as spectral weighting rules for given error criteria along with statistical models of the speech and noise spectra. However, as these spectral distributions are indeed not easy to be measured and modeled, such algorithms achieve in practice only a suboptimal performance. In the development of state-of-the-art algorithms, speech and noise training data is commonly exploited for the statistical modeling of the respective spectral distributions. In this thesis, the training data is directly applied to train data-driven speech enhancement algorithms, avoiding any modeling of the spectral distributions. Two applications are proposed: (1) A set of spectral weighting rules is trained from noise ...
Suhadi — Technische Universität Braunschweig
Modelling context in automatic speech recognition
Speech is at the core of human communication. Speaking and listing comes so natural to us that we do not have to think about it at all. The underlying cognitive processes are very rapid and almost completely subconscious. It is hard, if not impossible not to understand speech. For computers on the other hand, recognising speech is a daunting task. It has to deal with a large number of different voices "influenced, among other things, by emotion, moods and fatigue" the acoustic properties of different environments, dialects, a huge vocabulary and an unlimited creativity of speakers to combine words and to break the rules of grammar. Almost all existing automatic speech recognisers use statistics over speech sounds "what is the probability that a piece of audio is an a-sound" and statistics over word combinations to deal with this complexity. The ...
Wiggers, Pascal — Delft University of Technology
Single channel source separation is a quite recent problem of constantly growing interest in the scientific world. However, this problem is still very far to be solved, and even more, it cannot be solved in all its generality. Indeed, since this problem is highly underdetermined, the main difficulty is that a very strong knowledge about the sources is required to be able to separate them. For a grand class of existing separation methods, this knowledge is expressed by statistical source models, notably Gaussian Mixture Models (GMM), which are learned from some training examples. The subject of this work is to study the separation methods based on statistical models in general, and then to apply them to the particular problem of separating singing voice from background music in mono recordings of songs. It can be very useful to propose some satisfactory ...
OZEROV, Alexey — University of Rennes 1
Non-linear Spatial Filtering for Multi-channel Speech Enhancement
A large part of human speech communication takes place in noisy environments and is supported by technical devices. For example, a hearing-impaired person might use a hearing aid to take part in a conversation in a busy restaurant. These devices, but also telecommunication in noisy environments or voiced-controlled assistants, make use of speech enhancement and separation algorithms that improve the quality and intelligibility of speech by separating speakers and suppressing background noise as well as other unwanted effects such as reverberation. If the devices are equipped with more than one microphone, which is very common nowadays, then multi-channel speech enhancement approaches can leverage spatial information in addition to single-channel tempo-spectral information to perform the task. Traditionally, linear spatial filters, so-called beamformers, have been employed to suppress the signal components from other than the target direction and thereby enhance the desired ...
Tesch, Kristina — Universität Hamburg
Most signal processing methods were developed for continuous signals. Digital devices, such as the computer, process only discrete signals. This dissertation proposes new techniques to accurately define and efficiently implement an important signal processing method---the time--frequency distribution (TFD)---using discrete signals. The TFD represents a signal in the joint time--frequency domain. Because these distributions are a function of both time and frequency they, unlike traditional signal processing methods, can display frequency content that changes over time. TFDs have been used successfully in many signal processing applications as almost all real-world signals have time-varying frequency content. Although TFDs are well defined for continuous signals, defining and computing a TFD for discrete signals is problematic. This work overcomes these problems by making contributions to the definition, computation, and application of discrete TFDs. The first contribution is a new discrete definition of TFDs. A ...
O' Toole, John M. — University of Queensland
Heart rate variability : linear and nonlinear analysis with applications in human physiology
Cardiovascular diseases are a growing problem in today’s society. The World Health Organization (WHO) reported that these diseases make up about 30% of total global deaths and that heart diseases have no geographic, gender or socioeconomic boundaries. Therefore, detecting cardiac irregularities early-stage and a correct treatment are very important. However, this requires a good physiological understanding of the cardiovascular system. The heart is stimulated electrically by the brain via the autonomic nervous system, where sympathetic and vagal pathways are always interacting and modulating heart rate. Continuous monitoring of the heart activity is obtained by means of an ElectroCardioGram (ECG). Studying the fluctuations of heart beat intervals over time reveals a lot of information and is called heart rate variability (HRV) analysis. A reduction of HRV has been reported in several cardiological and noncardiological diseases. Moreover, HRV also has a prognostic ...
Vandeput, Steven — KU Leuven
Glottal Source Estimation and Automatic Detection of Dysphonic Speakers
Among all the biomedical signals, speech is among the most complex ones since it is produced and received by humans. The extraction and the analysis of the information conveyed by this signal are the basis of many applications, including the topics discussed in this thesis: the estimation of the glottal source and the automatic detection of voice pathologies. In the first part of the thesis, after a presentation of existing methods for the estimation of the glottal source, a focus is made on the occurence of irregular glottal source estimations when the representation based on the Zeros of the Z-Transform (ZZT) is concerned. As this method is sensitive to the location of the analysis window, it is proposed to regularize the estimation by shifting the analysis window around its initial location. The best shift is found by using a dynamic ...
Dubuisson, Thomas — University of Mons
Acoustic sensor network geometry calibration and applications
In the modern world, we are increasingly surrounded by computation devices with communication links and one or more microphones. Such devices are, for example, smartphones, tablets, laptops or hearing aids. These devices can work together as nodes in an acoustic sensor network (ASN). Such networks are a growing platform that opens the possibility for many practical applications. ASN based speech enhancement, source localization, and event detection can be applied for teleconferencing, camera control, automation, or assisted living. For this kind of applications, the awareness of auditory objects and their spatial positioning are key properties. In order to provide these two kinds of information, novel methods have been developed in this thesis. Information on the type of auditory objects is provided by a novel real-time sound classification method. Information on the position of human speakers is provided by a novel localization ...
Plinge, Axel — TU Dortmund University
Nowadays, Nuclear Magnetic Resonance (NMR) is widely used in oncology as a non-invasive diagnostic tool in order to detect the presence of tumor regions in the human body. An application of NMR is Magnetic Resonance Imaging, which is applied in routine clinical practice to localize tumors and determine their size. Magnetic Resonance Imaging is able to provide an initial diagnosis, but its ability to delineate anatomical and pathological information is significantly improved by its combination with another NMR application, namely Magnetic Resonance Spectroscopy. The latter reveals information on the biochemical profile tissues, thereby allowing clinicians and radiologists to identify in a non{invasive way the different tissue types characterizing the sample under investigation, and to study the biochemical changes underlying a pathological situation. In particular, an NMR application exists which provides spatial as well as biochemical information. This application is called ...
Laudadio, Teresa — Katholieke Universiteit Leuven
Information Fusion for Improved Motion Estimation
Motion Estimation is an important research field with many commercial applications including surveillance, navigation, robotics, and image compression. As a result, the field has received a great deal of attention and there exist a wide variety of Motion Estimation techniques which are often specialised for particular problems. The relative performance of these techniques, in terms of both accuracy and of computational requirements, is often found to be data dependent, and no single technique is known to outperform all others for all applications under all conditions. Information Fusion strategies seek to combine the results of different classifiers or sensors to give results of a better quality for a given problem than can be achieved by any single technique alone. Information Fusion has been shown to be of benefit to a number of applications including remote sensing, personal identity recognition, target detection, ...
Peacock, Andrew Mark — University Of Edinburgh
Audio Visual Speech Enhancement
This thesis presents a novel approach to speech enhancement by exploiting the bimodality of speech production and the correlation that exists between audio and visual speech information. An analysis into the correlation of a range of audio and visual features reveals significant correlation to exist between visual speech features and audio filterbank features. The amount of correlation was also found to be greater when the correlation is analysed with individual phonemes rather than across all phonemes. This led to building a Gaussian Mixture Model (GMM) that is capable of estimating filterbank features from visual features. Phoneme-specific GMMs gave lower filterbank estimation errors and a phoneme transcription is decoded using audio-visual Hidden Markov Model (HMM). Clean filterbank estimates along with mean noise estimates were then utilised to construct visually-derived Wiener filters that are able to enhance noisy speech. The mean noise ...
Almajai, Ibrahim — University of East Anglia
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.