Nonlinear analysis of speech from a synthesis perspective (1996)
An Investigation of Nonlinear Speech Synthesis and Pitch Modification Techniques
Speech synthesis technology plays an important role in many aspects of man–machine interaction, particularly in telephony applications. In order to be widely accepted, the synthesised speech quality should be as human–like as possible. This thesis investigates novel techniques for the speech signal generation stage in a speech synthesiser, based on concepts from nonlinear dynamical theory. It focuses on natural–sounding synthesis for voiced speech, coupled with the ability to generate the sound at the required pitch. The one–dimensional voiced speech time–domain signals are embedded into an appropriate higher dimensional space, using Takens’ method of delays. These reconstructed state space representations have approximately the same dynamical properties as the original speech generating system and are thus effective models. A new technique for marking epoch points in voiced speech that operates in the state space domain is proposed. Using the fact that one ...
Mann, Iain — University Of Edinburgh
Oscillator-plus-Noise Modeling of Speech Signals
In this thesis we examine the autonomous oscillator model for synthesis of speech signals. The contributions comprise an analysis of realizations and training methods for the nonlinear function used in the oscillator model, the combination of the oscillator model with inverse filtering, both significantly increasing the number of `successfully' re-synthesized speech signals, and the introduction of a new technique suitable for the re-generation of the noise-like signal component in speech signals. Nonlinear function models are compared in a one-dimensional modeling task regarding their presupposition for adequate re-synthesis of speech signals, in particular considering stability. The considerations also comprise the structure of the nonlinear functions, with the aspect of the possible interpolation between models for different speech sounds. Both regarding stability of the oscillator and the premiss of a nonlinear function structure that may be pre-defined, RBF networks are found a ...
Rank, Erhard — Vienna University of Technology
Statistical Parametric Speech Synthesis Based on the Degree of Articulation
Nowadays, speech synthesis is part of various daily life applications. The ultimate goal of such technologies consists in extending the possibilities of interaction with the machine, in order to get closer to human-like communications. However, current state-of-the-art systems often lack of realism: although high-quality speech synthesis can be produced by many researchers and companies around the world, synthetic voices are generally perceived as hyperarticulated. In any case, their degree of articulation is fixed once and for all. The present thesis falls within the more general quest for enriching expressivity in speech synthesis. The main idea consists in improving statistical parametric speech synthesis, whose most famous example is Hidden Markov Model (HMM) based speech synthesis, by introducing a control of the articulation degree, so as to enable synthesizers to automatically adapt their way of speaking to the contextual situation, like humans ...
Picart, Benjamin — Université de Mons (UMONS)
Nonlinear processing of non-Gaussian stochastic and chaotic deterministic time series
It is often assumed that interference or noise signals are Gaussian stochastic processes. Gaussian noise models are appealing as they usually result in noise suppression algorithms that are simple: i.e. linear and closed form. However, such linear techniques may be sub-optimal when the noise process is either a non-Gaussian stochastic process or a chaotic deterministic process. In the event of encountering such noise processes, improvements in noise suppression, relative to the performance of linear methods, may be achievable using nonlinear signal processing techniques. The application of interest for this thesis is maritime surveillance radar, where the main source of interference, termed sea clutter, is widely accepted to be a non-Gaussian stochastic process at high resolutions and/or at low grazing angles. However, evidence has been presented during the last decade which suggests that sea clutter may be better modelled as a ...
Cowper, Mark — University Of Edinburgh
Forensic Evaluation of the Evidence Using Automatic Speaker Recognition Systems
This Thesis is focused on the use of automatic speaker recognition systems for forensic identification, in what is called forensic automatic speaker recognition. More generally, forensic identification aims at individualization, defined as the certainty of distinguishing an object or person from any other in a given population. This objective is followed by the analysis of the forensic evidence, understood as the comparison between two samples of material, such as glass, blood, speech, etc. An automatic speaker recognition system can be used in order to perform such comparison between some recovered speech material of questioned origin (e.g., an incriminating wire-tapping) and some control speech material coming from a suspect (e.g., recordings acquired in police facilities). However, the evaluation of such evidence is not a trivial issue at all. In fact, the debate about the presentation of forensic evidence in a court ...
Ramos, Daniel — Universidad Autonoma de Madrid
Sigma Delta Modulation Of A Chaotic Signal
Sigma delta modulation has become a widespread method of analogue to digital conversion, however its operation has not been completely defined. The majority of the analysis carried out on the circuit has been from a linear standpoint, with non-linear analysis hinting at hidden complexities in the modulator’s operation. The sigma delta modulator itself is a non-linear system consisting, as it does, of a number of integrators and a one bit quantiser in a feedback loop. This configuration can be generalised as a non-linearity within a feedback path, which is a classic route to chaotic behaviour. This initially raises the prospect that a sigma delta modulator may be capable of chaotic modes of operation with a non-chaotic input. In fact, the problem does not arise and we show why not. To facilitate this investigation, a set of differential equations is formulated ...
Ushaw, Gary — University Of Edinburgh
Extraction of efficient and characteristic features of multidimensional time series
In numerous signal processing applications one disposes of multiple probes, delivering simultaneously information about one or multiple observed processes. The resulting multidimensional time series are often highly redundant and may contain stochastic contributions. The perception of the useful information becomes therefore very difficult and sometimes impossible. Thus, the major issue of concern of this thesis resides in the development of novel algorithms for the extraction of the salient and characteristic features of multidimensional time series. The proposed algorithms are based on parametric signal processing, namely we assume that the features of the experimental data can be represented efficiently by a specific model. We present a global framework for the selection of a specific model out of the large span of techniques proposed in the literature. For the selection of the model classes we use, in addition to prior knowledge about ...
Vetter, Rolf — Swiss Federal Institute of Technology
Noise or interference is often assumed to be a random process. Conventional linear filtering, control or prediction techniques are used to cancel or reduce the noise. However, some noise processes have been shown to be nonlinear and deterministic. These nonlinear deterministic noise processes appear to be random when analysed with second order statistics. As nonlinear processes are widespread in nature it may be beneficial to exploit the coherence of the nonlinear deterministic noise with nonlinear filtering techniques. The nonlinear deterministic noise processes used in this thesis are generated from nonlinear difference or differential equations which are derived from real world scenarios. Analysis tools from the theory of nonlinear dynamics are used to determine an appropriate sampling rate of the nonlinear deterministic noise processes and their embedding dimensions. Nonlinear models, such as the Volterra series filter and the radial basis function ...
Strauch, Paul E. — University Of Edinburgh
Bispectral Analysis of Speech Signals
Techniques which utilise a signal’s Higher Order Statistics (HOS) can reveal information about non-Gaussian signals and nonlinearities which cannot be obtained using conventional (second-order) techniques. This information may be useful in speech processing because it may provide clues about how to construct new models of speech production which are better than existing models. There has been a recent surge of interest in the application of HOS techniques to speech processing, but this has been handicapped by a lack of understanding of what the HOS properties of speech signals are. Without this understanding the HOS information which is in speech signals can not be efficiently utilised. This thesis describes an investigation into the use of HOS techniques, in particular the third-order frequency domain measure called the bispectrum, to speech signals Several issues relating to bispectral speech analysis are addressed; including nonlinearity ...
Fackrell, Justin W. A. — University Of Edinburgh
Cognitive Models for Acoustic and Audiovisual Sound Source Localization
Sound source localization algorithms have a long research history in the field of digital signal processing. Many common applications like intelligent personal assistants, teleconferencing systems and methods for technical diagnosis in acoustics require an accurate localization of sound sources in the environment. However, dynamic environments entail a particular challenge for these systems. For instance, voice controlled smart home applications, where the speaker, as well as potential noise sources, are moving within the room, are a typical example of dynamic environments. Classical sound source localization systems only have limited capabilities to deal with dynamic acoustic scenarios. In this thesis, three novel approaches to sound source localization that extend existing classical methods will be presented. The first system is proposed in the context of audiovisual source localization. Determining the position of sound sources in adverse acoustic conditions can be improved by including ...
Schymura, Christopher — Ruhr University Bochum
Analysis and Enhancement of Multiactuator Panels for Wave Field Synthesis
This thesis addresses the development and enhancement of Multiactuator Panels (MAPs) with emphasis on the application to Wave Field Synthesis (WFS) reproduction. MAPs can be used alternatively to dynamic loudspeaker arrays for WFS with added benefits. However, since MAPs are panels of finite extent, excited mechanically on several points, there are structural and geometric issues that must be addressed to guarantee that all exciters are acting evenly to form an effective loudspeaker array for WFS. This aim is addressed by means of a methodology for the analysis of sound field radiation in the space-time domain that has been proposed and validated in this thesis. This research has produced a number of key conclusions. The proposed method analyzes aliasing artifacts in a graphical representation showing the distribution of radiated energy over space. In a comparative study between MAPs of different dimensions ...
Pueo, Basilio — Technical University of Valencia
Realtime and Accurate Musical Control of Expression in Voice Synthesis
In the early days of speech synthesis research, understanding voice production has attracted the attention of scientists with the goal of producing intelligible speech. Later, the need to produce more natural voices led researchers to use prerecorded voice databases, containing speech units, reassembled by a concatenation algorithm. With the outgrowth of computer capacities, the length of units increased, going from diphones to non-uniform units, in the so-called unit selection framework, using a strategy referred to as 'take the best, modify the least'. Today the new challenge in voice synthesis is the production of expressive speech or singing. The mainstream solution to this problem is based on the “there is no data like more data” paradigm: emotionspecific databases are recorded and emotion-specific units are segmented. In this thesis, we propose to restart the expressive speech synthesis problem, from its original voice ...
D' Alessandro, N. — Universite de Mons
Gaussian Process Modelling for Audio Signals
Audio signals are characterised and perceived based on how their spectral make-up changes with time. Uncovering the behaviour of latent spectral components is at the heart of many real-world applications involving sound, but is a highly ill-posed task given the infinite number of ways any signal can be decomposed. This motivates the use of prior knowledge and a probabilistic modelling paradigm that can characterise uncertainty. This thesis studies the application of Gaussian processes to audio, which offer a principled non-parametric way to specify probability distributions over functions whilst also encoding prior knowledge. Along the way we consider what prior knowledge we have about sound, the way it behaves, and the way it is perceived, and write down these assumptions in the form of probabilistic models. We show how Bayesian time-frequency analysis can be reformulated as a spectral mixture Gaussian process, ...
William Wilkinson — Queen Mary University of London
Cosparse regularization of physics-driven inverse problems
Inverse problems related to physical processes are of great importance in practically every field related to signal processing, such as tomography, acoustics, wireless communications, medical and radar imaging, to name only a few. At the same time, many of these problems are quite challenging due to their ill-posed nature. On the other hand, signals originating from physical phenomena are often governed by laws expressible through linear Partial Differential Equations (PDE), or equivalently, integral equations and the associated Green’s functions. In addition, these phenomena are usually induced by sparse singularities, appearing as sources or sinks of a vector field. In this thesis we primarily investigate the coupling of such physical laws with a prior assumption on the sparse origin of a physical process. This gives rise to a “dual” regularization concept, formulated either as sparse analysis (cosparse), yielded by a PDE ...
Kitić, Srđan — Université de Rennes 1
Interpretable Machine Learning for Machine Listening
Recent years have witnessed a significant interest in interpretable machine learning (IML) research that develops techniques to analyse machine learning (ML) models. Understanding ML models is essential to gain trust in their predictions and to improve datasets, model architectures and training techniques. The majority of effort in IML research has been in analysing models that classify images or structured data and comparatively less work exists that analyses models for other domains. This research focuses on developing novel IML methods and on extending existing methods to understand machine listening models that analyse audio. In particular, this thesis reports the results of three studies that apply three different IML methods to analyse five singing voice detection (SVD) models that predict singing voice activity in musical audio excerpts. The first study introduces SoundLIME (SLIME), a method to generate temporal, spectral or time-frequency explanations ...
Mishra, Saumitra — Queen Mary University of London
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.