Adaptation of statistical models for single channel source separation. Application to voice / music separation in songs (2006)
The problem of signal separation is a very broad and fundamental one. A powerful paradigm within which signal separation can be achieved is the assumption that the signals/sources are statistically independent of one another. This is known as Independent Component Analysis (ICA). In this thesis, the theoretical aspects and derivation of ICA are examined, from which disparate approaches to signal separation are drawn together in a unifying framework. This is followed by a review of signal separation techniques based on ICA. Second order statistics based output decorrelation methods are employed to try to solve the challenging problem of separating convolutively mixed signals, in the context of mainly audio source separation and the Cocktail Party Problem. Various optimisation techniques are devised to implement second order signal separation of both artificially mixed signals and real mixtures. A study of the advantages and ...
Ahmed, Alijah — University of Cambridge
Some Contributions to Music Signal Processing and to Mono-Microphone Blind Audio Source Separation
For humans, the sound is valuable mostly for its meaning. The voice is spoken language, music, artistic intent. Its physiological functioning is highly developed, as well as our understanding of the underlying process. It is a challenge to replicate this analysis using a computer: in many aspects, its capabilities do not match those of human beings when it comes to speech or instruments music recognition from the sound, to name a few. In this thesis, two problems are investigated: the source separation and the musical processing. The first part investigates the source separation using only one Microphone. The problem of sources separation arises when several audio sources are present at the same moment, mixed together and acquired by some sensors (one in our case). In this kind of situation it is natural for a human to separate and to recognize ...
Schutz, Antony — Eurecome/Mobile
A Computational Framework for Sound Segregation in Music Signals
Music is built from sound, ultimately resulting from an elaborate interaction between the sound-generating properties of physical objects (i.e. music instruments) and the sound perception abilities of the human auditory system. Humans, even without any kind of formal music training, are typically able to ex- tract, almost unconsciously, a great amount of relevant information from a musical signal. Features such as the beat of a musical piece, the main melody of a complex musical ar- rangement, the sound sources and events occurring in a complex musical mixture, the song structure (e.g. verse, chorus, bridge) and the musical genre of a piece, are just some examples of the level of knowledge that a naive listener is commonly able to extract just from listening to a musical piece. In order to do so, the human auditory system uses a variety of cues ...
Martins, Luis Gustavo — Universidade do Porto
Bayesian Approaches in Image Source Seperation
In this thesis, a general solution to the component separation problem in images is introduced. Unlike most existing works, the spatial dependencies of images are modelled in the separation process with the use of Markov random fields (MRFs). In the MRFs model, Cauchy density is used for the gradient images. We provide a general Bayesian framework for the estimation of the parameters of this model. Due to the intractability of the problem we resort to numerical solutions for the joint maximization of the a posteriori distribution of the sources, the mixing matrix and the noise variances. For numerical solution, four different methods are proposed. In first method, the difficulty of working analytically with general Gibbs distributions of MRF is overcome by using an approximate density. In this approach, the Gibbs distribution is modelled by the product of directional Gaussians. The ...
Kayabol, Koray — Istanbul University
The separation of independent sources from mixed observed data is a fundamental and challenging signal processing problem. In many practical situations, one or more desired signals need to be recovered from the mixtures only. A typical example is speech recordings made in an acoustic environment in the presence of background noise and/or competing speakers. Other examples include EEG signals, passive sonar applications and cross-talk in data communications. The audio signal separation problem is sometimes referred to as The Cocktail Party Problem. When several people in the same room are conversing at the same time, it is remarkable that a person is able to choose to concentrate on one of the speakers and listen to his or her speech flow unimpeded. This ability, usually referred to as the binaural cocktail party effect, results in part from binaural (two-eared) hearing. In contrast, ...
Chan, Dominic C. B. — University of Cambridge
Speech Enhancement Algorithms for Audiological Applications
The improvement of speech intelligibility is a traditional problem which still remains open and unsolved. The recent boom of applications such as hands-free communi- cations or automatic speech recognition systems and the ever-increasing demands of the hearing-impaired community have given a definitive impulse to the research in this area. This PhD thesis is focused on speech enhancement for audiological applications. Most of the research conducted in this thesis has been focused on the improvement of speech intelligibility in hearing aids, considering the variety of restrictions and limitations imposed by this type of devices. The combination of source separation techniques and spatial filtering with machine learning and evolutionary computation has originated novel and interesting algorithms which are included in this thesis. The thesis is divided in two main parts. The first one contains a preliminary study of the problem and a ...
Ayllón, David — Universidad de Alcalá
Sequential Bayesian Modeling of non-stationary signals
are involved until the development of Sequential Monte Carlo techniques which are also known as the particle filters. In particle filtering, the problem is expressed in terms of state-space equations where the linearity and Gaussianity requirements of the Kalman filtering are generalized. Therefore, we need information about the functional form of the state variations. In this thesis, we bring a general solution for the cases where these variations are unknown and the process distributions cannot be expressed by any closed form probability density function. Here, we propose a novel modeling scheme which is as unified as possible to cover all these problems. Therefore we study the performance analysis of our unifying particle filtering methodology on non-stationary Alpha Stable process modeling. It is well known that the probability density functions of these processes cannot be expressed in closed form, except for ...
Gencaga, Deniz — Bogazici University
This thesis is concerned with three closely related problems. The first one is called Multiple-Input Multiple-Output (MIMO) Instantaneous Blind Identification, which we denote by MIBI. In this problem a number of mutually statistically independent source signals are mixed by a MIMO instantaneous mixing system and only the mixed signals are observed, i.e. both the mixing system and the original sources are unknown or ‘blind’. The goal of MIBI is to identify the MIMO system from the observed mixtures of the source signals only. The second problem is called Instantaneous Blind Signal Separation (IBSS) and deals with recovering mutually statistically independent source signals from their observed instantaneous mixtures only. The observation model and assumptions on the signals and mixing system are the same as those of MIBI. However, the main purpose of IBSS is the estimation of the source signals, whereas ...
van de Laar, Jakob — TU Eindhoven
Deep neural networks for source separation and noise-robust speech recognition
This thesis addresses the problem of multichannel audio source separation by exploiting deep neural networks (DNNs). We build upon the classical expectation-maximization (EM) based source separation framework employing a multichannel Gaussian model, in which the sources are characterized by their power spectral densities and their source spatial covariance matrices. We explore and optimize the use of DNNs for estimating these spectral and spatial parameters. Employing the estimated source parameters, we then derive a time-varying multichannel Wiener filter for the separation of each source. We extensively study the impact of various design choices for the spectral and spatial DNNs. We consider different cost functions, time-frequency representations, architectures, and training data sizes. Those cost functions notably include a newly proposed task-oriented signal-to-distortion ratio cost function for spectral DNNs. Furthermore, we present a weighted spatial parameter estimation formula, which generalizes the corresponding exact ...
Nugraha, Aditya Arie — Université de Lorraine
MIMO instantaneous blind idenfitication and separation based on arbitrary order
This thesis is concerned with three closely related problems. The first one is called Multiple-Input Multiple-Output (MIMO) Instantaneous Blind Identification, which we denote by MIBI. In this problem a number of mutually statistically independent source signals are mixed by a MIMO instantaneous mixing system and only the mixed signals are observed, i.e. both the mixing system and the original sources are unknown or ¡blind¢. The goal of MIBI is to identify the MIMO system from the observed mixtures of the source signals only. The second problem is called Instantaneous Blind Signal Separation (IBSS) and deals with recovering mutually statistically independent source signals from their observed instantaneous mixtures only. The observation model and assumptions on the signals and mixing system are the same as those of MIBI. However, the main purpose of IBSS is the estimation of the source signals, whereas ...
van de Laar, Jakob — T.U. Eindhoven
Source-Filter Model Based Single Channel Speech Separation
In a natural acoustic environment, multiple sources are usually active at the same time. The task of source separation is the estimation of individual source signals from this complex mixture. The challenge of single channel source separation (SCSS) is to recover more than one source from a single observation. Basically, SCSS can be divided in methods that try to mimic the human auditory system and model-based methods, which find a probabilistic representation of the individual sources and employ this prior knowledge for inference. This thesis presents several strategies for the separation of two speech utterances mixed into a single channel and is structured in four parts: The first part reviews factorial models in model-based SCSS and introduces the soft-binary mask for signal reconstruction. This mask shows improved performance compared to the soft and the binary masks in automatic speech recognition ...
Stark, Michael — Graz University of Technology
State and Parameter Estimation for Dynamic Systems: Some Investigations
This dissertation presents the outcome of investigations which envisaged to develop improved state and ‘combined state and parameter’ estimation algorithms for nonlinear signal models (during the contingent situations) where the complete knowledge of process and/or measurement noise covariance are not available. Variants of “adaptive nonlinear estimators” capable of providing satisfactory estimation results in the face of unknown noise covariance have been proposed in this dissertation. The proposed adaptive nonlinear estimators incorporate adaptation algorithms with which they can implicitly or explicitly, estimate unknown noise covariances along with estimation of states and parameters. Adaptation algorithms have been mathematically derived following different methods of adaptation which include Maximum Likelihood Estimation (MLE), Covariance Matching method and Maximum a Posteriori (MAP) method. The adaptive nonlinear estimators which have been proposed in this dissertation are formulated with the help of a general framework for adaptive nonlinear ...
Aritro Dey — Jadavpur University
This thesis concentrates on a major problem within audio signal processing, the separation of source signals from musical mixtures when only a single mixture channel is available. Source separation is the process by which signals that correspond to distinct sources are identified in a signal mixture and extracted from it. Producing multiple entities from a single one is an extremely underdetermined task, so additional prior information can assist in setting appropriate constraints on the solution set. The approach proposed uses prior information such that: (1) it can potentially be applied successfully to a large variety of musical mixtures, and (2) it requires minimal user intervention and no prior learning/training procedures (i.e., it is an unsupervised process). This system can be useful for applications such as remixing, creative effects, restoration and for archiving musical material for internet delivery, amongst others. Here, ...
Siamantas, Georgios — University of York
Realtime and Accurate Musical Control of Expression in Voice Synthesis
In the early days of speech synthesis research, understanding voice production has attracted the attention of scientists with the goal of producing intelligible speech. Later, the need to produce more natural voices led researchers to use prerecorded voice databases, containing speech units, reassembled by a concatenation algorithm. With the outgrowth of computer capacities, the length of units increased, going from diphones to non-uniform units, in the so-called unit selection framework, using a strategy referred to as 'take the best, modify the least'. Today the new challenge in voice synthesis is the production of expressive speech or singing. The mainstream solution to this problem is based on the “there is no data like more data” paradigm: emotionspecific databases are recorded and emotion-specific units are segmented. In this thesis, we propose to restart the expressive speech synthesis problem, from its original voice ...
D' Alessandro, N. — Universite de Mons
Bayesian Compressed Sensing using Alpha-Stable Distributions
During the last decades, information is being gathered and processed at an explosive rate. This fact gives rise to a very important issue, that is, how to effectively and precisely describe the information content of a given source signal or an ensemble of source signals, such that it can be stored, processed or transmitted by taking into consideration the limitations and capabilities of the several digital devices. One of the fundamental principles of signal processing for decades is the Nyquist-Shannon sampling theorem, which states that the minimum number of samples needed to reconstruct a signal without error is dictated by its bandwidth. However, there are many cases in our everyday life in which sampling at the Nyquist rate results in too many data and thus, demanding an increased processing power, as well as storage requirements. A mathematical theory that emerged ...
Tzagkarakis, George — University of Crete
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.