A Computational Framework for Sound Segregation in Music Signals

Music is built from sound, ultimately resulting from an elaborate interaction between the sound-generating properties of physical objects (i.e. music instruments) and the sound perception abilities of the human auditory system. Humans, even without any kind of formal music training, are typically able to ex- tract, almost unconsciously, a great amount of relevant information from a musical signal. Features such as the beat of a musical piece, the main melody of a complex musical ar- rangement, the sound sources and events occurring in a complex musical mixture, the song structure (e.g. verse, chorus, bridge) and the musical genre of a piece, are just some examples of the level of knowledge that a naive listener is commonly able to extract just from listening to a musical piece. In order to do so, the human auditory system uses a variety of cues ...

Martins, Luis Gustavo — Universidade do Porto


Adaptation of statistical models for single channel source separation. Application to voice / music separation in songs

Single channel source separation is a quite recent problem of constantly growing interest in the scientific world. However, this problem is still very far to be solved, and even more, it cannot be solved in all its generality. Indeed, since this problem is highly underdetermined, the main difficulty is that a very strong knowledge about the sources is required to be able to separate them. For a grand class of existing separation methods, this knowledge is expressed by statistical source models, notably Gaussian Mixture Models (GMM), which are learned from some training examples. The subject of this work is to study the separation methods based on statistical models in general, and then to apply them to the particular problem of separating singing voice from background music in mono recordings of songs. It can be very useful to propose some satisfactory ...

OZEROV, Alexey — University of Rennes 1


Pitch-informed solo and accompaniment separation

This thesis addresses the development of a system for pitch-informed solo and accompaniment separation capable of separating main instruments from music accompaniment regardless of the musical genre of the track, or type of music accompaniment. For the solo instrument, only pitched monophonic instruments were considered in a single-channel scenario where no panning or spatial location information is available. In the proposed method, pitch information is used as an initial stage of a sinusoidal modeling approach that attempts to estimate the spectral information of the solo instrument from a given audio mixture. Instead of estimating the solo instrument on a frame by frame basis, the proposed method gathers information of tone objects to perform separation. Tone-based processing allowed the inclusion of novel processing stages for attack re nement, transient interference reduction, common amplitude modulation (CAM) of tone objects, and for better ...

Cano Cerón, Estefanía — Ilmenau University of Technology


Sound Source Separation in Monaural Music Signals

Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separation of monaural, or, one-channel music recordings. We concentrate on separation methods, where the sources to be separated are not known beforehand. Instead, the separation is enabled by utilizing the common properties of real-world sound sources, which are their continuity, sparseness, and repetition in time and frequency, and their harmonic spectral structures. One of the separation approaches taken here use unsupervised learning and the other uses model-based inference based on sinusoidal modeling. Most of the existing unsupervised separation algorithms are based on a linear instantaneous signal model, where each frame of the input mixture signal is modeled ...

Virtanen, Tuomas — Tampere University of Technology


Some Contributions to Music Signal Processing and to Mono-Microphone Blind Audio Source Separation

For humans, the sound is valuable mostly for its meaning. The voice is spoken language, music, artistic intent. Its physiological functioning is highly developed, as well as our understanding of the underlying process. It is a challenge to replicate this analysis using a computer: in many aspects, its capabilities do not match those of human beings when it comes to speech or instruments music recognition from the sound, to name a few. In this thesis, two problems are investigated: the source separation and the musical processing. The first part investigates the source separation using only one Microphone. The problem of sources separation arises when several audio sources are present at the same moment, mixed together and acquired by some sensors (one in our case). In this kind of situation it is natural for a human to separate and to recognize ...

Schutz, Antony — Eurecome/Mobile


From Blind to Semi-Blind Acoustic Source Separation based on Independent Component Analysis

Typical acoustic scenes consist of multiple superimposed sources, where some of them represent desired signals, but often many of them are undesired sources, e.g., interferers or noise. Hence, source separation and extraction, i.e., the estimation of the desired source signals based on observed mixtures, is one of the central problems in audio signal processing. A promising class of approaches to address such problems is based on Independent Component Analysis (ICA), an unsupervised machine learning technique. These methods enjoyed a lot of attention from the research community due to the small number of assumptions that have to be made about the considered problem. Furthermore, the resulting generalization ability to unseen acoustic conditions, their mathematical rigor and the simplicity of resulting algorithms have been appreciated by many researchers working in audio signal processing. However, knowledge about the acoustic scenario is often available ...

Brendel, Andreas — Friedrich-Alexander-Universität Erlangen-Nürnberg


Application of Sound Source Separation Methods to Advanced Spatial Audio Systems

This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in two-channel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to ...

Cobos, Maximo — Universidad Politecnica de Valencia


Group-Sparse Regression - With Applications in Spectral Analysis and Audio Signal Processing

This doctorate thesis focuses on sparse regression, a statistical modeling tool for selecting valuable predictors in underdetermined linear models. By imposing different constraints on the structure of the variable vector in the regression problem, one obtains estimates which have sparse supports, i.e., where only a few of the elements in the response variable have non-zero values. The thesis collects six papers which, to a varying extent, deals with the applications, implementations, modifications, translations, and other analysis of such problems. Sparse regression is often used to approximate additive models with intricate, non-linear, non-smooth or otherwise problematic functions, by creating an underdetermined model consisting of candidate values for these functions, and linear response variables which selects among the candidates. Sparse regression is therefore a widely used tool in applications such as, e.g., image processing, audio processing, seismological and biomedical modeling, but is ...

Kronvall, Ted — Lund University


Reverse Audio Engineering for Active Listening and Other Applications

This work deals with the problem of reverse audio engineering for active listening. The format under consideration corresponds to the audio CD. The musical content is viewed as the result of a concatenation of the composition, the recording, the mixing, and the mastering. The inversion of the two latter stages constitutes the core of the problem at hand. The audio signal is treated as a post-nonlinear mixture. Thus, the mixture is “decompressed” before being “decomposed” into audio tracks. The problem is tackled in an informed context: The inversion is accompanied by information which is specific to the content production. In this manner, the quality of the inversion is significantly improved. The information is reduced in size by the use of quantification and coding methods, and some facts on psychoacoustics. The proposed methods are applicable in real time and have a ...

Gorlow, Stanislaw — Université Bordeaux 1


Audio motif detection for guided source separation. Application to movie soudtracks.

In audio signal processing, source separation consists in recovering the different audio sources that compose a given observed audio mixture. They are many techniques to estimate these sources and the more information are taken into account about them the more the separation is likely to be successful. One way to incorporate information on sources is the use of a reference signal which will give a first approximation of this source. This thesis aims to explore the theoretical and applied aspects of reference guided source separation. The proposed approach called SPotted REference based Separation (SPORES) explore the particular case where the references are obtained automatically by motif spotting, i.e., by a search of similar content. Such an approach is useful for contents with a certain redundancy or if a large database is be available. Fortunately, the current context often puts us ...

Souviraà-Labastie Nathan — Université de Rennes 1


Learning from structured EEG and fMRI data supporting the diagnosis of epilepsy

Epilepsy is a neurological condition that manifests in epileptic seizures as a result of an abnormal, synchronous activity of a large group of neurons. Depending on the affected brain regions, seizures produce various severe clinical symptoms. Epilepsy cannot be cured and in many cases is not controlled by medication either. Surgical resection of the region responsible for generating the epileptic seizures might offer remedy for these patients. Electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) measure the changes of brain activity in time over different locations of the brain. As such, they provide valuable information on the nature, the timing and the spatial origin of the epileptic activity. Unfortunately, both techniques record activity of different brain and artefact sources as well. Hence, EEG and fMRI signals are characterised by low signal to noise ratio. Data quality and the vast amount ...

Hunyadi, Borbála — KU Leuven


Signal Separation of Musical Instruments

This thesis presents techniques for the modelling of musical signals, with particular regard to monophonic and polyphonic pitch estimation. Musical signals are modelled as a set of notes, each comprising of a set of harmonically-related sinusoids. An hierarchical model is presented that is very general and applicable to any signal that can be decomposed as the sum of basis functions. Parameter estimation is posed within a Bayesian framework, allowing for the incorporation of prior information about model parameters. The resulting posterior distribution is of variable dimension and so reversible jump MCMC simulation techniques are employed for the parameter estimation task. The extension of the model to time-varying signals with high posterior correlations between model parameters is described. The parameters and hyperparameters of several frames of data are estimated jointly to achieve a more robust detection. A general model for the ...

Walmsley, Paul Joseph — University of Cambridge


Sound Event Detection by Exploring Audio Sequence Modelling

Everyday sounds in real-world environments are a powerful source of information by which humans can interact with their environments. Humans can infer what is happening around them by listening to everyday sounds. At the same time, it is a challenging task for a computer algorithm in a smart device to automatically recognise, understand, and interpret everyday sounds. Sound event detection (SED) is the process of transcribing an audio recording into sound event tags with onset and offset time values. This involves classification and segmentation of sound events in the given audio recording. SED has numerous applications in everyday life which include security and surveillance, automation, healthcare monitoring, multimedia information retrieval, and assisted living technologies. SED is to everyday sounds what automatic speech recognition (ASR) is to speech and automatic music transcription (AMT) is to music. The fundamental questions in designing ...

[Pankajakshan], [Arjun] — Queen Mary University of London


Time-domain music source separation for choirs and ensembles

Music source separation is the task of separating musical sources from an audio mixture. It has various direct applications including automatic karaoke generation, enhancing musical recordings, and 3D-audio upmixing; but also has implications for other downstream music information retrieval tasks such as multi-instrument transcription. However, the majority of research has focused on fixed stem separation of vocals, drums, and bass stems. While such models have highlighted capabilities of source separation using deep learning, their implications are limited to very few use cases. Such models are unable to separate most other instruments due to insufficient training data. Moreover, class-based separation inherently limits the applicability of such models to be unable to separate monotimbral mixtures. This thesis focuses on separating musical sources without requiring timbral distinction among the sources. Preliminary attempts focus on the separation of vocal harmonies from choral ensembles using ...

Sarkar, Saurjya — Queen Mary University of London


Model-Based Deep Speech Enhancement for Improved Interpretability and Robustness

Technology advancements profoundly impact numerous aspects of life, including how we communicate and interact. For instance, hearing aids enable hearing-impaired or elderly people to participate comfortably in daily conversations; telecommunications equipment lifts distance constraints, enabling people to communicate remotely; smart machines are developed to interact with humans by understanding and responding to their instructions. These applications involve speech-based interaction not only between humans but also between humans and machines. However, the microphones mounted on these technical devices can capture both target speech and interfering sounds, posing challenges to the reliability of speech communication in noisy environments. For example, distorted speech signals may reduce communication fluency among participants during teleconferencing. Additionally, noise interference can negatively affect the speech recognition and understanding modules of a voice-controlled machine. This calls for speech enhancement algorithms to extract clean speech and suppress undesired interfering signals, ...

Fang, Huajian — University of Hamburg

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.