Signal Processing and Graph Theory Techniques for Sound Source Separation

Source separation aims to identify and separate the sources from a given mixture. In music source separation, the sources are typically musical instruments and the given mixture, a recorded track. When there is little or no prior information about the sources or recording conditions, a major goal becomes to target the inherent characteristics of the sources to help with their differentiation and separation. This thesis is concerned with methods for doing so, introducing novel approaches based on signal processing and graph theory techniques.

Kernel Additive Modelling (KAM) is a popular music source separation framework as it is flexible, computationally efficient and requires no training data. The main idea behind KAM is that one can use the inherent repetitions of musical signals to reconstruct a source by defining a proximity kernel. KAM employs robust statistics for the separation, whose success ultimately depends on the kernel ability to identify similar instances of a source in the presence of other overlaying sources. In existing KAM approaches, the kernel design is rather rudimentary and its simplicity is limiting. In this thesis we investigate the current kernel and propose novel extensions boosting its performance without losing interpretability, flexibility or efficiency. We then explore the inherent graph structure in KAM, leading to the first unsupervised method to optimise the sole parameter in the framework. Following this perspective, we further investigate graph representations, introducing visibility graphs to magnitude spectra. We present a novel visibility graph-based representation with valuable properties for audio. Finally, we propose the first method to compute visibility graphs on-line, broadening the relevance of this thesis to generic time series analysis.

File Type: pdf
File Size: 3 MB
Publication Year: 2020
Author : Delia Fano Yela
Supervisors : Mark Sandler and Dan Stowell
Institution : Queen Mary University of London
Keywords : audio; signal processing; source separation; graph theory; unsupervised; spectral representation; visibility graphs; kernel additive modelling; non-negative matrix factorization