Some Contributions to Music Signal Processing and to Mono-Microphone Blind Audio Source Separation

For humans, the sound is valuable mostly for its meaning. The voice is spoken language, music, artistic intent. Its physiological functioning is highly developed, as well as our understanding of the underlying process. It is a challenge to replicate this analysis using a computer: in many aspects, its capabilities do not match those of human beings when it comes to speech or instruments music recognition from the sound, to name a few. In this thesis, two problems are investigated: the source separation and the musical processing. The first part investigates the source separation using only one Microphone. The problem of sources separation arises when several audio sources are present at the same moment, mixed together and acquired by some sensors (one in our case). In this kind of situation it is natural for a human to separate and to recognize several speakers. This problem, known as the Cocktail Problem, receives a lot of attention but is still open. In this part we present two algorithms for separating the speakers. Since we work with only one observation, no spatial informations can be used and a modelization of the sources is needed. We use a parametric model for constraining the solution: a mixture is modeled as a sum of Autoregressive sources with an additive white noise. The sources are themselfs modeled by a cascade of two AR model with differents correlation lenghts. The first algorithm is adaptive, for a non stationary signal it is natural to want to follow the variation of the signal over the time. The second algorithm works with consecutive frames of short duration. The procedure is splitted in two parts: first an estimation of the sources parameters is done on a frame, then a non iterative separation algorithm is used. Finally the estimated parameters are used for the initilization of the next analysed frame. The second part deals with Musical Processing and is composed of several annexe. The task that we investigate is connected to the Automatic Music Transcription task, which is the process of understanding the content of a song in order to generate a music score. But, music cannot be reduced to a succession of notes, and an accurate transcriptor should be able to detect other performance characteristics such as interpretations effects. The tools built for automatic transcription can also be used in a pedagogic way, so that even a student can improve his performances with the help of a software. This means that the software should be able to detect some interpret?s flaws. In this part, first of all we collect several samples of interpretation effects and performing defects. Then, we have built some tools for finding the presence (or not) of the considered effect. Another problem in music transcription is called the octave problem, it appears when a note and its octave are present together. As the octave has a frequency twice the note, they share the periodicity and the partials of the spectrum are perfectly overlapped. This makes the detection laborious. We propose an energetic criterion based on the estimation of the energy of the odd and even partials of the chord. Finally, the last chapter deals with the description of an audio-video simulator specialized for writing guitar tablature instead of partition.

File Type: pdf
File Size: 21 MB
Publication Year: 2010
Author: Schutz, Antony
Supervisors: Dirk Slock
Institution: Eurecome/Mobile
Keywords: Source Separation, Autoregressive Model, Ornementation, Adaptive Filtering, Spectral Analysis