Decompositions Parcimonieuses Structurees: Application a la presentation objet de la musique
The amount of digital music available both on the Internet and by each listener has considerably raised for about ten years. The organization and the accessibillity of this amount of data demand that additional informations are available, such as artist, album and song names, musical genre, tempo, mood or other symbolic or semantic attributes. Automatic music indexing has thus become a challenging research area. If some tasks are now correctly handled for certain types of music, such as automatic genre classification for stereotypical music, music instrument recoginition on solo performance and tempo extraction, others are more difficult to perform. For example, automatic transcription of polyphonic signals and instrument ensemble recognition are still limited to some particular cases. The goal of our study is not to obain a perfect transcription of the signals and an exact classification of all the instruments involved, but rather to build an object representation of the signal, that exhibits some useful features of the music signals by representing it as sound objects. To achieve this goal, we will employ sparse representations of the signal. This recent research area handles the approximation of signals by waveforms (atoms) belonging to dictionaries. The main topics are the building of dictionaries that are adapted to the analyzed signals, and the design of algorithms allowing to decompose the signal in an optimal and efficient way. In the presented work, dictionaries linked to instrumental sources have been built: we define a Instrument-Specific Harmonic atom as a sum of Gabor atoms representing the note partials, and whose amplitude vectors belong to an ensemble learnt of annotated sources. Some variants of these atoms have been defined to better model the structures outside the strict harmonicity: one takes the frequency modulations into account, another one introduces an inharmonicity parameter that models the partial positions for the slightly inharmonic instruments like piano. These atoms can be defined in stereo signals with an additionnal panpot parameter. We also introduce molecules, atom subsets that models long structures like entire music notes. Then, we present algorithms that extract these atoms and molecules in an efficient way from audio signals. We will use the Matching Pursuit algorithm, that we adapt for the extraction of the aforementioned signal structures. The algorithms used to extract the atoms involve an optimization of their parameters after a coarse estimate on a grid. The molecular algorithms are based on path search, resolved with dynamic programming. Finally, we show how the signal models and the developed algorithms yield useful representations for music indexing. We evaluate their efficiency for pitch estimation and music instrument recognition on solo phrases, for which results are as high as state-of-the-art algorithms. The identification of instruments ensembles has also been addressed in monophonic and stereophonic signals. A extremely low rate coder (1 ? 4 kbs) has also been implemented, with encouraging preliminary results.
