Abstract / truncated to 115 words (read the full abstract)

When listening to music, some humans can easily recognize which instruments play at what time or when a new musical segment starts, but cannot describe exactly how they do this. To automatically describe particular aspects of a music piece – be it for an academic interest in emulating human perception, or for practical applications –, we can thus not directly replicate the steps taken by a human. We can, however, exploit that humans can easily annotate examples, and optimize a generic function to reproduce these annotations. In this thesis, I explore solving different music perception tasks with deep learning, a recent branch of machine learning that optimizes functions of many stacked nonlinear operations – referred ... toggle 18 keywords

machine learning deep learning multilayer perceptron convolutional neural network music information retrieval music detection speech detection vocal activity detection sequence labelling music similarity estimation onset detection event detection boundary detection structural segmentation music segmentation singing voice detection data augmentation weak labels


Schlüter, Jan
Department of Computational Perception, Johannes Kepler University Linz
Publication Year
Upload Date
Oct. 5, 2018

First few pages / click to enlarge

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.