Decompositions Parcimonieuses Structurees: Application a la presentation objet de la musique

The amount of digital music available both on the Internet and by each listener has considerably raised for about ten years. The organization and the accessibillity of this amount of data demand that additional informations are available, such as artist, album and song names, musical genre, tempo, mood or other symbolic or semantic attributes. Automatic music indexing has thus become a challenging research area. If some tasks are now correctly handled for certain types of music, such as automatic genre classification for stereotypical music, music instrument recoginition on solo performance and tempo extraction, others are more difficult to perform. For example, automatic transcription of polyphonic signals and instrument ensemble recognition are still limited to some particular cases. The goal of our study is not to obain a perfect transcription of the signals and an exact classification of all the instruments ...

Leveau, Pierre — Universite Pierre et Marie Curie, Telecom ParisTech


Adaptive Sparse Coding and Dictionary Selection

The sparse coding is approximation/representation of signals with the minimum number of coefficients using an overcomplete set of elementary functions. This kind of approximations/ representations has found numerous applications in source separation, denoising, coding and compressed sensing. The adaptation of the sparse approximation framework to the coding problem of signals is investigated in this thesis. Open problems are the selection of appropriate models and their orders, coefficient quantization and sparse approximation method. Some of these questions are addressed in this thesis and novel methods developed. Because almost all recent communication and storage systems are digital, an easy method to compute quantized sparse approximations is introduced in the first part. The model selection problem is investigated next. The linear model can be adapted to better fit a given signal class. It can also be designed based on some a priori information ...

Yaghoobi, Mehrdad — University of Edinburgh


Parameter Estimation -in sparsity we trust

This thesis is based on nine papers, all concerned with parameter estimation. The thesis aims at solving problems related to real-world applications such as spectroscopy, DNA sequencing, and audio processing, using sparse modeling heuristics. For the problems considered in this thesis, one is not only concerned with finding the parameters in the signal model, but also to determine the number of signal components present in the measurements. In recent years, developments in sparse modeling have allowed for methods that jointly estimate the parameters in the model and the model order. Based on these achievements, the approach often taken in this thesis is as follows. First, a parametric model of the considered signal is derived, containing different parameters that capture the important characteristics of the signal. When the signal model has been determined, an optimization problem is formed aimed at finding ...

Swärd, Johan — Lund University


Novel texture synthesis methods and their application to image prediction and image inpainting

This thesis presents novel exemplar-based texture synthesis methods for image prediction (i.e., predictive coding) and image inpainting problems. The main contributions of this study can also be seen as extensions to simple template matching, however the texture synthesis problem here is well-formulated in an optimization framework with different constraints. The image prediction problem has first been put into sparse representations framework by approximating the template with a sparsity constraint. The proposed sparse prediction method with locally and adaptive dictionaries has been shown to give better performance when compared to static waveform (such as DCT) dictionaries, and also to the template matching method. The image prediction problem has later been placed into an online dictionary learning framework by adapting conventional dictionary learning approaches for image prediction. The experimental observations show a better performance when compared to H.264/AVC intra and sparse prediction. ...

Turkan, Mehmet — INRIA-Rennes, France


Parallel Dictionary Learning Algorithms for Sparse Representations

Sparse representations are intensively used in signal processing applications, like image coding, denoising, echo channels modeling, compression, classification and many others. Recent research has shown encouraging results when the sparse signals are created through the use of a learned dictionary. The current study focuses on finding new methods and algorithms, that have a parallel form where possible, for obtaining sparse representations of signals with improved dictionaries that lead to better performance in both representation error and execution time. We attack the general dictionary learning problem by first investigating and proposing new solutions for sparse representation stage and then moving on to the dictionary update stage where we propose a new parallel update strategy. Lastly, we study the effect of the representation algorithms on the dictionary update method. We also researched dictionary learning solutions where the dictionary has a specific form. ...

Irofti, Paul — Politehnica University of Bucharest


Contributions to signal analysis and processing using compressed sensing techniques

Chapter 2 contains a short introduction to the fundamentals of compressed sensing theory, which is the larger context of this thesis. We start with introducing the key concepts of sparsity and sparse representations of signals. We discuss the central problem of compressed sensing, i.e. how to adequately recover sparse signals from a small number of measurements, as well as the multiple formulations of the reconstruction problem. A large part of the chapter is devoted to some of the most important conditions necessary and/or sufficient to guarantee accurate recovery. The aim is to introduce the reader to the basic results, without the burden of detailed proofs. In addition, we also present a few of the popular reconstruction and optimization algorithms that we use throughout the thesis. Chapter 3 presents an alternative sparsity model known as analysis sparsity, that offers similar recovery ...

Cleju, Nicolae — "Gheorghe Asachi" Technical University of Iasi


Toward sparse and geometry adapted video approximations

Video signals are sequences of natural images, where images are often modeled as piecewise-smooth signals. Hence, video can be seen as a 3D piecewise-smooth signal made of piecewise-smooth regions that move through time. Based on the piecewise-smooth model and on related theoretical work on rate-distortion performance of wavelet and oracle based coding schemes, one can better analyze the appropriate coding strategies that adaptive video codecs need to implement in order to be efficient. Efficient video representations for coding purposes require the use of adaptive signal decompositions able to capture appropriately the structure and redundancy appearing in video signals. Adaptivity needs to be such that it allows for proper modeling of signals in order to represent these with the lowest possible coding cost. Video is a very structured signal with high geometric content. This includes temporal geometry (normally represented by motion ...

Divorra Escoda, Oscar — EPFL / Signal Processing Institute


Interpretable Machine Learning for Machine Listening

Recent years have witnessed a significant interest in interpretable machine learning (IML) research that develops techniques to analyse machine learning (ML) models. Understanding ML models is essential to gain trust in their predictions and to improve datasets, model architectures and training techniques. The majority of effort in IML research has been in analysing models that classify images or structured data and comparatively less work exists that analyses models for other domains. This research focuses on developing novel IML methods and on extending existing methods to understand machine listening models that analyse audio. In particular, this thesis reports the results of three studies that apply three different IML methods to analyse five singing voice detection (SVD) models that predict singing voice activity in musical audio excerpts. The first study introduces SoundLIME (SLIME), a method to generate temporal, spectral or time-frequency explanations ...

Mishra, Saumitra — Queen Mary University of London


An Investigation of Nonlinear Speech Synthesis and Pitch Modification Techniques

Speech synthesis technology plays an important role in many aspects of man–machine interaction, particularly in telephony applications. In order to be widely accepted, the synthesised speech quality should be as human–like as possible. This thesis investigates novel techniques for the speech signal generation stage in a speech synthesiser, based on concepts from nonlinear dynamical theory. It focuses on natural–sounding synthesis for voiced speech, coupled with the ability to generate the sound at the required pitch. The one–dimensional voiced speech time–domain signals are embedded into an appropriate higher dimensional space, using Takens’ method of delays. These reconstructed state space representations have approximately the same dynamical properties as the original speech generating system and are thus effective models. A new technique for marking epoch points in voiced speech that operates in the state space domain is proposed. Using the fact that one ...

Mann, Iain — University Of Edinburgh


Sparsity Models for Signals: Theory and Applications

Many signal and image processing applications have benefited remarkably from the theory of sparse representations. In its classical form this theory models signal as having a sparse representation under a given dictionary -- this is referred to as the "Synthesis Model". In this work we focus on greedy methods for the problem of recovering a signal from a set of deteriorated linear measurements. We consider four different sparsity frameworks that extend the aforementioned synthesis model: (i) The cosparse analysis model; (ii) the signal space paradigm; (iii) the transform domain strategy; and (iv) the sparse Poisson noise model. Our algorithms of interest in the first part of the work are the greedy-like schemes: CoSaMP, subspace pursuit (SP), iterative hard thresholding (IHT) and hard thresholding pursuit (HTP). It has been shown for the synthesis model that these can achieve a stable recovery ...

Giryes, Raja — Technion


Sparse Modeling Heuristics for Parameter Estimation - Applications in Statistical Signal Processing

This thesis examines sparse statistical modeling on a range of applications in audio modeling, audio localizations, DNA sequencing, and spectroscopy. In the examined cases, the resulting estimation problems are computationally cumbersome, both as one often suffers from a lack of model order knowledge for this form of problems, but also due to the high dimensionality of the parameter spaces, which typically also yield optimization problems with numerous local minima. In this thesis, these problems are treated using sparse modeling heuristics, with the resulting criteria being solved using convex relaxations, inspired from disciplined convex programming ideas, to maintain tractability. The contributions to audio modeling and estimation focus on the estimation of the fundamental frequency of harmonically related sinusoidal signals, which is commonly used model for, e.g., voiced speech or tonal audio. We examine both the problems of estimating multiple audio sources ...

Adalbjörnsson, Stefan Ingi — Lund University


Audio-visual processing and content management techniques, for the study of (human) bioacoustics phenomena

The present doctoral thesis aims towards the development of new long-term, multi-channel, audio-visual processing techniques for the analysis of bioacoustics phenomena. The effort is focused on the study of the physiology of the gastrointestinal system, aiming at the support of medical research for the discovery of gastrointestinal motility patterns and the diagnosis of functional disorders. The term "processing" in this case is quite broad, incorporating the procedures of signal processing, content description, manipulation and analysis, that are applied to all the recorded bioacoustics signals, the auxiliary audio-visual surveillance information (for the monitoring of experiments and the subjects' status), and the extracted audio-video sequences describing the abdominal sound-field alterations. The thesis outline is as follows. The main objective of the thesis, which is the technological support of medical research, is presented in the first chapter. A quick problem definition is initially ...

Dimoulas, Charalampos — Department of Electrical and Computer Engineering, Faculty of Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece


Probabilistic Model-Based Multiple Pitch Tracking of Speech

Multiple pitch tracking of speech is an important task for the segregation of multiple speakers in a single-channel recording. In this thesis, a probabilistic model-based approach for estimation and tracking of multiple pitch trajectories is proposed. A probabilistic model that captures pitch-dependent characteristics of the single-speaker short-time spectrum is obtained a priori from clean speech data. The resulting speaker model, which is based on Gaussian mixture models, can be trained either in a speaker independent (SI) or a speaker dependent (SD) fashion. Speaker models are then combined using an interaction model to obtain a probabilistic description of the observed speech mixture. A factorial hidden Markov model is applied for tracking the pitch trajectories of multiple speakers over time. The probabilistic model-based approach is capable to explicitly incorporate timbral information and all associated uncertainties of spectral structure into the model. While ...

Wohlmayr, Michael — Graz University of Technology


Acoustic Event Detection: Feature, Evaluation and Dataset Design

It takes more time to think of a silent scene, action or event than finding one that emanates sound. Not only speaking or playing music but almost everything that happens is accompanied with or results in one or more sounds mixed together. This makes acoustic event detection (AED) one of the most researched topics in audio signal processing nowadays and it will probably not see a decline anywhere in the near future. This is due to the thirst for understanding and digitally abstracting more and more events in life via the enormous amount of recorded audio through thousands of applications in our daily routine. But it is also a result of two intrinsic properties of audio: it doesn’t need a direct sight to be perceived and is less intrusive to record when compared to image or video. Many applications such ...

Mina Mounir — KU Leuven, ESAT STADIUS


Discrete-time speech processing with application to emotion recognition

The subject of this PhD thesis is the efficient and robust processing and analysis of the audio recordings that are derived from a call center. The thesis is comprised of two parts. The first part is dedicated to dialogue/non-dialogue detection and to speaker segmentation. The systems that are developed are prerequisite for detecting (i) the audio segments that actually contain a dialogue between the system and the call center customer and (ii) the change points between the system and the customer. This way the volume of the audio recordings that need to be processed is significantly reduced, while the system is automated. To detect the presence of a dialogue several systems are developed. This is the first effort found in the international literature that the audio channel is exclusively exploited. Also, it is the first time that the speaker utterance ...

Kotti, Margarita — Aristotle University of Thessaloniki

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.