Sparsity in Linear Predictive Coding of Speech (2010)
Group-Sparse Regression - With Applications in Spectral Analysis and Audio Signal Processing
This doctorate thesis focuses on sparse regression, a statistical modeling tool for selecting valuable predictors in underdetermined linear models. By imposing different constraints on the structure of the variable vector in the regression problem, one obtains estimates which have sparse supports, i.e., where only a few of the elements in the response variable have non-zero values. The thesis collects six papers which, to a varying extent, deals with the applications, implementations, modifications, translations, and other analysis of such problems. Sparse regression is often used to approximate additive models with intricate, non-linear, non-smooth or otherwise problematic functions, by creating an underdetermined model consisting of candidate values for these functions, and linear response variables which selects among the candidates. Sparse regression is therefore a widely used tool in applications such as, e.g., image processing, audio processing, seismological and biomedical modeling, but is ...
Kronvall, Ted — Lund University
The use of High-Order Sparse Linear Prediction for the Restoration of Archived Audio
Since the invention of Gramophone by Thomas Edison in 1877, vast amounts of cultural, entertainment, educational and historical audio recordings have been recorded and stored throughout the world. Through natural aging and improper storage, the recorded signal degrades and loses its information in terms of quality and intelligibility. Degradation of audio signals is considered as any unwanted modification to the audio signal after it has been recorded. There are different degradations affecting recorded signals on analog storage media. The degradations that are often encountered are clicks, hiss and ‘Wow and Flutter’. Several researches have been conducted in restoring degraded audio recordings. Most of the methods rely on some prior information of the underlying data and the degradation process. The success of these methods heavily depends on the prior information available. When such information is not available, a model of the ...
Dufera, Bisrat Derebssa — School of Electrical and Computer Engineering, Addis Ababa Institute of Technology, Addis Ababa University
Advances in Perceptual Stereo Audio Coding Using Linear Prediction Techniques
A wide range of techniques for coding a single-channel speech and audio signal has been developed over the last few decades. In addition to pure redundancy reduction, sophisticated source and receiver models have been considered for reducing the bit-rate. Traditionally, speech and audio coders are based on different principles and thus each of them offers certain advantages. With the advent of high capacity channels, networks, and storage systems, the bit-rate versus quality compromise will no longer be the major issue; instead, attributes like low-delay, scalability, computational complexity, and error concealments in packet-oriented networks are expected to be the major selling factors. Typical audio coders such as MP3 and AAC are based on subband or transform coding techniques that are not easily reconcilable with a low-delay requirement. The reasons for their inherently longer delay are the relatively long band splitting filters ...
Biswas, Arijit — Technische Universiteit Eindhoven
Oscillator-plus-Noise Modeling of Speech Signals
In this thesis we examine the autonomous oscillator model for synthesis of speech signals. The contributions comprise an analysis of realizations and training methods for the nonlinear function used in the oscillator model, the combination of the oscillator model with inverse filtering, both significantly increasing the number of `successfully' re-synthesized speech signals, and the introduction of a new technique suitable for the re-generation of the noise-like signal component in speech signals. Nonlinear function models are compared in a one-dimensional modeling task regarding their presupposition for adequate re-synthesis of speech signals, in particular considering stability. The considerations also comprise the structure of the nonlinear functions, with the aspect of the possible interpolation between models for different speech sounds. Both regarding stability of the oscillator and the premiss of a nonlinear function structure that may be pre-defined, RBF networks are found a ...
Rank, Erhard — Vienna University of Technology
Adaptive Sparse Coding and Dictionary Selection
The sparse coding is approximation/representation of signals with the minimum number of coefficients using an overcomplete set of elementary functions. This kind of approximations/ representations has found numerous applications in source separation, denoising, coding and compressed sensing. The adaptation of the sparse approximation framework to the coding problem of signals is investigated in this thesis. Open problems are the selection of appropriate models and their orders, coefficient quantization and sparse approximation method. Some of these questions are addressed in this thesis and novel methods developed. Because almost all recent communication and storage systems are digital, an easy method to compute quantized sparse approximations is introduced in the first part. The model selection problem is investigated next. The linear model can be adapted to better fit a given signal class. It can also be designed based on some a priori information ...
Yaghoobi, Mehrdad — University of Edinburgh
Spatio-Temporal Speech Enhancement in Adverse Acoustic Conditions
Never before has speech been captured as often by electronic devices equipped with one or multiple microphones, serving a variety of applications. It is the key aspect in digital telephony, hearing devices, and voice-driven human-to-machine interaction. When speech is recorded, the microphones also capture a variety of further, undesired sound components due to adverse acoustic conditions. Interfering speech, background noise and reverberation, i.e. the persistence of sound in a room after excitation caused by a multitude of reflections on the room enclosure, are detrimental to the quality and intelligibility of target speech as well as the performance of automatic speech recognition. Hence, speech enhancement aiming at estimating the early target-speech component, which contains the direct component and early reflections, is crucial to nearly all speech-related applications presently available. In this thesis, we compare, propose and evaluate existing and novel approaches ...
Dietzen, Thomas — KU Leuven
Sparse approximation and dictionary learning with applications to audio signals
Over-complete transforms have recently become the focus of a wide wealth of research in signal processing, machine learning, statistics and related fields. Their great modelling flexibility allows to find sparse representations and approximations of data that in turn prove to be very efficient in a wide range of applications. Sparse models express signals as linear combinations of a few basis functions called atoms taken from a so-called dictionary. Finding the optimal dictionary from a set of training signals of a given class is the objective of dictionary learning and the main focus of this thesis. The experimental evidence presented here focuses on the processing of audio signals, and the role of sparse algorithms in audio applications is accordingly highlighted. The first main contribution of this thesis is the development of a pitch-synchronous transform where the frame-by-frame analysis of audio data ...
Barchiesi, Daniele — Queen Mary University of London
This thesis deals with several open problems in acoustic echo cancellation and acoustic feedback control. Our main goal has been to develop solutions that provide a high performance and sound quality, and behave in a robust way in realistic conditions. This can be achieved by departing from the traditional ad-hoc methods, and instead deriving theoretically well-founded solutions, based on results from parameter estimation and system identification. In the development of these solutions, the computational efficiency has permanently been taken into account as a design constraint, in that the complexity increase compared to the state-of-the-art solutions should not exceed 50 % of the original complexity. In the context of acoustic echo cancellation, we have investigated the problems of double-talk robustness, acoustic echo path undermodeling, and poor excitation. The two former problems have been tackled by including adaptive decorrelation filters in the ...
van Waterschoot, Toon — Katholieke Universiteit Leuven
Efficient Perceptual Audio Coding Using Cosine and Sine Modulated Lapped Transforms
The increasing number of simultaneous input and output channels utilized in immersive audio configurations primarily in broadcasting applications has renewed industrial requirements for efficient audio coding schemes with low bit-rate and complexity. This thesis presents a comprehensive review and extension of conventional approaches for perceptual coding of arbitrary multichannel audio signals. Particular emphasis is given to use cases ranging from two-channel stereophonic to six-channel 5.1-surround setups with or without the application-specific constraint of low algorithmic coding latency. Conventional perceptual audio codecs share six common algorithmic components, all of which are examined extensively in this thesis. The first is a signal-adaptive filterbank, constructed using instances of the real-valued modified discrete cosine transform (MDCT), to obtain spectral representations of successive portions of the incoming discrete time signal. Within this MDCT spectral domain, various intra- and inter-channel optimizations, most of which are of ...
Helmrich, Christian R. — Friedrich-Alexander-Universität Erlangen-Nürnberg
Efficient parametric modeling, identification and equalization of room acoustics
Room acoustic signal enhancement (RASE) applications, such as digital equalization, acoustic echo and feedback cancellation, which are commonly found in communication devices and audio equipment, aim at processing the acoustic signals with the final goal of improving the perceived sound quality in rooms. In order to do so, signal processing algorithms require the acoustic response of the room to be represented by means of parametric models and to be identified from the input and output signals of the room acoustic system. In particular, a good model should be both accurate, thus capturing those features of room acoustics that are physically and perceptually most relevant, and efficient, so that it can be implemented as a digital filter and used in practical signal processing tasks. This thesis addresses the fundamental question in room acoustic signal processing concerning the appropriateness of different parametric ...
Vairetti, Giacomo — KU Leuven
Optimization of Coding of AR Sources for Transmission Across Channels with Loss
Source coding concerns the representation of information in a source signal using as few bits as possible. In the case of lossy source coding, it is the encoding of a source signal using the fewest possible bits at a given distortion or, at the lowest possible distortion given a specified bit rate. Channel coding is usually applied in combination with source coding to ensure reliable transmission of the (source coded) information at the maximal rate across a channel given the properties of this channel. In this thesis, we consider the coding of auto-regressive (AR) sources which are sources that can be modeled as auto-regressive processes. The coding of AR sources lends itself to linear predictive coding. We address the problem of joint source/channel coding in the setting of linear predictive coding of AR sources. We consider channels in which individual ...
Arildsen, Thomas — Aalborg University
Advances in Glottal Analysis and its Applications
From artificial voices in GPS to automatic systems of dictation, from voice-based identity verification to voice pathology detection, speech processing applications are nowadays omnipresent in our daily life. By offering solutions to companies seeking for efficiency enhancement with simultaneous cost saving, the market of speech technology is forecast to be especially promising in the next years. The present thesis deals with advances in glottal analysis in order to incorporate new techniques within speech processing applications. While current systems are usually based on information related to the vocal tract configuration, the airflow passing through the vocal folds, and called glottal flow, is expected to exhibit a relevant complementarity. Unfortunately, glottal analysis from speech recordings requires specific complex processing operations, which explains why it has been generally avoided. The main goal of this thesis is to provide new advances in glottal analysis ...
Drugman, Thomas — Universite de Mons
Toward sparse and geometry adapted video approximations
Video signals are sequences of natural images, where images are often modeled as piecewise-smooth signals. Hence, video can be seen as a 3D piecewise-smooth signal made of piecewise-smooth regions that move through time. Based on the piecewise-smooth model and on related theoretical work on rate-distortion performance of wavelet and oracle based coding schemes, one can better analyze the appropriate coding strategies that adaptive video codecs need to implement in order to be efficient. Efficient video representations for coding purposes require the use of adaptive signal decompositions able to capture appropriately the structure and redundancy appearing in video signals. Adaptivity needs to be such that it allows for proper modeling of signals in order to represent these with the lowest possible coding cost. Video is a very structured signal with high geometric content. This includes temporal geometry (normally represented by motion ...
Divorra Escoda, Oscar — EPFL / Signal Processing Institute
Sparsity Models for Signals: Theory and Applications
Many signal and image processing applications have benefited remarkably from the theory of sparse representations. In its classical form this theory models signal as having a sparse representation under a given dictionary -- this is referred to as the "Synthesis Model". In this work we focus on greedy methods for the problem of recovering a signal from a set of deteriorated linear measurements. We consider four different sparsity frameworks that extend the aforementioned synthesis model: (i) The cosparse analysis model; (ii) the signal space paradigm; (iii) the transform domain strategy; and (iv) the sparse Poisson noise model. Our algorithms of interest in the first part of the work are the greedy-like schemes: CoSaMP, subspace pursuit (SP), iterative hard thresholding (IHT) and hard thresholding pursuit (HTP). It has been shown for the synthesis model that these can achieve a stable recovery ...
Giryes, Raja — Technion
The Bionic Electro-Larynx Speech System - Challenges, Investigations, and Solutions
Humans without larynx need to use a substitution voice to re-obtain speech. The electro-larynx (EL) is a widely used device but is known for its unnatural and monotonic speech quality. Previous research tackled these problems, but until now no significant improvements could be reported. The EL speech system is a complex system including hardware (artificial excitation source or sound transducer) and software (control and generation of the artificial excitation signal). It is not enough to consider one separated problem, but all aspects of the EL speech system need to be taken into account. In this thesis we would like to push forward the boundaries of the conventional EL device towards a new bionic electro-larynx speech system. We formulate two overall scenarios: a closed-loop scenario, where EL speech is excited and simultaneously recorded using an EL speech system, and the artificial ...
Fuchs, Anna Katharina — Graz University of Technology, Signal Processing and Speech Communication Laboratory
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.