Sparsity in Linear Predictive Coding of Speech

This thesis deals with developing improved modeling methods for speech and audio processing based on the recent developments in sparse signal representation. In particular, this work is motivated by the need to address some of the limitations of the well-known linear prediction (LP) based all-pole models currently applied in many modern speech and audio processing systems. In the first part of this thesis, we introduce \emph{Sparse Linear Prediction}, a set of speech processing tools created by introducing sparsity constraints into the LP framework. This approach defines predictors that look for a sparse residual rather than a minimum variance one, with direct applications to coding but also consistent with the speech production model of voiced speech, where the excitation of the all-pole filter is model as an impulse train. Introducing sparsity in the LP framework, will also bring to develop the concept of high-order sparse predictors. These predictors, by modeling efficiently the spectral envelope and the harmonics components with very few coefficients, have direct applications in speech and audio processing. In the first case, they provide a joint estimation of short-term and long-term predictors and in the second case, they provide an efficient modeling of the different tonal components of monophonic and polyphonic signals. A thorough analysis of the modeling properties and coding applications of the several sparse predictor introduced will be given throughout the thesis. The second part of the thesis deals with introducing sparsity directly in the Linear Prediction Analysis-by-Synthesis (LPAS) speech coding paradigm. We first propose a novel method to look for a sparse approximate excitation using a Compressed Sensing formulation. This will allow for a fast and efficient estimation of the sparse excitation. Furthermore, in traditional LPAS coding, LP is used as a first step to decorrelate a segment of speech and its parameters are found in a open-loop configuration, while the excitation is found in a closed-loop configuration given certain constraints on it. The difference between the true prediction residual and its approximated version creates a mismatch that can raise the distortion on the reconstructed speech significantly. To cope with this problem, we define a novel re-estimation procedure to adapt the predictor coefficients to the given sparse excitation, balancing the two representation in the contest of speech coding. The compact parametric representation of a segment of speech given by the sparse linear predictors and the use of the re-estimation procedure, will be analyzed in the contest of frame independent coding for speech communications over packet networks. Finally, we will consider the application of the high-order sparse predictor and the sparse residual to provide a common framework for speech and audio frame independent coding: a novel scheme for VoIP that can also carry music and mixed audio contents.

File Type: pdf
File Size: 2 MB
Publication Year: 2010
Author: Giacobello, Daniele
Supervisors: Mads Gr?sb?ll Christensen, S?ren Holdt Jensen, Marc Moonen
Institution: Aalborg University
Keywords: sparsity, linear prediction, compressed sensing, speech and audio analysis, speech coding