Advances in Perceptual Stereo Audio Coding Using Linear Prediction Techniques

A wide range of techniques for coding a single-channel speech and audio signal has been developed over the last few decades. In addition to pure redundancy reduction, sophisticated source and receiver models have been considered for reducing the bit-rate. Traditionally, speech and audio coders are based on different principles and thus each of them offers certain advantages. With the advent of high capacity channels, networks, and storage systems, the bit-rate versus quality compromise will no longer be the major issue; instead, attributes like low-delay, scalability, computational complexity, and error concealments in packet-oriented networks are expected to be the major selling factors. Typical audio coders such as MP3 and AAC are based on subband or transform coding techniques that are not easily reconcilable with a low-delay requirement. The reasons for their inherently longer delay are the relatively long band splitting filters needed to undertake requantization under control of a psychoacoustic model, as well as the buffering required to even out variations in the bit-rate. On the other hand, speech coders typically use linear predictive coding which is compatible with attributes like low-delay, scalability, error concealments, and low computational complexity. Since with predictive coding it is possible to obtain a very low encoding/decoding delay with basically no loss of compression performance, we selected Linear Prediction (LP) as our venturing point. However, several issues need to be resolved in order to make LP an adequate and attractive tool for audio coding. These stem from the fundamental differences between speech and audio signals. Speech signals are typically band-limited, mono, and stem from a single source. Audio signals are typically multi-channel, broadband, and stem from different instruments (sources). This difference creates some fundamental aspects that need to be addressed; like, choosing an appropriate multi-channel linear prediction system such that the essential single-channel LP properties carry over to this generalized case. Additionally, LP in speech coding is heavily associated with a source model, which is not adequate for audio in view of the fact that multiple sources appear. Instead, the source model has to be replaced by a receiver model: the psychoacoustic model in standard audio coders. This, together with the higher bandwidth means that an LP system for audio coding tends to become rather complex. This thesis addresses these issues. A proposal for the ‘best’ generalization of the single-channel LP system to a stereo and multi-channel linear prediction system, complexity reductions for Laguerre-based linear prediction systems, the quantization scheme for stereo linear prediction parameters, and the concept of perceptually biased linear prediction constitute the most important contributions in this thesis. It thereby gives contributions to the field of low-delay, low-complexity coding of audio by use of linear prediction.

File Type: pdf
File Size: 3 MB
Publication Year: 2007
Author: Biswas, Arijit
Supervisors: R.J. Sluijter, A.G. Kohlrausch, A.C. den Brinker
Institution: Technische Universiteit Eindhoven
Keywords: audio coding, linear predictive coding, signal processing, speech coding.