Rate-Distortion Optimal Time-Frequency Decompositions for MDCT-based Audio Coding (2006)
Advances in Perceptual Stereo Audio Coding Using Linear Prediction Techniques
A wide range of techniques for coding a single-channel speech and audio signal has been developed over the last few decades. In addition to pure redundancy reduction, sophisticated source and receiver models have been considered for reducing the bit-rate. Traditionally, speech and audio coders are based on different principles and thus each of them offers certain advantages. With the advent of high capacity channels, networks, and storage systems, the bit-rate versus quality compromise will no longer be the major issue; instead, attributes like low-delay, scalability, computational complexity, and error concealments in packet-oriented networks are expected to be the major selling factors. Typical audio coders such as MP3 and AAC are based on subband or transform coding techniques that are not easily reconcilable with a low-delay requirement. The reasons for their inherently longer delay are the relatively long band splitting filters ...
Biswas, Arijit — Technische Universiteit Eindhoven
Toward sparse and geometry adapted video approximations
Video signals are sequences of natural images, where images are often modeled as piecewise-smooth signals. Hence, video can be seen as a 3D piecewise-smooth signal made of piecewise-smooth regions that move through time. Based on the piecewise-smooth model and on related theoretical work on rate-distortion performance of wavelet and oracle based coding schemes, one can better analyze the appropriate coding strategies that adaptive video codecs need to implement in order to be efficient. Efficient video representations for coding purposes require the use of adaptive signal decompositions able to capture appropriately the structure and redundancy appearing in video signals. Adaptivity needs to be such that it allows for proper modeling of signals in order to represent these with the lowest possible coding cost. Video is a very structured signal with high geometric content. This includes temporal geometry (normally represented by motion ...
Divorra Escoda, Oscar — EPFL / Signal Processing Institute
Contributions to Improved Hard- and Soft-Decision Decoding in Speech and Audio Codecs
Source coding is an essential part in digital communications. In error-prone transmission conditions, even with the help of channel coding, which normally introduces delay, bit errors may still occur. Single bit errors can result in significant distortions. Therefore, a robust source decoder is desired for adverse transmission conditions. Compared to the traditional hard-decision (HD) decoding and error concealment, soft-decision (SD) decoding offers a higher robustness by exploiting the source residual redundancy and utilizing the bit-wise channel reliability information. Moreover, the quantization codebook index can be either mapped to a fixed number of bits using fixed-length (FL) codes, or a variable number of bits employing variable-length (VL) codes. The codebook entry can be either fixed over time or time-variant. However, using a fixed scalar quantization codebook leads to the same performance for correlated and uncorrelated processes. This thesis aims to improve ...
Han, Sai — Technische Universität Braunschweig
Geometric Distortion in Image and Video Watermarking. Robustness and Perceptual Quality Impact
The main focus of this thesis is the problem of geometric distortion in image and video watermarking. In this thesis we discuss the two aspects of the geometric distortion problem, namely the watermark desynchronization aspect and the perceptual quality assessment aspect. Furthermore, this thesis also discusses the challenges of watermarking data compressed in low bit-rates. The main contributions of this thesis are: A watermarking algorithm suitable for low bit-rate video has been proposed. Two different approaches has been proposed to deal with the watermark desynchronization problem. A novel approach has been proposed to quantify the perceptual quality impact of geometric distortion.
Setyawan, Iwan — Delft University of Technology
Efficient Perceptual Audio Coding Using Cosine and Sine Modulated Lapped Transforms
The increasing number of simultaneous input and output channels utilized in immersive audio configurations primarily in broadcasting applications has renewed industrial requirements for efficient audio coding schemes with low bit-rate and complexity. This thesis presents a comprehensive review and extension of conventional approaches for perceptual coding of arbitrary multichannel audio signals. Particular emphasis is given to use cases ranging from two-channel stereophonic to six-channel 5.1-surround setups with or without the application-specific constraint of low algorithmic coding latency. Conventional perceptual audio codecs share six common algorithmic components, all of which are examined extensively in this thesis. The first is a signal-adaptive filterbank, constructed using instances of the real-valued modified discrete cosine transform (MDCT), to obtain spectral representations of successive portions of the incoming discrete time signal. Within this MDCT spectral domain, various intra- and inter-channel optimizations, most of which are of ...
Helmrich, Christian R. — Friedrich-Alexander-Universität Erlangen-Nürnberg
Audio Watermarking, Steganalysis Using Audio Quality Metrics, and Robust Audio Hashing
We propose a technique for the problem of detecting the very presence of hidden messages in an audio object. The detector is based on the characteristics of the denoised residuals of the audio file. Our proposition is established upon the presupposition that the hidden message in a cover object leaves statistical evidence that can be detected with the use of some audio distortion measures. The distortions caused by hidden message are measured in terms of objective and perceptual quality metrics. The detector discriminates between cover and stego files using a selected subset of features and an SVM classifier. We have evaluated the detection performance of the proposed steganalysis technique with the well-known watermarking and steganographic methods. We present novel and robust audio fingerprinting techniques based on the summarization of the time-frequency spectral characteristics of an audio object. The perceptual hash ...
Ozer, Hamza — Bogazici University
Exploiting Correlation Noise Modeling in Wyner-Ziv Video Coding
Wyner-Ziv (WZ) video coding is a particular case of distributed video coding, a new video coding paradigm based on the Slepian-Wolf and Wyner-Ziv theorems which mainly exploit the source correlation at the decoder and not only at the encoder as in predictive video coding. Therefore, this new coding paradigm may provide a flexible allocation of complexity between the encoder and the decoder and in-built channel error robustness, interesting features for emerging applications such as low-power video surveillance and visual sensor networks among others. Although some progress has been made in the last eight years, the rate-distortion performance of WZ video coding is still far from the maximum performance attained with predictive video coding. The WZ video coding compression efficiency depends critically on the capability to model the correlation noise between the original information at the encoder and its estimation generated ...
Brites, Catarina — Instituto Superior Tecnico (IST)
Synthetic test patterns and compression artefact distortion metrics for image codecs
This thesis presents a framework of test methodology to assess spatial domain compression artefacts produced by image and intra-frame coded video codecs. Few researchers have studied this broad range of artefacts. A taxonomy of image and video compression artefacts is proposed. This is based on the point of origin of the artefact in the image communication model. This thesis presents objective evaluation of distortions known as artefacts due to image and intra-frame coded video compression made using synthetic test patterns. The American National Standard Institute document ANSI T1 801 qualitatively defines blockiness, blur and ringing artefacts. These definitions have been augmented with quantitative definitions in conjunction with test patterns proposed. A test and measurement environment is proposed in which the codec under test is exercised using a portfolio of test patterns. The test patterns are designed to highlight the artefact ...
Punchihewa, Amal — Massey University, New Zealand
Multiple-description lattice vector quantization
In this thesis we construct and analyze index-assignment based multiple-description coding schemes.
Ostergaard, Jan — Delft University of Technology
Sparsity in Linear Predictive Coding of Speech
This thesis deals with developing improved modeling methods for speech and audio processing based on the recent developments in sparse signal representation. In particular, this work is motivated by the need to address some of the limitations of the well-known linear prediction (LP) based all-pole models currently applied in many modern speech and audio processing systems. In the first part of this thesis, we introduce \emph{Sparse Linear Prediction}, a set of speech processing tools created by introducing sparsity constraints into the LP framework. This approach defines predictors that look for a sparse residual rather than a minimum variance one, with direct applications to coding but also consistent with the speech production model of voiced speech, where the excitation of the all-pole filter is model as an impulse train. Introducing sparsity in the LP framework, will also bring to develop the ...
Giacobello, Daniele — Aalborg University
Distributed Source Coding. Tools and Applications to Video Compression
Distributed source coding is a technique that allows to compress several correlated sources, without any cooperation between the encoders, and without rate loss provided that the decoding is joint. Motivated by this principle, distributed video coding has emerged, exploiting the correlation between the consecutive video frames, tremendously simplifying the encoder, and leaving the task of exploiting the correlation to the decoder. The first part of our contributions in this thesis presents the asymmetric coding of binary sources that are not uniform. We analyze the coding of non-uniform Bernoulli sources, and that of hidden Markov sources. For both sources, we first show that exploiting the distribution at the decoder clearly increases the decoding capabilities of a given channel code. For the binary symmetric channel modeling the correlation between the sources, we propose a tool to estimate its parameter, thanks to an ...
Toto-Zarasoa, Velotiaray — INRIA Rennes-Bretagne Atlantique, Universite de Rennes 1
Traditional and Scalable Coding Techniques for Video Compression
In recent years, the usage of digital video has steadily been increasing. Since the amount of data for uncompressed digital video representation is very high, lossy source coding techniques are usually employed in digital video systems to compress that information and make it more suitable for storage and transmission. The source coding algorithms for video compression can be grouped into two big classes: the traditional and the scalable techniques. The goal of the traditional video coders is to maximize the compression efficiency corresponding to a given amount of compressed data. The goal of scalable video coding is instead to give a scalable representation of the source, such that subsets of it are able to describe in an optimal way the same video source but with reduced resolution in the temporal, spatial and/or quality domain. This thesis is focused on the ...
Cappellari, Lorenzo — University of Padova
Multiple Description Coding for Path Diversity Video Streaming
In the current heterogeneous communication environments, the great variety of multimedia systems and applications combined with fast evolution of networking architectures and topologies, give rise to new research problems related to the various elements of the communication chain. This includes, the ever present problem in video communications, which results from the need for coping with transmission errors and losses. In this context, video streaming with path diversity appeared as a novel communication framework, involving different technological fields and posing several research challenges. The research work carried out in this thesis is a contribution to robust video coding and adaptation techniques in the field of Multiple Description Coding (MDC) for multipath video streaming. The thesis starts with a thorough study of MDC and its theoretical basis followed by a description of the most important practical implementation aspects currently available in literature. ...
Correia, Pedro Daniel Frazão — University of Coimbra
Optimization of Coding of AR Sources for Transmission Across Channels with Loss
Source coding concerns the representation of information in a source signal using as few bits as possible. In the case of lossy source coding, it is the encoding of a source signal using the fewest possible bits at a given distortion or, at the lowest possible distortion given a specified bit rate. Channel coding is usually applied in combination with source coding to ensure reliable transmission of the (source coded) information at the maximal rate across a channel given the properties of this channel. In this thesis, we consider the coding of auto-regressive (AR) sources which are sources that can be modeled as auto-regressive processes. The coding of AR sources lends itself to linear predictive coding. We address the problem of joint source/channel coding in the setting of linear predictive coding of AR sources. We consider channels in which individual ...
Arildsen, Thomas — Aalborg University
A flexible scalable video coding framework with adaptive spatio-temporal decompositions
The work presented in this thesis covers topics that extend the scalability functionalities in video coding and improve the compression performance. Two main novel approaches are presented, each targeting a different part of the scalable video coding (SVC) architecture: motion adaptive wavelet transform based on the wavelet transform in lifting implementation, and a design of a flexible framework for generalised spatio-temporal decomposition. Motion adaptive wavelet transform is based on the newly introduced concept of connectivity-map. The connectivity-map describes the underlying irregular structure of regularly sampled data. To enable a scalable representation of the connectivity-map, the corresponding analysis and synthesis operations have been derived. These are then employed to define a joint wavelet connectivity-map decomposition that serves as an adaptive alternative to the conventional wavelet decomposition. To demonstrate its applicability, the presented decomposition scheme is used in the proposed SVC framework, ...
Sprljan, Nikola — Queen Mary University of London
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.