Efficient Perceptual Audio Coding Using Cosine and Sine Modulated Lapped Transforms

The increasing number of simultaneous input and output channels utilized in immersive audio configurations primarily in broadcasting applications has renewed industrial requirements for efficient audio coding schemes with low bit-rate and complexity. This thesis presents a comprehensive review and extension of conventional approaches for perceptual coding of arbitrary multichannel audio signals. Particular emphasis is given to use cases ranging from two-channel stereophonic to six-channel 5.1-surround setups with or without the application-specific constraint of low algorithmic coding latency. Conventional perceptual audio codecs share six common algorithmic components, all of which are examined extensively in this thesis. The first is a signal-adaptive filterbank, constructed using instances of the real-valued modified discrete cosine transform (MDCT), to obtain spectral representations of successive portions of the incoming discrete time signal. Within this MDCT spectral domain, various intra- and inter-channel optimizations, most of which are of ...

Helmrich, Christian R. — Friedrich-Alexander-Universität Erlangen-Nürnberg


Contributions to Improved Hard- and Soft-Decision Decoding in Speech and Audio Codecs

Source coding is an essential part in digital communications. In error-prone transmission conditions, even with the help of channel coding, which normally introduces delay, bit errors may still occur. Single bit errors can result in significant distortions. Therefore, a robust source decoder is desired for adverse transmission conditions. Compared to the traditional hard-decision (HD) decoding and error concealment, soft-decision (SD) decoding offers a higher robustness by exploiting the source residual redundancy and utilizing the bit-wise channel reliability information. Moreover, the quantization codebook index can be either mapped to a fixed number of bits using fixed-length (FL) codes, or a variable number of bits employing variable-length (VL) codes. The codebook entry can be either fixed over time or time-variant. However, using a fixed scalar quantization codebook leads to the same performance for correlated and uncorrelated processes. This thesis aims to improve ...

Han, Sai — Technische Universität Braunschweig


Spatio-Temporal Speech Enhancement in Adverse Acoustic Conditions

Never before has speech been captured as often by electronic devices equipped with one or multiple microphones, serving a variety of applications. It is the key aspect in digital telephony, hearing devices, and voice-driven human-to-machine interaction. When speech is recorded, the microphones also capture a variety of further, undesired sound components due to adverse acoustic conditions. Interfering speech, background noise and reverberation, i.e. the persistence of sound in a room after excitation caused by a multitude of reflections on the room enclosure, are detrimental to the quality and intelligibility of target speech as well as the performance of automatic speech recognition. Hence, speech enhancement aiming at estimating the early target-speech component, which contains the direct component and early reflections, is crucial to nearly all speech-related applications presently available. In this thesis, we compare, propose and evaluate existing and novel approaches ...

Dietzen, Thomas — KU Leuven


Reverse Audio Engineering for Active Listening and Other Applications

This work deals with the problem of reverse audio engineering for active listening. The format under consideration corresponds to the audio CD. The musical content is viewed as the result of a concatenation of the composition, the recording, the mixing, and the mastering. The inversion of the two latter stages constitutes the core of the problem at hand. The audio signal is treated as a post-nonlinear mixture. Thus, the mixture is “decompressed” before being “decomposed” into audio tracks. The problem is tackled in an informed context: The inversion is accompanied by information which is specific to the content production. In this manner, the quality of the inversion is significantly improved. The information is reduced in size by the use of quantification and coding methods, and some facts on psychoacoustics. The proposed methods are applicable in real time and have a ...

Gorlow, Stanislaw — Université Bordeaux 1


Speech derereverberation in noisy environments using time-frequency domain signal models

Reverberation is the sum of reflected sound waves and is present in any conventional room. Speech communication devices such as mobile phones in hands-free mode, tablets, smart TVs, teleconferencing systems, hearing aids, voice-controlled systems, etc. use one or more microphones to pick up the desired speech signals. When the microphones are not in the proximity of the desired source, strong reverberation and noise can degrade the signal quality at the microphones and can impair the intelligibility and the performance of automatic speech recognizers. Therefore, it is a highly demanded task to process the microphone signals such that reverberation and noise are reduced. The process of reducing or removing reverberation from recorded signals is called dereverberation. As dereverberation is usually a completely blind problem, where the only available information are the microphone signals, and as the acoustic scenario can be non-stationary, ...

Braun, Sebastian — Friedrich-Alexander Universität Erlangen-Nürnberg


Sparsity in Linear Predictive Coding of Speech

This thesis deals with developing improved modeling methods for speech and audio processing based on the recent developments in sparse signal representation. In particular, this work is motivated by the need to address some of the limitations of the well-known linear prediction (LP) based all-pole models currently applied in many modern speech and audio processing systems. In the first part of this thesis, we introduce \emph{Sparse Linear Prediction}, a set of speech processing tools created by introducing sparsity constraints into the LP framework. This approach defines predictors that look for a sparse residual rather than a minimum variance one, with direct applications to coding but also consistent with the speech production model of voiced speech, where the excitation of the all-pole filter is model as an impulse train. Introducing sparsity in the LP framework, will also bring to develop the ...

Giacobello, Daniele — Aalborg University


Sparse Multi-Channel Linear Prediction for Blind Speech Dereverberation

In many speech communication applications, such as hands-free telephony and hearing aids, the microphones are located at a distance from the speaker. Therefore, in addition to the desired speech signal, the microphone signals typically contain undesired reverberation and noise, caused by acoustic reflections and undesired sound sources. Since these disturbances tend to degrade the quality of speech communication, decrease speech intelligibility and negatively affect speech recognition, efficient dereverberation and denoising methods are required. This thesis deals with blind dereverberation methods, not requiring any knowledge about the room impulse responses between the speaker and the microphones. More specifically, we propose a general framework for blind speech dereverberation based on multi-channel linear prediction (MCLP) and exploiting sparsity of the speech signal in the time-frequency domain.

Jukić, Ante — University of Oldenburg


Distributed Signal Processing for Binaural Hearing Aids

Over the last centuries, hearing aids have evolved from crude and bulky horn-shaped instruments to lightweight and almost invisible digital signal processing devices. While most of the research has focused on the design of monaural apparatus, the use of a wireless link has been recently advocated to enable data transfer between hearing aids such as to obtain a binaural system. The availability of a wireless link offers brand new perspectives but also poses great technical challenges. It requires the design of novel signal processing schemes that address the restricted communication bitrates, processing delays and power consumption limitations imposed by wireless hearing aids. The goal of this dissertation is to address these issues at both a theoretical and a practical level. We start by taking a distributed source coding view on the problem of binaural noise reduction. The proposed analysis allows ...

Roy, Olivier — EPFL


Non-Intrusive Speech Intelligibility Prediction

The ability to communicate through speech is important for social interaction. We rely on the ability to communicate with each other even in noisy conditions. Ideally, the speech is easy to understand but this is not always the case, if the speech is degraded, e.g., due to background noise, distortion or hearing impairment. One of the most important factors to consider in relation to such degradations is speech intelligibility, which is a measure of how easy or difficult it is to understand the speech. In this thesis, the focus is on the topic of speech intelligibility prediction. The thesis consists of an introduction to the field of speech intelligibility prediction and a collection of scientific papers. The introduction provides a background to the challenges with speech communication in noisy conditions, followed by an introduction to how speech is produced and ...

Sørensen, Charlotte — Aalborg University


Non-linear Spatial Filtering for Multi-channel Speech Enhancement

A large part of human speech communication takes place in noisy environments and is supported by technical devices. For example, a hearing-impaired person might use a hearing aid to take part in a conversation in a busy restaurant. These devices, but also telecommunication in noisy environments or voiced-controlled assistants, make use of speech enhancement and separation algorithms that improve the quality and intelligibility of speech by separating speakers and suppressing background noise as well as other unwanted effects such as reverberation. If the devices are equipped with more than one microphone, which is very common nowadays, then multi-channel speech enhancement approaches can leverage spatial information in addition to single-channel tempo-spectral information to perform the task. Traditionally, linear spatial filters, so-called beamformers, have been employed to suppress the signal components from other than the target direction and thereby enhance the desired ...

Tesch, Kristina — Universität Hamburg


Optimization of Coding of AR Sources for Transmission Across Channels with Loss

Source coding concerns the representation of information in a source signal using as few bits as possible. In the case of lossy source coding, it is the encoding of a source signal using the fewest possible bits at a given distortion or, at the lowest possible distortion given a specified bit rate. Channel coding is usually applied in combination with source coding to ensure reliable transmission of the (source coded) information at the maximal rate across a channel given the properties of this channel. In this thesis, we consider the coding of auto-regressive (AR) sources which are sources that can be modeled as auto-regressive processes. The coding of AR sources lends itself to linear predictive coding. We address the problem of joint source/channel coding in the setting of linear predictive coding of AR sources. We consider channels in which individual ...

Arildsen, Thomas — Aalborg University


Study and optimization of multi-antenna systems associated with multicarrier modulations

Since several years, multi-antenna systems are foreseen as a potential solution for increasing the throughput of future wireless communication systems. The aim of this thesis is to study and to improve the transmitter and receiver's techniques of these MIMO (Multiple Input Multiple Output) systems in the context of a multi-carrier transmission. On the one hand, the OFDM (Orthogonal Frequency Division Multiplex) modulation, which transform a frequency selective channel into multiple non frequency selective channels, is particularly well adapted to the conception of MIMO receivers with low complexity. On the other hand, two techniques allowing to improve the exploitation of frequential and/or temporal diversities are associated with OFDM, namely linear precoding (LP-OFDM) and CDMA in a MC-CDMA (Multicarrier Code division Multiplex Access) scheme. We have associated LP-OFDM and MC-CDMA with two MIMO techniques which require no channel state information at the ...

LE NIR, Vincent — INSA de Rennes


Dynamic Scheme Selection in Image Coding

This thesis deals with the coding of images with multiple coding schemes and their dynamic selection. In our society of information highways, electronic communication is taking everyday a bigger place in our lives. The number of transmitted images is also increasing everyday. Therefore, research on image compression is still an active area. However, the current trend is to add several functionalities to the compression scheme such as progressiveness for more comfortable browsing of web-sites or databases. Classical image coding schemes have a rigid structure. They usually process an image as a whole and treat the pixels as a simple signal with no particular characteristics. Second generation schemes use the concept of objects in an image, and introduce a model of the human visual system in the design of the coding scheme. Dynamic coding schemes, as their name tells us, make ...

Fleury, Pascal — Swiss Federal Institute of Technology


Speech Enhancement Algorithms for Audiological Applications

The improvement of speech intelligibility is a traditional problem which still remains open and unsolved. The recent boom of applications such as hands-free communi- cations or automatic speech recognition systems and the ever-increasing demands of the hearing-impaired community have given a definitive impulse to the research in this area. This PhD thesis is focused on speech enhancement for audiological applications. Most of the research conducted in this thesis has been focused on the improvement of speech intelligibility in hearing aids, considering the variety of restrictions and limitations imposed by this type of devices. The combination of source separation techniques and spatial filtering with machine learning and evolutionary computation has originated novel and interesting algorithms which are included in this thesis. The thesis is divided in two main parts. The first one contains a preliminary study of the problem and a ...

Ayllón, David — Universidad de Alcalá


Adaptive filtering algorithms for acoustic echo cancellation and acoustic feedback control in speech communication applications

Multimedia consumer electronics are nowadays everywhere from teleconferencing, hands-free communications, in-car communications to smart TV applications and more. We are living in a world of telecommunication where ideal scenarios for implementing these applications are hard to find. Instead, practical implementations typically bring many problems associated to each real-life scenario. This thesis mainly focuses on two of these problems, namely, acoustic echo and acoustic feedback. On the one hand, acoustic echo cancellation (AEC) is widely used in mobile and hands-free telephony where the existence of echoes degrades the intelligibility and listening comfort. On the other hand, acoustic feedback limits the maximum amplification that can be applied in, e.g., in-car communications or in conferencing systems, before howling due to instability, appears. Even though AEC and acoustic feedback cancellation (AFC) are functional in many applications, there are still open issues. This means that ...

Gil-Cacho, Jose Manuel — KU Leuven

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.