Mixed structural models for 3D audio in virtual environments

In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...

Geronazzo, Michele — University of Padova


Application of Sound Source Separation Methods to Advanced Spatial Audio Systems

This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in two-channel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to ...

Cobos, Maximo — Universidad Politecnica de Valencia


A Computational Framework for Sound Segregation in Music Signals

Music is built from sound, ultimately resulting from an elaborate interaction between the sound-generating properties of physical objects (i.e. music instruments) and the sound perception abilities of the human auditory system. Humans, even without any kind of formal music training, are typically able to ex- tract, almost unconsciously, a great amount of relevant information from a musical signal. Features such as the beat of a musical piece, the main melody of a complex musical ar- rangement, the sound sources and events occurring in a complex musical mixture, the song structure (e.g. verse, chorus, bridge) and the musical genre of a piece, are just some examples of the level of knowledge that a naive listener is commonly able to extract just from listening to a musical piece. In order to do so, the human auditory system uses a variety of cues ...

Martins, Luis Gustavo — Universidade do Porto


An iterative, residual-based approach to unsupervised musical source separation in single-channel mixtures

This thesis concentrates on a major problem within audio signal processing, the separation of source signals from musical mixtures when only a single mixture channel is available. Source separation is the process by which signals that correspond to distinct sources are identified in a signal mixture and extracted from it. Producing multiple entities from a single one is an extremely underdetermined task, so additional prior information can assist in setting appropriate constraints on the solution set. The approach proposed uses prior information such that: (1) it can potentially be applied successfully to a large variety of musical mixtures, and (2) it requires minimal user intervention and no prior learning/training procedures (i.e., it is an unsupervised process). This system can be useful for applications such as remixing, creative effects, restoration and for archiving musical material for internet delivery, amongst others. Here, ...

Siamantas, Georgios — University of York


Solving inverse problems in room acoustics using physical models, sparse regularization and numerical optimization

Reverberation consists of a complex acoustic phenomenon that occurs inside rooms. Many audio signal processing methods, addressing source localization, signal enhancement and other tasks, often assume absence of reverberation. Consequently, reverberant environments are considered challenging as state-ofthe-art methods can perform poorly. The acoustics of a room can be described using a variety of mathematical models, among which, physical models are the most complete and accurate. The use of physical models in audio signal processing methods is often non-trivial since it can lead to ill-posed inverse problems. These inverse problems require proper regularization to achieve meaningful results and involve the solution of computationally intensive large-scale optimization problems. Recently, however, sparse regularization has been applied successfully to inverse problems arising in different scientific areas. The increased computational power of modern computers and the development of new efficient optimization algorithms makes it possible ...

Antonello, Niccolò — KU Leuven


Dialogue Enhancement and Personalization - Contributions to Quality Assessment and Control

The production and delivery of audio for television involve many creative and technical challenges. One of them is concerned with the level balance between the foreground speech (also referred to as dialogue) and the background elements, e.g., music, sound effects, and ambient sounds. Background elements are fundamental for the narrative and for creating an engaging atmosphere, but they can mask the dialogue, which the audience wishes to follow in a comfortable way. Very different individual factors of the people in the audience clash with the creative freedom of the content creators. As a result, service providers receive regular complaints about difficulties in understanding the dialogue because of too loud background sounds. While this has been a known issue for at least three decades, works analyzing the problem and up-to-date statics were scarce before the contributions in this work. Enabling the ...

Torcoli, Matteo — Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)


Interpretable Machine Learning for Machine Listening

Recent years have witnessed a significant interest in interpretable machine learning (IML) research that develops techniques to analyse machine learning (ML) models. Understanding ML models is essential to gain trust in their predictions and to improve datasets, model architectures and training techniques. The majority of effort in IML research has been in analysing models that classify images or structured data and comparatively less work exists that analyses models for other domains. This research focuses on developing novel IML methods and on extending existing methods to understand machine listening models that analyse audio. In particular, this thesis reports the results of three studies that apply three different IML methods to analyse five singing voice detection (SVD) models that predict singing voice activity in musical audio excerpts. The first study introduces SoundLIME (SLIME), a method to generate temporal, spectral or time-frequency explanations ...

Mishra, Saumitra — Queen Mary University of London


Blind Signal Separation

The separation of independent sources from mixed observed data is a fundamental and challenging signal processing problem. In many practical situations, one or more desired signals need to be recovered from the mixtures only. A typical example is speech recordings made in an acoustic environment in the presence of background noise and/or competing speakers. Other examples include EEG signals, passive sonar applications and cross-talk in data communications. The audio signal separation problem is sometimes referred to as The Cocktail Party Problem. When several people in the same room are conversing at the same time, it is remarkable that a person is able to choose to concentrate on one of the speakers and listen to his or her speech flow unimpeded. This ability, usually referred to as the binaural cocktail party effect, results in part from binaural (two-eared) hearing. In contrast, ...

Chan, Dominic C. B. — University of Cambridge


Pitch-informed solo and accompaniment separation

This thesis addresses the development of a system for pitch-informed solo and accompaniment separation capable of separating main instruments from music accompaniment regardless of the musical genre of the track, or type of music accompaniment. For the solo instrument, only pitched monophonic instruments were considered in a single-channel scenario where no panning or spatial location information is available. In the proposed method, pitch information is used as an initial stage of a sinusoidal modeling approach that attempts to estimate the spectral information of the solo instrument from a given audio mixture. Instead of estimating the solo instrument on a frame by frame basis, the proposed method gathers information of tone objects to perform separation. Tone-based processing allowed the inclusion of novel processing stages for attack re nement, transient interference reduction, common amplitude modulation (CAM) of tone objects, and for better ...

Cano Cerón, Estefanía — Ilmenau University of Technology


Cost functions for acoustic filters estimations in reverberant mixtures

This work is focused on the processing of multichannel and multisource audio signals. From an audio mixture of several audio sources recorded in a reverberant room, we wish to es- timate the acoustic responses (a.k.a. mixing filters) between the sources and the microphones. To solve this inverse problem one need to take into account additional hypotheses on the nature of the acoustic responses. Our approach consists in first identifying mathematically the neces- sary hypotheses on the acoustic responses for their estimation and then building cost functions and algorithms to effectively estimate them. First, we considered the case where the source signals are known. We developed a method to estimate the acoustic responses based on a convex regularization which exploits both the temporal sparsity of the filters and the exponentially decaying envelope. Real-world experi- ments confirmed the effectiveness of this method ...

Benichoux, Alexis — Université Rennes I


Algorithmic Analysis of Complex Audio Scenes

In this thesis, we examine the problem of algorithmic analysis of complex audio scenes with a special emphasis on natural audio scenes. One of the driving goals behind this work is to develop tools for monitoring the presence of animals in areas of interest based on their vocalisations. This task, which often occurs in the evaluation of nature conservation measures, leads to a number of subproblems in audio scene analysis. In order to develop and evaluate pattern recognition algorithms for animal sounds, a representative collection of such sounds is necessary. Building such a collection is beyond the scope of a single researcher and we therefore use data from the Animal Sound Archive of the Humboldt University of Berlin. Although a large portion of well annotated recordings from this archive has been available in digital form, little infrastructure for searching and ...

Bardeli, Rolf — University of Bonn


Sparse Pulsed Auditory Representations For Speech and Audio Coding

Auditory modeling is a well-established methodology that provides insight into human perception and that facilitates the extraction of signal features most relevant to the human listener for coding applications. This thesis deals with the approach of `coding in the perceptual domain' and is based on an invertible auditory model that provides a pulsed auditory representation of the input speech or audio signal. It is natural for pulsed signal representations to encode only the non-zero samples by specifying their positions as side information. For the considered auditory representation, the number of pulses and, therefore, the amount of side information is too high for an efficient encoding at a relatively low bit rate. The focus of this work is to `sparsify' the pulsed signal representation, i.e., to remove its perceptual irrelevance and its redundancy, to obtain a compact signal representation, which facilitates ...

Christian Feldbauer — Graz University of Technology


Digital Forensic Techniques for Splicing Detection in Multimedia Contents

Visual and audio contents always played a key role in communications, because of their immediacy and presumed objectivity. This has become even more true in the digital era, and today it is common to have multimedia contents stand as proof of events. Digital contents, however, are also very easy to manipulate, thus calling for analysis methods devoted to uncover their processing history. Multimedia forensics is the science trying to answer questions about the past of a given image, audio or video file, questions like “which was the recording device?", or “is the content authentic?". In particular, authenticity assessment is a crucial task in many contexts, and it usually consists in determining whether the investigated object has been artificially created by splicing together different contents. In this thesis we address the problem of splicing detection in the three main media: image, ...

Fontani, Marco — Dept. of Information Engineering and Mathematics, University of Siena


Audio-visual processing and content management techniques, for the study of (human) bioacoustics phenomena

The present doctoral thesis aims towards the development of new long-term, multi-channel, audio-visual processing techniques for the analysis of bioacoustics phenomena. The effort is focused on the study of the physiology of the gastrointestinal system, aiming at the support of medical research for the discovery of gastrointestinal motility patterns and the diagnosis of functional disorders. The term "processing" in this case is quite broad, incorporating the procedures of signal processing, content description, manipulation and analysis, that are applied to all the recorded bioacoustics signals, the auxiliary audio-visual surveillance information (for the monitoring of experiments and the subjects' status), and the extracted audio-video sequences describing the abdominal sound-field alterations. The thesis outline is as follows. The main objective of the thesis, which is the technological support of medical research, is presented in the first chapter. A quick problem definition is initially ...

Dimoulas, Charalampos — Department of Electrical and Computer Engineering, Faculty of Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece


The use of High-Order Sparse Linear Prediction for the Restoration of Archived Audio

Since the invention of Gramophone by Thomas Edison in 1877, vast amounts of cultural, entertainment, educational and historical audio recordings have been recorded and stored throughout the world. Through natural aging and improper storage, the recorded signal degrades and loses its information in terms of quality and intelligibility. Degradation of audio signals is considered as any unwanted modification to the audio signal after it has been recorded. There are different degradations affecting recorded signals on analog storage media. The degradations that are often encountered are clicks, hiss and ‘Wow and Flutter’. Several researches have been conducted in restoring degraded audio recordings. Most of the methods rely on some prior information of the underlying data and the degradation process. The success of these methods heavily depends on the prior information available. When such information is not available, a model of the ...

Dufera, Bisrat Derebssa — School of Electrical and Computer Engineering, Addis Ababa Institute of Technology, Addis Ababa University

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.