Mixed structural models for 3D audio in virtual environments

In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...

Geronazzo, Michele — University of Padova


An iterative, residual-based approach to unsupervised musical source separation in single-channel mixtures

This thesis concentrates on a major problem within audio signal processing, the separation of source signals from musical mixtures when only a single mixture channel is available. Source separation is the process by which signals that correspond to distinct sources are identified in a signal mixture and extracted from it. Producing multiple entities from a single one is an extremely underdetermined task, so additional prior information can assist in setting appropriate constraints on the solution set. The approach proposed uses prior information such that: (1) it can potentially be applied successfully to a large variety of musical mixtures, and (2) it requires minimal user intervention and no prior learning/training procedures (i.e., it is an unsupervised process). This system can be useful for applications such as remixing, creative effects, restoration and for archiving musical material for internet delivery, amongst others. Here, ...

Siamantas, Georgios — University of York


Reverse Audio Engineering for Active Listening and Other Applications

This work deals with the problem of reverse audio engineering for active listening. The format under consideration corresponds to the audio CD. The musical content is viewed as the result of a concatenation of the composition, the recording, the mixing, and the mastering. The inversion of the two latter stages constitutes the core of the problem at hand. The audio signal is treated as a post-nonlinear mixture. Thus, the mixture is “decompressed” before being “decomposed” into audio tracks. The problem is tackled in an informed context: The inversion is accompanied by information which is specific to the content production. In this manner, the quality of the inversion is significantly improved. The information is reduced in size by the use of quantification and coding methods, and some facts on psychoacoustics. The proposed methods are applicable in real time and have a ...

Gorlow, Stanislaw — Université Bordeaux 1


Dialogue Enhancement and Personalization - Contributions to Quality Assessment and Control

The production and delivery of audio for television involve many creative and technical challenges. One of them is concerned with the level balance between the foreground speech (also referred to as dialogue) and the background elements, e.g., music, sound effects, and ambient sounds. Background elements are fundamental for the narrative and for creating an engaging atmosphere, but they can mask the dialogue, which the audience wishes to follow in a comfortable way. Very different individual factors of the people in the audience clash with the creative freedom of the content creators. As a result, service providers receive regular complaints about difficulties in understanding the dialogue because of too loud background sounds. While this has been a known issue for at least three decades, works analyzing the problem and up-to-date statics were scarce before the contributions in this work. Enabling the ...

Torcoli, Matteo — Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)


A Computational Framework for Sound Segregation in Music Signals

Music is built from sound, ultimately resulting from an elaborate interaction between the sound-generating properties of physical objects (i.e. music instruments) and the sound perception abilities of the human auditory system. Humans, even without any kind of formal music training, are typically able to ex- tract, almost unconsciously, a great amount of relevant information from a musical signal. Features such as the beat of a musical piece, the main melody of a complex musical ar- rangement, the sound sources and events occurring in a complex musical mixture, the song structure (e.g. verse, chorus, bridge) and the musical genre of a piece, are just some examples of the level of knowledge that a naive listener is commonly able to extract just from listening to a musical piece. In order to do so, the human auditory system uses a variety of cues ...

Martins, Luis Gustavo — Universidade do Porto


Information-Theoretic Measures of Predictability for Music Content Analysis

This thesis is concerned with determining similarity in musical audio, for the purpose of applications in music content analysis. With the aim of determining similarity, we consider the problem of representing temporal structure in music. To represent temporal structure, we propose to compute information-theoretic measures of predictability in sequences. We apply our measures to track-wise representations obtained from musical audio; thereafter we consider the obtained measures predictors of musical similarity. We demonstrate that our approach benefits music content analysis tasks based on musical similarity. For the intermediate-specificity task of cover song identification, we compare contrasting discrete-valued and continuous-valued measures of pairwise predictability between sequences. In the discrete case, we devise a method for computing the normalised compression distance (NCD) which accounts for correlation between sequences. We observe that our measure improves average performance over NCD, for sequential compression algorithms. In ...

Foster, Peter — Queen Mary University of London


Application of Sound Source Separation Methods to Advanced Spatial Audio Systems

This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in two-channel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to ...

Cobos, Maximo — Universidad Politecnica de Valencia


Blind Signal Separation

The separation of independent sources from mixed observed data is a fundamental and challenging signal processing problem. In many practical situations, one or more desired signals need to be recovered from the mixtures only. A typical example is speech recordings made in an acoustic environment in the presence of background noise and/or competing speakers. Other examples include EEG signals, passive sonar applications and cross-talk in data communications. The audio signal separation problem is sometimes referred to as The Cocktail Party Problem. When several people in the same room are conversing at the same time, it is remarkable that a person is able to choose to concentrate on one of the speakers and listen to his or her speech flow unimpeded. This ability, usually referred to as the binaural cocktail party effect, results in part from binaural (two-eared) hearing. In contrast, ...

Chan, Dominic C. B. — University of Cambridge


Sound Source Separation in Monaural Music Signals

Sound source separation refers to the task of estimating the signals produced by individual sound sources from a complex acoustic mixture. It has several applications, since monophonic signals can be processed more efficiently and flexibly than polyphonic mixtures. This thesis deals with the separation of monaural, or, one-channel music recordings. We concentrate on separation methods, where the sources to be separated are not known beforehand. Instead, the separation is enabled by utilizing the common properties of real-world sound sources, which are their continuity, sparseness, and repetition in time and frequency, and their harmonic spectral structures. One of the separation approaches taken here use unsupervised learning and the other uses model-based inference based on sinusoidal modeling. Most of the existing unsupervised separation algorithms are based on a linear instantaneous signal model, where each frame of the input mixture signal is modeled ...

Virtanen, Tuomas — Tampere University of Technology


Automatic Detection, Classification and Restoration of Defects in Historical Images

Historical photos are significant attestations of the inheritance of the past. Since Photography is an art that is more than 150 years old, more and more diffuse are the photographic archives all over the world. Nevertheless, time and bad preservation corrupts physical supports, and many important historical documents risk to be ruined and their content lost. Therefore solutions must be implemented to preserve their state and to recover damaged information. This PhD thesis proposes a general methodology, and several applicative solutions, to handle these problems, by means of digitization and digital restoration. The purpose is to create a useful tool to support non-expert users in the restoration process of damaged historical images. The content of this thesis is outlined as follows: Chapter 1 gives an overview on the problems related to management and preservation of cultural repositories, and discusses about ...

Mazzola, Giuseppe — Università degli studi di Palermo - Dipartimento di Ingegneria Informatica


Audio-visual processing and content management techniques, for the study of (human) bioacoustics phenomena

The present doctoral thesis aims towards the development of new long-term, multi-channel, audio-visual processing techniques for the analysis of bioacoustics phenomena. The effort is focused on the study of the physiology of the gastrointestinal system, aiming at the support of medical research for the discovery of gastrointestinal motility patterns and the diagnosis of functional disorders. The term "processing" in this case is quite broad, incorporating the procedures of signal processing, content description, manipulation and analysis, that are applied to all the recorded bioacoustics signals, the auxiliary audio-visual surveillance information (for the monitoring of experiments and the subjects' status), and the extracted audio-video sequences describing the abdominal sound-field alterations. The thesis outline is as follows. The main objective of the thesis, which is the technological support of medical research, is presented in the first chapter. A quick problem definition is initially ...

Dimoulas, Charalampos — Department of Electrical and Computer Engineering, Faculty of Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece


Algorithmic Analysis of Complex Audio Scenes

In this thesis, we examine the problem of algorithmic analysis of complex audio scenes with a special emphasis on natural audio scenes. One of the driving goals behind this work is to develop tools for monitoring the presence of animals in areas of interest based on their vocalisations. This task, which often occurs in the evaluation of nature conservation measures, leads to a number of subproblems in audio scene analysis. In order to develop and evaluate pattern recognition algorithms for animal sounds, a representative collection of such sounds is necessary. Building such a collection is beyond the scope of a single researcher and we therefore use data from the Animal Sound Archive of the Humboldt University of Berlin. Although a large portion of well annotated recordings from this archive has been available in digital form, little infrastructure for searching and ...

Bardeli, Rolf — University of Bonn


Adaptation of statistical models for single channel source separation. Application to voice / music separation in songs

Single channel source separation is a quite recent problem of constantly growing interest in the scientific world. However, this problem is still very far to be solved, and even more, it cannot be solved in all its generality. Indeed, since this problem is highly underdetermined, the main difficulty is that a very strong knowledge about the sources is required to be able to separate them. For a grand class of existing separation methods, this knowledge is expressed by statistical source models, notably Gaussian Mixture Models (GMM), which are learned from some training examples. The subject of this work is to study the separation methods based on statistical models in general, and then to apply them to the particular problem of separating singing voice from background music in mono recordings of songs. It can be very useful to propose some satisfactory ...

OZEROV, Alexey — University of Rennes 1


Distributed Caching Methods in Small Cell Networks

This thesis explores one of the key enablers of 5G wireless networks leveraging small cell network deployments, namely proactive caching. Endowed with predictive capabilities and harnessing recent developments in storage, context-awareness and social networks, peak traffic demands can be substantially reduced by proactively serving predictable user demands, via caching at base stations and users' devices. In order to show the effectiveness of proactive caching techniques, we tackle the problem from two different perspectives, namely theoretical and practical ones. In the first part of this thesis, we use tools from stochastic geometry to model and analyse the theoretical gains of caching at base stations. In particular, we focus on 1) single-tier networks where small base stations with limited storage are deployed, 2) multi-tier networks with limited backhaul, and) multi-tier clustered networks with two different topologies, namely coverage-aided and capacity-aided deployments. Therein, ...

Bastug, Ejder — CentraleSupélec, Université Paris-Saclay


Understanding and Assessing Quality of Experience in Immersive Communications

eXtended Reality (XR) technology, also called Mixed Reality (MR), is in constant development and improvement in terms of hardware and software to offer relevant experiences to users. One of the advances in XR has been the introduction of real visual information in the virtual environment, offering a more natural interaction with the scene and a greater acceptance of technology. Another advance has been achieved with the representation of the scene through a video that covers the entire environment, called 360-degree or omnidirectional video. These videos are acquired by cameras with omnidirectional lenses that cover the 360-degrees of the scene and are generally viewed by users through a head-tracked Head Mounted Display (HMD). Users only visualize a subset of the 360-degree scene, called viewport, which changes with the variations of the viewing direction of the users, determined by the movements of ...

Orduna, Marta — Universidad Politécnica de Madrid

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.