Mixed structural models for 3D audio in virtual environments (2014)
Audio Signal Processing for Binaural Reproduction with Improved Spatial Perception
Binaural technology aims to reproduce three-dimensional auditory scenes with a high level of realism by providing the auditory display with spatial hearing information. This technology has various applications in virtual acoustics, architectural acoustics, telecommunication and auditory science. One key element in binaural technology is the actual binaural signals, produced by filtering a sound-field with free-field head related transfer functions (HRTFs). With the increased popularity of spherical microphone arrays for sound-field recording, methods have been developed for rendering binaural signals from these recordings. The use of spherical arrays naturally leads to processing methods that are formulated in the spherical harmonics (SH) domain. For accurate SH representation, high-order functions, of both the sound-field and the HRTF, are required. However, a limited number of microphones, on one hand, and challenges in acquiring high resolution individual HRTFs, on the other hand, impose limitations on ...
Ben-Hur, Zamir — Ben-Gurion University of the Negev
Synthetic reproduction of head-related transfer functions by using microphone arrays
Spatial hearing for human listeners is based on the interaural as well as on the monaural analysis of the signals arriving at both ears, enabling the listeners to assign certain spatial components to these signals. This spatial aspect gets lost when the signals are reproduced via headphones without considering the acoustical influence of the head and torso, i.e. head-related transfer function (HRTFs). A common procedure to take into account spatial aspects in a binaural reproduction is to use so-called artificial heads. Artificial heads are replicas of a human head and torso with average anthropometric geometries and built-in microphones in the ears. Although, the signals recorded with artificial heads contain relevant spatial aspects, binaural recordings using artificial heads often suffer from front-back confusions and the perception of the sound source being inside the head (internalization). These shortcomings can be attributed to ...
Rasumow, Eugen — University of Oldenburg
The ability of humans to perceive sound spatially is based on binaural hearing, i.e. on signals arriving at the two ears which supply the listener with important spatial and spectral cues. The aim of binaural technology is to capture and reproduce the sound field in such a way that these cues are preserved. A well-known drawback of using artificial heads for this aim is that they exhibit different anthropometrical measures compared to individual listeners. When playing back the recorded signals over headphones, the non-individual design of artificial heads may lead to localization ambiguities such as front-back reversals and perception inside the head. Moreover, it is hardly possible to achieve dynamic signal playback, accounting for the listener's head movements. As an alternative, it has been proposed to use a Virtual Artificial Head (VAH), which is a microphone array where spectral weights ...
Mina Fallahi — University of Oldenburg, Germany
SPACE-TIME PARAMETRIC APPROACH TO EXTENDED AUDIO REALITY (SP-EAR)
The term extended reality refers to all possible interactions between real and virtual (computed generated) elements and environments. The extended reality field is rapidly growing, primarily through augmented and virtual reality applications. The former allows users to bring digital elements into the real world, while the latter lets us experience and interact with an entirely virtual environment. While currently extended reality implementations primarily focus on the visual domain, we cannot underestimate the impact of auditory perception in order to provide a fully immersive experience. As a matter of fact, effective handling of the acoustic content is able to enrich the engagement of users. We refer to Extended Audio Reality (EAR) as the subset of extended reality operations related to the audio domain. In this thesis, we propose a parametric approach to EAR conceived in order to provide an effective and ...
Pezzoli Mirco — Politecnico di Milano
Cognitive Models for Acoustic and Audiovisual Sound Source Localization
Sound source localization algorithms have a long research history in the field of digital signal processing. Many common applications like intelligent personal assistants, teleconferencing systems and methods for technical diagnosis in acoustics require an accurate localization of sound sources in the environment. However, dynamic environments entail a particular challenge for these systems. For instance, voice controlled smart home applications, where the speaker, as well as potential noise sources, are moving within the room, are a typical example of dynamic environments. Classical sound source localization systems only have limited capabilities to deal with dynamic acoustic scenarios. In this thesis, three novel approaches to sound source localization that extend existing classical methods will be presented. The first system is proposed in the context of audiovisual source localization. Determining the position of sound sources in adverse acoustic conditions can be improved by including ...
Schymura, Christopher — Ruhr University Bochum
The problem of segregating a sound source of interest from an acoustic background has been extensively studied due to applications in hearing prostheses, robust speech/speaker recognition and audio information retrieval. Computational auditory scene analysis (CASA) approaches the segregation problem by utilizing grouping cues involved in the perceptual organization of sound by human listeners. Binaural processing, where input signals resemble those that enter the two ears, is of particular interest in the CASA field. The dominant approach to binaural segregation has been to derive spatially selective filters in order to enhance the signal in a direction of interest. As such, the problems of sound localization and sound segregation are closely tied. While spatial filtering has been widely utilized, substantial performance degradation is incurred in reverberant environments and more fundamentally, segregation cannot be performed without sufficient spatial separation between sources. This dissertation ...
Woodruff, John — The Ohio State University
Computational models of expressive gesture in multimedia systems
This thesis focuses on the development of paradigms and techniques for the design and implementation of multimodal interactive systems, mainly for performing arts applications. The work addresses research issues in the fields of human-computer interaction, multimedia systems, and sound and music computing. The thesis is divided into two parts. In the first one, after a short review of the state-of-the-art, the focus moves on the definition of environments in which novel forms of technology-integrated artistic performances can take place. These are distributed active mixed reality environments in which information at different layers of abstraction is conveyed mainly non-verbally through expressive gestures. Expressive gesture is therefore defined and the internal structure of a virtual observer able to process it (and inhabiting the proposed environments) is described in a multimodal perspective. The definition of the structure of the environments, of the virtual ...
Volpe, Gualtiero — University of Genova
Visual ear detection and recognition in unconstrained environments
Automatic ear recognition systems have seen increased interest over recent years due to multiple desirable characteristics. Ear images used in such systems can typically be extracted from profile head shots or video footage. The acquisition procedure is contactless and non-intrusive, and it also does not depend on the cooperation of the subjects. In this regard, ear recognition technology shares similarities with other image-based biometric modalities. Another appealing property of ear biometrics is its distinctiveness. Recent studies even empirically validated existing conjectures that certain features of the ear are distinct for identical twins. This fact has significant implications for security-related applications and puts ear images on a par with epigenetic biometric modalities, such as the iris. Ear images can also supplement other biometric modalities in automatic recognition systems and provide identity cues when other information is unreliable or even unavailable. In ...
Emeršič, Žiga — University of Ljubljana, Faculty of Computer and Information Science
Performance Improvement of Multichannel Audio by Graphics Processing Units
Multichannel acoustic signal processing has undergone major development in recent years due to the increased complexity of current audio processing applications. People want to collaborate through communication with the feeling of being together and sharing the same environment, what is considered as Immersive Audio Schemes. In this phenomenon, several acoustic effects are involved: 3D spatial sound, room compensation, crosstalk cancelation, sound source localization, among others. However, high computing capacity is required to achieve any of these effects in a real large-scale system, what represents a considerable limitation for real-time applications. The increase of the computational capacity has been historically linked to the number of transistors in a chip. However, nowadays the improvements in the computational capacity are mainly given by increasing the number of processing units, i.e expanding parallelism in computing. This is the case of the Graphics Processing Units ...
Belloch, Jose A. — Universitat Politècnica de València
Design and Evaluation of Feedback Control Algorithms for Implantable Hearing Devices
Using a hearing device is one of the most successful approaches to partially restore the degraded functionality of an impaired auditory system. However, due to the complex structure of the human auditory system, hearing impairment can manifest itself in different ways and, therefore, its compensation can be achieved through different classes of hearing devices. Although the majority of hearing devices consists of conventional hearing aids (HAs), several other classes of hearing devices have been developed. For instance, bone-conduction devices (BCDs) and cochlear implants (CIs) have successfully been used for more than thirty years. More recently, other classes of implantable devices have been developed such as middle ear implants (MEIs), implantable BCDs, and direct acoustic cochlear implants (DACIs). Most of these different classes of hearing devices rely on a sound processor running different algorithms able to compensate for the hearing impairment. ...
Bernardi, Giuliano — KU Leuven
Due to their decreased ability to understand speech hearing impaired may have difficulties to interact in social groups, especially when several people are talking simultaneously. Fortunately, in the last decades hearing aids have evolved from simple sound amplifiers to modern digital devices with complex functionalities including noise reduction algorithms, which are crucial to improve speech understanding in background noise for hearing-impaired persons. Since many hearing aid users are fitted with two hearing aids, so-called binaural hearing aids have been developed, which exchange data and signals through a wireless link such that the processing in both hearing aids can be synchronized. In addition to reducing noise and limiting speech distortion, another important objective of noise reduction algorithms in binaural hearing aids is the preservation of the listener’s impression of the acoustical scene, in order to exploit the binaural hearing advantage and ...
Marquardt, Daniel — University of Oldenburg, Germany
Parametric spatial audio processing utilising compact microphone arrays
This dissertation focuses on the development of novel parametric spatial audio techniques using compact microphone arrays. Compact arrays are of special interest since they can be adapted to fit in portable devices, opening the possibility of exploiting the potential of immersive spatial audio algorithms in our daily lives. The techniques developed in this thesis consider the use of signal processing algorithms adapted for human listeners, thus exploiting the capabilities and limitations of human spatial hearing. The findings of this research are in the following three areas of spatial audio processing: directional filtering, spatial audio reproduction, and direction of arrival estimation. In directional filtering, two novel algorithms have been developed based on the cross-pattern coherence (CroPaC). The method essentially exploits the directional response of two different types of beamformers by using their cross-spectrum to estimate a soft masker. The soft masker ...
Delikaris-Manias, Symeon — Aalto University
3D motion capture by computer vision and virtual rendering
Networked 3D virtual environments allow multiple users to interact with each other over the Internet. Users can share some sense of telepresence by remotely animating an avatar that represents them. However, avatar control may be tedious and still render user gestures poorly. This work aims at animating a user‟s avatar from real time 3D motion capture by monoscopic computer vision, thus allowing virtual telepresence to anyone using a personal computer with a webcam. The approach followed consists of registering a 3D articulated upper-body model to a video sequence. This involves searching iteratively for the best match between features extracted from the 3D model and from the image. A two-step registration process matches regions and then edges. The first contribution of this thesis is a method of allocating computing iterations under real-time constrain that achieves optimal robustness and accuracy. The major ...
Gomez Jauregui, David Antonio — Telecom SudParis
Cloning with gesture expressivity
Virtual environments allow human beings to be represented by virtual humans or avatars. Users can share a sense of virtual presence is the avatar looks like the real human it represents. This classically involves turning the avatar into a clone with the real human’s appearance and voice. However, the possibility of cloning the gesture expressivity of a real person has received little attention so far. Gesture expressivity combines the style and mood of a person. Expressivity parameters have been defined in earlier works for animating embodied conversational agents. In this work, we focus on expressivity in wrist motion. First, we propose algorithms to estimate three expressivity parameters from captured wrist 3D trajectories: repetition, spatial extent and temporal extent. Then, we conducted perceptual study through a user survey the relevance of expressivity for recognizing individual human. We have animated a virtual ...
Rajagopal, Manoj kumar — Telecom Sudparis
The separation of independent sources from mixed observed data is a fundamental and challenging signal processing problem. In many practical situations, one or more desired signals need to be recovered from the mixtures only. A typical example is speech recordings made in an acoustic environment in the presence of background noise and/or competing speakers. Other examples include EEG signals, passive sonar applications and cross-talk in data communications. The audio signal separation problem is sometimes referred to as The Cocktail Party Problem. When several people in the same room are conversing at the same time, it is remarkable that a person is able to choose to concentrate on one of the speakers and listen to his or her speech flow unimpeded. This ability, usually referred to as the binaural cocktail party effect, results in part from binaural (two-eared) hearing. In contrast, ...
Chan, Dominic C. B. — University of Cambridge
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.