SPACE-TIME PARAMETRIC APPROACH TO EXTENDED AUDIO REALITY (SP-EAR)
The term extended reality refers to all possible interactions between real and virtual (computed generated) elements and environments. The extended reality field is rapidly growing, primarily through augmented and virtual reality applications. The former allows users to bring digital elements into the real world, while the latter lets us experience and interact with an entirely virtual environment. While currently extended reality implementations primarily focus on the visual domain, we cannot underestimate the impact of auditory perception in order to provide a fully immersive experience. As a matter of fact, effective handling of the acoustic content is able to enrich the engagement of users. We refer to Extended Audio Reality (EAR) as the subset of extended reality operations related to the audio domain. In this thesis, we propose a parametric approach to EAR conceived in order to provide an effective and intuitive framework for the implementation of EAR applications. It is clear that the main challenges of EAR regard the processing of real sound fields and the rendering of virtual acoustic sources (VSs); hence, EAR requires the development of properly designed sound field representations. As far as sound field representation is concerned, two main paradigms are present in the literature: parametric and non-parametric. The former describes the acoustic field assuming a signal model governed by few meaningful parameters, e.g., the source signal and location, while the latter relies on the solutions of the wave equation providing accurate results at the cost of higher complexity and lower model interpretability. Therefore, in the context of the EAR, parametric models represent an appealing approach. In fact, they provide a compressed and intuitive description of the sound field. This characteristic promotes the integration of VSs through the parameters of the model and their manipulation thereof. Here, we introduce a novel parametric model for sound field representation based on few parameters. This model allows both the navigation and manipulation of a recorded sound scene. The main feature of the proposed solution is represented by the modeling of the acoustic source directivity integrated among the parameters of the representation. The directivity is a function describing the spatial property of the source sound radiation. As a matter of fact, sound sources typically present a directional acoustic emission imposed by their physical characteristics. It follows that the source directivity information influences our acoustic scene perception. Therefore, the integration of the directivity is a fundamental aspect for providing a more natural and immersive EAR, enhancing the user experience. In order to analyze the sound field, we adopted spatially distributed acoustic sensors. This configuration allows us to evaluate the acoustic field from different observation points in order to estimate the parameters required by the proposed representation. Successively, we exploit the estimated parameters to provide a sound field reconstruction technique that enables the six-degrees-of-freedom interaction (virtual navigation) with the sound field. Conveniently, the parameters adopted for describing the acoustic sources can be exploited for characterizing a VS. Therefore, we can seamlessly implement EAR within the same parametric representation. Here, the addition of the source directivity into the model is appealing since it allows the accurate rendering of VSs, including their directional characteristics. Hence, we can further lead the real-virtual interaction by implementing VS replicas of actual acoustic sources. A VS replica mimics the source spatial sound radiation through the VS directivity parameters. For instance, the VS parameters can be estimated from measurements on the real source. Conversely, we can rely on fully simulated acoustic sources, e.g., employing Finite Element Method (FEM) simulations, from which the VS parameters are derived. It follows that an accurate estimate, prediction, and analysis of the directivity of VSs are fundamental to obtain an effective EAR. In this thesis, we studied the VS implementation through a case study. In particular, we focused on the VS implementation of violins. Whereas violins present a peculiar directional radiation characteristic, we need to carefully analyze and model their directivity in order to provide an accurate VS implementation. Regarding the analysis of the violin directivity, we can outline different solutions according to their invasiveness. In the first place, one can perform measurements directly on played violin. During our collaboration with Musel del Violino settled in Cremona (Italy we had the unique opportunity to measure, for the first time, a relevant number of valuable historical violins made by the renowned masters of the Cremonese school such as Antonio Stradivari and played by professional violinists. From the acquired data, we derived a compressed representation of the violin directivity pattern based on the spherical harmonics expansion. Besides the VS modeling, the adopted representation allowed us to study and characterize the directivity patterns of the instruments, giving insights of their directional behavior. Although the measurement of played instruments allows an analysis scenario closer to the actual listening conditions, it might not be applicable for particularly fragile instruments. Less invasive techniques, such as nearfield acoustic holography (NAH), can be employed when conventional measurements cannot be carried out. It is known that the acoustic radiation of vibrating objects, such as violins, is determined by their dynamical behavior. Hence, from the knowledge of the vibration velocity field, we can estimate the directivity of the source. NAH allows the contactless estimation of the velocity field of a vibrating source from acoustic pressure measured in its proximity. Here, we introduced a novel NAH technique based on deep learning. In particular, we proposed a convolutional neural network (CNN) with an autoencoder-inspired structure in order to estimate the velocity field of both rectangular and violin plates. Alternatively, simulations allow us to predict the directivity of a source relying on the FEM simulation of its vibroacoustic behavior. This approach minimizes the invasiveness at the cost of reduced accuracy caused by inherent approximations of the simulated model. It follows that an effective violin simulation requires a 3D model of the instrument geometry and the mechanical parameters of the material. Unfortunately, we can typically only acquire the outer surface of existing instruments. Therefore, we developed a practical technique for reconstructing the 3D model of violin plates, starting from outer surface scans and sparse thickness measurements taken at reference points. Furthermore, as regards the estimation of the material mechanical parameters, we proposed the evaluation of the Young’s modulus from the sound wave velocity of wood. As a matter of fact, the Young’s modulus is a fundamental parameter for mechanical simulations. The developed technique estimates the sound wave velocity from responses of the wood to an impulsive excitation in a rake receiver fashion. Successively, from the knowledge of the sound wave velocity, the Young’s modulus is indirectly derived. Lastly, we propose an EAR proof of concept through which we showcase the benefit of the proposed parametric approach to EAR. We display an EAR scenario in which two VSs, a VS replica of a prestigious violin, and a simulated generic model of the instrument are virtually co-located in a real sound scene with the presence of actual sound sources. The results give a sneak peek of the power of EAR, showing that the proposed parametric approach is able to provide the blend between real and virtual sound elements. Hence, we envision that the proposed solutions will pave the way to the development of parametric EAR frameworks for extended reality applications.
