Synthetic reproduction of head-related transfer functions by using microphone arrays
Spatial hearing for human listeners is based on the interaural as well as on the monaural analysis of the signals arriving at both ears, enabling the listeners to assign certain spatial components to these signals. This spatial aspect gets lost when the signals are reproduced via headphones without considering the acoustical influence of the head and torso, i.e. head-related transfer function (HRTFs). A common procedure to take into account spatial aspects in a binaural reproduction is to use so-called artificial heads. Artificial heads are replicas of a human head and torso with average anthropometric geometries and built-in microphones in the ears. Although, the signals recorded with artificial heads contain relevant spatial aspects, binaural recordings using artificial heads often suffer from front-back confusions and the perception of the sound source being inside the head (internalization). These shortcomings can be attributed to the missing individualization of the binaural recordings in a static scenario without visual cues. Alternatively, the desired frequency-dependent directivity pattern of individual HRTFs can also be synthesized by using a microphone array with individually optimized filter coefficients (referred to as virtual artificial head, VAH which is the main goal of this thesis. The main advantages of a VAH are the possibility of adjusting the filter coefficients to HRTFs of different listeners (individualization) and to different look directions (orientation), the possibility of employing head tracking in the reproduction stage and a better flexibility and manageability due to the smaller size/weight of the device. This thesis deals with the individual aspects of human spatial hearing, with the measurement and the perception of the associated HRTFs and the synthesis of the associated directivity patterns by using microphone arrays. This thesis is thematically subdivided into three parts: 1. The optimization of the beamformer filter coefficients to synthesize the desired directivity patterns, 2. The imperceptible simplification of individual HRTFs prior to the optimization in order to synthesize only perceptually relevant aspects of HRTFs, and 3. The evaluation of the resulting VAH-synthesis in comparison to binaural recordings using traditional artificial heads. In the first part of this thesis, a mathematically motivated method to derive appropriate microphone topologies for HRTF-synthesis using a VAH is introduced. In a subsequent study, different regularization strategies to improve the robustness of the VAH-synthesis against errors in the microphone characteristics are presented and numerically evaluated. It is shown to be advantageous for the regularization to take into account all directions and to adapt the bandwidth of the optimization and regularization according to the frequency grouping of the human auditory system. In the second part of this thesis, it is examined to which extent individual HRTFs may be smoothed without causing a detectable perceptual difference compared to a chosen reference condition (binaural reproduction with head-related impulse responses (hrirs) truncated to approximately 12 ms). The main reason behind this investigation is to synthesize only the perceptually-relevant aspects of individual HRTFs and hence to improve the accuracy and the robustness of the synthesis. It turns out that individual hrirs may be truncated to approximately 6 ms in the time domain, and the individual phase response of HRTFs may be substituted by a linear phase response for frequencies f >=1 kHz. Based on these findings, the complex-valued HRTFs can be smoothed in relative bandwidths after substituting the original phase by a linear phase for higher frequencies. The bandwidth of this complex-valued smoothing can be increased up to 1/5 octave without yielding a detectable difference. Furthermore, it is shown that spatial notches in the frequency-dependent directivity pattern do not need to be retained in detail if they are less than 29 dB below the maximum value. It is found that such an imperceptible smoothing of the HRTFs prior to optimizing the beamformer filter coefficients improves the VAH-synthesis. In the third part of this thesis, it is shown that the perceptual evaluation of the VAH-synthesis depends on, e.g., the desired regularization but also on the used microphone array and the associated sensor noise. In general, microphone arrays with a lower sensor noise yield better properties for the synthesis. In a subsequent study, the individual VAH-synthesis and the binaural reproduction using traditional artificial heads are perceptually evaluated using listening experiments in comparison to free field presentation. It is found that individuality plays an important role when evaluating binaural reproductions. On average, the VAH-synthesis results in good to excellent perceptual ratings for explicitly considered directions, mainly with better perceptual ratings for the VAH-synthesis in comparison to traditional artificial heads. Perceptual ratings range between fair and good for intermediate, i.e. not explicitly considered directions, and are roughly at the level (or slightly better in terms of the overall performance) of the best ratings associated with a traditional artificial head. In summary, the perceptual ratings confirm the validity of synthesizing HRTFs using the VAH and emphasize the advantages associated with individualized binaural reproduction.
