When the deaf listen to music. Pitch perception with cochlear implants

Cochlear implants (CI) are surgically implanted hearing aids that provide auditory sensations to deaf people through direct electrical stimulation of the auditory nerve. Although relatively good speech understanding can be achieved by implanted subjects, pitch perception by CI subjects is about 50 times worse than observed for normal-hearing (NH) persons. Pitch is, however, important for intonation, music, speech understanding in tonal languages, and for separating multiple simultaneous sound sources. The major goal of this work is to improve pitch perception by CI subjects. In CI subjects two fundamental mechanisms are used for pitch perception: place pitch and temporal pitch. Our results show that place pitch is correlated to the sound¢s brightness because place pitch sensation is related to the centroid of the excitation pattern along the cochlea. The slopes of the excitation pattern determine place pitch sensitivity. Our results also ...

Laneau, Johan — Katholieke Universiteit Leuven


Prediction and Optimization of Speech Intelligibility in Adverse Conditions

In digital speech-communication systems like mobile phones, public address systems and hearing aids, conveying the message is one of the most important goals. This can be challenging since the intelligibility of the speech may be harmed at various stages before, during and after the transmission process from sender to receiver. Causes which create such adverse conditions include background noise, an unreliable internet connection during a Skype conversation or a hearing impairment of the receiver. To overcome this, many speech-communication systems include speech processing algorithms to compensate for these signal degradations like noise reduction. To determine the effect on speech intelligibility of these signal processing based solutions, the speech signal has to be evaluated by means of a listening test with human listeners. However, such tests are costly and time consuming. As an alternative, reliable and fast machine-driven intelligibility predictors are ...

Taal, Cees — Delft University of Technology


Realtime and Accurate Musical Control of Expression in Voice Synthesis

In the early days of speech synthesis research, understanding voice production has attracted the attention of scientists with the goal of producing intelligible speech. Later, the need to produce more natural voices led researchers to use prerecorded voice databases, containing speech units, reassembled by a concatenation algorithm. With the outgrowth of computer capacities, the length of units increased, going from diphones to non-uniform units, in the so-called unit selection framework, using a strategy referred to as 'take the best, modify the least'. Today the new challenge in voice synthesis is the production of expressive speech or singing. The mainstream solution to this problem is based on the “there is no data like more data” paradigm: emotionspecific databases are recorded and emotion-specific units are segmented. In this thesis, we propose to restart the expressive speech synthesis problem, from its original voice ...

D' Alessandro, N. — Universite de Mons


Design and Evaluation of Feedback Control Algorithms for Implantable Hearing Devices

Using a hearing device is one of the most successful approaches to partially restore the degraded functionality of an impaired auditory system. However, due to the complex structure of the human auditory system, hearing impairment can manifest itself in different ways and, therefore, its compensation can be achieved through different classes of hearing devices. Although the majority of hearing devices consists of conventional hearing aids (HAs), several other classes of hearing devices have been developed. For instance, bone-conduction devices (BCDs) and cochlear implants (CIs) have successfully been used for more than thirty years. More recently, other classes of implantable devices have been developed such as middle ear implants (MEIs), implantable BCDs, and direct acoustic cochlear implants (DACIs). Most of these different classes of hearing devices rely on a sound processor running different algorithms able to compensate for the hearing impairment. ...

Bernardi, Giuliano — KU Leuven


Cochlear implant artifact suppression in EEG measurements

Cochlear implants (CIs) aim to restore hearing in severely to profoundly deaf adults, children and infants. Electrically evoked auditory steady-state responses (EASSRs) are neural responses to continuous modulated pulse trains, and can be objectively detected at the modulation frequency in the electro-encephalogram (EEG). EASSRs provide a number of advantages over other objective measures, because frequency-specific stimuli are used, because targeted brain areas can be studied, depending on the chosen stimulation parameters, and because they can objectively be detected using statistical methods. EASSRs can potentially be used to determine appropriate stimulation levels during CI fitting, without behavioral input from the subjects. Furthermore, speech understanding in noise varies greatly between CI subjects. EASSRs lend themselves well to study the underlying causes of this variability, such as the integrity of the electrode-neuron interface or changes in the auditory cortex following deafness and following ...

Deprez, Hanne — KU Leuven


Design and evaluation of noise reduction techniques for binaural hearing aids

One of the main complaints of hearing aid users is their degraded speech understanding in noisy environments. Modern hearing aids therefore include noise reduction techniques. These techniques are typically designed for a monaural application, i.e. in a single device. However, the majority of hearing aid users currently have hearing aids at both ears in a so-called bilateral fitting, as it is widely accepted that this leads to a better speech understanding and user satisfaction. Unfortunately, the independent signal processing (in particular the noise reduction) in a bilateral fitting can destroy the so-called binaural cues, namely the interaural time and level differences (ITDs and ILDs) which are used to localize sound sources in the horizontal plane. A recent technological advance are so-called binaural hearing aids, where a wireless link allows for the exchange of data (or even microphone signals) between the ...

Cornelis, Bram — KU Leuven


Advances in Perceptual Stereo Audio Coding Using Linear Prediction Techniques

A wide range of techniques for coding a single-channel speech and audio signal has been developed over the last few decades. In addition to pure redundancy reduction, sophisticated source and receiver models have been considered for reducing the bit-rate. Traditionally, speech and audio coders are based on different principles and thus each of them offers certain advantages. With the advent of high capacity channels, networks, and storage systems, the bit-rate versus quality compromise will no longer be the major issue; instead, attributes like low-delay, scalability, computational complexity, and error concealments in packet-oriented networks are expected to be the major selling factors. Typical audio coders such as MP3 and AAC are based on subband or transform coding techniques that are not easily reconcilable with a low-delay requirement. The reasons for their inherently longer delay are the relatively long band splitting filters ...

Biswas, Arijit — Technische Universiteit Eindhoven


Mixed structural models for 3D audio in virtual environments

In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...

Geronazzo, Michele — University of Padova


Deep Learning for Event Detection, Sequence Labelling and Similarity Estimation in Music Signals

When listening to music, some humans can easily recognize which instruments play at what time or when a new musical segment starts, but cannot describe exactly how they do this. To automatically describe particular aspects of a music piece – be it for an academic interest in emulating human perception, or for practical applications –, we can thus not directly replicate the steps taken by a human. We can, however, exploit that humans can easily annotate examples, and optimize a generic function to reproduce these annotations. In this thesis, I explore solving different music perception tasks with deep learning, a recent branch of machine learning that optimizes functions of many stacked nonlinear operations – referred to as deep neural networks – and promises to obtain better results or require less domain knowledge than more traditional techniques. In particular, I employ ...

Schlüter, Jan — Department of Computational Perception, Johannes Kepler University Linz


Some Contributions to Music Signal Processing and to Mono-Microphone Blind Audio Source Separation

For humans, the sound is valuable mostly for its meaning. The voice is spoken language, music, artistic intent. Its physiological functioning is highly developed, as well as our understanding of the underlying process. It is a challenge to replicate this analysis using a computer: in many aspects, its capabilities do not match those of human beings when it comes to speech or instruments music recognition from the sound, to name a few. In this thesis, two problems are investigated: the source separation and the musical processing. The first part investigates the source separation using only one Microphone. The problem of sources separation arises when several audio sources are present at the same moment, mixed together and acquired by some sensors (one in our case). In this kind of situation it is natural for a human to separate and to recognize ...

Schutz, Antony — Eurecome/Mobile


Advances in Glottal Analysis and its Applications

From artificial voices in GPS to automatic systems of dictation, from voice-based identity verification to voice pathology detection, speech processing applications are nowadays omnipresent in our daily life. By offering solutions to companies seeking for efficiency enhancement with simultaneous cost saving, the market of speech technology is forecast to be especially promising in the next years. The present thesis deals with advances in glottal analysis in order to incorporate new techniques within speech processing applications. While current systems are usually based on information related to the vocal tract configuration, the airflow passing through the vocal folds, and called glottal flow, is expected to exhibit a relevant complementarity. Unfortunately, glottal analysis from speech recordings requires specific complex processing operations, which explains why it has been generally avoided. The main goal of this thesis is to provide new advances in glottal analysis ...

Drugman, Thomas — Universite de Mons


A multimicrophone approach to speech processing in a smart-room environment

Recent advances in computer technology and speech and language processing have made possible that some new ways of person-machine communication and computer assistance to human activities start to appear feasible. Concretely, the interest on the development of new challenging applications in indoor environments equipped with multiple multimodal sensors, also known as smart-rooms, has considerably grown. In general, it is well-known that the quality of speech signals captured by microphones that can be located several meters away from the speakers is severely distorted by acoustic noise and room reverberation. In the context of the development of hands-free speech applications in smart-room environments, the use of obtrusive sensors like close-talking microphones is usually not allowed, and consequently, speech technologies must operate on the basis of distant-talking recordings. In such conditions, speech technologies that usually perform reasonably well in free of noise and ...

Abad, Alberto — Universitat Politecnica de Catalunya


Efficient Perceptual Audio Coding Using Cosine and Sine Modulated Lapped Transforms

The increasing number of simultaneous input and output channels utilized in immersive audio configurations primarily in broadcasting applications has renewed industrial requirements for efficient audio coding schemes with low bit-rate and complexity. This thesis presents a comprehensive review and extension of conventional approaches for perceptual coding of arbitrary multichannel audio signals. Particular emphasis is given to use cases ranging from two-channel stereophonic to six-channel 5.1-surround setups with or without the application-specific constraint of low algorithmic coding latency. Conventional perceptual audio codecs share six common algorithmic components, all of which are examined extensively in this thesis. The first is a signal-adaptive filterbank, constructed using instances of the real-valued modified discrete cosine transform (MDCT), to obtain spectral representations of successive portions of the incoming discrete time signal. Within this MDCT spectral domain, various intra- and inter-channel optimizations, most of which are of ...

Helmrich, Christian R. — Friedrich-Alexander-Universität Erlangen-Nürnberg


Pitch-informed solo and accompaniment separation

This thesis addresses the development of a system for pitch-informed solo and accompaniment separation capable of separating main instruments from music accompaniment regardless of the musical genre of the track, or type of music accompaniment. For the solo instrument, only pitched monophonic instruments were considered in a single-channel scenario where no panning or spatial location information is available. In the proposed method, pitch information is used as an initial stage of a sinusoidal modeling approach that attempts to estimate the spectral information of the solo instrument from a given audio mixture. Instead of estimating the solo instrument on a frame by frame basis, the proposed method gathers information of tone objects to perform separation. Tone-based processing allowed the inclusion of novel processing stages for attack re nement, transient interference reduction, common amplitude modulation (CAM) of tone objects, and for better ...

Cano Cerón, Estefanía — Ilmenau University of Technology


Robust Speech Recognition: Analysis and Equalization of Lombard Effect in Czech Corpora

When exposed to noise, speakers will modify the way they speak in an effort to maintain intelligible communication. This process, which is referred to as Lombard effect (LE), involves a combination of both conscious and subconscious articulatory adjustment. Speech production variations due to LE can cause considerable degradation in automatic speech recognition (ASR) since they introduce a mismatch between parameters of the speech to be recognized and the ASR system’s acoustic models, which are usually trained on neutral speech. The main objective of this thesis is to analyze the impact of LE on speech production and to propose methods that increase ASR system performance in LE. All presented experiments were conducted on the Czech spoken language, yet, the proposed concepts are assumed applicable to other languages. The first part of the thesis focuses on the design and acquisition of a ...

Boril, Hynek — Czech Technical University in Prague

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.