Speech Watermarking and Air Traffic Control

Air traffic control (ATC) voice radio communication between aircraft pilots and controllers is subject to technical and functional constraints owing to the legacy radio system currently in use worldwide. This thesis investigates the embedding of digital side information, so called watermarks, into speech signals. Applied to the ATC voice radio, a watermarking system could overcome existing limitations, and ultimately increase safety, security and efficiency in ATC. In contrast to conventional watermarking methods, this field of application allows embedding of the data in perceptually irrelevant signal components. We show that the resulting theoretical watermark capacity far exceeds the capacity of conventional watermarking channels. Based on this finding, we present a general purpose blind speech watermarking algorithm that embeds watermark data in the phase of non-voiced speech segments by replacing the excitation signal of an autoregressive signal representation. Our implementation embeds the ...

Hofbauer, Konrad — Graz University


Informed spatial filters for speech enhancement

In modern devices which provide hands-free speech capturing functionality, such as hands-free communication kits and voice-controlled devices, the received speech signal at the microphones is corrupted by background noise, interfering speech signals, and room reverberation. In many practical situations, the microphones are not necessarily located near the desired source, and hence, the ratio of the desired speech power to the power of the background noise, the interfering speech, and the reverberation at the microphones can be very low, often around or even below 0 dB. In such situations, the comfort of human-to-human communication, as well as the accuracy of automatic speech recognisers for voice-controlled applications can be signi cantly degraded. Therefore, e ffective speech enhancement algorithms are required to process the microphone signals before transmitting them to the far-end side for communication, or before feeding them into a speech recognition ...

Taseska, Maja — Friedrich-Alexander Universität Erlangen-Nürnberg


Realtime and Accurate Musical Control of Expression in Voice Synthesis

In the early days of speech synthesis research, understanding voice production has attracted the attention of scientists with the goal of producing intelligible speech. Later, the need to produce more natural voices led researchers to use prerecorded voice databases, containing speech units, reassembled by a concatenation algorithm. With the outgrowth of computer capacities, the length of units increased, going from diphones to non-uniform units, in the so-called unit selection framework, using a strategy referred to as 'take the best, modify the least'. Today the new challenge in voice synthesis is the production of expressive speech or singing. The mainstream solution to this problem is based on the “there is no data like more data” paradigm: emotionspecific databases are recorded and emotion-specific units are segmented. In this thesis, we propose to restart the expressive speech synthesis problem, from its original voice ...

D' Alessandro, N. — Universite de Mons


Microphone arrays for imaging of aerospace noise sources

With the continuous growth in demand for air traffic and wind turbines, the noise emissions they generate are becoming an increasingly important issue. To reduce their noise levels, it is essential to obtain accurate information about all the sound sources present. Phased microphone arrays and acoustic imaging methods allow for the estimation of the location and strength of sound sources. Experiments with these devices are one of the main approaches in the current research in aeroacoustics, along with computational simulations or noise prediction models. This thesis presents a detailed literature review on the most common aerospace noise sources, challenges in aeroacoustic measurements, and the acoustic imaging methods typically used to overcome them. Practical recommendations are provided for selecting the appropriate imaging technique depending on the type of experiment. New integration techniques for distributed sound sources, such as leading– or trailing–edge ...

Merino-Martinez, Roberto — Delft University of Technology


Enhancement of Speech Signals - with a Focus on Voiced Speech Models

The topic of this thesis is speech enhancement with a focus on models of voiced speech. Speech is divided into two subcategories dependent on the characteristics of the signal. One part is the voiced speech, the other is the unvoiced. In this thesis, we primarily focus on the voiced speech parts and utilise the structure of the signal in relation to speech enhancement. The basis for the models is the harmonic model which is a very often used model for voiced speech because it describes periodic signals perfectly. First, we consider the problem of non-stationarity in the speech signal. The speech signal changes its characteristics continuously over time whereas most speech analysis and enhancement methods assume stationarity within 20-30 ms. We propose to change the model to allow the fundamental frequency to vary linearly over time by introducing a chirp ...

Nørholm, Sidsel Marie — Aalborg University


Measurement-based Performance Evaluation of WiMAX and HSDPA

In this work, a realistic physical layer performance evaluation of High Speed Downlink Packet Access (HSDPA) as well as IEEE 802.16-2004, commonly referred to as Worldwide Inter-operability for Microwave Access (WiMAX), is provided. The performance evaluation is carried out in two measurement campaigns that took place in an alpine and an urban environment. Both, WiMAX and HSDPA use adaptive modulation and coding to adapt the channel coding rate and the size of the symbol alphabet to the current channel conditions. Additionally, both systems allow for multiple transmit and multiple receive antennas to increase the spectral efficiency and the reliability of the transmission. While WiMAX utilizes multiple transmit antennas by simple Alamouti space-time coding, HSDPA implements a closed-loop system with channel adaptive spatial precoding. The necessary, quantized channel information is fed back from the user equipment to the base station. The ...

Mehlfuehrer, Christian — Vienna University of Technology


Speech derereverberation in noisy environments using time-frequency domain signal models

Reverberation is the sum of reflected sound waves and is present in any conventional room. Speech communication devices such as mobile phones in hands-free mode, tablets, smart TVs, teleconferencing systems, hearing aids, voice-controlled systems, etc. use one or more microphones to pick up the desired speech signals. When the microphones are not in the proximity of the desired source, strong reverberation and noise can degrade the signal quality at the microphones and can impair the intelligibility and the performance of automatic speech recognizers. Therefore, it is a highly demanded task to process the microphone signals such that reverberation and noise are reduced. The process of reducing or removing reverberation from recorded signals is called dereverberation. As dereverberation is usually a completely blind problem, where the only available information are the microphone signals, and as the acoustic scenario can be non-stationary, ...

Braun, Sebastian — Friedrich-Alexander Universität Erlangen-Nürnberg


Speech Enhancement Algorithms for Audiological Applications

The improvement of speech intelligibility is a traditional problem which still remains open and unsolved. The recent boom of applications such as hands-free communi- cations or automatic speech recognition systems and the ever-increasing demands of the hearing-impaired community have given a definitive impulse to the research in this area. This PhD thesis is focused on speech enhancement for audiological applications. Most of the research conducted in this thesis has been focused on the improvement of speech intelligibility in hearing aids, considering the variety of restrictions and limitations imposed by this type of devices. The combination of source separation techniques and spatial filtering with machine learning and evolutionary computation has originated novel and interesting algorithms which are included in this thesis. The thesis is divided in two main parts. The first one contains a preliminary study of the problem and a ...

Ayllón, David — Universidad de Alcalá


Advances in Glottal Analysis and its Applications

From artificial voices in GPS to automatic systems of dictation, from voice-based identity verification to voice pathology detection, speech processing applications are nowadays omnipresent in our daily life. By offering solutions to companies seeking for efficiency enhancement with simultaneous cost saving, the market of speech technology is forecast to be especially promising in the next years. The present thesis deals with advances in glottal analysis in order to incorporate new techniques within speech processing applications. While current systems are usually based on information related to the vocal tract configuration, the airflow passing through the vocal folds, and called glottal flow, is expected to exhibit a relevant complementarity. Unfortunately, glottal analysis from speech recordings requires specific complex processing operations, which explains why it has been generally avoided. The main goal of this thesis is to provide new advances in glottal analysis ...

Drugman, Thomas — Universite de Mons


The Removal of Environmental Noise in Cellular Communications by Perceptual Techniques

This thesis describes the application of a perceptually based spectral subtraction algorithm for the enhancement of non-stationary noise corrupted speech. Through examination of speech enhancement techniques, explanations are given for the choice of magnitude spectral subtraction and how the human auditory system can be modelled for frequency domain speech enhancement. It is discovered, that the cochlea provides the mechanical speech enhancement in the auditory system, through the use of masking. Frequency masking is used in spectral subtraction, to improve the algorithm execution time, and to shape the enhancement process making it sound natural to the ear. A new technique for estimation of background noise is presented, which operates during speech sections as well as pauses. This uses two microphones placed on opposite ends of the cellular handset. Using these, the algorithm determines whether the signal is speech, or noise, by ...

Tuffy, Mark — University Of Edinburgh


Coordination Strategies for Interference Management in MIMO Dense Cellular Networks

The envisioned rapid and exponential increase of wireless data traffic demand in the next years imposes rethinking current wireless cellular networks due to the scarcity of the available spectrum. In this regard, three main drivers are considered to increase the capacity of today's most advanced (4G systems) and future (5G systems and beyond) cellular networks: i) use more bandwidth (more Hz) through spectral aggregation, ii) enhance the spectral efficiency per base station (BS) (more bits/s/Hz/BS) by using multiple antennas at BSs and users (i.e. MIMO systems), and iii) increase the density of BSs (more BSs/km2) through a dense and heterogeneous deployment (known as dense heterogeneous cellular networks). We focus on the last two drivers. First, the use of multi-antenna systems allows exploiting the spatial dimension for several purposes: improving the capacity of a conventional point-to-point wireless link, increasing the number ...

Lagen, Sandra — Universitat Politecnica de Catalunya


Prediction and Optimization of Speech Intelligibility in Adverse Conditions

In digital speech-communication systems like mobile phones, public address systems and hearing aids, conveying the message is one of the most important goals. This can be challenging since the intelligibility of the speech may be harmed at various stages before, during and after the transmission process from sender to receiver. Causes which create such adverse conditions include background noise, an unreliable internet connection during a Skype conversation or a hearing impairment of the receiver. To overcome this, many speech-communication systems include speech processing algorithms to compensate for these signal degradations like noise reduction. To determine the effect on speech intelligibility of these signal processing based solutions, the speech signal has to be evaluated by means of a listening test with human listeners. However, such tests are costly and time consuming. As an alternative, reliable and fast machine-driven intelligibility predictors are ...

Taal, Cees — Delft University of Technology


Discrete-time speech processing with application to emotion recognition

The subject of this PhD thesis is the efficient and robust processing and analysis of the audio recordings that are derived from a call center. The thesis is comprised of two parts. The first part is dedicated to dialogue/non-dialogue detection and to speaker segmentation. The systems that are developed are prerequisite for detecting (i) the audio segments that actually contain a dialogue between the system and the call center customer and (ii) the change points between the system and the customer. This way the volume of the audio recordings that need to be processed is significantly reduced, while the system is automated. To detect the presence of a dialogue several systems are developed. This is the first effort found in the international literature that the audio channel is exclusively exploited. Also, it is the first time that the speaker utterance ...

Kotti, Margarita — Aristotle University of Thessaloniki


Speech Enhancement for Disordered and Substitution Voices

This thesis presents methods to enhance the speech of patients with voice disorders or with substitution voices. The first method enhances speech of patients with laryngeal neoplasm. The enhancement enables a reduction of pitch and a strengthening of the harmonics of voiced segments as well as decreasing the perceived speaking effort. The need for reliable pitch mark determination on disordered and substitution voices led to the implementation of a state-space based algorithm. Its performance is comparable to a state-of-the art pitch detection algorithm but does not require post processing. A subsequent part of the thesis deals with alaryngeal speech, with a focus on Electro-Larynx (EL) speech. After investigating an EL speech production model, which takes into account the common source of the speech signal and the directly radiated EL (DREL) sound, a solution to suppress the direct sound is based ...

Hagmuller, Martin — Graz University of Technology


Quality Aspects of Packet-Based Interactive Speech Communication

Voice-over-Internet Protocol (VoIP) technology provides the transmission of speech over packet-based networks. The transition from circuit-switched to packet-switched networks introduces two major quality impairments: packet loss and end-to-end delay. This thesis shows that the incorporation of packets that were damaged by bit errors reduces the effective packet loss rate, and thus improves the speech quality as perceived by the user. Moreover, this thesis addresses the impact of transmission delay on conversational interactivity and on the perceived speech quality. In order to study the structure and interactivity of conversations, the framework of Parametric Conversation Analysis (P-CA) is introduced and three metrics for conversational interactivity are defined. The investigation of five conversation scenarios based on subjective quality tests has shown that only highly structured scenarios result in high conversational interactivity. The speaker alternation rate has turned out to represent a simple and ...

Hammer, Florian — Graz University of Technology

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.