Speech Watermarking and Air Traffic Control

Air traffic control (ATC) voice radio communication between aircraft pilots and controllers is subject to technical and functional constraints owing to the legacy radio system currently in use worldwide. This thesis investigates the embedding of digital side information, so called watermarks, into speech signals. Applied to the ATC voice radio, a watermarking system could overcome existing limitations, and ultimately increase safety, security and efficiency in ATC. In contrast to conventional watermarking methods, this field of application allows embedding of the data in perceptually irrelevant signal components. We show that the resulting theoretical watermark capacity far exceeds the capacity of conventional watermarking channels. Based on this finding, we present a general purpose blind speech watermarking algorithm that embeds watermark data in the phase of non-voiced speech segments by replacing the excitation signal of an autoregressive signal representation. Our implementation embeds the ...

Hofbauer, Konrad — Graz University


Informed spatial filters for speech enhancement

In modern devices which provide hands-free speech capturing functionality, such as hands-free communication kits and voice-controlled devices, the received speech signal at the microphones is corrupted by background noise, interfering speech signals, and room reverberation. In many practical situations, the microphones are not necessarily located near the desired source, and hence, the ratio of the desired speech power to the power of the background noise, the interfering speech, and the reverberation at the microphones can be very low, often around or even below 0 dB. In such situations, the comfort of human-to-human communication, as well as the accuracy of automatic speech recognisers for voice-controlled applications can be signi cantly degraded. Therefore, e ffective speech enhancement algorithms are required to process the microphone signals before transmitting them to the far-end side for communication, or before feeding them into a speech recognition ...

Taseska, Maja — Friedrich-Alexander Universität Erlangen-Nürnberg


Network-Based Ionospheric Gradient Monitoring to Support Ground Based Augmentation Systems

The Ground Based Augmentation System (GBAS) is a local-area, airport-based augmentation of Global Navigation Satellite Systems (GNSS) that provides precision approach guidance for aircraft. It enhances GNSS performance in terms of integrity, continuity, accuracy, and availability by providing differential corrections and integrity information to aircraft users. Differential corrections enable the aircraft to correct spatially correlated errors, improving its position estimation. Integrity parameters enable it to bound the residual position errors, ensuring safety of the operation. Additionally, a GBAS ground station continuously monitors and excludes the satellites affected by any system failure to guarantee system integrity and safety. Among the error sources of GNSS positioning, the ionosphere is the largest and most unpredictable. Under abnormal ionospheric conditions, large ionospheric gradients may produce a significant difference between the ionospheric delay observed by the GBAS reference station and the aircraft on approach. Such ...

Caamaño Albuerne, María — Universitat Politècnica de Catalunya


Non-Intrusive Speech Intelligibility Prediction

The ability to communicate through speech is important for social interaction. We rely on the ability to communicate with each other even in noisy conditions. Ideally, the speech is easy to understand but this is not always the case, if the speech is degraded, e.g., due to background noise, distortion or hearing impairment. One of the most important factors to consider in relation to such degradations is speech intelligibility, which is a measure of how easy or difficult it is to understand the speech. In this thesis, the focus is on the topic of speech intelligibility prediction. The thesis consists of an introduction to the field of speech intelligibility prediction and a collection of scientific papers. The introduction provides a background to the challenges with speech communication in noisy conditions, followed by an introduction to how speech is produced and ...

Sørensen, Charlotte — Aalborg University


Dialogue Enhancement and Personalization - Contributions to Quality Assessment and Control

The production and delivery of audio for television involve many creative and technical challenges. One of them is concerned with the level balance between the foreground speech (also referred to as dialogue) and the background elements, e.g., music, sound effects, and ambient sounds. Background elements are fundamental for the narrative and for creating an engaging atmosphere, but they can mask the dialogue, which the audience wishes to follow in a comfortable way. Very different individual factors of the people in the audience clash with the creative freedom of the content creators. As a result, service providers receive regular complaints about difficulties in understanding the dialogue because of too loud background sounds. While this has been a known issue for at least three decades, works analyzing the problem and up-to-date statics were scarce before the contributions in this work. Enabling the ...

Torcoli, Matteo — Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)


Realtime and Accurate Musical Control of Expression in Voice Synthesis

In the early days of speech synthesis research, understanding voice production has attracted the attention of scientists with the goal of producing intelligible speech. Later, the need to produce more natural voices led researchers to use prerecorded voice databases, containing speech units, reassembled by a concatenation algorithm. With the outgrowth of computer capacities, the length of units increased, going from diphones to non-uniform units, in the so-called unit selection framework, using a strategy referred to as 'take the best, modify the least'. Today the new challenge in voice synthesis is the production of expressive speech or singing. The mainstream solution to this problem is based on the “there is no data like more data” paradigm: emotionspecific databases are recorded and emotion-specific units are segmented. In this thesis, we propose to restart the expressive speech synthesis problem, from its original voice ...

D' Alessandro, N. — Universite de Mons


Joint Source-Cryptographic-Channel Coding for Real-Time Secure Voice Communications on Voice Channels

The growing risk of privacy violation and espionage associated with the rapid spread of mobile communications renewed interest in the original concept of sending encrypted voice as audio signal over arbitrary voice channels. The usual methods used for encrypted data transmission over analog telephony turned out to be inadequate for modern vocal links (cellular networks, VoIP) equipped with voice compression, voice activity detection, and adaptive noise suppression algorithms. The limited available bandwidth, nonlinear channel distortion, and signal fadings motivate the investigation of a dedicated, joint approach for speech encoding and encryption adapted to modern noisy voice channels. This thesis aims to develop, analyze, and validate secure and efficient schemes for real-time speech encryption and transmission via modern voice channels. In addition to speech encryption, this study covers the security and operational aspects of the whole voice communication system, as this ...

Krasnowski, Piotr — Université Côte d'Azur


Measurement-based Performance Evaluation of WiMAX and HSDPA

In this work, a realistic physical layer performance evaluation of High Speed Downlink Packet Access (HSDPA) as well as IEEE 802.16-2004, commonly referred to as Worldwide Inter-operability for Microwave Access (WiMAX), is provided. The performance evaluation is carried out in two measurement campaigns that took place in an alpine and an urban environment. Both, WiMAX and HSDPA use adaptive modulation and coding to adapt the channel coding rate and the size of the symbol alphabet to the current channel conditions. Additionally, both systems allow for multiple transmit and multiple receive antennas to increase the spectral efficiency and the reliability of the transmission. While WiMAX utilizes multiple transmit antennas by simple Alamouti space-time coding, HSDPA implements a closed-loop system with channel adaptive spatial precoding. The necessary, quantized channel information is fed back from the user equipment to the base station. The ...

Mehlfuehrer, Christian — Vienna University of Technology


Spatio-Temporal Speech Enhancement in Adverse Acoustic Conditions

Never before has speech been captured as often by electronic devices equipped with one or multiple microphones, serving a variety of applications. It is the key aspect in digital telephony, hearing devices, and voice-driven human-to-machine interaction. When speech is recorded, the microphones also capture a variety of further, undesired sound components due to adverse acoustic conditions. Interfering speech, background noise and reverberation, i.e. the persistence of sound in a room after excitation caused by a multitude of reflections on the room enclosure, are detrimental to the quality and intelligibility of target speech as well as the performance of automatic speech recognition. Hence, speech enhancement aiming at estimating the early target-speech component, which contains the direct component and early reflections, is crucial to nearly all speech-related applications presently available. In this thesis, we compare, propose and evaluate existing and novel approaches ...

Dietzen, Thomas — KU Leuven


Microphone arrays for imaging of aerospace noise sources

With the continuous growth in demand for air traffic and wind turbines, the noise emissions they generate are becoming an increasingly important issue. To reduce their noise levels, it is essential to obtain accurate information about all the sound sources present. Phased microphone arrays and acoustic imaging methods allow for the estimation of the location and strength of sound sources. Experiments with these devices are one of the main approaches in the current research in aeroacoustics, along with computational simulations or noise prediction models. This thesis presents a detailed literature review on the most common aerospace noise sources, challenges in aeroacoustic measurements, and the acoustic imaging methods typically used to overcome them. Practical recommendations are provided for selecting the appropriate imaging technique depending on the type of experiment. New integration techniques for distributed sound sources, such as leading– or trailing–edge ...

Merino-Martinez, Roberto — Delft University of Technology


Non-intrusive Quality Evaluation of Speech Processed in Noisy and Reverberant Environments

In many speech applications such as hands-free telephony or voice-controlled home assistants, the distance between the user and the recording microphones can be relatively large. In such a far-field scenario, the recorded microphone signals are typically corrupted by noise and reverberation, which may severely degrade the performance of speech recognition systems and reduce intelligibility and quality of speech in communication applications. In order to limit these effects, speech enhancement algorithms are typically applied. The main objective of this thesis is to develop novel speech enhancement algorithms for noisy and reverberant environments and signal-based measures to evaluate these algorithms, focusing on solutions that are applicable in realistic scenarios. First, we propose a single-channel speech enhancement algorithm for joint noise and reverberation reduction. The proposed algorithm uses a spectral gain to enhance the input signal, where the gain is computed using a ...

Cauchi, Benjamin — University of Oldenburg


Enhancement of Speech Signals - with a Focus on Voiced Speech Models

The topic of this thesis is speech enhancement with a focus on models of voiced speech. Speech is divided into two subcategories dependent on the characteristics of the signal. One part is the voiced speech, the other is the unvoiced. In this thesis, we primarily focus on the voiced speech parts and utilise the structure of the signal in relation to speech enhancement. The basis for the models is the harmonic model which is a very often used model for voiced speech because it describes periodic signals perfectly. First, we consider the problem of non-stationarity in the speech signal. The speech signal changes its characteristics continuously over time whereas most speech analysis and enhancement methods assume stationarity within 20-30 ms. We propose to change the model to allow the fundamental frequency to vary linearly over time by introducing a chirp ...

Nørholm, Sidsel Marie — Aalborg University


Coordination Strategies for Interference Management in MIMO Dense Cellular Networks

The envisioned rapid and exponential increase of wireless data traffic demand in the next years imposes rethinking current wireless cellular networks due to the scarcity of the available spectrum. In this regard, three main drivers are considered to increase the capacity of today's most advanced (4G systems) and future (5G systems and beyond) cellular networks: i) use more bandwidth (more Hz) through spectral aggregation, ii) enhance the spectral efficiency per base station (BS) (more bits/s/Hz/BS) by using multiple antennas at BSs and users (i.e. MIMO systems), and iii) increase the density of BSs (more BSs/km2) through a dense and heterogeneous deployment (known as dense heterogeneous cellular networks). We focus on the last two drivers. First, the use of multi-antenna systems allows exploiting the spatial dimension for several purposes: improving the capacity of a conventional point-to-point wireless link, increasing the number ...

Lagen, Sandra — Universitat Politecnica de Catalunya


OFDM Air-Interface Design for Multimedia Communications

The aim of this dissertation is the investigation of the key issues encountered in the development of wideband radio air-interfaces. Orthogonal frequency-division multiplexing (OFDM) is considered as the enabling technology for transmitting data at extremely high rates over time-dispersive radio channels. OFDM is a transmission scheme, which splits up the data stream, sending the data symbols simultaneously at a drastically reduced symbol rate over a set of parallel sub-carriers. The first part of this thesis deals with the modeling of the time-dispersive and frequency-selective radio channel, utilizing second order Gaussian stochastic processes. A novel channel measurement technique is developed, in which the RMS delay spread of the channel is estimated from the level-crossing rate of the frequency-selective channel transfer function. This method enables the empirical channel characterization utilizing simplified non-coherent measurements of the received power versus frequency. Air-interface and multiple ...

Witrisal, Klaus — Delft University of Technology


Acoustic echo reduction for multiple loudspeakers and microphones: Complexity reduction and convergence enhancement

Modern devices such as mobile phones, tablets or smart speakers are commonly equipped with several loudspeakers and microphones. If, for instance, one employs such a device for hands-free communication applications, the signals that are reproduced by the loudspeakers are propagated through the room and are inevitably acquired by the microphones. If no processing is applied, the participants in the far-end room receive delayed reverberated replicas of their own voice, which strongly degrades both speech intelligibility and user comfort. In order to prevent that so-called acoustic echoes are transmitted back to the far-end room, acoustic echo cancelers are commonly employed. The latter make use of adaptive filtering techniques to identify the propagation paths between loudspeakers and microphones. The estimated propagation paths are then employed to compute acoustic echo estimates, which are finally subtracted from the signals acquired by the microphones. In ...

Luis Valero, Maria — International Audio Laboratories Erlangen

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.