Statistical Parametric Speech Synthesis Based on the Degree of Articulation

Nowadays, speech synthesis is part of various daily life applications. The ultimate goal of such technologies consists in extending the possibilities of interaction with the machine, in order to get closer to human-like communications. However, current state-of-the-art systems often lack of realism: although high-quality speech synthesis can be produced by many researchers and companies around the world, synthetic voices are generally perceived as hyperarticulated. In any case, their degree of articulation is fixed once and for all. The present thesis falls within the more general quest for enriching expressivity in speech synthesis. The main idea consists in improving statistical parametric speech synthesis, whose most famous example is Hidden Markov Model (HMM) based speech synthesis, by introducing a control of the articulation degree, so as to enable synthesizers to automatically adapt their way of speaking to the contextual situation, like humans ...

Picart, Benjamin — Université de Mons (UMONS)


Speech Enhancement Using Nonnegative Matrix Factorization and Hidden Markov Models

Reducing interference noise in a noisy speech recording has been a challenging task for many years yet has a variety of applications, for example, in handsfree mobile communications, in speech recognition, and in hearing aids. Traditional single-channel noise reduction schemes, such as Wiener filtering, do not work satisfactorily in the presence of non-stationary background noise. Alternatively, supervised approaches, where the noise type is known in advance, lead to higher-quality enhanced speech signals. This dissertation proposes supervised and unsupervised single-channel noise reduction algorithms. We consider two classes of methods for this purpose: approaches based on nonnegative matrix factorization (NMF) and methods based on hidden Markov models (HMM). The contributions of this dissertation can be divided into three main (overlapping) parts. First, we propose NMF-based enhancement approaches that use temporal dependencies of the speech signals. In a standard NMF, the important temporal ...

Mohammadiha, Nasser — KTH Royal Institute of Technology


Informed spatial filters for speech enhancement

In modern devices which provide hands-free speech capturing functionality, such as hands-free communication kits and voice-controlled devices, the received speech signal at the microphones is corrupted by background noise, interfering speech signals, and room reverberation. In many practical situations, the microphones are not necessarily located near the desired source, and hence, the ratio of the desired speech power to the power of the background noise, the interfering speech, and the reverberation at the microphones can be very low, often around or even below 0 dB. In such situations, the comfort of human-to-human communication, as well as the accuracy of automatic speech recognisers for voice-controlled applications can be signi cantly degraded. Therefore, e ffective speech enhancement algorithms are required to process the microphone signals before transmitting them to the far-end side for communication, or before feeding them into a speech recognition ...

Taseska, Maja — Friedrich-Alexander Universität Erlangen-Nürnberg


Realtime and Accurate Musical Control of Expression in Voice Synthesis

In the early days of speech synthesis research, understanding voice production has attracted the attention of scientists with the goal of producing intelligible speech. Later, the need to produce more natural voices led researchers to use prerecorded voice databases, containing speech units, reassembled by a concatenation algorithm. With the outgrowth of computer capacities, the length of units increased, going from diphones to non-uniform units, in the so-called unit selection framework, using a strategy referred to as 'take the best, modify the least'. Today the new challenge in voice synthesis is the production of expressive speech or singing. The mainstream solution to this problem is based on the “there is no data like more data” paradigm: emotionspecific databases are recorded and emotion-specific units are segmented. In this thesis, we propose to restart the expressive speech synthesis problem, from its original voice ...

D' Alessandro, N. — Universite de Mons


Enhancement of Periodic Signals: with Application to Speech Signals

The topic of this thesis is the enhancement of noisy, periodic signals with application to speech signals. Generally speaking, enhancement methods can be divided into signal- and noise-driven methods. In this thesis, we focus on the signal-driven approach by employing relevant signal parameters for the enhancement of periodic signals. The enhancement problem consists of two major subproblems: the estimation of relevant parameters or statistics, and the actual noise reduction of the observed signal. We consider both of these subproblems. First, we consider the problem of estimating signal parameters relevant to the enhancement of periodic signals. The fundamental frequency is one example of such a parameter. Furthermore, in multichannel scenarios, the direction-of-arrival of the periodic sources onto an array of sensors is another parameter of relevance. We propose methods for the estimation of the fundamental frequency that have benefits compared to ...

Jensen, Jesper Rindom — Aalborg University


Models and Software Realization of Russian Speech Recognition based on Morphemic Analysis

Above 20% European citizens speak in Russian therefore the task of automatic recognition of Russian continuous speech has a key significance. The main problems of ASR are connected with the complex mechanism of Russian word-formation. Totally there exist above 3 million diverse valid word-forms that is very large vocabulary ASR task. The thesis presents the novel HMM-based ASR model of Russian that has morphemic levels of speech and language representation. The model includes the developed methods for decomposition of the word vocabulary into morphemes and acoustical and statistical language modelling at the training stage and the method for word synthesis at the last stage of speech decoding. The presented results of application of the ASR model for voice access to the Yellow Pages directory have shown the essential improvement (above 75%) of the real-time factor saving acceptable word recognition rate ...

Karpov, Alexey — St.Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences


Adaptive Noise Cancelation in Speech Signals

Today, adaptive algorithms represent one of the most frequently used computational tools for the processing of digital speech signals. This work investigates and analyzes the properties of adaptive algorithms in speech communication applications where rigorous conditions apply, such as noise and echo cancelation. Like other theses in this field do, it tries to tackle the ever-lasting problem of computational complexity vs. rate of convergence. It introduces some new adaptive methods that stem from the existing algorithms as well as a novel concept which has been entitled Optimal Step-Size (OSS). In the first part of the thesis we investigate some well-known, widely used adaptive techniques such as the Normalized Least Mean Squares (NLMS) and the Recursive Least Mean Squares (RLS). In spite of the fact that the NLMS and the RLS belong to the "simplest" principles, as far as complexity is ...

Malenovsky, Vladimir — Department of Telecommunications, Brno University of Technology, Czech Republic


Design and evaluation of digital signal processing algorithms for acoustic feedback and echo cancellation

This thesis deals with several open problems in acoustic echo cancellation and acoustic feedback control. Our main goal has been to develop solutions that provide a high performance and sound quality, and behave in a robust way in realistic conditions. This can be achieved by departing from the traditional ad-hoc methods, and instead deriving theoretically well-founded solutions, based on results from parameter estimation and system identification. In the development of these solutions, the computational efficiency has permanently been taken into account as a design constraint, in that the complexity increase compared to the state-of-the-art solutions should not exceed 50 % of the original complexity. In the context of acoustic echo cancellation, we have investigated the problems of double-talk robustness, acoustic echo path undermodeling, and poor excitation. The two former problems have been tackled by including adaptive decorrelation filters in the ...

van Waterschoot, Toon — Katholieke Universiteit Leuven


Vulnerabilities and Attack Protection in Security Systems Based on Biometric Recognition

Absolute security does not exist: given funding, willpower and the proper technology, every security system can be compromised. However, the objective of the security community should be to develop such applications that the funding, the will, and the resources needed by the attacker to crack the system prevent him from attempting to do so. This Thesis is focused on the vulnerability assessment of biometric systems. Although being relatively young compared to other mature and long-used security technologies, biometrics have emerged in the last decade as a pushing alternative for applications where automatic recognition of people is needed. Certainly, biometrics are very attractive and useful for the final user: forget about PINs and passwords, you are your own key. However, we cannot forget that as any technology aimed to provide a security service, biometric systems are exposed to external attacks which ...

Javier Galbally — Universidad Autonoma de Madrid


Iterative Joint Source-Channel Coding Techniques for Single and Multiterminal Sources in Communication Networks

In a communication system it results undoubtedly of great interest to compress the information generated by the data sources to its most elementary representation, so that the amount of power necessary for reliable communications can be reduced. It is often the case that the redundancy shown by a wide variety of information sources can be modelled by taking into account the probabilistic dependance among consecutive source symbols rather than the probabilistic distribution of a single symbol. These sources are commonly referred to as single or multiterminal sources "with memory" being the memory, in this latter case, the existing temporal correlation among the consecutive symbol vectors generated by the multiterminal source. It is well known that, when the source has memory, the average amount of information per source symbol is given by the entropy rate, which is lower than its entropy ...

Del Ser, Javier — University of Navarra (TECNUN)


Adaptive filtering algorithms for acoustic echo cancellation and acoustic feedback control in speech communication applications

Multimedia consumer electronics are nowadays everywhere from teleconferencing, hands-free communications, in-car communications to smart TV applications and more. We are living in a world of telecommunication where ideal scenarios for implementing these applications are hard to find. Instead, practical implementations typically bring many problems associated to each real-life scenario. This thesis mainly focuses on two of these problems, namely, acoustic echo and acoustic feedback. On the one hand, acoustic echo cancellation (AEC) is widely used in mobile and hands-free telephony where the existence of echoes degrades the intelligibility and listening comfort. On the other hand, acoustic feedback limits the maximum amplification that can be applied in, e.g., in-car communications or in conferencing systems, before howling due to instability, appears. Even though AEC and acoustic feedback cancellation (AFC) are functional in many applications, there are still open issues. This means that ...

Gil-Cacho, Jose Manuel — KU Leuven


Facial Soft Biometrics: Methods, Applications and Solutions

This dissertation studies soft biometrics traits, their applicability in different security and commercial scenarios, as well as related usability aspects. We place the emphasis on human facial soft biometric traits which constitute the set of physical, adhered or behavioral human characteristics that can partially differentiate, classify and identify humans. Such traits, which include characteristics like age, gender, skin and eye color, the presence of glasses, moustache or beard, inherit several advantages such as ease of acquisition, as well as a natural compatibility with how humans perceive their surroundings. Specifically, soft biometric traits are compatible with the human process of classifying and recalling our environment, a process which involves constructions of hierarchical structures of different refined traits. This thesis explores these traits, and their application in soft biometric systems (SBSs), and specifically focuses on how such systems can achieve different goals ...

Dantcheva, Antitza — EURECOM / Telecom ParisTech


Sparsity Models for Signals: Theory and Applications

Many signal and image processing applications have benefited remarkably from the theory of sparse representations. In its classical form this theory models signal as having a sparse representation under a given dictionary -- this is referred to as the "Synthesis Model". In this work we focus on greedy methods for the problem of recovering a signal from a set of deteriorated linear measurements. We consider four different sparsity frameworks that extend the aforementioned synthesis model: (i) The cosparse analysis model; (ii) the signal space paradigm; (iii) the transform domain strategy; and (iv) the sparse Poisson noise model. Our algorithms of interest in the first part of the work are the greedy-like schemes: CoSaMP, subspace pursuit (SP), iterative hard thresholding (IHT) and hard thresholding pursuit (HTP). It has been shown for the synthesis model that these can achieve a stable recovery ...

Giryes, Raja — Technion


Broadband angle of arrival estimation using polynomial matrix decompositions

This thesis is concerned with the problem of broadband angle of arrival (AoA) estimation for sensor arrays. There is a rich theory of narrowband solutions to the AoA problem, which typically involves the covariance matrix of the received data and matrix factorisations such as the eigenvalue decomposition (EVD) to reach optimality in various senses. For broadband arrays, such as found in sonar, acoustics or other applications where signals do not fulfil the narrowband assumption, working with phase shifts between different signals — as sufficient in the narrowband case — does not suffice and explicit lags need to be taken into account. The required space-time covariance matrix of the data now has a lag dimension, and classical solutions such as those based on the EVD are no longer directly applicable. There are a number of existing broadband AoA techniques, which are ...

Alrmah, Mohamed Abubaker — University of Strathclyde


Constrained Non-negative Matrix Factorization for Vocabulary Acquisition from Continuous Speech

One desideratum in designing cognitive robots is autonomous learning of communication skills, just like humans. The primary step towards this goal is vocabulary acquisition. Being different from the training procedures of the state-of-the-art automatic speech recognition (ASR) systems, vocabulary acquisition cannot rely on prior knowledge of language in the same way. Like what infants do, the acquisition process should be data-driven with multi-level abstraction and coupled with multi-modal inputs. To avoid lengthy training efforts in a word-by-word interactive learning process, a clever learning agent should be able to acquire vocabularies from continuous speech automatically. The work presented in this thesis is entitled \emph{Constrained Non-negative Matrix Factorization for Vocabulary Acquisition from Continuous Speech}. Enlightened by the extensively studied techniques in ASR, we design computational models to discover and represent vocabularies from continuous speech with little prior knowledge of the language to ...

Sun, Meng — Katholieke Universiteit Leuven

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.