Discrete-time speech processing with application to emotion recognition

The subject of this PhD thesis is the efficient and robust processing and analysis of the audio recordings that are derived from a call center. The thesis is comprised of two parts. The first part is dedicated to dialogue/non-dialogue detection and to speaker segmentation. The systems that are developed are prerequisite for detecting (i) the audio segments that actually contain a dialogue between the system and the call center customer and (ii) the change points between the system and the customer. This way the volume of the audio recordings that need to be processed is significantly reduced, while the system is automated. To detect the presence of a dialogue several systems are developed. This is the first effort found in the international literature that the audio channel is exclusively exploited. Also, it is the first time that the speaker utterance ...

Kotti, Margarita — Aristotle University of Thessaloniki


Audio-visual processing and content management techniques, for the study of (human) bioacoustics phenomena

The present doctoral thesis aims towards the development of new long-term, multi-channel, audio-visual processing techniques for the analysis of bioacoustics phenomena. The effort is focused on the study of the physiology of the gastrointestinal system, aiming at the support of medical research for the discovery of gastrointestinal motility patterns and the diagnosis of functional disorders. The term "processing" in this case is quite broad, incorporating the procedures of signal processing, content description, manipulation and analysis, that are applied to all the recorded bioacoustics signals, the auxiliary audio-visual surveillance information (for the monitoring of experiments and the subjects' status), and the extracted audio-video sequences describing the abdominal sound-field alterations. The thesis outline is as follows. The main objective of the thesis, which is the technological support of medical research, is presented in the first chapter. A quick problem definition is initially ...

Dimoulas, Charalampos — Department of Electrical and Computer Engineering, Faculty of Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece


Bayesian methods for sparse and low-rank matrix problems

Many scientific and engineering problems require us to process measurements and data in order to extract information. Since we base decisions on information, it is important to design accurate and efficient processing algorithms. This is often done by modeling the signal of interest and the noise in the problem. One type of modeling is Compressed Sensing, where the signal has a sparse or low-rank representation. In this thesis we study different approaches to designing algorithms for sparse and low-rank problems. Greedy methods are fast methods for sparse problems which iteratively detects and estimates the non-zero components. By modeling the detection problem as an array processing problem and a Bayesian filtering problem, we improve the detection accuracy. Bayesian methods approximate the sparsity by probability distributions which are iteratively modified. We show one approach to making the Bayesian method the Relevance Vector ...

Sundin, Martin — Department of Signal Processing, Royal Institute of Technology KTH


Towards Automatic Extraction of Harmony Information from Music Signals

In this thesis we address the subject of automatic extraction of harmony information from audio recordings. We focus on chord symbol recognition and methods for evaluating algorithms designed to perform that task. We present a novel six-dimensional model for equal tempered pitch space based on concepts from neo-Riemannian music theory. This model is employed as the basis of a harmonic change detection function which we use to improve the performance of a chord recognition algorithm. We develop a machine readable text syntax for chord symbols and present a hand labelled chord transcription collection of 180 Beatles songs annotated using this syntax. This collection has been made publicly available and is already widely used for evaluation purposes in the research community. We also introduce methods for comparing chord symbols which we subsequently use for analysing the statistics of the transcription collection. ...

Harte, Christopher — Queen Mary, University of London


Statistical-dynamical channel modeling of outdoor optical wireless links

The growing need for Earth observation and monitoring systems has stimulated considerable interest in free-space optical wireless (FSO) systems because of the huge bandwidth requirements. However, terrestrial FSO links are severely impacted by weather conditions especially dense fog to a larger extent while, rain and snow to a lesser extent. For the proper deployment of FSO technology requires a better understanding of the free-space channel transmission characteristics as they have major influence on the transmission link properties like link availability, reliability and quality of service. This thesis provides new insight on the fog microphysics, its characterization and the fog attenuation modeling. A comprehensive analysis of the measured fog attenuations is presented by building the discussion through comparison of recorded attenuations at Graz (Austria), Milan (Italy), Nice (France) and Prague (Czech Republic). It was observed that fog attenuations in radiation fog ...

Awan, Muhammad Saleem — Graz University of Technology


Probabilistic modeling for sensor fusion with inertial measurements

In recent years, inertial sensors have undergone major developments. The quality of their measurements has improved while their cost has decreased, leading to an increase in availability. They can be found in stand-alone sensor units, so-called inertial measurement units, but are nowadays also present in for instance any modern smartphone, in Wii controllers and in virtual reality headsets. The term inertial sensor refers to the combination of accelerometers and gyroscopes. These measure the external specific force and the angular velocity, respectively. Integration of their measurements provides information about the sensor’s position and orientation. However, the position and orientation estimates obtained by simple integration suffer from drift and are therefore only accurate on a short time scale. In order to improve these estimates, we combine the inertial sensors with additional sensors and models. To combine these different sources of information, also ...

Kok, Manon — Linköping University


Antenna Arrays for Multipath and Interference Mitigation in GNSS Receivers

This thesis deals with the synchronization of one or several replicas of a known signal received in a scenario with multipath propagation and directional interference. A connecting theme along this work is the systematic application of the maximum likelihood (ML) principle together with a signal model in which the spatial signatures are unstructured and the noise term is Gaussian- distributed with an unknown correlation matrix. This last assumption is key in obtaining estimators that are capable of mitigating the disturbing signals that exhibit a certain structure, and this is achieved without resorting to the estimation of the parameters of those signals. On the other hand, the assumption of unstructured spatial signatures is interesting from a practical standpoint and facilitates the estimation problem since the estimates of these signatures can be obtained in closed form. This constitutes a first step towards ...

Seco-Granados, Gonzalo — Universitat Politecnica de Catalunya


Contributions to signal analysis and processing using compressed sensing techniques

Chapter 2 contains a short introduction to the fundamentals of compressed sensing theory, which is the larger context of this thesis. We start with introducing the key concepts of sparsity and sparse representations of signals. We discuss the central problem of compressed sensing, i.e. how to adequately recover sparse signals from a small number of measurements, as well as the multiple formulations of the reconstruction problem. A large part of the chapter is devoted to some of the most important conditions necessary and/or sufficient to guarantee accurate recovery. The aim is to introduce the reader to the basic results, without the burden of detailed proofs. In addition, we also present a few of the popular reconstruction and optimization algorithms that we use throughout the thesis. Chapter 3 presents an alternative sparsity model known as analysis sparsity, that offers similar recovery ...

Cleju, Nicolae — "Gheorghe Asachi" Technical University of Iasi


Variational Sparse Bayesian Learning: Centralized and Distributed Processing

In this thesis we investigate centralized and distributed variants of sparse Bayesian learning (SBL), an effective probabilistic regression method used in machine learning. Since inference in an SBL model is not tractable in closed form, approximations are needed. We focus on the variational Bayesian approximation, as opposed to others used in the literature, for three reasons: First, it is a flexible general framework for approximate Bayesian inference that estimates probability densities including point estimates as a special case. Second, it has guaranteed convergence properties. And third, it is a deterministic approximation concept that is even applicable for high dimensional problems where non-deterministic sampling methods may be prohibitive. We resolve some inconsistencies in the literature involved in other SBL approximation techniques with regard to a proper Bayesian treatment and the incorporation of a very desired property, namely scale invariance. More specifically, ...

Buchgraber, Thomas — Graz University of Technology


Automatic Speaker Characterization; Identification of Gender, Age, Language and Accent from Speech Signals

Speech signals carry important information about a speaker such as age, gender, language, accent and emotional/psychological state. Automatic recognition of speaker characteristics has a wide range of commercial, medical and forensic applications such as interactive voice response systems, service customization, natural human-machine interaction, recognizing the type of pathology of speakers, and directing the forensic investigation process. This research aims to develop accurate methods and tools to identify different physical characteristics of the speakers. Due to the lack of required databases, among all characteristics of speakers, our experiments cover gender recognition, age estimation, language recognition and accent/dialect identification. However, similar approaches and techniques can be applied to identify other characteristics such as emotional/psychological state. For speaker characterization, we first convert variable-duration speech signals into fixed-dimensional vectors suitable for classification/regression algorithms. This is performed by fitting a probability density function to acoustic ...

Bahari, Mohamad Hasan — KU Leuven


Fitting maximum-entropy models on large sample spaces

This thesis investigates the iterative application of Monte Carlo methods to the problem of parameter estimation for models of maximum entropy, minimum divergence, and maximum likelihood among the class of exponential-family densities. It describes a suite of tools for applying such models to large domains in which exact computation is not practically possible. The first result is a derivation of estimators for the Lagrange dual of the entropy and its gradient using importance sampling from a measure on the same probability space or its image under the transformation induced by the canonical sufficient statistic. This yields two benefits. One is the flexibility to choose an auxiliary distribution for sampling that reduces the standard error of the estimates for a given sample size. The other is the opportunity to re-weight a fixed sample iteratively to reduce the computational burden for each ...

Schofield, Edward — Imperial College London


Biological Image Analysis

In biological research images are extensively used to monitor growth, dynamics and changes in biological specimen, such as cells or plants. Many of these images are used solely for observation or are manually annotated by an expert. In this dissertation we discuss several methods to automate the annotating and analysis of bio-images. Two large clusters of methods have been investigated and developed. A first set of methods focuses on the automatic delineation of relevant objects in bio-images, such as individual cells in microscopic images. Since these methods should be useful for many different applications, e.g. to detect and delineate different objects (cells, plants, leafs, ...) in different types of images (different types of microscopes, regular colour photographs, ...), the methods should be easy to adjust. Therefore we developed a methodology relying on probability theory, where all required parameters can easily ...

De Vylder, Jonas — Ghent University


Performance Analysis and Algorithm Design for Distributed Transmit Beamforming

Wireless sensor networks has been one of the major research topics in recent years because of its great potential for a wide range of applications. In some application scenarios, sensor nodes intend to report the sensing data to a far-field destination, which cannot be realized by traditional transmission techniques. Due to the energy limitations and the hardware constraints of sensor nodes, distributed transmit beamforming is considered as an attractive candidate for long-range communications in such scenarios as it can reduce energy requirement of each sen-sor node and extend the communication range. However, unlike conventional beamforming, which is performed by a centralized antenna array, distributed beamforming is performed by a virtual antenna array composed of randomly located sensor nodes, each of which has an independent oscillator. Sensor nodes have to coordinate with each other and adjust their transmitting signals to collaboratively ...

Song, Shuo — University of Edinburgh


Distributed Localization and Tracking of Acoustic Sources

Localization, separation and tracking of acoustic sources are ancient challenges that lots of animals and human beings are doing intuitively and sometimes with an impressive accuracy. Artificial methods have been developed for various applications and conditions. The majority of those methods are centralized, meaning that all signals are processed together to produce the estimation results. The concept of distributed sensor networks is becoming more realistic as technology advances in the fields of nano-technology, micro electro-mechanic systems (MEMS) and communication. A distributed sensor network comprises scattered nodes which are autonomous, self-powered modules consisting of sensors, actuators and communication capabilities. A variety of layout and connectivity graphs are usually used. Distributed sensor networks have a broad range of applications, which can be categorized in ecology, military, environment monitoring, medical, security and surveillance. In this dissertation we develop algorithms for distributed sensor networks ...

Dorfan, Yuval — Bar Ilan University


Acoustic sensor network geometry calibration and applications

In the modern world, we are increasingly surrounded by computation devices with communication links and one or more microphones. Such devices are, for example, smartphones, tablets, laptops or hearing aids. These devices can work together as nodes in an acoustic sensor network (ASN). Such networks are a growing platform that opens the possibility for many practical applications. ASN based speech enhancement, source localization, and event detection can be applied for teleconferencing, camera control, automation, or assisted living. For this kind of applications, the awareness of auditory objects and their spatial positioning are key properties. In order to provide these two kinds of information, novel methods have been developed in this thesis. Information on the type of auditory objects is provided by a novel real-time sound classification method. Information on the position of human speakers is provided by a novel localization ...

Plinge, Axel — TU Dortmund University

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.