Audio-visual processing and content management techniques, for the study of (human) bioacoustics phenomena (2006)
Mixed structural models for 3D audio in virtual environments
In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...
Geronazzo, Michele — University of Padova
Detection of epileptic seizures based on video and accelerometer recordings
Epilepsy is one of the most common neurological diseases, especially in children. And although the majority of patients can be treated through medication or surgery (70%-75%), a significant group of patients cannot be treated. For this latter group of patients it is advisable to follow the evolution of the disease. This can be done through a long-term automatic monitoring, which gives an objective measure of the number of seizures that the patient has, for example during the night. On the other hand, there is a reduced social control overnight and the parents or caregivers can miss some seizures. In severe seizures, it is sometimes necessary, however, to avoid dangerous situations during or after the seizure (e.g. the danger of suffocation caused by vomiting or a position that obstructs breathing, or the risk of injury during violent movements), and to comfort ...
Cuppens, Kris — Katholieke Universiteit Leuven
Localizing the bioelectric phenomena originating from the cerebral cortex and evoked by auditory and somatosensory stimuli are clear objectives to both understand how the brain works and to recognize different pathologies. Diseases such as Parkinson's, Alzheimer's, schizophrenia and epilepsy are intensively studied to find a cure or accurate diagnosis. Epilepsy is considered the disease with major prevalence within disorders with neurological origin. The recurrent and sudden incidence of seizures can lead to dangerous and possibly life-threatening situations. Since disturbance of consciousness and sudden loss of motor control often occur without any warning, the ability to predict epileptic seizures would reduce patients' anxiety, thus considerably improving quality of life and safety. The common procedure for epilepsy seizure detection is based on brain activity monitorization via electroencephalogram (EEG) data. This process consumes a lot of time, especially in the case of long ...
Carlos Guerrero-Mosquera — University Carlos III of Madrid
Cognitive Models for Acoustic and Audiovisual Sound Source Localization
Sound source localization algorithms have a long research history in the field of digital signal processing. Many common applications like intelligent personal assistants, teleconferencing systems and methods for technical diagnosis in acoustics require an accurate localization of sound sources in the environment. However, dynamic environments entail a particular challenge for these systems. For instance, voice controlled smart home applications, where the speaker, as well as potential noise sources, are moving within the room, are a typical example of dynamic environments. Classical sound source localization systems only have limited capabilities to deal with dynamic acoustic scenarios. In this thesis, three novel approaches to sound source localization that extend existing classical methods will be presented. The first system is proposed in the context of audiovisual source localization. Determining the position of sound sources in adverse acoustic conditions can be improved by including ...
Schymura, Christopher — Ruhr University Bochum
Algorithmic Analysis of Complex Audio Scenes
In this thesis, we examine the problem of algorithmic analysis of complex audio scenes with a special emphasis on natural audio scenes. One of the driving goals behind this work is to develop tools for monitoring the presence of animals in areas of interest based on their vocalisations. This task, which often occurs in the evaluation of nature conservation measures, leads to a number of subproblems in audio scene analysis. In order to develop and evaluate pattern recognition algorithms for animal sounds, a representative collection of such sounds is necessary. Building such a collection is beyond the scope of a single researcher and we therefore use data from the Animal Sound Archive of the Humboldt University of Berlin. Although a large portion of well annotated recordings from this archive has been available in digital form, little infrastructure for searching and ...
Bardeli, Rolf — University of Bonn
Mining the ECG: Algorithms and Applications
This research focuses on the development of algorithms to extract diagnostic information from the ECG signal, which can be used to improve automatic detection systems and home monitoring solutions. In the first part of this work, a generically applicable algorithm for model selection in kernel principal component analysis is presented, which was inspired by the derivation of respiratory information from the ECG signal. This method not only solves a problem in biomedical signal processing, but more importantly offers a solution to a long-standing problem in the field of machine learning. Next, a methodology to quantify the level of contamination in a segment of ECG is proposed. This level is used to detect artifacts, and to improve the performance of different classifiers, by removing these artifacts from the training set. Furthermore, an evaluation of three different methodologies to compute the ECG-derived ...
Varon, Carolina — KU Leuven
Denoising and Features Extraction of ECG Signals using Unbiased FIR Estimation Techniques
The electrocardiogram (ECG) signals bear fundamental information for medical experts to make decisions about heart diseases. Therefore, in the past decades the scientific community has made great efforts to develop methods for the heartbeat features extraction via ECG records with the highest accuracy and efficiency using different strategies. It should be noted that noise and artifacts induced by external factors make it difficult to learn specific patterns of ECG signals, which play an important role to find abnormalities. Using filtering techniques such as the unbiased finite impulse response FIR (UFIR) filtering approach promises better results. Aimed at extracting the features with the highest accuracy, in this dissertation, we have designed and applied to ECG signals the adaptive UFIR filter and smoother. We also compared the proposed technique with the traditional method such as UFIR predictors, standard filters (e.g. low-pass filter), ...
Lastre Dominguez Carlos Mauricio — Universidad de Guanajuato
Digital Forensic Techniques for Splicing Detection in Multimedia Contents
Visual and audio contents always played a key role in communications, because of their immediacy and presumed objectivity. This has become even more true in the digital era, and today it is common to have multimedia contents stand as proof of events. Digital contents, however, are also very easy to manipulate, thus calling for analysis methods devoted to uncover their processing history. Multimedia forensics is the science trying to answer questions about the past of a given image, audio or video file, questions like “which was the recording device?", or “is the content authentic?". In particular, authenticity assessment is a crucial task in many contexts, and it usually consists in determining whether the investigated object has been artificially created by splicing together different contents. In this thesis we address the problem of splicing detection in the three main media: image, ...
Fontani, Marco — Dept. of Information Engineering and Mathematics, University of Siena
Perceptually-Based Signal Features for Environmental Sound Classification
This thesis faces the problem of automatically classifying environmental sounds, i.e., any non-speech or non-music sounds that can be found in the environment. Broadly speaking, two main processes are needed to perform such classification: the signal feature extraction so as to compose representative sound patterns and the machine learning technique that performs the classification of such patterns. The main focus of this research is put on the former, studying relevant signal features that optimally represent the sound characteristics since, according to several references, it is a key issue to attain a robust recognition. This type of audio signals holds many differences with speech or music signals, thus specific features should be determined and adapted to their own characteristics. In this sense, new signal features, inspired by the human auditory system and the human perception of sound, are proposed to improve ...
Valero, Xavier — La Salle-Universitat Ramon Llull
Modulation Spectrum Analysis for Noisy Electrocardiogram Signal Processing and Applications
Advances in wearable electrocardiogram (ECG) monitoring devices have allowed for new cardiovascular applications to emerge beyond diagnostics, such as stress and fatigue detection, athletic performance assessment, sleep disorder characterization, mood recognition, activity surveillance, biometrics, and fitness tracking, to name a few. Such devices, however, are prone to artifacts, particularly due to movement, thus hampering heart rate and heart rate variability measurement and posing a serious threat to cardiac monitoring applications. To address these issues, this thesis proposes the use of a spectro-temporal signal representation called “modulation spectrum”, which is shown to accurately separate cardiac and noise components from the ECG signals, thus opening doors for noise-robust ECG signal processing tools and applications. First, an innovative ECG quality index based on the modulation spectral signal representation is proposed. The representation quantifies the rate-of-change of ECG spectral components, which are shown to ...
Tobon Vallejo, Diana Patricia — INRS-EMT
Digital Processing Based Solutions for Life Science Engineering Recognition Problems
The field of Life Science Engineering (LSE) is rapidly expanding and predicted to grow strongly in the next decades. It covers areas of food and medical research, plant and pests’ research, and environmental research. In each research area, engineers try to find equations that model a certain life science problem. Once found, they research different numerical techniques to solve for the unknown variables of these equations. Afterwards, solution improvement is examined by adopting more accurate conventional techniques, or developing novel algorithms. In particular, signal and image processing techniques are widely used to solve those LSE problems require pattern recognition. However, due to the continuous evolution of the life science problems and their natures, these solution techniques can not cover all aspects, and therefore demanding further enhancement and improvement. The thesis presents numerical algorithms of digital signal and image processing to ...
Hussein, Walid — Technische Universität München
Sound Event Detection by Exploring Audio Sequence Modelling
Everyday sounds in real-world environments are a powerful source of information by which humans can interact with their environments. Humans can infer what is happening around them by listening to everyday sounds. At the same time, it is a challenging task for a computer algorithm in a smart device to automatically recognise, understand, and interpret everyday sounds. Sound event detection (SED) is the process of transcribing an audio recording into sound event tags with onset and offset time values. This involves classification and segmentation of sound events in the given audio recording. SED has numerous applications in everyday life which include security and surveillance, automation, healthcare monitoring, multimedia information retrieval, and assisted living technologies. SED is to everyday sounds what automatic speech recognition (ASR) is to speech and automatic music transcription (AMT) is to music. The fundamental questions in designing ...
[Pankajakshan], [Arjun] — Queen Mary University of London
Audio Visual Speech Enhancement
This thesis presents a novel approach to speech enhancement by exploiting the bimodality of speech production and the correlation that exists between audio and visual speech information. An analysis into the correlation of a range of audio and visual features reveals significant correlation to exist between visual speech features and audio filterbank features. The amount of correlation was also found to be greater when the correlation is analysed with individual phonemes rather than across all phonemes. This led to building a Gaussian Mixture Model (GMM) that is capable of estimating filterbank features from visual features. Phoneme-specific GMMs gave lower filterbank estimation errors and a phoneme transcription is decoded using audio-visual Hidden Markov Model (HMM). Clean filterbank estimates along with mean noise estimates were then utilised to construct visually-derived Wiener filters that are able to enhance noisy speech. The mean noise ...
Almajai, Ibrahim — University of East Anglia
Acoustic sensor network geometry calibration and applications
In the modern world, we are increasingly surrounded by computation devices with communication links and one or more microphones. Such devices are, for example, smartphones, tablets, laptops or hearing aids. These devices can work together as nodes in an acoustic sensor network (ASN). Such networks are a growing platform that opens the possibility for many practical applications. ASN based speech enhancement, source localization, and event detection can be applied for teleconferencing, camera control, automation, or assisted living. For this kind of applications, the awareness of auditory objects and their spatial positioning are key properties. In order to provide these two kinds of information, novel methods have been developed in this thesis. Information on the type of auditory objects is provided by a novel real-time sound classification method. Information on the position of human speakers is provided by a novel localization ...
Plinge, Axel — TU Dortmund University
Wavelet Analysis For Robust Speech Processing and Applications
In this work, we study the application of wavelet analysis for robust speech processing. Reliable time-scale features (TS) which characterize the relevant phonetic classes such as voiced (V), unvoiced (UV), silence (S), mixed-excitation, and stop sounds are extracted. By training neural and Bayesian networks, the classification rates provided by only 7 TS features are mostly similar to the ones obtained by 13 MFCC features. The TS features are further enhanced to design a reliable and low-complexity V/UV/S classifier. Quantile filtering and slope tracking are used for deriving adaptive thresholds. A robust voice activity detector is then built and used as a pre-processing stage to improve the performance of a speaker verification system. Based on wavelet shrinkage, a statistical wavelet filtering (SWF) method is designed for speech enhancement. Non-stationary and colored noise is handled by employing quantile filtering and time-frequency adaptive ...
Pham, Van Tuan — Graz University of Technology
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.