Algorithmic Analysis of Complex Audio Scenes

In this thesis, we examine the problem of algorithmic analysis of complex audio scenes with a special emphasis on natural audio scenes. One of the driving goals behind this work is to develop tools for monitoring the presence of animals in areas of interest based on their vocalisations. This task, which often occurs in the evaluation of nature conservation measures, leads to a number of subproblems in audio scene analysis. In order to develop and evaluate pattern recognition algorithms for animal sounds, a representative collection of such sounds is necessary. Building such a collection is beyond the scope of a single researcher and we therefore use data from the Animal Sound Archive of the Humboldt University of Berlin. Although a large portion of well annotated recordings from this archive has been available in digital form, little infrastructure for searching and ...

Bardeli, Rolf — University of Bonn


Speech recognition in noisy conditions using missing feature approach

The research in this thesis addresses the problem of automatic speech recognition in noisy environments. Automatic speech recognition systems obtain acceptable performances in noise free conditions but these performances degrade dramatically in presence of additive noise. This is mainly due to the mismatch between the training and the noisy operating conditions. In the time-frequency representation of the noisy speech signal, some of the clean speech features are masked by noise. In this case the clean speech features cannot be correctly estimated from the noisy speech and therefore they are considered as missing or unreliable. In order to improve the performance of speech recognition systems in additive noise conditions, special attention should be paid to the problems of detection and compensation of these unreliable features. This thesis is concerned with the problem of missing features applied to automatic speaker-independent speech recognition. ...

Renevey, Philippe — Swiss Federal Institute of Technology


Motion Analysis and Modeling for Activity Recognition and 3-D Animation based on Geometrical and Video Processing Algorithms

The analysis of audiovisual data aims at extracting high level information, equivalent with the one(s) that can be extracted by a human. It is considered as a fundamental, unsolved (in its general form) problem. Even though the inverse problem, the audiovisual (sound and animation) synthesis, is judged easier than the previous, it remains an unsolved problem. The systematic research on these problems yields solutions that constitute the basis for a great number of continuously developing applications. In this thesis, we examine the two aforementioned fundamental problems. We propose algorithms and models of analysis and synthesis of articulated motion and undulatory (snake) locomotion, using data from video sequences. The goal of this research is the multilevel information extraction from video, like object tracking and activity recognition, and the 3-D animation synthesis in virtual environments based on the results of analysis. An ...

Panagiotakis, Costas — University of Crete


Audio-visual processing and content management techniques, for the study of (human) bioacoustics phenomena

The present doctoral thesis aims towards the development of new long-term, multi-channel, audio-visual processing techniques for the analysis of bioacoustics phenomena. The effort is focused on the study of the physiology of the gastrointestinal system, aiming at the support of medical research for the discovery of gastrointestinal motility patterns and the diagnosis of functional disorders. The term "processing" in this case is quite broad, incorporating the procedures of signal processing, content description, manipulation and analysis, that are applied to all the recorded bioacoustics signals, the auxiliary audio-visual surveillance information (for the monitoring of experiments and the subjects' status), and the extracted audio-video sequences describing the abdominal sound-field alterations. The thesis outline is as follows. The main objective of the thesis, which is the technological support of medical research, is presented in the first chapter. A quick problem definition is initially ...

Dimoulas, Charalampos — Department of Electrical and Computer Engineering, Faculty of Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece


Video Processing for Remote Respiration Monitoring

Monitoring of vital signs is a key tool in medical diagnostics to asses the onset and the evolution of several diseases. Among fundamental vital parameters, such as the hearth rate, blood pressure and body temperature, the Respiratory Rate (RR) plays an important role. For this reason, respiration needs to be carefully monitored in order to detect potential signs or events indicating possible changes of health conditions. Monitoring of the respiration is generally carried out in hospital and clinical environments by the use of expensive devices with several sensors connected to the patient's body. A new research trend, in order to reduce healthcare service costs and make monitoring of vital signs more comfortable, is the development of low-cost systems which may allow remote and contactless monitoring; in such a context, an appealing method is to rely on video processing-based solutions. In ...

Alinovi, Davide — University of Parma


Tissue Characterisation from Intravascular Ultrasound using Texture Analysis

Intravascular ultrasound has, over the past decade, significantly changed the clinical diagnosis and therapeutic strategy of coronary and vascular disease assessment, as it not only allows visualisation of the vessel lumen, but gives a unique view of the pathophysiologic structure of the artery wall. This information is currently unavailable from the universally accepted instrument for artery assessment, angiography, which has on several occasions had its diagnostic accuracy questioned. With intravascular ultrasound, there is the potential to categorise diseased arterial tissue belonging to distinct pathological groups which can ultimately aid in the understanding of individual lesions as well as making a significant contribution to treatment choice and management of cardiac patients. The high resolution image information offered by intravascular ultrasound provides excellent crosssectional views of coronary artery disease at the level of the disease process itself. This information can be used ...

Nailon, William Henry — University Of Edinburgh


Fire Detection Algorithms Using Multimodal Signal and Image Analysis

Dynamic textures are common in natural scenes. Examples of dynamic textures in video include fire, smoke, clouds, volatile organic compound (VOC) plumes in infra-red (IR) videos, trees in the wind, sea and ocean waves, etc. Researchers extensively studied 2-D textures and related problems in the fields of image processing and computer vision. On the other hand, there is very little research on dynamic texture detection in video. In this dissertation, signal and image processing methods developed for detection of a specific set of dynamic textures are presented. Signal and image processing methods are developed for the detection of flames and smoke in open and large spaces with a range of up to $30$m to the camera in visible-range (IR) video. Smoke is semi-transparent at the early stages of fire. Edges present in image frames with smoke start loosing their sharpness ...

Toreyin, Behcet Ugur — Bilkent University


Face recognition, a landmarks tale

Face recognition is a technology that appeals to the imagination of many people. This is particularly reflected in the popularity of science-fiction films and forensic detective series such as CSI, CSI New York, CSI Miami, Bones and NCIS. Although these series tend to be set in the present, their application of face recognition should be considered science-fiction. The successes are not, or at least not yet, realistic. This does, however, not mean that it does not, or will never, work. To the contrary, face recognition is used in places where the user does not need or want to cooperate, for example entry to stadiums or stations, or the detection of double entries into databases. Another important reason to use face recognition is that it can be a user-friendly biometric security. Face recognition works reliably and robustly when there is little ...

Beumer, Gert M. — University of Twente


Parametric and non-parametric approaches for multisensor data fusion

Multisensor data fusion technology combines data and information from multiple sensors to achieve improved accuracies and better inference about the environment than could be achieved by the use of a single sensor alone. In this dissertation, we propose parametric and nonparametric multisensor data fusion algorithms with a broad range of applications. Image registration is a vital first step in fusing sensor data. Among the wide range of registration techniques that have been developed for various applications, mutual information based registration algorithms have been accepted as one of the most accurate and robust methods. Inspired by the mutual information based approaches, we propose to use the joint R´enyi entropy as the dissimilarity metric between images. Since the R´enyi entropy of an image can be estimated with the length of the minimum spanning tree over the corresponding graph, the proposed information-theoretic registration ...

Ma, Bing — University of Michigan


Optimized Merging of Search-Coil and Fluxgate Data for the Magnetospheric Multiscale Mission

he main objective of the Magnetospheric Multiscale (MMS) mission is to characterize fine-scale structures in the Earth’s magnetotail and magnetopause. These dynamic structures traverse the MMS spacecraft formation at high speed and generate magnetic field signatures that cross the sensitive frequency bands of both search-coil and fluxgate magnetometers. An improved understanding of these events is only possible by combining data from both instrument types for magnetospheric event analysis. This combination is done using a model-based sensor fusion approach that merges data from both instrument types to a virtual instrument with flat gain curve, linear phase and known timing properties as well as the highest sensitivity and lowest noise floor. The generation of the underlying instrument models requires precise knowledge of the instrument frequency responses and timing. This knowledge was obtained in a dedicated end-to-end measurement campaign using a purpose-built magnetic ...

Fischer, David — Signal Processing and Speech Communication Laboratory, TU Graz; Space Research Institute Graz, Austrian Academy of Sciences


Camera based motion estimation and recognition for human-computer interaction

Communicating with mobile devices has become an unavoidable part of our daily life. Unfortunately, the current user interface designs are mostly taken directly from desktop computers. This has resulted in devices that are sometimes hard to use. Since more processing power and new sensing technologies are already available, there is a possibility to develop systems to communicate through different modalities. This thesis proposes some novel computer vision approaches, including head tracking, object motion analysis and device ego-motion estimation, to allow efficient interaction with mobile devices. For head tracking, two new methods have been developed. The first method detects a face region and facial features by employing skin detection, morphology, and a geometrical face model. The second method, designed especially for mobile use, detects the face and eyes using local texture features. In both cases, Kalman filtering is applied to estimate ...

Hannuksela, Jari — University of Oulou


Monitoring Infants by Automatic Video Processing

This work has, as its objective, the development of non-invasive and low-cost systems for monitoring and automatic diagnosing specific neonatal diseases by means of the analysis of suitable video signals. We focus on monitoring infants potentially at risk of diseases characterized by the presence or absence of rhythmic movements of one or more body parts. Seizures and respiratory diseases are specifically considered, but the approach is general. Seizures are defined as sudden neurological and behavioural alterations. They are age-dependent phenomena and the most common sign of central nervous system dysfunction. Neonatal seizures have onset within the 28th day of life in newborns at term and within the 44th week of conceptional age in preterm infants. Their main causes are hypoxic-ischaemic encephalopathy, intracranial haemorrhage, and sepsis. Studies indicate an incidence rate of neonatal seizures of 2‰ live births, 11‰ for preterm ...

Cattani Luca — University of Parma (Italy)


Visual ear detection and recognition in unconstrained environments

Automatic ear recognition systems have seen increased interest over recent years due to multiple desirable characteristics. Ear images used in such systems can typically be extracted from profile head shots or video footage. The acquisition procedure is contactless and non-intrusive, and it also does not depend on the cooperation of the subjects. In this regard, ear recognition technology shares similarities with other image-based biometric modalities. Another appealing property of ear biometrics is its distinctiveness. Recent studies even empirically validated existing conjectures that certain features of the ear are distinct for identical twins. This fact has significant implications for security-related applications and puts ear images on a par with epigenetic biometric modalities, such as the iris. Ear images can also supplement other biometric modalities in automatic recognition systems and provide identity cues when other information is unreliable or even unavailable. In ...

Emeršič, Žiga — University of Ljubljana, Faculty of Computer and Information Science


Acoustic Event Detection: Feature, Evaluation and Dataset Design

It takes more time to think of a silent scene, action or event than finding one that emanates sound. Not only speaking or playing music but almost everything that happens is accompanied with or results in one or more sounds mixed together. This makes acoustic event detection (AED) one of the most researched topics in audio signal processing nowadays and it will probably not see a decline anywhere in the near future. This is due to the thirst for understanding and digitally abstracting more and more events in life via the enormous amount of recorded audio through thousands of applications in our daily routine. But it is also a result of two intrinsic properties of audio: it doesn’t need a direct sight to be perceived and is less intrusive to record when compared to image or video. Many applications such ...

Mina Mounir — KU Leuven, ESAT STADIUS


A COMPARISON OF DIFFERENT APPROACHES TO TARGET DIFFERENTIATION WITH SONAR

This study compares the performances of different classification schemes and fusion techniques for target differentiation and localization of commonly encountered features in indoor robot environments using sonar sensing. Differentiation of such features is of interest for intelligent systems in a variety of applications such as system control based on acoustic signal detection and identification, map-building, navigation, obstacle avoidance, and target tracking. The classification schemes employed include the target differentiation algorithm developed by Ayrulu and Barshan, statistical pattern recognition techniques, fuzzy c-means clustering algorithm, and artificial neural networks. The fusion techniques used are Dempster-Shafer evidential reasoning and different voting schemes. To solve the consistency problem arising in simple majority voting, different voting schemes including preference ordering and reliability measures are proposed and verified experimentally. To improve the performance of neural network classifiers, different input signal representations, two different training algorithms, and ...

Ayrulu-Erdem, Birsel — Bilkent University

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.