An Attention Model and its Application in Man-Made Scene Interpretation (2010)
Radial Basis Function Network Robust Learning Algorithms in Computer Vision Applications
This thesis introduces new learning algorithms for Radial Basis Function (RBF) networks. RBF networks is a feed-forward two-layer neural network used for functional approximation or pattern classification applications. The proposed training algorithms are based on robust statistics. Their theoretical performance has been assessed and compared with that of classical algorithms for training RBF networks. The applications of RBF networks described in this thesis consist of simultaneously modeling moving object segmentation and optical flow estimation in image sequences and 3-D image modeling and segmentation. A Bayesian classifier model is used for the representation of the image sequence and 3-D images. This employs an energy based description of the probability functions involved. The energy functions are represented by RBF networks whose inputs are various features drawn from the images and whose outputs are objects. The hidden units embed kernel functions. Each kernel ...
Bors, Adrian G. — Aristotle University of Thessaloniki
Contributions to Human Motion Modeling and Recognition using Non-intrusive Wearable Sensors
This thesis contributes to motion characterization through inertial and physiological signals captured by wearable devices and analyzed using signal processing and deep learning techniques. This research leverages the possibilities of motion analysis for three main applications: to know what physical activity a person is performing (Human Activity Recognition), to identify who is performing that motion (user identification) or know how the movement is being performed (motor anomaly detection). Most previous research has addressed human motion modeling using invasive sensors in contact with the user or intrusive sensors that modify the user’s behavior while performing an action (cameras or microphones). In this sense, wearable devices such as smartphones and smartwatches can collect motion signals from users during their daily lives in a less invasive or intrusive way. Recently, there has been an exponential increase in research focused on inertial-signal processing to ...
Gil-Martín, Manuel — Universidad Politécnica de Madrid
Deep Learning Techniques for Visual Counting
The explosion of Deep Learning (DL) added a boost to the already rapidly developing field of Computer Vision to such a point that vision-based tasks are now parts of our everyday lives. Applications such as image classification, photo stylization, or face recognition are nowadays pervasive, as evidenced by the advent of modern systems trivially integrated into mobile applications. In this thesis, we investigated and enhanced the visual counting task, which automatically estimates the number of objects in still images or video frames. Recently, due to the growing interest in it, several Convolutional Neural Network (CNN)-based solutions have been suggested by the scientific community. These artificial neural networks, inspired by the organization of the animal visual cortex, provide a way to automatically learn effective representations from raw visual data and can be successfully employed to address typical challenges characterizing this task, ...
Ciampi Luca — University of Pisa
Discrete-time speech processing with application to emotion recognition
The subject of this PhD thesis is the efficient and robust processing and analysis of the audio recordings that are derived from a call center. The thesis is comprised of two parts. The first part is dedicated to dialogue/non-dialogue detection and to speaker segmentation. The systems that are developed are prerequisite for detecting (i) the audio segments that actually contain a dialogue between the system and the call center customer and (ii) the change points between the system and the customer. This way the volume of the audio recordings that need to be processed is significantly reduced, while the system is automated. To detect the presence of a dialogue several systems are developed. This is the first effort found in the international literature that the audio channel is exclusively exploited. Also, it is the first time that the speaker utterance ...
Kotti, Margarita — Aristotle University of Thessaloniki
Machine vision applies computer vision to industry and manufacturing in order to control or analyze a process or activity. Typical application of machine vision is the inspection of produced goods like electronic devices, automobiles, food and pharmaceuticals. Machine vision systems form their judgement based on specially designed image processing softwares. Therefore, image processing is very crucial for their accuracy. Food industry is among the industries that largely use image processing for inspection of produce. Fruits and vegetables have extremely varying physical appearance. Numerous defect types present for apples as well as high natural variability of their skin color brings apple fruits into the center of our interest. Traditional inspection of apple fruits is performed by human experts. But, automation of this process is necessary to reduce error, variation, fatigue and cost due to human experts as well as to increase ...
Unay, Devrim — Universite de Mons
Camera based motion estimation and recognition for human-computer interaction
Communicating with mobile devices has become an unavoidable part of our daily life. Unfortunately, the current user interface designs are mostly taken directly from desktop computers. This has resulted in devices that are sometimes hard to use. Since more processing power and new sensing technologies are already available, there is a possibility to develop systems to communicate through different modalities. This thesis proposes some novel computer vision approaches, including head tracking, object motion analysis and device ego-motion estimation, to allow efficient interaction with mobile devices. For head tracking, two new methods have been developed. The first method detects a face region and facial features by employing skin detection, morphology, and a geometrical face model. The second method, designed especially for mobile use, detects the face and eyes using local texture features. In both cases, Kalman filtering is applied to estimate ...
Hannuksela, Jari — University of Oulou
Vision models and quality metrics for image processing applications
Optimizing the performance of digital imaging systems with respect to the capture, display, storage and transmission of visual information represents one of the biggest challenges in the field of image and video processing. Taking into account the way humans perceive visual information can be greatly beneficial for this task. To achieve this, it is necessary to understand and model the human visual system, which is also the principal goal of this thesis. Computational models for different aspects of the visual system are developed, which can be used in a wide variety of image and video processing applications. The proposed models and metrics are shown to be consistent with human perception. The focus of this work is visual quality assessment. A perceptual distortion metric (PDM) for the evaluation of video quality is presented. It is based on a model of the ...
Winkler, Stefan — Swiss Federal Institute of Technology
Computational Attention: Towards attentive computers
Consciously or unconsciously, humans always pay attention to a wide variety of stimuli. Attention is part of daily life and it is the first step to understanding. The proposed thesis deals with a computational approach to the human attentional mechanism and with its possible applications mainly in the field of computer vision. In a first stage, the text introduces a rarity-based three-level attention model handling monodimensional signals as well as images or video sequences. The concept of attention is defined as the transformation of a huge acquired unstructured data set into a smaller structured one while preserving the information: the attentional mechanism turns rough data into intelligence. Afterwards, several applications are described in the fields of machine vision, signal coding and enhancement, medical imaging, event detection and so on. These applications not only show the applicability of the proposed computational ...
Mancas, Matei — University of Mons (UMONS)
Constrained Non-negative Matrix Factorization for Vocabulary Acquisition from Continuous Speech
One desideratum in designing cognitive robots is autonomous learning of communication skills, just like humans. The primary step towards this goal is vocabulary acquisition. Being different from the training procedures of the state-of-the-art automatic speech recognition (ASR) systems, vocabulary acquisition cannot rely on prior knowledge of language in the same way. Like what infants do, the acquisition process should be data-driven with multi-level abstraction and coupled with multi-modal inputs. To avoid lengthy training efforts in a word-by-word interactive learning process, a clever learning agent should be able to acquire vocabularies from continuous speech automatically. The work presented in this thesis is entitled \emph{Constrained Non-negative Matrix Factorization for Vocabulary Acquisition from Continuous Speech}. Enlightened by the extensively studied techniques in ASR, we design computational models to discover and represent vocabularies from continuous speech with little prior knowledge of the language to ...
Sun, Meng — Katholieke Universiteit Leuven
Optimization of Video Streaming over 3G Networks
VIDEO streaming over cellular networks has been made possible in the last years by better performing video codecs and wireless cellular networks oriented to data transmission. The interaction between two heterogeneous worlds, the telecommunication infrastructure and the coding video software, calls for advanced optimization mechanisms. The actors involved in the optimization process are the cellular system's access network, UMTS and HSDPA, the wireless transmission channel and the fi nal user equipped with a mobile device capable of decoding video sequences. The knowledge and characterization of each of the building blocks allow the optimization of each element to the specifi c needs of the others. This doctoral thesis discusses three main contributions. In the fi rst part, the e ffects of transmission errors on video streams are analyzed. Incorrectly received video packets are usually discarded by the lower layers and not ...
Superiori, Luca — Vienna University of Technology
An adaptive edge-enhanced correlation based robust and real-time visual tracking framework, and two machine vision systems based on the framework are proposed. The visual tracking algorithm can track any object of interest in a video acquired from a stationary or moving camera. It can handle the real-world problems, such as noise, clutter, occlusion, uneven illumination, varying appearance, orientation, scale, and velocity of the maneuvering object, and object fading and obscuration in low contrast video at various zoom levels. The proposed machine vision systems are an active camera tracking system and a vision based system for a UGV (unmanned ground vehicle) to handle a road intersection. The core of the proposed visual tracking framework is an Edge Enhanced Back-propagation neural-network Controlled Fast Normalized Correlation (EE-BCFNC), which makes the object localization stage efficient and robust to noise, object fading, obscuration, and uneven ...
Ahmed, Javed — Electrical (Telecom.) Engineering Department, National University of Sciences and Technology, Rawalpindi, Pakistan.
Computational Attention: Modelisation and Application to Audio and Image Processing
Consciously or unconsciously, humans always pay attention to a wide variety of stimuli. Attention is part of daily life and it is the first step to understanding. The proposed thesis deals with a computational approach to the human attentional mechanism and with its possible applications mainly in the field of computer vision. In a first stage, the text introduces a rarity-based three-level attention model handling monodimensional signals as well as images or video sequences. The concept of attention is defined as the transformation of a huge acquired unstructured data set into a smaller structured one while preserving the information: the attentional mechanism turns rough data into intelligence. Afterwards, several applications are described in the fields of machine vision, signal coding and enhancement, medical imaging, event detection and so on. These applications not only show the applicability of the proposed computational ...
Mancas, Matei — Universite de Mons
Fire Detection Algorithms Using Multimodal Signal and Image Analysis
Dynamic textures are common in natural scenes. Examples of dynamic textures in video include fire, smoke, clouds, volatile organic compound (VOC) plumes in infra-red (IR) videos, trees in the wind, sea and ocean waves, etc. Researchers extensively studied 2-D textures and related problems in the fields of image processing and computer vision. On the other hand, there is very little research on dynamic texture detection in video. In this dissertation, signal and image processing methods developed for detection of a specific set of dynamic textures are presented. Signal and image processing methods are developed for the detection of flames and smoke in open and large spaces with a range of up to $30$m to the camera in visible-range (IR) video. Smoke is semi-transparent at the early stages of fire. Edges present in image frames with smoke start loosing their sharpness ...
Toreyin, Behcet Ugur — Bilkent University
Direct Pore-based Identification For Fingerprint Matching Process
Fingerprint, is considered one of the most crucial scientific tools in solving criminal cases. This biometric feature is composed of unique and distinctive patterns found on the fingertips of each individual. With advancing technology and progress in forensic sciences, fingerprint analysis plays a vital role in forensic investigations and the analysis of evidence at crime scenes. The fingerprint patterns of each individual start to develop in early stagesof life and never change thereafter. This fact makes fingerprints an exceptional means of identification. In criminal cases, fingerprint analysis is used to decipher traces, evidence, and clues at crime scenes. These analyses not only provide insights into how a crime was committed but also assist in identifying the culprits or individuals involved. Computer-based fingerprint identification systems yield faster and more accurate results compared to traditional methods, making fingerprint comparisons in large databases ...
Vedat DELICAN, PhD — Istanbul Technical University
A Robust Face Recognition Algorithm for Real-World Applications
Face recognition is one of the most challenging problems of computer vision and pattern recognition. The difficulty in face recognition arises mainly from facial appearance variations caused by factors, such as expression, illumination, partial face occlusion, and time gap between training and testing data capture. Moreover, the performance of face recognition algorithms heavily depends on prior facial feature localization step. That is, face images need to be aligned very well before they are fed into a face recognition algorithm, which requires precise facial feature localization. This thesis addresses on solving these two main problems -facial appearance variations due to changes in expression, illumination, occlusion, time gap, and imprecise face alignment due to mislocalized facial features- in order to accomplish its goal of building a generic face recognition algorithm that can function reliably under real-world conditions. The proposed face recognition algorithm ...
Ekenel, Hazim Kemal — University of Karlsruhe
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.