A Robust Face Recognition Algorithm for Real-World Applications

Face recognition is one of the most challenging problems of computer vision and pattern recognition. The difficulty in face recognition arises mainly from facial appearance variations caused by factors, such as expression, illumination, partial face occlusion, and time gap between training and testing data capture. Moreover, the performance of face recognition algorithms heavily depends on prior facial feature localization step. That is, face images need to be aligned very well before they are fed into a face recognition algorithm, which requires precise facial feature localization. This thesis addresses on solving these two main problems -facial appearance variations due to changes in expression, illumination, occlusion, time gap, and imprecise face alignment due to mislocalized facial features- in order to accomplish its goal of building a generic face recognition algorithm that can function reliably under real-world conditions. The proposed face recognition algorithm ...

Ekenel, Hazim Kemal — University of Karlsruhe


Video Sequence Analysis for Content Description, Summarization and Content-Based Retrieval

The main research area of this Ph.D. thesis is video sequence processing and analysis for description and indexing of visual content. Its objective is to contribute in the development of a computational system with the capabilities of object-based segmentation of audiovisual material, automatic content description, summarization for preview and browsing, as well as content-based retrieval. The thesis consists of four parts. The first introduces video sequence analysis, segmentation and object extraction based on color, motion, and depth field. A fusion technique is proposed that combines individual cue segmentations and allows for reliable identification of semantic objects. The second part refers to automatic description and annotation of the visual content by means of feature vectors, summarization, implemented by optimal selection of a limited set of key frames and shots, and content-based search and retrieval. In the third part, the problem of ...

Avrithis, Yannis — National Technical University of Athens


Camera based motion estimation and recognition for human-computer interaction

Communicating with mobile devices has become an unavoidable part of our daily life. Unfortunately, the current user interface designs are mostly taken directly from desktop computers. This has resulted in devices that are sometimes hard to use. Since more processing power and new sensing technologies are already available, there is a possibility to develop systems to communicate through different modalities. This thesis proposes some novel computer vision approaches, including head tracking, object motion analysis and device ego-motion estimation, to allow efficient interaction with mobile devices. For head tracking, two new methods have been developed. The first method detects a face region and facial features by employing skin detection, morphology, and a geometrical face model. The second method, designed especially for mobile use, detects the face and eyes using local texture features. In both cases, Kalman filtering is applied to estimate ...

Hannuksela, Jari — University of Oulou


Pointwise shape-adaptive DCT image filtering and signal-dependent noise estimation

When an image is acquired by a digital imaging sensor, it is always degraded by some noise. This leads to two basic questions: What are the main characteristics of this noise? How to remove it? These questions in turn correspond to two key problems in signal processing: noise estimation and noise removal (so-called denoising). This thesis addresses both abovementioned problems and provides a number of original and effective contributions for their solution. The first part of the thesis introduces a novel image denoising algorithm based on the low-complexity Shape-Adaptive Discrete Cosine Transform (SA-DCT). By using spatially adaptive supports for the transform, the quality of the filtered image is high, with clean edges and without disturbing artifacts. We further present extensions of this approach to image deblurring, deringing and deblocking, as well as to color image filtering. For all these applications, ...

Foi, Alessandro — Tampere University of Technology


Robust Speech Recognition: Analysis and Equalization of Lombard Effect in Czech Corpora

When exposed to noise, speakers will modify the way they speak in an effort to maintain intelligible communication. This process, which is referred to as Lombard effect (LE), involves a combination of both conscious and subconscious articulatory adjustment. Speech production variations due to LE can cause considerable degradation in automatic speech recognition (ASR) since they introduce a mismatch between parameters of the speech to be recognized and the ASR system’s acoustic models, which are usually trained on neutral speech. The main objective of this thesis is to analyze the impact of LE on speech production and to propose methods that increase ASR system performance in LE. All presented experiments were conducted on the Czech spoken language, yet, the proposed concepts are assumed applicable to other languages. The first part of the thesis focuses on the design and acquisition of a ...

Boril, Hynek — Czech Technical University in Prague


Decision threshold estimation and model quality evaluation techniques for speaker verification

The number of biometric applications has increased a lot in the last few years. In this context, the automatic person recognition by some physical traits like fingerprints, face, voice or iris, plays an important role. Users demand this type of applications every time more and the technology seems already mature. People look for security, low cost and accuracy but, at the same time, there are many other factors in connection with biometric applications that are growing in importance. Intrusiveness is undoubtedly a burning factor to decide about the biometrics we will used for our application. At this point, one can realize about the suitability of speaker recognition because voice is the natural way of communicating, can be remotely used and provides a low cost. Automatic speaker recognition is commonly used in telephonic applications although it can also be used in ...

Rodriguez Saeta, Javier — Universitat Politecnica de Catalunya


Large-Scale Light Field Capture and Reconstruction

This thesis discusses approaches and techniques to convert Sparsely-Sampled Light Fields (SSLFs) into Densely-Sampled Light Fields (DSLFs), which can be used for visualization on 3DTV and Virtual Reality (VR) devices. Exemplarily, a movable 1D large-scale light field acquisition system for capturing SSLFs in real-world environments is evaluated. This system consists of 24 sparsely placed RGB cameras and two Kinect V2 sensors. The real-world SSLF data captured with this setup can be leveraged to reconstruct real-world DSLFs. To this end, three challenging problems require to be solved for this system: (i) how to estimate the rigid transformation from the coordinate system of a Kinect V2 to the coordinate system of an RGB camera; (ii) how to register the two Kinect V2 sensors with a large displacement; (iii) how to reconstruct a DSLF from a SSLF with moderate and large disparity ranges. ...

Gao, Yuan — Department of Computer Science, Kiel University


Automatic Signature and Graphical Password Verification: Discriminant Features and New Application Scenarios

The proliferation of handheld devices such as smartphones and tablets brings a new scenario for biometric authentication, and in particular to automatic signature verification. Research on signature verification has been traditionally carried out using signatures acquired on digitizing tablets or Tablet-PCs. This PhD Thesis addresses the problem of user authentication on handled devices using handwritten signatures and graphical passwords based on free-form doodles, as well as the effects of biometric aging on signatures. The Thesis pretends to analyze: (i) which are the effects of mobile conditions on signature and doodle verification, (ii) which are the most distinctive features in mobile conditions, extracted from the pen or fingertip trajectory, (iii) how do different similarity computation (i.e. matching) algorithms behave with signatures and graphical passwords captured on mobile conditions, and (iv) what is the impact of aging on signature features and verification ...

Martinez-Diaz, Marcos — Universidad Autonoma de Madrid


Density-based shape descriptors and similarity learning for 3D object retrieval

Next generation search engines will enable query formulations, other than text, relying on visual information encoded in terms of images and shapes. The 3D search technology, in particular, targets specialized application domains ranging from computer aided-design and manufacturing to cultural heritage archival and presentation. Content-based retrieval research aims at developing search engines that would allow users to perform a query by similarity of content. This thesis deals with two fundamentals problems in content-based 3D object retrieval: (1) How to describe a 3D shape to obtain a reliable representative for the subsequent task of similarity search? (2) How to supervise the search process to learn inter-shape similarities for more effective and semantic retrieval? Concerning the first problem, we develop a novel 3D shape description scheme based on probability density of multivariate local surface features. We constructively obtain local characterizations of 3D ...

Akgul, Ceyhun Burak — Bogazici University and Telecom ParisTech


Techniques for improving the performance of distributed video coding

Distributed Video Coding (DVC) is a recently proposed paradigm in video communication, which fits well emerging applications such as wireless video surveillance, multimedia sensor networks, wireless PC cameras, and mobile cameras phones. These applications require a low complexity encoding, while possibly affording a high complexity decoding. DVC presents several advantages: First, the complexity can be distributed between the encoder and the decoder. Second, the DVC is robust to errors, since it uses a channel code. In DVC, a Side Information (SI) is estimated at the decoder, using the available decoded frames, and used for the decoding and reconstruction of other frames. In this Ph.D thesis, we propose new techniques in order to improve the quality of the SI. First, successive refinement of the SI is performed after each decoded DCT band, using a Partially Decoded WZF (PDWZF), along with the ...

Abou-Elailah, Abdalbassir — Telecom Paristech


Automated Face Recognition from Low-resolution Imagery

Recently, significant advances in the field of automated face recognition have been achieved using computer vision, machine learning, and deep learning methodologies. However, despite claims of super-human performance of face recognition algorithms on select key benchmark tasks, there remain several open problems that preclude the general replacement of human face recognition work with automated systems. State-of-the-art automated face recognition systems based on deep learning methods are able to achieve high accuracy when the face images they are tasked with recognizing subjects from are of sufficiently high quality. However, low image resolution remains one of the principal obstacles to face recognition systems, and their performance in the low-resolution regime is decidedly below human capabilities. In this PhD thesis, we present a systematic study of modern automated face recognition systems in the presence of image degradation in various forms. Based on our ...

Grm, Klemen — University of Ljubljana


Content-based search and browsing in semantic multimedia retrieval

Growth in storage capacity has led to large digital video repositories and complicated the discovery of specific information without the laborious manual annotation of data. The research focuses on creating a retrieval system that is ultimately independent of manual work. To retrieve relevant content, the semantic gap between the searcher's information need and the content data has to be overcome using content-based technology. Semantic gap constitutes of two distinct elements: the ambiguity of the true information need and the equivocalness of digital video data. The research problem of this thesis is: what computational content-based models for retrieval increase the effectiveness of the semantic retrieval of digital video? The hypothesis is that semantic search performance can be improved using pattern recognition, data abstraction and clustering techniques jointly with human interaction through manually created queries and visual browsing. The results of this ...

Rautiainen, Mika — University of Oulou


Human-Centered Content-Based Image Retrieval

Retrieval of images that lack a (suitable) annotations cannot be achieved through (traditional) Information Retrieval (IR) techniques. Access through such collections can be achieved through the application of computer vision techniques on the IR problem, which is baptized Content-Based Image Retrieval (CBIR). In contrast with most purely technological approaches, the thesis Human-Centered Content-Based Image Retrieval approaches the problem from a human/user centered perspective. Psychophysical experiments were conducted in which people were asked to categorize colors. The data gathered from these experiments was fed to a Fast Exact Euclidean Distance (FEED) transform (Schouten & Van den Broek, 2004), which enabled the segmentation of color space based on human perception (Van den Broek et al., 2008). This unique color space segementation was exploited for texture analysis and image segmentation, and subsequently for full-featured CBIR. In addition, a unique CBIR-benchmark was developed (Van ...

van den Broek, Egon L. — Radboud University Nijmegen


Natural-Scene Text Understanding

Either in color camera-based images or in low resolution thumbnails, inherent degradations, such as complex backgrounds, artistic fonts, uneven lighting or unsatisfactory resolution, must be taken into account. In order to circumvent or correct them, studies of image formation and degradation sources challengingly led to overcome too constrained definitions of color spaces. Hence the selective metric text extraction attempts to combine magnitude and directional processing of colors in an unsupervised framework. Text extraction from background is simultaneously linked to subsequent steps of character segmentation and recognition. This intermingled chain mainly aims at combining color, intensity and spatial information of pixels for robustness and accuracy. Each of these features addresses different issues; the first one for text extraction and the two latter ones for recovering initial separation between characters through log-Gabor filtering. In order to reach higher quality results, pre- and ...

Mancas-Thillou, Celine — Universite de Mons


Fire Detection Algorithms Using Multimodal Signal and Image Analysis

Dynamic textures are common in natural scenes. Examples of dynamic textures in video include fire, smoke, clouds, volatile organic compound (VOC) plumes in infra-red (IR) videos, trees in the wind, sea and ocean waves, etc. Researchers extensively studied 2-D textures and related problems in the fields of image processing and computer vision. On the other hand, there is very little research on dynamic texture detection in video. In this dissertation, signal and image processing methods developed for detection of a specific set of dynamic textures are presented. Signal and image processing methods are developed for the detection of flames and smoke in open and large spaces with a range of up to $30$m to the camera in visible-range (IR) video. Smoke is semi-transparent at the early stages of fire. Edges present in image frames with smoke start loosing their sharpness ...

Toreyin, Behcet Ugur — Bilkent University

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.