Spatiotonal Adaptivity in Super-Resolution of under-sampled Image Sequences

This thesis concerns the use of spatial and tonal adaptivity in improving the resolution of aliased image sequences under scene or camera motion. Each of the five content chapters focuses on a different subtopic of super-resolution: image registration (chapter 2), image fusion (chapter 3 and 4), super-resolution restoration (chapter 5), and super-resolution synthesis (chapter 6). Chapter 2 derives the Cramer-Rao lower bound of image registration and shows that iterative gradient-based estimators achieve this performance limit. Chapter 3 presents an algorithm for image fusion of irregularly sampled and uncertain data using robust normalized convolution. The size and shape of the fusion kernel is adapted to local curvilinear structures in the image. Each data sample is assigned an intensity-related certainty value to limit the influence of outliers. Chapter 4 presents two fast implementations of the signal-adaptive bilateral filter. The xy-separable implementation filters ...

Pham, Tuan Q. — Delft University of Technology


Automated Face Recognition from Low-resolution Imagery

Recently, significant advances in the field of automated face recognition have been achieved using computer vision, machine learning, and deep learning methodologies. However, despite claims of super-human performance of face recognition algorithms on select key benchmark tasks, there remain several open problems that preclude the general replacement of human face recognition work with automated systems. State-of-the-art automated face recognition systems based on deep learning methods are able to achieve high accuracy when the face images they are tasked with recognizing subjects from are of sufficiently high quality. However, low image resolution remains one of the principal obstacles to face recognition systems, and their performance in the low-resolution regime is decidedly below human capabilities. In this PhD thesis, we present a systematic study of modern automated face recognition systems in the presence of image degradation in various forms. Based on our ...

Grm, Klemen — University of Ljubljana


Large-Scale Light Field Capture and Reconstruction

This thesis discusses approaches and techniques to convert Sparsely-Sampled Light Fields (SSLFs) into Densely-Sampled Light Fields (DSLFs), which can be used for visualization on 3DTV and Virtual Reality (VR) devices. Exemplarily, a movable 1D large-scale light field acquisition system for capturing SSLFs in real-world environments is evaluated. This system consists of 24 sparsely placed RGB cameras and two Kinect V2 sensors. The real-world SSLF data captured with this setup can be leveraged to reconstruct real-world DSLFs. To this end, three challenging problems require to be solved for this system: (i) how to estimate the rigid transformation from the coordinate system of a Kinect V2 to the coordinate system of an RGB camera; (ii) how to register the two Kinect V2 sensors with a large displacement; (iii) how to reconstruct a DSLF from a SSLF with moderate and large disparity ranges. ...

Gao, Yuan — Department of Computer Science, Kiel University


Natural-Scene Text Understanding

Either in color camera-based images or in low resolution thumbnails, inherent degradations, such as complex backgrounds, artistic fonts, uneven lighting or unsatisfactory resolution, must be taken into account. In order to circumvent or correct them, studies of image formation and degradation sources challengingly led to overcome too constrained definitions of color spaces. Hence the selective metric text extraction attempts to combine magnitude and directional processing of colors in an unsupervised framework. Text extraction from background is simultaneously linked to subsequent steps of character segmentation and recognition. This intermingled chain mainly aims at combining color, intensity and spatial information of pixels for robustness and accuracy. Each of these features addresses different issues; the first one for text extraction and the two latter ones for recovering initial separation between characters through log-Gabor filtering. In order to reach higher quality results, pre- and ...

Mancas-Thillou, Celine — Universite de Mons


Combining anatomical and spectral information to enhance MRSI resolution and quantification: Application to Multiple Sclerosis

Multiple sclerosis is a progressive autoimmune disease that a˙ects young adults. Magnetic resonance (MR) imaging has become an integral part in monitoring multiple sclerosis disease. Conventional MR imaging sequences such as fluid attenuated inversion recovery imaging have high spatial resolution, and can visualise the presence of focal white matter brain lesions in multiple sclerosis disease. Manual delineation of these lesions on conventional MR images is time consuming and su˙ers from intra and inter-rater variability. Among the advanced MR imaging techniques, MR spectroscopic imaging can o˙er complementary information on lesion characterisation compared to conventional MR images. However, MR spectroscopic images have low spatial resolution. Therefore, the aim of this thesis is to automatically segment multiple sclerosis lesions on conventional MR images and use the information from high-resolution conventional MR images to enhance the resolution of MR spectroscopic images. Automatic single time ...

Jain, Saurabh — KU Leuven


Bayesian Fusion of Multi-band Images: A Powerful Tool for Super-resolution

Hyperspectral (HS) imaging, which consists of acquiring a same scene in several hundreds of contiguous spectral bands (a three dimensional data cube), has opened a new range of relevant applications, such as target detection [MS02], classification [C.-03] and spectral unmixing [BDPD+12]. However, while HS sensors provide abundant spectral information, their spatial resolution is generally more limited. Thus, fusing the HS image with other highly resolved images of the same scene, such as multispectral (MS) or panchromatic (PAN) images is an interesting problem. The problem of fusing a high spectral and low spatial resolution image with an auxiliary image of higher spatial but lower spectral resolution, also known as multi-resolution image fusion, has been explored for many years [AMV+11]. From an application point of view, this problem is also important as motivated by recent national programs, e.g., the Japanese next-generation space-borne ...

Wei, Qi — University of Toulouse


Single-pixel imaging: development and applications of adaptive methods

Single-pixel imaging is a recent paradigm that allows the acquisition of images at reasonably low cost by exploiting hardware compression of the data. The architecture of a single-pixel camera consists of only two elements: a spatial light modulator, and a single-point detector. The key idea is to measure the projection at the detector (i.e., the inner product) of the scene under view -the image- with some patterns. The post-processing of a sequence of measurements obtained with different patterns permits the restoring of the desired image. Single-pixel imaging has several advantages, which are of interest for different applications, and especially in the biomedical field. In particular, a time-resolved single-pixel imaging system benefits fluorescence lifetime sensing. Such a set-up can be coupled to a spectrometer, to supplement the lifetime with spectral information. However, the main limitation of single-pixel imaging is the speed ...

Rousset, Florian — University of Lyon - Politecnico di Milan


3D motion capture by computer vision and virtual rendering

Networked 3D virtual environments allow multiple users to interact with each other over the Internet. Users can share some sense of telepresence by remotely animating an avatar that represents them. However, avatar control may be tedious and still render user gestures poorly. This work aims at animating a user‟s avatar from real time 3D motion capture by monoscopic computer vision, thus allowing virtual telepresence to anyone using a personal computer with a webcam. The approach followed consists of registering a 3D articulated upper-body model to a video sequence. This involves searching iteratively for the best match between features extracted from the 3D model and from the image. A two-step registration process matches regions and then edges. The first contribution of this thesis is a method of allocating computing iterations under real-time constrain that achieves optimal robustness and accuracy. The major ...

Gomez Jauregui, David Antonio — Telecom SudParis


On some aspects of inverse problems in image processing

This work is concerned with two image-processing problems, image deconvolution with incomplete observations and data fusion of spectral images, and with some of the algorithms that are used to solve these and related problems. In image-deconvolution problems, the diagonalization of the blurring operator by means of the discrete Fourier transform usually yields very large speedups. When there are incomplete observations (e.g., in the case of unknown boundaries), standard deconvolution techniques normally involve non-diagonalizable operators, resulting in rather slow methods, or, otherwise, use inexact convolution models, resulting in the occurrence of artifacts in the enhanced images. We propose a new deconvolution framework for images with incomplete observations that allows one to work with diagonalizable convolution operators, and therefore is very fast. The framework is also an efficient, high-quality alternative to existing methods of dealing with the image boundaries, such as edge ...

Simões, Miguel — Universidade de Lisboa, Instituto Superior Técnico & Université Grenoble Alpes


Non-rigid Registration-based Data-driven 3D Facial Action Unit Detection

Automated analysis of facial expressions has been an active area of study due to its potential applications not only for intelligent human-computer interfaces but also for human facial behavior research. To advance automatic expression analysis, this thesis proposes and empirically proves two hypotheses: (i) 3D face data is a better data modality than conventional 2D camera images, not only for being much less disturbed by illumination and head pose effects but also for capturing true facial surface information. (ii) It is possible to perform detailed face registration without resorting to any face modeling. This means that data-driven methods in automatic expression analysis can compensate for the confounding effects like pose and physiognomy differences, and can process facial features more effectively, without suffering the drawbacks of model-driven analysis. Our study is based upon Facial Action Coding System (FACS) as this paradigm ...

Savran, Arman — Bogazici University


Parametric and non-parametric approaches for multisensor data fusion

Multisensor data fusion technology combines data and information from multiple sensors to achieve improved accuracies and better inference about the environment than could be achieved by the use of a single sensor alone. In this dissertation, we propose parametric and nonparametric multisensor data fusion algorithms with a broad range of applications. Image registration is a vital first step in fusing sensor data. Among the wide range of registration techniques that have been developed for various applications, mutual information based registration algorithms have been accepted as one of the most accurate and robust methods. Inspired by the mutual information based approaches, we propose to use the joint R´enyi entropy as the dissimilarity metric between images. Since the R´enyi entropy of an image can be estimated with the length of the minimum spanning tree over the corresponding graph, the proposed information-theoretic registration ...

Ma, Bing — University of Michigan


Motion detection and human recognition in video sequences

This thesis is concerned with the design of a complete framework that allows the real-time recognition of humans in a video stream acquired by a static camera. For each stage of the processing chain, which takes as input the raw images of the stream and eventually outputs the identity of the persons, we propose an original algorithm. The first algorithm is a background subtraction technique named ViBe. The purpose of ViBe is to detect the parts of the images that contain moving objects. The second algorithm determines which moving objects correspond to individuals. The third algorithm allows the recognition of the detected individuals from their gait. Our background subtraction algorithm, ViBe, uses a collection of samples to model the history of each pixel. The current value of a pixel is classified by comparison with the closest samples that belong to ...

Olivier, Barnich — University of Liege


Camera based motion estimation and recognition for human-computer interaction

Communicating with mobile devices has become an unavoidable part of our daily life. Unfortunately, the current user interface designs are mostly taken directly from desktop computers. This has resulted in devices that are sometimes hard to use. Since more processing power and new sensing technologies are already available, there is a possibility to develop systems to communicate through different modalities. This thesis proposes some novel computer vision approaches, including head tracking, object motion analysis and device ego-motion estimation, to allow efficient interaction with mobile devices. For head tracking, two new methods have been developed. The first method detects a face region and facial features by employing skin detection, morphology, and a geometrical face model. The second method, designed especially for mobile use, detects the face and eyes using local texture features. In both cases, Kalman filtering is applied to estimate ...

Hannuksela, Jari — University of Oulou


Distributed Compressed Representation of Correlated Image Sets

Vision sensor networks and video cameras find widespread usage in several applications that rely on effective representation of scenes or analysis of 3D information. These systems usually acquire multiple images of the same 3D scene from different viewpoints or at different time instants. Therefore, these images are generally correlated through displacement of scene objects. Efficient compression techniques have to exploit this correlation in order to efficiently communicate the 3D scene information. Instead of joint encoding that requires communication between the cameras, in this thesis we concentrate on distributed representation, where the captured images are encoded independently, but decoded jointly to exploit the correlation between images. One of the most important and challenging tasks relies in estimation of the underlying correlation from the compressed correlated images for effective reconstruction or analysis in the joint decoder. This thesis focuses on developing efficient ...

Thirumalai, Vijayaraghavan — EPFL, Switzerland


Video person recognition strategies using head motion and facial appearance

In this doctoral dissertation, we principally explore the use of the temporal information available in video sequences for person and gender recognition; in particular, we focus on the analysis of head and facial motion, and their potential application as biometric identifiers. We also investigate how to exploit as much video information as possible for the automatic recognition; more precisely, we examine the possibility of integrating the head and mouth motion information with facial appearance into a multimodal biometric system, and we study the extraction of novel spatio-temporal facial features for recognition. We initially present a person recognition system that exploits the unconstrained head motion information, extracted by tracking a few facial landmarks in the image plane. In particular, we detail how each video sequence is firstly pre-processed by semiautomatically detecting the face, and then automatically tracking the facial landmarks over ...

Matta, Federico — Eurécom / Multimedia communications

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.