Super-Resolution Image Reconstruction Using Non-Linear Filtering Techniques

Super-resolution (SR) is a filtering technique that combines a sequence of under-sampled and degraded low-resolution images to produce an image at a higher resolution. The reconstruction takes advantage of the additional spatio-temporal data available in the sequence of images portraying the same scene. The fundamental problem addressed in super-resolution is a typical example of an inverse problem, wherein multiple low-resolution (LR)images are used to solve for the original high-resolution (HR) image. Super-resolution has already proved useful in many practical cases where multiple frames of the same scene can be obtained, including medical applications, satellite imaging and astronomical observatories. The application of super resolution filtering in consumer cameras and mobile devices shall be possible in the future, especially that the computational and memory resources in these devices are increasing all the time. For that goal, several research problems need to be ...

Trimeche, Mejdi — Tampere University of Technology


Deep learning for semantic description of visual human traits

The recent progress in artificial neural networks (rebranded as “deep learning”) has significantly boosted the state-of-the-art in numerous domains of computer vision offering an opportunity to approach the problems which were hardly solvable with conventional machine learning. Thus, in the frame of this PhD study, we explore how deep learning techniques can help in the analysis of one the most basic and essential semantic traits revealed by a human face, namely, gender and age. In particular, two complementary problem settings are considered: (1) gender/age prediction from given face images, and (2) synthesis and editing of human faces with the required gender/age attributes. Convolutional Neural Network (CNN) has currently become a standard model for image-based object recognition in general, and therefore, is a natural choice for addressing the first of these two problems. However, our preliminary studies have shown that the ...

Antipov, Grigory — Télécom ParisTech (Eurecom)


General Approaches for Solving Inverse Problems with Arbitrary Signal Models

Ill-posed inverse problems appear in many signal and image processing applications, such as deblurring, super-resolution and compressed sensing. The common approach to address them is to design a specific algorithm, or recently, a specific deep neural network, for each problem. Both signal processing and machine learning tactics have drawbacks: traditional reconstruction strategies exhibit limited performance for complex signals, such as natural images, due to the hardness of their mathematical modeling; while modern works that circumvent signal modeling by training deep convolutional neural networks (CNNs) suffer from a huge performance drop when the observation model used in training is inexact. In this work, we develop and analyze reconstruction algorithms that are not restricted to a specific signal model and are able to handle different observation models. Our main contributions include: (a) We generalize the popular sparsity-based CoSaMP algorithm to any signal ...

Tirer, Tom — Tel Aviv University


Spatiotonal Adaptivity in Super-Resolution of under-sampled Image Sequences

This thesis concerns the use of spatial and tonal adaptivity in improving the resolution of aliased image sequences under scene or camera motion. Each of the five content chapters focuses on a different subtopic of super-resolution: image registration (chapter 2), image fusion (chapter 3 and 4), super-resolution restoration (chapter 5), and super-resolution synthesis (chapter 6). Chapter 2 derives the Cramer-Rao lower bound of image registration and shows that iterative gradient-based estimators achieve this performance limit. Chapter 3 presents an algorithm for image fusion of irregularly sampled and uncertain data using robust normalized convolution. The size and shape of the fusion kernel is adapted to local curvilinear structures in the image. Each data sample is assigned an intensity-related certainty value to limit the influence of outliers. Chapter 4 presents two fast implementations of the signal-adaptive bilateral filter. The xy-separable implementation filters ...

Pham, Tuan Q. — Delft University of Technology


Non-rigid Registration-based Data-driven 3D Facial Action Unit Detection

Automated analysis of facial expressions has been an active area of study due to its potential applications not only for intelligent human-computer interfaces but also for human facial behavior research. To advance automatic expression analysis, this thesis proposes and empirically proves two hypotheses: (i) 3D face data is a better data modality than conventional 2D camera images, not only for being much less disturbed by illumination and head pose effects but also for capturing true facial surface information. (ii) It is possible to perform detailed face registration without resorting to any face modeling. This means that data-driven methods in automatic expression analysis can compensate for the confounding effects like pose and physiognomy differences, and can process facial features more effectively, without suffering the drawbacks of model-driven analysis. Our study is based upon Facial Action Coding System (FACS) as this paradigm ...

Savran, Arman — Bogazici University


Combining anatomical and spectral information to enhance MRSI resolution and quantification: Application to Multiple Sclerosis

Multiple sclerosis is a progressive autoimmune disease that a˙ects young adults. Magnetic resonance (MR) imaging has become an integral part in monitoring multiple sclerosis disease. Conventional MR imaging sequences such as fluid attenuated inversion recovery imaging have high spatial resolution, and can visualise the presence of focal white matter brain lesions in multiple sclerosis disease. Manual delineation of these lesions on conventional MR images is time consuming and su˙ers from intra and inter-rater variability. Among the advanced MR imaging techniques, MR spectroscopic imaging can o˙er complementary information on lesion characterisation compared to conventional MR images. However, MR spectroscopic images have low spatial resolution. Therefore, the aim of this thesis is to automatically segment multiple sclerosis lesions on conventional MR images and use the information from high-resolution conventional MR images to enhance the resolution of MR spectroscopic images. Automatic single time ...

Jain, Saurabh — KU Leuven


Visual ear detection and recognition in unconstrained environments

Automatic ear recognition systems have seen increased interest over recent years due to multiple desirable characteristics. Ear images used in such systems can typically be extracted from profile head shots or video footage. The acquisition procedure is contactless and non-intrusive, and it also does not depend on the cooperation of the subjects. In this regard, ear recognition technology shares similarities with other image-based biometric modalities. Another appealing property of ear biometrics is its distinctiveness. Recent studies even empirically validated existing conjectures that certain features of the ear are distinct for identical twins. This fact has significant implications for security-related applications and puts ear images on a par with epigenetic biometric modalities, such as the iris. Ear images can also supplement other biometric modalities in automatic recognition systems and provide identity cues when other information is unreliable or even unavailable. In ...

Emeršič, Žiga — University of Ljubljana, Faculty of Computer and Information Science


Large-Scale Light Field Capture and Reconstruction

This thesis discusses approaches and techniques to convert Sparsely-Sampled Light Fields (SSLFs) into Densely-Sampled Light Fields (DSLFs), which can be used for visualization on 3DTV and Virtual Reality (VR) devices. Exemplarily, a movable 1D large-scale light field acquisition system for capturing SSLFs in real-world environments is evaluated. This system consists of 24 sparsely placed RGB cameras and two Kinect V2 sensors. The real-world SSLF data captured with this setup can be leveraged to reconstruct real-world DSLFs. To this end, three challenging problems require to be solved for this system: (i) how to estimate the rigid transformation from the coordinate system of a Kinect V2 to the coordinate system of an RGB camera; (ii) how to register the two Kinect V2 sensors with a large displacement; (iii) how to reconstruct a DSLF from a SSLF with moderate and large disparity ranges. ...

Gao, Yuan — Department of Computer Science, Kiel University


Contributions to Human Motion Modeling and Recognition using Non-intrusive Wearable Sensors

This thesis contributes to motion characterization through inertial and physiological signals captured by wearable devices and analyzed using signal processing and deep learning techniques. This research leverages the possibilities of motion analysis for three main applications: to know what physical activity a person is performing (Human Activity Recognition), to identify who is performing that motion (user identification) or know how the movement is being performed (motor anomaly detection). Most previous research has addressed human motion modeling using invasive sensors in contact with the user or intrusive sensors that modify the user’s behavior while performing an action (cameras or microphones). In this sense, wearable devices such as smartphones and smartwatches can collect motion signals from users during their daily lives in a less invasive or intrusive way. Recently, there has been an exponential increase in research focused on inertial-signal processing to ...

Gil-Martín, Manuel — Universidad Politécnica de Madrid


Predictive modelling and deep learning for quantifying human health

Machine learning and deep learning techniques have emerged as powerful tools for addressing complex challenges across diverse domains. These methodologies are powerful because they extract patterns and insights from large and complex datasets, automate decision-making processes, and continuously improve over time. They enable us to observe and quantify patterns in data that a normal human would not be able to capture, leading to deeper insights and more accurate predictions. This dissertation presents two research papers that leverage these methodologies to tackle distinct yet interconnected problems in neuroimaging and computer vision for the quantification of human health. The first investigation, "Age prediction using resting-state functional MRI," addresses the challenge of understanding brain aging. By employing the Least Absolute Shrinkage and Selection Operator (LASSO) on resting-state functional MRI (rsfMRI) data, we identify the most predictive correlations related to brain age. Our study, ...

Chang Jose — National Cheng Kung University


Deep Learning Techniques for Visual Counting

The explosion of Deep Learning (DL) added a boost to the already rapidly developing field of Computer Vision to such a point that vision-based tasks are now parts of our everyday lives. Applications such as image classification, photo stylization, or face recognition are nowadays pervasive, as evidenced by the advent of modern systems trivially integrated into mobile applications. In this thesis, we investigated and enhanced the visual counting task, which automatically estimates the number of objects in still images or video frames. Recently, due to the growing interest in it, several Convolutional Neural Network (CNN)-based solutions have been suggested by the scientific community. These artificial neural networks, inspired by the organization of the animal visual cortex, provide a way to automatically learn effective representations from raw visual data and can be successfully employed to address typical challenges characterizing this task, ...

Ciampi Luca — University of Pisa


Biologically Inspired 3D Face Recognition

Face recognition has been an active area of study for both computer vision and image processing communities, not only for biometrics but also for human-computer interaction applications. The purpose of the present work is to evaluate the existing 3D face recognition techniques and seek biologically motivated methods to improve them. We especially look at findings in psychophysics and cognitive science for insights. We propose a biologically motivated computational model, and focus on the earlier stages of the model, whose performance is critical for the later stages. Our emphasis is on automatic localization of facial features. We first propose a strong unsupervised learning algorithm for flexible and automatic training of Gaussian mixture models and use it in a novel feature-based algorithm for facial fiducial point localization. We also propose a novel structural correction algorithm to evaluate the quality of landmarking and ...

Salah, Albert Ali — Bogazici University


Guitar Tablature Generation with Deep Learning

The burgeoning of deep learning-based music generation has overlooked the potential of symbolic representations tailored for fretted instruments. Guitar tablatures offer an advantageous approach to represent prescriptive information about music performance, often missing from standard MIDI representations. This dissertation tackles a gap in symbolic music generation by developing models that predict both musical structures and expressive guitar performance techniques. We first present DadaGP, a dataset comprising over 25k songs converted from the Guitar Pro tablature format to a dedicated token format suiting sequence models such as the Transformer. To establish a benchmark, we first introduce a baseline unconditional model for guitar tablature generation, by training a Transformer-XL architecture on the DadaGP dataset. We explored various architecture configurations and experimented with two different tokenisation approaches. Delving into controllability of the generative process, we introduce methods for manipulating the output's instrumentation (inst-CTRL) ...

Sarmento, Pedro — Queen Mary University of London


Good Features to Correlate for Visual Tracking

Estimating object motion is one of the key components of video processing and the first step in applications which require video representation. Visual object tracking is one way of extracting this component, and it is one of the major problems in the field of computer vision. Numerous discriminative and generative machine learning approaches have been employed to solve this problem. Recently, correlation filter based (CFB) approaches have been popular due to their computational efficiency and notable performances on benchmark datasets. The ultimate goal of CFB approaches is to find a filter (i.e., template) which can produce high correlation outputs around the actual object location and low correlation outputs around the locations that are far from the object. Nevertheless, CFB visual tracking methods suffer from many challenges, such as occlusion, abrupt appearance changes, fast motion and object deformation. The main reasons ...

Gundogdu, Erhan — Middle East Technical University


Light Field Based Biometric Recognition and Presentation Attack Detection

In a world where security issues have been gaining explosive importance, face and ear recognition systems have attracted increasing attention in multiple application areas, ranging from forensics and surveillance to commerce and entertainment. While the recognition performance has been steadily improving, there are still challenging recognition scenarios and conditions, notably when facing large variations in the biometric data characteristics. Additionally, the widespread use of face and ear recognition solutions raises new security concerns, making the robustness against presentation attacks a very active field of research. Lenslet light field cameras have recently come into prominence as they are able to also capture the intensity of the light rays coming from multiple directions, thus offering a richer representation of the visual scene, notably spatio-angular information. To take benefit of this richer representation, light field cameras have recently been successfully applied, not only ...

Alireza Sepas-Moghaddam — Instituto Superior Técnico, University of Lisbon

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.