Light Field Processing for Immersive Systems (2025)
Large-Scale Light Field Capture and Reconstruction
This thesis discusses approaches and techniques to convert Sparsely-Sampled Light Fields (SSLFs) into Densely-Sampled Light Fields (DSLFs), which can be used for visualization on 3DTV and Virtual Reality (VR) devices. Exemplarily, a movable 1D large-scale light field acquisition system for capturing SSLFs in real-world environments is evaluated. This system consists of 24 sparsely placed RGB cameras and two Kinect V2 sensors. The real-world SSLF data captured with this setup can be leveraged to reconstruct real-world DSLFs. To this end, three challenging problems require to be solved for this system: (i) how to estimate the rigid transformation from the coordinate system of a Kinect V2 to the coordinate system of an RGB camera; (ii) how to register the two Kinect V2 sensors with a large displacement; (iii) how to reconstruct a DSLF from a SSLF with moderate and large disparity ranges. ...
Gao, Yuan — Department of Computer Science, Kiel University
Light Field Based Biometric Recognition and Presentation Attack Detection
In a world where security issues have been gaining explosive importance, face and ear recognition systems have attracted increasing attention in multiple application areas, ranging from forensics and surveillance to commerce and entertainment. While the recognition performance has been steadily improving, there are still challenging recognition scenarios and conditions, notably when facing large variations in the biometric data characteristics. Additionally, the widespread use of face and ear recognition solutions raises new security concerns, making the robustness against presentation attacks a very active field of research. Lenslet light field cameras have recently come into prominence as they are able to also capture the intensity of the light rays coming from multiple directions, thus offering a richer representation of the visual scene, notably spatio-angular information. To take benefit of this richer representation, light field cameras have recently been successfully applied, not only ...
Alireza Sepas-Moghaddam — Instituto Superior Técnico, University of Lisbon
Single-pixel imaging: development and applications of adaptive methods
Single-pixel imaging is a recent paradigm that allows the acquisition of images at reasonably low cost by exploiting hardware compression of the data. The architecture of a single-pixel camera consists of only two elements: a spatial light modulator, and a single-point detector. The key idea is to measure the projection at the detector (i.e., the inner product) of the scene under view -the image- with some patterns. The post-processing of a sequence of measurements obtained with different patterns permits the restoring of the desired image. Single-pixel imaging has several advantages, which are of interest for different applications, and especially in the biomedical field. In particular, a time-resolved single-pixel imaging system benefits fluorescence lifetime sensing. Such a set-up can be coupled to a spectrometer, to supplement the lifetime with spectral information. However, the main limitation of single-pixel imaging is the speed ...
Rousset, Florian — University of Lyon - Politecnico di Milan
Mixed structural models for 3D audio in virtual environments
In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...
Geronazzo, Michele — University of Padova
Real Time Stereo to Multi-view Video Conversion
A novel and efficient methodology is presented for the conversion of stereo to multi-view video in order to address the 3D content requirements for the next generation 3D-TVs and auto-stereoscopic multi-view displays. There are two main algorithmic blocks in such a conversion system; stereo matching and virtual view rendering that enable extraction of 3D information from stereo video and synthesis of inexistent virtual views, respectively. In the intermediate steps of these functional blocks, a novel edge-preserving filter is proposed that recursively constructs connected support regions for each pixel among color-wise similar neighboring pixels. The proposed recursive update structure eliminates pre-defined window dependency of the conventional approaches, providing complete content adaptibility with quite low computational complexity. Based on extensive tests, it is observed that the proposed filtering technique yields better or competetive results against some leading techniques in the literature. The ...
Cigla, Cevahir — Middle East Technical University
Planar 3D Scene Representations for Depth Compression
The recent invasion of stereoscopic 3D television technologies is expected to be followed by autostereoscopic and holographic technologies. Glasses-free multiple stereoscopic pair displaying capabilities of these technologies will advance the 3D experience. The prospective 3D format to create the multiple views for such displays is Multiview Video plus Depth (MVD) format based on the Depth Image Based Rendering (DIBR) techniques. The depth modality of the MVD format is an active research area whose main objective is to develop DIBR friendly efficient compression methods. As a part this research, the thesis proposes novel 3D planar-based depth representations. The planar approximation of the stereo depth images is formulated as an energy-based co-segmentation problem by a Markov Random Field model. The energy terms of this problem are designed to mimic the rate-distortion tradeoff for a depth compression application. A heuristic algorithm is developed ...
Özkalaycı, Burak Oğuz — Middle East Technical University
Gliomas represent about 80% of all malignant primary brain tumors. Despite recent advancements in glioma research, patient outcome remains poor. The 5 year survival rate of the most common and most malignant subtype, i.e. glioblastoma, is about 5%. Magnetic resonance imaging (MRI) has become the imaging modality of choice in the management of brain tumor patients. Conventional MRI (cMRI) provides excellent soft tissue contrast without exposing the patient to potentially harmful ionizing radiation. Over the past decade, advanced MRI modalities, such as perfusion-weighted imaging (PWI), diffusion-weighted imaging (DWI) and magnetic resonance spectroscopic imaging (MRSI) have gained interest in the clinical field, and their added value regarding brain tumor diagnosis, treatment planning and follow-up has been recognized. Tumor segmentation involves the imaging-based delineation of a tumor and its subcompartments. In gliomas, segmentation plays an important role in treatment planning as well ...
Sauwen, Nicolas — KU Leuven
The spectral signatures of the materials contained in hyperspectral images, also called endmembers (EMs), can be significantly affected by variations in atmospheric, illumination or environmental conditions typically occurring within an image. Traditional spectral unmixing (SU) algorithms neglect the spectral variability of the endmembers, what propagates significant mismodeling errors throughout the whole unmixing process and compromises the quality of the estimated abundances. Therefore, significant effort have been recently dedicated to mitigate the effects of spectral variability in SU. However, many challenges still remain in how to best explore a priori information about the problem in order to improve the quality, the robustness and the efficiency of SU algorithms that account for spectral variability. In this thesis, new strategies are developed to address spectral variability in SU. First, an (over)-segmentation-based multiscale regularization strategy is proposed to explore spatial information about the abundance ...
Borsoi, Ricardo Augusto — Université Côte d'Azur; Federal University of Santa Catarina
Radial Basis Function Network Robust Learning Algorithms in Computer Vision Applications
This thesis introduces new learning algorithms for Radial Basis Function (RBF) networks. RBF networks is a feed-forward two-layer neural network used for functional approximation or pattern classification applications. The proposed training algorithms are based on robust statistics. Their theoretical performance has been assessed and compared with that of classical algorithms for training RBF networks. The applications of RBF networks described in this thesis consist of simultaneously modeling moving object segmentation and optical flow estimation in image sequences and 3-D image modeling and segmentation. A Bayesian classifier model is used for the representation of the image sequence and 3-D images. This employs an energy based description of the probability functions involved. The energy functions are represented by RBF networks whose inputs are various features drawn from the images and whose outputs are objects. The hidden units embed kernel functions. Each kernel ...
Bors, Adrian G. — Aristotle University of Thessaloniki
Stereoscopic depth map estimation and coding techniques for multiview video systems
The dissertation deals with the problems of stereoscopic depth estimation and coding in multiview video systems, which are vital for development of the next generation three-dimensional television. The depth estimation algorithms known from literature, along with theoretical foundations are discussed. The problem of estimation of depth maps with high quality, expressed by means of accuracy, precision and temporal consistency, has been stated. Next, original solutions have been proposed. Author has proposed a novel, theoretically founded approach to depth estimation which employs Maximum A posteriori Probability (MAP) rule for modeling of the cost function used in optimization algorithms. The proposal has been presented along with a method for estimation of parameters of such model. In order to attain that, an analysis of the noise existing in multiview video and a study of inter-view correlation of corresponding samples of pictures have been ...
Stankiewicz, Olgierd — Poznan University of Technology
Point Cloud Quality Assessment
Nowadays, richer 3D visual representation formats are emerging, notably light fields and point clouds. These formats enable new applications in many usage domains, notably virtual and augmented reality, geographical information systems, immersive communications, and cultural heritage. Recently, following major improvements in 3D visual data acquisition, there is an increasing interest in point-based visual representation, which models real-world objects as a cloud of sampled points on their surfaces. Point cloud is a 3D representation model where the real visual world is represented by a set of 3D coordinates (the geometry) over the objects with some additional attributes such as color and normals. With the advances in 3D acquisition systems, it is now possible to capture a realistic point cloud to represent a visual scene with a very high resolution. These point clouds may have up to billions of points and, thus, ...
Javaheri, Alireza — Instituto Superior Técnico - University of Lisbon
Vision models and quality metrics for image processing applications
Optimizing the performance of digital imaging systems with respect to the capture, display, storage and transmission of visual information represents one of the biggest challenges in the field of image and video processing. Taking into account the way humans perceive visual information can be greatly beneficial for this task. To achieve this, it is necessary to understand and model the human visual system, which is also the principal goal of this thesis. Computational models for different aspects of the visual system are developed, which can be used in a wide variety of image and video processing applications. The proposed models and metrics are shown to be consistent with human perception. The focus of this work is visual quality assessment. A perceptual distortion metric (PDM) for the evaluation of video quality is presented. It is based on a model of the ...
Winkler, Stefan — Swiss Federal Institute of Technology
Digital Audio Processing Methods for Voice Pathology Detection
Voice pathology is a diverse field that includes various disorders affecting vocal quality and production. Using audio machine learning for voice pathology classification represents an innovative approach to diagnosing a wide range of voice disorders. Despite extensive research in this area, there remains a significant gap in the development of classifiers and their ability to adapt and generalize effectively. This thesis aims to address this gap by contributing new insights and methods. This research provides a comprehensive exploration of automatic voice pathology classification, focusing on challenges such as data limitations and the potential of integrating multiple modalities to enhance diagnostic accuracy and adaptability. To achieve generalization capabilities and enhance the flexibility of the classifier across diverse types of voice disorders, this research explores various datasets and pathology types comprehensively. It covers a broad range of voice disorders, including functional dysphonia, ...
Ioanna Miliaresi — University of Pireaus
Cognitive Models for Acoustic and Audiovisual Sound Source Localization
Sound source localization algorithms have a long research history in the field of digital signal processing. Many common applications like intelligent personal assistants, teleconferencing systems and methods for technical diagnosis in acoustics require an accurate localization of sound sources in the environment. However, dynamic environments entail a particular challenge for these systems. For instance, voice controlled smart home applications, where the speaker, as well as potential noise sources, are moving within the room, are a typical example of dynamic environments. Classical sound source localization systems only have limited capabilities to deal with dynamic acoustic scenarios. In this thesis, three novel approaches to sound source localization that extend existing classical methods will be presented. The first system is proposed in the context of audiovisual source localization. Determining the position of sound sources in adverse acoustic conditions can be improved by including ...
Schymura, Christopher — Ruhr University Bochum
On-board Processing for an Infrared Observatory
During the past two decades, image compression has developed from a mostly academic Rate-Distortion (R-D) field, into a highly commercial business. Various lossless and lossy image coding techniques have been developed. This thesis represents an interdisciplinary work between the field of astronomy and digital image processing and brings new aspects into both of the fields. In fact, image compression had its beginning in an American space program for efficient data storage. The goal of this research work is to recognize and develop new methods for space observatories and software tools to incorporate compression in space astronomy standards. While the astronomers benefit from new objective processing and analysis methods and improved efficiency and quality, for technicians a new field of application and research is opened. For validation of the processing results, the case of InfraRed (IR) astronomy has been specifically analyzed. ...
Belbachir, Ahmed Nabil — Vienna University of Technology
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.