Planar 3D Scene Representations for Depth Compression

The recent invasion of stereoscopic 3D television technologies is expected to be followed by autostereoscopic and holographic technologies. Glasses-free multiple stereoscopic pair displaying capabilities of these technologies will advance the 3D experience. The prospective 3D format to create the multiple views for such displays is Multiview Video plus Depth (MVD) format based on the Depth Image Based Rendering (DIBR) techniques. The depth modality of the MVD format is an active research area whose main objective is to develop DIBR friendly efficient compression methods. As a part this research, the thesis proposes novel 3D planar-based depth representations. The planar approximation of the stereo depth images is formulated as an energy-based co-segmentation problem by a Markov Random Field model. The energy terms of this problem are designed to mimic the rate-distortion tradeoff for a depth compression application. A heuristic algorithm is developed ...

Özkalaycı, Burak Oğuz — Middle East Technical University


Toward sparse and geometry adapted video approximations

Video signals are sequences of natural images, where images are often modeled as piecewise-smooth signals. Hence, video can be seen as a 3D piecewise-smooth signal made of piecewise-smooth regions that move through time. Based on the piecewise-smooth model and on related theoretical work on rate-distortion performance of wavelet and oracle based coding schemes, one can better analyze the appropriate coding strategies that adaptive video codecs need to implement in order to be efficient. Efficient video representations for coding purposes require the use of adaptive signal decompositions able to capture appropriately the structure and redundancy appearing in video signals. Adaptivity needs to be such that it allows for proper modeling of signals in order to represent these with the lowest possible coding cost. Video is a very structured signal with high geometric content. This includes temporal geometry (normally represented by motion ...

Divorra Escoda, Oscar — EPFL / Signal Processing Institute


WATERMARKING FOR 3D REPRESENTATIONS

In this thesis, a number of novel watermarking techniques for different 3D representations are presented. A novel watermarking method is proposed for the mono-view video, which might be interpreted as the basic implicit representation of 3D scenes. The proposed method solves the common flickering problem in the existing video watermarking schemes by means of adjusting the watermark strength with respect to temporal contrast thresholds of human visual system (HVS), which define the maximum invisible distortions in the temporal direction. The experimental results indicate that the proposed method gives better results in both objective and subjective measures, compared to some recognized methods in the literature. The watermarking techniques for the geometry and image based representations of 3D scenes, denoted as 3D watermarking, are examined and classified into three groups, as 3D-3D, 3D-2D and 2D-2D watermarking, in which the pair of symbols ...

Koz, Alper — Middle East Technical University, Department of Electrical and Electronics Engineering


Distributed Compressed Representation of Correlated Image Sets

Vision sensor networks and video cameras find widespread usage in several applications that rely on effective representation of scenes or analysis of 3D information. These systems usually acquire multiple images of the same 3D scene from different viewpoints or at different time instants. Therefore, these images are generally correlated through displacement of scene objects. Efficient compression techniques have to exploit this correlation in order to efficiently communicate the 3D scene information. Instead of joint encoding that requires communication between the cameras, in this thesis we concentrate on distributed representation, where the captured images are encoded independently, but decoded jointly to exploit the correlation between images. One of the most important and challenging tasks relies in estimation of the underlying correlation from the compressed correlated images for effective reconstruction or analysis in the joint decoder. This thesis focuses on developing efficient ...

Thirumalai, Vijayaraghavan — EPFL, Switzerland


Dynamic Scheme Selection in Image Coding

This thesis deals with the coding of images with multiple coding schemes and their dynamic selection. In our society of information highways, electronic communication is taking everyday a bigger place in our lives. The number of transmitted images is also increasing everyday. Therefore, research on image compression is still an active area. However, the current trend is to add several functionalities to the compression scheme such as progressiveness for more comfortable browsing of web-sites or databases. Classical image coding schemes have a rigid structure. They usually process an image as a whole and treat the pixels as a simple signal with no particular characteristics. Second generation schemes use the concept of objects in an image, and introduce a model of the human visual system in the design of the coding scheme. Dynamic coding schemes, as their name tells us, make ...

Fleury, Pascal — Swiss Federal Institute of Technology


3D motion capture by computer vision and virtual rendering

Networked 3D virtual environments allow multiple users to interact with each other over the Internet. Users can share some sense of telepresence by remotely animating an avatar that represents them. However, avatar control may be tedious and still render user gestures poorly. This work aims at animating a user‟s avatar from real time 3D motion capture by monoscopic computer vision, thus allowing virtual telepresence to anyone using a personal computer with a webcam. The approach followed consists of registering a 3D articulated upper-body model to a video sequence. This involves searching iteratively for the best match between features extracted from the 3D model and from the image. A two-step registration process matches regions and then edges. The first contribution of this thesis is a method of allocating computing iterations under real-time constrain that achieves optimal robustness and accuracy. The major ...

Gomez Jauregui, David Antonio — Telecom SudParis


Adaptive Nonlocal Signal Restoration and Enhancement Techniques for High-Dimensional Data

The large number of practical applications involving digital images has motivated a significant interest towards restoration solutions that improve the visual quality of the data under the presence of various acquisition and compression artifacts. Digital images are the results of an acquisition process based on the measurement of a physical quantity of interest incident upon an imaging sensor over a specified period of time. The quantity of interest depends on the targeted imaging application. Common imaging sensors measure the number of photons impinging over a dense grid of photodetectors in order to produce an image similar to what is perceived by the human visual system. Different applications focus on the part of the electromagnetic spectrum not visible by the human visual system, and thus require different sensing technologies to form the image. In all cases, even with the advance of ...

Maggioni, Matteo — Tampere University of Technology


Motion Analysis and Modeling for Activity Recognition and 3-D Animation based on Geometrical and Video Processing Algorithms

The analysis of audiovisual data aims at extracting high level information, equivalent with the one(s) that can be extracted by a human. It is considered as a fundamental, unsolved (in its general form) problem. Even though the inverse problem, the audiovisual (sound and animation) synthesis, is judged easier than the previous, it remains an unsolved problem. The systematic research on these problems yields solutions that constitute the basis for a great number of continuously developing applications. In this thesis, we examine the two aforementioned fundamental problems. We propose algorithms and models of analysis and synthesis of articulated motion and undulatory (snake) locomotion, using data from video sequences. The goal of this research is the multilevel information extraction from video, like object tracking and activity recognition, and the 3-D animation synthesis in virtual environments based on the results of analysis. An ...

Panagiotakis, Costas — University of Crete


A flexible scalable video coding framework with adaptive spatio-temporal decompositions

The work presented in this thesis covers topics that extend the scalability functionalities in video coding and improve the compression performance. Two main novel approaches are presented, each targeting a different part of the scalable video coding (SVC) architecture: motion adaptive wavelet transform based on the wavelet transform in lifting implementation, and a design of a flexible framework for generalised spatio-temporal decomposition. Motion adaptive wavelet transform is based on the newly introduced concept of connectivity-map. The connectivity-map describes the underlying irregular structure of regularly sampled data. To enable a scalable representation of the connectivity-map, the corresponding analysis and synthesis operations have been derived. These are then employed to define a joint wavelet connectivity-map decomposition that serves as an adaptive alternative to the conventional wavelet decomposition. To demonstrate its applicability, the presented decomposition scheme is used in the proposed SVC framework, ...

Sprljan, Nikola — Queen Mary University of London


Techniques for improving the performance of distributed video coding

Distributed Video Coding (DVC) is a recently proposed paradigm in video communication, which fits well emerging applications such as wireless video surveillance, multimedia sensor networks, wireless PC cameras, and mobile cameras phones. These applications require a low complexity encoding, while possibly affording a high complexity decoding. DVC presents several advantages: First, the complexity can be distributed between the encoder and the decoder. Second, the DVC is robust to errors, since it uses a channel code. In DVC, a Side Information (SI) is estimated at the decoder, using the available decoded frames, and used for the decoding and reconstruction of other frames. In this Ph.D thesis, we propose new techniques in order to improve the quality of the SI. First, successive refinement of the SI is performed after each decoded DCT band, using a Partially Decoded WZF (PDWZF), along with the ...

Abou-Elailah, Abdalbassir — Telecom Paristech


Super-Resolution Image Reconstruction Using Non-Linear Filtering Techniques

Super-resolution (SR) is a filtering technique that combines a sequence of under-sampled and degraded low-resolution images to produce an image at a higher resolution. The reconstruction takes advantage of the additional spatio-temporal data available in the sequence of images portraying the same scene. The fundamental problem addressed in super-resolution is a typical example of an inverse problem, wherein multiple low-resolution (LR)images are used to solve for the original high-resolution (HR) image. Super-resolution has already proved useful in many practical cases where multiple frames of the same scene can be obtained, including medical applications, satellite imaging and astronomical observatories. The application of super resolution filtering in consumer cameras and mobile devices shall be possible in the future, especially that the computational and memory resources in these devices are increasing all the time. For that goal, several research problems need to be ...

Trimeche, Mejdi — Tampere University of Technology


Radial Basis Function Network Robust Learning Algorithms in Computer Vision Applications

This thesis introduces new learning algorithms for Radial Basis Function (RBF) networks. RBF networks is a feed-forward two-layer neural network used for functional approximation or pattern classification applications. The proposed training algorithms are based on robust statistics. Their theoretical performance has been assessed and compared with that of classical algorithms for training RBF networks. The applications of RBF networks described in this thesis consist of simultaneously modeling moving object segmentation and optical flow estimation in image sequences and 3-D image modeling and segmentation. A Bayesian classifier model is used for the representation of the image sequence and 3-D images. This employs an energy based description of the probability functions involved. The energy functions are represented by RBF networks whose inputs are various features drawn from the images and whose outputs are objects. The hidden units embed kernel functions. Each kernel ...

Bors, Adrian G. — Aristotle University of Thessaloniki


Video Sequence Analysis for Content Description, Summarization and Content-Based Retrieval

The main research area of this Ph.D. thesis is video sequence processing and analysis for description and indexing of visual content. Its objective is to contribute in the development of a computational system with the capabilities of object-based segmentation of audiovisual material, automatic content description, summarization for preview and browsing, as well as content-based retrieval. The thesis consists of four parts. The first introduces video sequence analysis, segmentation and object extraction based on color, motion, and depth field. A fusion technique is proposed that combines individual cue segmentations and allows for reliable identification of semantic objects. The second part refers to automatic description and annotation of the visual content by means of feature vectors, summarization, implemented by optimal selection of a limited set of key frames and shots, and content-based search and retrieval. In the third part, the problem of ...

Avrithis, Yannis — National Technical University of Athens


Motion Estimation and Compensation of Video Sequences using Affine Transforms

Motion estimation and compensation is of great importance for the compression of video sequences. In this dissertation a motion estimation/compensation approach based on a non-overlapping connected mesh of triangles is proposed. To manipulate the triangles within the connected mesh or ‘rubber sheet’ structure affin transforms are used which allow many different types of motion to be accurately modelled. Another advantage of this structure is that the non-overlapping triangles do not generate the typical artefacts associated with the current block based standards when operating at very low bitrates. The initial motion estimation/ compensation algorithms investigated implement a full search method which updates one vertex at a time matching sets of triangles between adjacent frames. Although the prediction performance is good the resulting computational load is high. This issue is addressed by deriving gradient-based algorithms which are found to be between one ...

Bradshaw, David Benedict — University of Cambridge


Large-Scale Light Field Capture and Reconstruction

This thesis discusses approaches and techniques to convert Sparsely-Sampled Light Fields (SSLFs) into Densely-Sampled Light Fields (DSLFs), which can be used for visualization on 3DTV and Virtual Reality (VR) devices. Exemplarily, a movable 1D large-scale light field acquisition system for capturing SSLFs in real-world environments is evaluated. This system consists of 24 sparsely placed RGB cameras and two Kinect V2 sensors. The real-world SSLF data captured with this setup can be leveraged to reconstruct real-world DSLFs. To this end, three challenging problems require to be solved for this system: (i) how to estimate the rigid transformation from the coordinate system of a Kinect V2 to the coordinate system of an RGB camera; (ii) how to register the two Kinect V2 sensors with a large displacement; (iii) how to reconstruct a DSLF from a SSLF with moderate and large disparity ranges. ...

Gao, Yuan — Department of Computer Science, Kiel University

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.