Real Time Stereo to Multi-view Video Conversion

A novel and efficient methodology is presented for the conversion of stereo to multi-view video in order to address the 3D content requirements for the next generation 3D-TVs and auto-stereoscopic multi-view displays. There are two main algorithmic blocks in such a conversion system; stereo matching and virtual view rendering that enable extraction of 3D information from stereo video and synthesis of inexistent virtual views, respectively. In the intermediate steps of these functional blocks, a novel edge-preserving filter is proposed that recursively constructs connected support regions for each pixel among color-wise similar neighboring pixels. The proposed recursive update structure eliminates pre-defined window dependency of the conventional approaches, providing complete content adaptibility with quite low computational complexity. Based on extensive tests, it is observed that the proposed filtering technique yields better or competetive results against some leading techniques in the literature. The ...

Cigla, Cevahir — Middle East Technical University


A floating polygon soup representation for 3D video

This thesis presents a new representation called floating polygon soup for applications like 3DTV and FTV (Free Viewpoint Television). The polygon soup is designed for compactness, compression efficiency, and view synthesis quality. The polygons are stored in 2D, with depth values at each corner. They are not necessarily connected to each other and can be deformed (or floated) w.r.t viewpoints and time. Starting from multi-view video plus depth (MVD), the construction holds in two steps: quadtree decomposition and multi-view redundancy reduction. It results in a compact set of polygons replacing the depth maps while preserving depth discontinuities and geometric details. Next, compression efficiency and view-synthesis quality are evaluated. Classical meth- ods such as inpainting and post-processing are implemented and adapted to the poly- gon soup. A new compression method is proposed. It exploits the quadtree structure and uses spatial prediction. ...

Colleu, Thomas — INRIA Rennes Bretagne Atlantique / Orange Labs / IETR


WATERMARKING FOR 3D REPRESENTATIONS

In this thesis, a number of novel watermarking techniques for different 3D representations are presented. A novel watermarking method is proposed for the mono-view video, which might be interpreted as the basic implicit representation of 3D scenes. The proposed method solves the common flickering problem in the existing video watermarking schemes by means of adjusting the watermark strength with respect to temporal contrast thresholds of human visual system (HVS), which define the maximum invisible distortions in the temporal direction. The experimental results indicate that the proposed method gives better results in both objective and subjective measures, compared to some recognized methods in the literature. The watermarking techniques for the geometry and image based representations of 3D scenes, denoted as 3D watermarking, are examined and classified into three groups, as 3D-3D, 3D-2D and 2D-2D watermarking, in which the pair of symbols ...

Koz, Alper — Middle East Technical University, Department of Electrical and Electronics Engineering


Stereoscopic depth map estimation and coding techniques for multiview video systems

The dissertation deals with the problems of stereoscopic depth estimation and coding in multiview video systems, which are vital for development of the next generation three-dimensional television. The depth estimation algorithms known from literature, along with theoretical foundations are discussed. The problem of estimation of depth maps with high quality, expressed by means of accuracy, precision and temporal consistency, has been stated. Next, original solutions have been proposed. Author has proposed a novel, theoretically founded approach to depth estimation which employs Maximum A posteriori Probability (MAP) rule for modeling of the cost function used in optimization algorithms. The proposal has been presented along with a method for estimation of parameters of such model. In order to attain that, an analysis of the noise existing in multiview video and a study of inter-view correlation of corresponding samples of pictures have been ...

Stankiewicz, Olgierd — Poznan University of Technology


Efficient representation, generation and compression of digital holograms

Digital holography is a discipline of science that measures or reconstructs the wavefield of light by means of interference. The wavefield encodes three-dimensional information, which has many applications, such as interferometry, microscopy, non-destructive testing and data storage. Moreover, digital holography is emerging as a display technology. Holograms can recreate the wavefield of a 3D object, thereby reproducing all depth cues for all viewpoints, unlike current stereoscopic 3D displays. At high quality, the appearance of an object on a holographic display system becomes indistinguishable from a real one. High-quality holograms need large volumes of data to be represented, approaching resolutions of billions of pixels. For holographic videos, the data rates needed for transmitting and encoding of the raw holograms quickly become unfeasible with currently available hardware. Efficient generation and coding of holograms will be of utmost importance for future holographic displays. ...

Blinder, David — Vrije Universiteit Brussel


Distributed Compressed Representation of Correlated Image Sets

Vision sensor networks and video cameras find widespread usage in several applications that rely on effective representation of scenes or analysis of 3D information. These systems usually acquire multiple images of the same 3D scene from different viewpoints or at different time instants. Therefore, these images are generally correlated through displacement of scene objects. Efficient compression techniques have to exploit this correlation in order to efficiently communicate the 3D scene information. Instead of joint encoding that requires communication between the cameras, in this thesis we concentrate on distributed representation, where the captured images are encoded independently, but decoded jointly to exploit the correlation between images. One of the most important and challenging tasks relies in estimation of the underlying correlation from the compressed correlated images for effective reconstruction or analysis in the joint decoder. This thesis focuses on developing efficient ...

Thirumalai, Vijayaraghavan — EPFL, Switzerland


PRIORITIZED 3D SCENE RECONSTRUCTION AND RATE-DISTORTION

In this dissertation, a novel scheme performing 3D reconstruction of a scene from a 2D video sequence is presented. To this aim, first, the trajectories of the salient features in the scene are determined as a sequence of displacements via Kanade-Lukas-Tomasi tracker and Kalman filter. Then, a tentative camera trajectory with respect to a metric reference reconstruction is estimated. All frame pairs are ordered with respect to their amenability to 3D reconstruction by a metric that utilizes the baseline distances and the number of tracked correspondences between the frames. The ordered frame pairs are processed via a sequential structure-from- motion algorithm to estimate the sparse structure and camera matrices. The metric and the associated reconstruction algorithm are shown to outperform their counterparts in the literature via experiments. Finally, a mesh-based, rate- distortion efficient representation is constructed through a novel procedure ...

Imre, Evren — Middle East Technical University, Department of Electrical and Electronics Engineering


Mixed structural models for 3D audio in virtual environments

In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...

Geronazzo, Michele — University of Padova


Multiple Objective Optimization for Video Streaming

In this thesis, we propose Multiple Objective Optimization (MOO) frameworks for efficient video streaming. Firstly, we introduce pre-roll delay-distortion optimization (DDO) for uninterrupted content-adaptive video streaming over low capacity, constant bitrate (CBR) channels using MOO. Content analysis is used to divide the input video into shots with assigned relevance levels. The video is adaptively encoded and streamed aiming minimum pre-roll delay and distortion with the optimal spatial and temporal resolutions and quantization parameters for each shot. With buffer and distortion constraints, the bitrate of unimportant shots is reduced to achieve an acceptable quality in important shots. Secondly, we introduce a cross-layer optimized video rate adaptation and scheduling scheme to achieve maximum "application layer" Quality-of-Service (QoS), maximum video throughput (video seconds per transmission slot), and QoS fairness for wireless video streaming. Using the MOO framework, these objectives are jointly optimized such ...

Ozcelebi, Tanir — Koc University


3D motion capture by computer vision and virtual rendering

Networked 3D virtual environments allow multiple users to interact with each other over the Internet. Users can share some sense of telepresence by remotely animating an avatar that represents them. However, avatar control may be tedious and still render user gestures poorly. This work aims at animating a user‟s avatar from real time 3D motion capture by monoscopic computer vision, thus allowing virtual telepresence to anyone using a personal computer with a webcam. The approach followed consists of registering a 3D articulated upper-body model to a video sequence. This involves searching iteratively for the best match between features extracted from the 3D model and from the image. A two-step registration process matches regions and then edges. The first contribution of this thesis is a method of allocating computing iterations under real-time constrain that achieves optimal robustness and accuracy. The major ...

Gomez Jauregui, David Antonio — Telecom SudParis


Regularized estimation of fractal attributes by convex minimization for texture segmentation: joint variational formulations, fast proximal algorithms and unsupervised selection of regularization para

In this doctoral thesis several scale-free texture segmentation procedures based on two fractal attributes, the Hölder exponent, measuring the local regularity of a texture, and local variance, are proposed.A piecewise homogeneous fractal texture model is built, along with a synthesis procedure, providing images composed of the aggregation of fractal texture patches with known attributes and segmentation. This synthesis procedure is used to evaluate the proposed methods performance.A first method, based on the Total Variation regularization of a noisy estimate of local regularity, is illustrated and refined thanks to a post-processing step consisting in an iterative thresholding and resulting in a segmentation.After evidencing the limitations of this first approach, deux segmentation methods, with either "free" or "co-located" contours, are built, taking in account jointly the local regularity and the local variance.These two procedures are formulated as convex nonsmooth functional minimization problems.We ...

Pascal, Barbara — École Normale Supérieure de Lyon


Multi-Sensor Integration for Indoor 3D Reconstruction

Outdoor maps and navigation information delivered by modern services and technologies like Google Maps and Garmin navigators have revolutionized the lifestyle of many people. Motivated by the desire for similar navigation systems for indoor usage from consumers, advertisers, emergency rescuers/responders, etc., many indoor environments such as shopping malls, museums, casinos, airports, transit stations, offices, and schools need to be mapped. Typically, the environment is first reconstructed by capturing many point clouds from various stations and defining their spatial relationships. Currently, there is a lack of an accurate, rigorous, and speedy method for relating point clouds in indoor, urban, satellite-denied environments. This thesis presents a novel and automatic way for fusing calibrated point clouds obtained using a terrestrial laser scanner and the Microsoft Kinect by integrating them with a low-cost inertial measurement unit. The developed system, titled the Scannect, is the ...

Chow, Jacky — University of Calgary


Camera based motion estimation and recognition for human-computer interaction

Communicating with mobile devices has become an unavoidable part of our daily life. Unfortunately, the current user interface designs are mostly taken directly from desktop computers. This has resulted in devices that are sometimes hard to use. Since more processing power and new sensing technologies are already available, there is a possibility to develop systems to communicate through different modalities. This thesis proposes some novel computer vision approaches, including head tracking, object motion analysis and device ego-motion estimation, to allow efficient interaction with mobile devices. For head tracking, two new methods have been developed. The first method detects a face region and facial features by employing skin detection, morphology, and a geometrical face model. The second method, designed especially for mobile use, detects the face and eyes using local texture features. In both cases, Kalman filtering is applied to estimate ...

Hannuksela, Jari — University of Oulou


Vision models and quality metrics for image processing applications

Optimizing the performance of digital imaging systems with respect to the capture, display, storage and transmission of visual information represents one of the biggest challenges in the field of image and video processing. Taking into account the way humans perceive visual information can be greatly beneficial for this task. To achieve this, it is necessary to understand and model the human visual system, which is also the principal goal of this thesis. Computational models for different aspects of the visual system are developed, which can be used in a wide variety of image and video processing applications. The proposed models and metrics are shown to be consistent with human perception. The focus of this work is visual quality assessment. A perceptual distortion metric (PDM) for the evaluation of video quality is presented. It is based on a model of the ...

Winkler, Stefan — Swiss Federal Institute of Technology


Contributions to the 3D city modeling: 3D polyhedral building model reconstruction from aerial images and 3D facade modeling from terrestrial 3D point cloud and images

The aim of this work is to develop research on 3D building modeling. In particular, the research in aerial-based 3D building reconstruction is a topic very developed since 1990. However, it is necessary to pursue the research since the actual approaches for 3D massive building reconstruction (although efficient) still encounter problems in generalization, coherency, accuracy. Besides, the recent developments of street acquisition systems such as Mobile Mapping Systems open new perspectives for improvements in building modeling in the sense that the terrestrial data (very dense and accurate) can be exploited with more performance (in comparison to the aerial investigation) to enrich the building models at facade level (e.g., geometry, texturing). Hence, aerial and terrestrial based building modeling approaches are individually proposed. At aerial level, we describe a direct and featureless approach for simple polyhedral building reconstruction from a set of ...

Hammoudi Karim — Université Paris-Est, Saint-Mandé, France

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.