Mixed structural models for 3D audio in virtual environments

In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...

Geronazzo, Michele — University of Padova


Planar 3D Scene Representations for Depth Compression

The recent invasion of stereoscopic 3D television technologies is expected to be followed by autostereoscopic and holographic technologies. Glasses-free multiple stereoscopic pair displaying capabilities of these technologies will advance the 3D experience. The prospective 3D format to create the multiple views for such displays is Multiview Video plus Depth (MVD) format based on the Depth Image Based Rendering (DIBR) techniques. The depth modality of the MVD format is an active research area whose main objective is to develop DIBR friendly efficient compression methods. As a part this research, the thesis proposes novel 3D planar-based depth representations. The planar approximation of the stereo depth images is formulated as an energy-based co-segmentation problem by a Markov Random Field model. The energy terms of this problem are designed to mimic the rate-distortion tradeoff for a depth compression application. A heuristic algorithm is developed ...

Özkalaycı, Burak Oğuz — Middle East Technical University


WATERMARKING FOR 3D REPRESENTATIONS

In this thesis, a number of novel watermarking techniques for different 3D representations are presented. A novel watermarking method is proposed for the mono-view video, which might be interpreted as the basic implicit representation of 3D scenes. The proposed method solves the common flickering problem in the existing video watermarking schemes by means of adjusting the watermark strength with respect to temporal contrast thresholds of human visual system (HVS), which define the maximum invisible distortions in the temporal direction. The experimental results indicate that the proposed method gives better results in both objective and subjective measures, compared to some recognized methods in the literature. The watermarking techniques for the geometry and image based representations of 3D scenes, denoted as 3D watermarking, are examined and classified into three groups, as 3D-3D, 3D-2D and 2D-2D watermarking, in which the pair of symbols ...

Koz, Alper — Middle East Technical University, Department of Electrical and Electronics Engineering


Vision models and quality metrics for image processing applications

Optimizing the performance of digital imaging systems with respect to the capture, display, storage and transmission of visual information represents one of the biggest challenges in the field of image and video processing. Taking into account the way humans perceive visual information can be greatly beneficial for this task. To achieve this, it is necessary to understand and model the human visual system, which is also the principal goal of this thesis. Computational models for different aspects of the visual system are developed, which can be used in a wide variety of image and video processing applications. The proposed models and metrics are shown to be consistent with human perception. The focus of this work is visual quality assessment. A perceptual distortion metric (PDM) for the evaluation of video quality is presented. It is based on a model of the ...

Winkler, Stefan — Swiss Federal Institute of Technology


Dynamic Scheme Selection in Image Coding

This thesis deals with the coding of images with multiple coding schemes and their dynamic selection. In our society of information highways, electronic communication is taking everyday a bigger place in our lives. The number of transmitted images is also increasing everyday. Therefore, research on image compression is still an active area. However, the current trend is to add several functionalities to the compression scheme such as progressiveness for more comfortable browsing of web-sites or databases. Classical image coding schemes have a rigid structure. They usually process an image as a whole and treat the pixels as a simple signal with no particular characteristics. Second generation schemes use the concept of objects in an image, and introduce a model of the human visual system in the design of the coding scheme. Dynamic coding schemes, as their name tells us, make ...

Fleury, Pascal — Swiss Federal Institute of Technology


Density-based shape descriptors and similarity learning for 3D object retrieval

Next generation search engines will enable query formulations, other than text, relying on visual information encoded in terms of images and shapes. The 3D search technology, in particular, targets specialized application domains ranging from computer aided-design and manufacturing to cultural heritage archival and presentation. Content-based retrieval research aims at developing search engines that would allow users to perform a query by similarity of content. This thesis deals with two fundamentals problems in content-based 3D object retrieval: (1) How to describe a 3D shape to obtain a reliable representative for the subsequent task of similarity search? (2) How to supervise the search process to learn inter-shape similarities for more effective and semantic retrieval? Concerning the first problem, we develop a novel 3D shape description scheme based on probability density of multivariate local surface features. We constructively obtain local characterizations of 3D ...

Akgul, Ceyhun Burak — Bogazici University and Telecom ParisTech


No-Reference Image and Video Quality Assessment

Image and video quality assessment has become an increasingly important subject in digital video coding and transmission scenarios, such as digital television. In this context, a special interest has been put on no-reference objective quality assessment metrics, since they are suitable for real-time quality monitoring once the video delivery system is settled. This Thesis proposes new no-reference quality assessment metrics for images and video. The main goal of the proposed techniques is to estimate the quality of lossy DCT-based encoded video. The proposed metrics share the same key idea: based on elements extracted from the bitstream of the encoded images or video arriving at the point where quality assessment has to be performed, an estimate of the quantization error associated to each DCT coefficient is obtained. Those estimates are perceptually weighted and combined in order to obtain a quality score ...

Brandão, Tomás — Technical University of Lisbon


Facial Soft Biometrics: Methods, Applications and Solutions

This dissertation studies soft biometrics traits, their applicability in different security and commercial scenarios, as well as related usability aspects. We place the emphasis on human facial soft biometric traits which constitute the set of physical, adhered or behavioral human characteristics that can partially differentiate, classify and identify humans. Such traits, which include characteristics like age, gender, skin and eye color, the presence of glasses, moustache or beard, inherit several advantages such as ease of acquisition, as well as a natural compatibility with how humans perceive their surroundings. Specifically, soft biometric traits are compatible with the human process of classifying and recalling our environment, a process which involves constructions of hierarchical structures of different refined traits. This thesis explores these traits, and their application in soft biometric systems (SBSs), and specifically focuses on how such systems can achieve different goals ...

Dantcheva, Antitza — EURECOM / Telecom ParisTech


Dealing with Variability Factors and Its Application to Biometrics at a Distance

This Thesis is focused on dealing with the variability factors in biometric recognition and applications of biometrics at a distance. In particular, this PhD Thesis explores the problem of variability factors assessment and how to deal with them by the incorporation of soft biometrics information in order to improve person recognition systems working at a distance. The proposed methods supported by experimental results show the benefits of adapting the system considering the variability of the sample at hand. Although being relatively young compared to other mature and long-used security technologies, biometrics have emerged in the last decade as a pushing alternative for applications where automatic recognition of people is needed. Certainly, biometrics are very attractive and useful for video surveillance systems at a distance, widely distributed in our lifes, and for the final user: forget about PINs and passwords, you ...

Tome, Pedro — Universidad Autónoma de Madrid


Efficient Perceptual Audio Coding Using Cosine and Sine Modulated Lapped Transforms

The increasing number of simultaneous input and output channels utilized in immersive audio configurations primarily in broadcasting applications has renewed industrial requirements for efficient audio coding schemes with low bit-rate and complexity. This thesis presents a comprehensive review and extension of conventional approaches for perceptual coding of arbitrary multichannel audio signals. Particular emphasis is given to use cases ranging from two-channel stereophonic to six-channel 5.1-surround setups with or without the application-specific constraint of low algorithmic coding latency. Conventional perceptual audio codecs share six common algorithmic components, all of which are examined extensively in this thesis. The first is a signal-adaptive filterbank, constructed using instances of the real-valued modified discrete cosine transform (MDCT), to obtain spectral representations of successive portions of the incoming discrete time signal. Within this MDCT spectral domain, various intra- and inter-channel optimizations, most of which are of ...

Helmrich, Christian R. — Friedrich-Alexander-Universität Erlangen-Nürnberg


Distributed Compressed Representation of Correlated Image Sets

Vision sensor networks and video cameras find widespread usage in several applications that rely on effective representation of scenes or analysis of 3D information. These systems usually acquire multiple images of the same 3D scene from different viewpoints or at different time instants. Therefore, these images are generally correlated through displacement of scene objects. Efficient compression techniques have to exploit this correlation in order to efficiently communicate the 3D scene information. Instead of joint encoding that requires communication between the cameras, in this thesis we concentrate on distributed representation, where the captured images are encoded independently, but decoded jointly to exploit the correlation between images. One of the most important and challenging tasks relies in estimation of the underlying correlation from the compressed correlated images for effective reconstruction or analysis in the joint decoder. This thesis focuses on developing efficient ...

Thirumalai, Vijayaraghavan — EPFL, Switzerland


Contributions to the 3D city modeling: 3D polyhedral building model reconstruction from aerial images and 3D facade modeling from terrestrial 3D point cloud and images

The aim of this work is to develop research on 3D building modeling. In particular, the research in aerial-based 3D building reconstruction is a topic very developed since 1990. However, it is necessary to pursue the research since the actual approaches for 3D massive building reconstruction (although efficient) still encounter problems in generalization, coherency, accuracy. Besides, the recent developments of street acquisition systems such as Mobile Mapping Systems open new perspectives for improvements in building modeling in the sense that the terrestrial data (very dense and accurate) can be exploited with more performance (in comparison to the aerial investigation) to enrich the building models at facade level (e.g., geometry, texturing). Hence, aerial and terrestrial based building modeling approaches are individually proposed. At aerial level, we describe a direct and featureless approach for simple polyhedral building reconstruction from a set of ...

Hammoudi Karim — Université Paris-Est, Saint-Mandé, France


Modeling Perceived Quality for Imaging Applications

People of all generations are making more and more use of digital imaging systems in their daily lives. The image content rendered by these digital imaging systems largely differs in perceived quality depending on the system and its applications. To be able to optimize the experience of viewers of this content understanding and modeling perceived image quality is essential. Research on modeling image quality in a full-reference framework --- where the original content can be used as a reference --- is well established in literature. In many current applications, however, the perceived image quality needs to be modeled in a no-reference framework at real-time. As a consequence, the model needs to quantitatively predict perceived quality of a degraded image without being able to compare it to its original version, and has to achieve this with limited computational complexity in order ...

Liu, Hantao — Delft University of Technology


Synthetic test patterns and compression artefact distortion metrics for image codecs

This thesis presents a framework of test methodology to assess spatial domain compression artefacts produced by image and intra-frame coded video codecs. Few researchers have studied this broad range of artefacts. A taxonomy of image and video compression artefacts is proposed. This is based on the point of origin of the artefact in the image communication model. This thesis presents objective evaluation of distortions known as artefacts due to image and intra-frame coded video compression made using synthetic test patterns. The American National Standard Institute document ANSI T1 801 qualitatively defines blockiness, blur and ringing artefacts. These definitions have been augmented with quantitative definitions in conjunction with test patterns proposed. A test and measurement environment is proposed in which the codec under test is exercised using a portfolio of test patterns. The test patterns are designed to highlight the artefact ...

Punchihewa, Amal — Massey University, New Zealand


Toward sparse and geometry adapted video approximations

Video signals are sequences of natural images, where images are often modeled as piecewise-smooth signals. Hence, video can be seen as a 3D piecewise-smooth signal made of piecewise-smooth regions that move through time. Based on the piecewise-smooth model and on related theoretical work on rate-distortion performance of wavelet and oracle based coding schemes, one can better analyze the appropriate coding strategies that adaptive video codecs need to implement in order to be efficient. Efficient video representations for coding purposes require the use of adaptive signal decompositions able to capture appropriately the structure and redundancy appearing in video signals. Adaptivity needs to be such that it allows for proper modeling of signals in order to represent these with the lowest possible coding cost. Video is a very structured signal with high geometric content. This includes temporal geometry (normally represented by motion ...

Divorra Escoda, Oscar — EPFL / Signal Processing Institute

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.