Vision models and quality metrics for image processing applications

Optimizing the performance of digital imaging systems with respect to the capture, display, storage and transmission of visual information represents one of the biggest challenges in the field of image and video processing. Taking into account the way humans perceive visual information can be greatly beneficial for this task. To achieve this, it is necessary to understand and model the human visual system, which is also the principal goal of this thesis. Computational models for different aspects of the visual system are developed, which can be used in a wide variety of image and video processing applications. The proposed models and metrics are shown to be consistent with human perception. The focus of this work is visual quality assessment. A perceptual distortion metric (PDM) for the evaluation of video quality is presented. It is based on a model of the ...

Winkler, Stefan — Swiss Federal Institute of Technology


Integration of human color vision models into high quality image compression

Strong academic and commercial interest in image compression has resulted in a number of sophisticated compression techniques. Some of these techniques have evolved into international standards such as JPEG. However, the widespread success of JPEG has slowed the rate of innovation in such standards. Even most recent techniques, such as those proposed in the JPEG2000 standard, do not show significantly improved compression performance; rather they increase the bitstream functionality. Nevertheless, the manifold of multimedia applications demands for further improvements in compression quality. The problem of stagnating compression quality can be overcome by exploiting the limitations of the human visual system (HVS) for compression purposes. To do so, commonly used distortion metrics such as mean-square error (MSE) are replaced by an HVS-model-based quality metric. Thus, the "visual" quality is optimized. Due to the tremendous complexity of the physiological structures involved in ...

Nadenau, Marcus J. — Swiss Federal Institute of Technology


Dynamic Scheme Selection in Image Coding

This thesis deals with the coding of images with multiple coding schemes and their dynamic selection. In our society of information highways, electronic communication is taking everyday a bigger place in our lives. The number of transmitted images is also increasing everyday. Therefore, research on image compression is still an active area. However, the current trend is to add several functionalities to the compression scheme such as progressiveness for more comfortable browsing of web-sites or databases. Classical image coding schemes have a rigid structure. They usually process an image as a whole and treat the pixels as a simple signal with no particular characteristics. Second generation schemes use the concept of objects in an image, and introduce a model of the human visual system in the design of the coding scheme. Dynamic coding schemes, as their name tells us, make ...

Fleury, Pascal — Swiss Federal Institute of Technology


Dialogue Enhancement and Personalization - Contributions to Quality Assessment and Control

The production and delivery of audio for television involve many creative and technical challenges. One of them is concerned with the level balance between the foreground speech (also referred to as dialogue) and the background elements, e.g., music, sound effects, and ambient sounds. Background elements are fundamental for the narrative and for creating an engaging atmosphere, but they can mask the dialogue, which the audience wishes to follow in a comfortable way. Very different individual factors of the people in the audience clash with the creative freedom of the content creators. As a result, service providers receive regular complaints about difficulties in understanding the dialogue because of too loud background sounds. While this has been a known issue for at least three decades, works analyzing the problem and up-to-date statics were scarce before the contributions in this work. Enabling the ...

Torcoli, Matteo — Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)


No-Reference Image and Video Quality Assessment

Image and video quality assessment has become an increasingly important subject in digital video coding and transmission scenarios, such as digital television. In this context, a special interest has been put on no-reference objective quality assessment metrics, since they are suitable for real-time quality monitoring once the video delivery system is settled. This Thesis proposes new no-reference quality assessment metrics for images and video. The main goal of the proposed techniques is to estimate the quality of lossy DCT-based encoded video. The proposed metrics share the same key idea: based on elements extracted from the bitstream of the encoded images or video arriving at the point where quality assessment has to be performed, an estimate of the quantization error associated to each DCT coefficient is obtained. Those estimates are perceptually weighted and combined in order to obtain a quality score ...

Brandão, Tomás — Technical University of Lisbon


Synthetic test patterns and compression artefact distortion metrics for image codecs

This thesis presents a framework of test methodology to assess spatial domain compression artefacts produced by image and intra-frame coded video codecs. Few researchers have studied this broad range of artefacts. A taxonomy of image and video compression artefacts is proposed. This is based on the point of origin of the artefact in the image communication model. This thesis presents objective evaluation of distortions known as artefacts due to image and intra-frame coded video compression made using synthetic test patterns. The American National Standard Institute document ANSI T1 801 qualitatively defines blockiness, blur and ringing artefacts. These definitions have been augmented with quantitative definitions in conjunction with test patterns proposed. A test and measurement environment is proposed in which the codec under test is exercised using a portfolio of test patterns. The test patterns are designed to highlight the artefact ...

Punchihewa, Amal — Massey University, New Zealand


An Attention Model and its Application in Man-Made Scene Interpretation

The ultimate aim of research into computer vision is designing a system which interprets its surrounding environment in a similar way the human can do effortlessly. However, the state of technology is far from achieving such a goal. In this thesis different components of a computer vision system that are designed for the task of interpreting man-made scenes, in particular images of buildings, are described. The flow of information in the proposed system is bottom-up i.e., the image is first segmented into its meaningful components and subsequently the regions are labelled using a contextual classifier. Starting from simple observations concerning the human vision system and the gestalt laws of human perception, like the law of 'good (simple) shape' and 'perceptual grouping', a blob detector is developed, that identifies components in a 2D image. These components are convex regions of interest, ...

Jahangiri, Mohammad — Imperial College London


Understanding and Assessing Quality of Experience in Immersive Communications

eXtended Reality (XR) technology, also called Mixed Reality (MR), is in constant development and improvement in terms of hardware and software to offer relevant experiences to users. One of the advances in XR has been the introduction of real visual information in the virtual environment, offering a more natural interaction with the scene and a greater acceptance of technology. Another advance has been achieved with the representation of the scene through a video that covers the entire environment, called 360-degree or omnidirectional video. These videos are acquired by cameras with omnidirectional lenses that cover the 360-degrees of the scene and are generally viewed by users through a head-tracked Head Mounted Display (HMD). Users only visualize a subset of the 360-degree scene, called viewport, which changes with the variations of the viewing direction of the users, determined by the movements of ...

Orduna, Marta — Universidad Politécnica de Madrid


Point Cloud Quality Assessment

Nowadays, richer 3D visual representation formats are emerging, notably light fields and point clouds. These formats enable new applications in many usage domains, notably virtual and augmented reality, geographical information systems, immersive communications, and cultural heritage. Recently, following major improvements in 3D visual data acquisition, there is an increasing interest in point-based visual representation, which models real-world objects as a cloud of sampled points on their surfaces. Point cloud is a 3D representation model where the real visual world is represented by a set of 3D coordinates (the geometry) over the objects with some additional attributes such as color and normals. With the advances in 3D acquisition systems, it is now possible to capture a realistic point cloud to represent a visual scene with a very high resolution. These point clouds may have up to billions of points and, thus, ...

Javaheri, Alireza — Instituto Superior Técnico - University of Lisbon


Video Quality Estimation for Mobile Video Streaming

For the provisioning of video streaming services it is essential to provide a required level of customer satisfaction, given by the perceived video stream quality. It is therefore important to choose the compression parameters as well as the network settings so that they maximize the end-user quality. Due to video compression improvements of the newest video coding standard H.264/AVC, video streaming for low bit and frame rates is possible while preserving its perceptual quality. This is especially suitable for video applications in 3G wireless networks. Mobile video streaming is characterized by low resolutions and low bitrates. The commonly used resolutions are Quarter Common Intermediate Format (QCIF,176x144 pixels) for cell phones, Common Intermediate Format (CIF, 352x288 pixels) and Standard Interchange Format (SIF or QVGA, 320x240 pixels) for data-cards and palmtops (PDA). The mandatory codec for Universal Mobile Telecommunications System (UMTS) streaming ...

Ries, Michal — Vienna University of Technology


Adaptive Nonlocal Signal Restoration and Enhancement Techniques for High-Dimensional Data

The large number of practical applications involving digital images has motivated a significant interest towards restoration solutions that improve the visual quality of the data under the presence of various acquisition and compression artifacts. Digital images are the results of an acquisition process based on the measurement of a physical quantity of interest incident upon an imaging sensor over a specified period of time. The quantity of interest depends on the targeted imaging application. Common imaging sensors measure the number of photons impinging over a dense grid of photodetectors in order to produce an image similar to what is perceived by the human visual system. Different applications focus on the part of the electromagnetic spectrum not visible by the human visual system, and thus require different sensing technologies to form the image. In all cases, even with the advance of ...

Maggioni, Matteo — Tampere University of Technology


Embedded Optimization Algorithms for Perceptual Enhancement of Audio Signals

This thesis investigates the design and evaluation of an embedded optimization framework for the perceptual enhancement of audio signals which are degraded by linear and/or nonlinear distortion. In general, audio signal enhancement has the goal to improve the perceived audio quality, speech intelligibility, or another desired perceptual attribute of the distorted audio signal by applying a real-time digital signal processing algorithm. In the designed embedded optimization framework, the audio signal enhancement problem under consideration is formulated and solved as a per-frame numerical optimization problem, allowing to compute the enhanced audio signal frame that is optimal according to a desired perceptual attribute. The first stage of the embedded optimization framework consists in the formulation of the per-frame optimization problem aimed at maximally enhancing the desired perceptual attribute, by explicitly incorporating a suitable model of human sound perception. The second stage of ...

Defraene, Bruno — KU Leuven


Facial Soft Biometrics: Methods, Applications and Solutions

This dissertation studies soft biometrics traits, their applicability in different security and commercial scenarios, as well as related usability aspects. We place the emphasis on human facial soft biometric traits which constitute the set of physical, adhered or behavioral human characteristics that can partially differentiate, classify and identify humans. Such traits, which include characteristics like age, gender, skin and eye color, the presence of glasses, moustache or beard, inherit several advantages such as ease of acquisition, as well as a natural compatibility with how humans perceive their surroundings. Specifically, soft biometric traits are compatible with the human process of classifying and recalling our environment, a process which involves constructions of hierarchical structures of different refined traits. This thesis explores these traits, and their application in soft biometric systems (SBSs), and specifically focuses on how such systems can achieve different goals ...

Dantcheva, Antitza — EURECOM / Telecom ParisTech


Mixed structural models for 3D audio in virtual environments

In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...

Geronazzo, Michele — University of Padova


Human-Centered Content-Based Image Retrieval

Retrieval of images that lack a (suitable) annotations cannot be achieved through (traditional) Information Retrieval (IR) techniques. Access through such collections can be achieved through the application of computer vision techniques on the IR problem, which is baptized Content-Based Image Retrieval (CBIR). In contrast with most purely technological approaches, the thesis Human-Centered Content-Based Image Retrieval approaches the problem from a human/user centered perspective. Psychophysical experiments were conducted in which people were asked to categorize colors. The data gathered from these experiments was fed to a Fast Exact Euclidean Distance (FEED) transform (Schouten & Van den Broek, 2004), which enabled the segmentation of color space based on human perception (Van den Broek et al., 2008). This unique color space segementation was exploited for texture analysis and image segmentation, and subsequently for full-featured CBIR. In addition, a unique CBIR-benchmark was developed (Van ...

van den Broek, Egon L. — Radboud University Nijmegen

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.