Understanding and Assessing Quality of Experience in Immersive Communications

eXtended Reality (XR) technology, also called Mixed Reality (MR), is in constant development and improvement in terms of hardware and software to offer relevant experiences to users. One of the advances in XR has been the introduction of real visual information in the virtual environment, offering a more natural interaction with the scene and a greater acceptance of technology. Another advance has been achieved with the representation of the scene through a video that covers the entire environment, called 360-degree or omnidirectional video. These videos are acquired by cameras with omnidirectional lenses that cover the 360-degrees of the scene and are generally viewed by users through a head-tracked Head Mounted Display (HMD). Users only visualize a subset of the 360-degree scene, called viewport, which changes with the variations of the viewing direction of the users, determined by the movements of ...

Orduna, Marta — Universidad Politécnica de Madrid


Multiple Objective Optimization for Video Streaming

In this thesis, we propose Multiple Objective Optimization (MOO) frameworks for efficient video streaming. Firstly, we introduce pre-roll delay-distortion optimization (DDO) for uninterrupted content-adaptive video streaming over low capacity, constant bitrate (CBR) channels using MOO. Content analysis is used to divide the input video into shots with assigned relevance levels. The video is adaptively encoded and streamed aiming minimum pre-roll delay and distortion with the optimal spatial and temporal resolutions and quantization parameters for each shot. With buffer and distortion constraints, the bitrate of unimportant shots is reduced to achieve an acceptable quality in important shots. Secondly, we introduce a cross-layer optimized video rate adaptation and scheduling scheme to achieve maximum "application layer" Quality-of-Service (QoS), maximum video throughput (video seconds per transmission slot), and QoS fairness for wireless video streaming. Using the MOO framework, these objectives are jointly optimized such ...

Ozcelebi, Tanir — Koc University


Point Cloud Quality Assessment

Nowadays, richer 3D visual representation formats are emerging, notably light fields and point clouds. These formats enable new applications in many usage domains, notably virtual and augmented reality, geographical information systems, immersive communications, and cultural heritage. Recently, following major improvements in 3D visual data acquisition, there is an increasing interest in point-based visual representation, which models real-world objects as a cloud of sampled points on their surfaces. Point cloud is a 3D representation model where the real visual world is represented by a set of 3D coordinates (the geometry) over the objects with some additional attributes such as color and normals. With the advances in 3D acquisition systems, it is now possible to capture a realistic point cloud to represent a visual scene with a very high resolution. These point clouds may have up to billions of points and, thus, ...

Javaheri, Alireza — Instituto Superior Técnico - University of Lisbon


Video Quality Estimation for Mobile Video Streaming

For the provisioning of video streaming services it is essential to provide a required level of customer satisfaction, given by the perceived video stream quality. It is therefore important to choose the compression parameters as well as the network settings so that they maximize the end-user quality. Due to video compression improvements of the newest video coding standard H.264/AVC, video streaming for low bit and frame rates is possible while preserving its perceptual quality. This is especially suitable for video applications in 3G wireless networks. Mobile video streaming is characterized by low resolutions and low bitrates. The commonly used resolutions are Quarter Common Intermediate Format (QCIF,176x144 pixels) for cell phones, Common Intermediate Format (CIF, 352x288 pixels) and Standard Interchange Format (SIF or QVGA, 320x240 pixels) for data-cards and palmtops (PDA). The mandatory codec for Universal Mobile Telecommunications System (UMTS) streaming ...

Ries, Michal — Vienna University of Technology


Dialogue Enhancement and Personalization - Contributions to Quality Assessment and Control

The production and delivery of audio for television involve many creative and technical challenges. One of them is concerned with the level balance between the foreground speech (also referred to as dialogue) and the background elements, e.g., music, sound effects, and ambient sounds. Background elements are fundamental for the narrative and for creating an engaging atmosphere, but they can mask the dialogue, which the audience wishes to follow in a comfortable way. Very different individual factors of the people in the audience clash with the creative freedom of the content creators. As a result, service providers receive regular complaints about difficulties in understanding the dialogue because of too loud background sounds. While this has been a known issue for at least three decades, works analyzing the problem and up-to-date statics were scarce before the contributions in this work. Enabling the ...

Torcoli, Matteo — Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)


Mixed structural models for 3D audio in virtual environments

In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...

Geronazzo, Michele — University of Padova


Synthetic test patterns and compression artefact distortion metrics for image codecs

This thesis presents a framework of test methodology to assess spatial domain compression artefacts produced by image and intra-frame coded video codecs. Few researchers have studied this broad range of artefacts. A taxonomy of image and video compression artefacts is proposed. This is based on the point of origin of the artefact in the image communication model. This thesis presents objective evaluation of distortions known as artefacts due to image and intra-frame coded video compression made using synthetic test patterns. The American National Standard Institute document ANSI T1 801 qualitatively defines blockiness, blur and ringing artefacts. These definitions have been augmented with quantitative definitions in conjunction with test patterns proposed. A test and measurement environment is proposed in which the codec under test is exercised using a portfolio of test patterns. The test patterns are designed to highlight the artefact ...

Punchihewa, Amal — Massey University, New Zealand


Quality Aspects of Packet-Based Interactive Speech Communication

Voice-over-Internet Protocol (VoIP) technology provides the transmission of speech over packet-based networks. The transition from circuit-switched to packet-switched networks introduces two major quality impairments: packet loss and end-to-end delay. This thesis shows that the incorporation of packets that were damaged by bit errors reduces the effective packet loss rate, and thus improves the speech quality as perceived by the user. Moreover, this thesis addresses the impact of transmission delay on conversational interactivity and on the perceived speech quality. In order to study the structure and interactivity of conversations, the framework of Parametric Conversation Analysis (P-CA) is introduced and three metrics for conversational interactivity are defined. The investigation of five conversation scenarios based on subjective quality tests has shown that only highly structured scenarios result in high conversational interactivity. The speaker alternation rate has turned out to represent a simple and ...

Hammer, Florian — Graz University of Technology


Vision models and quality metrics for image processing applications

Optimizing the performance of digital imaging systems with respect to the capture, display, storage and transmission of visual information represents one of the biggest challenges in the field of image and video processing. Taking into account the way humans perceive visual information can be greatly beneficial for this task. To achieve this, it is necessary to understand and model the human visual system, which is also the principal goal of this thesis. Computational models for different aspects of the visual system are developed, which can be used in a wide variety of image and video processing applications. The proposed models and metrics are shown to be consistent with human perception. The focus of this work is visual quality assessment. A perceptual distortion metric (PDM) for the evaluation of video quality is presented. It is based on a model of the ...

Winkler, Stefan — Swiss Federal Institute of Technology


Adaptive media streaming over multipath networks

With the latest developments in video coding technology and fast deployment of end-user broadband internet connections, real-time media applications become increasingly interesting for both private users and businesses. However, the internet remains a best-effort service network unable to guarantee the stringent requirements of the media application, in terms of high, constant bandwidth, low packet loss rate and transmission delay. Therefore, efficient adaptation mechanisms must be derived in order to bridge the application requirements with the transport medium characteristics. Lately, different network architectures, e.g., peer-to-peer networks, content distribution networks, parallel wireless services, emerge as potential solutions for reducing the cost of communication or infrastructure, and possibly improve the application performance. In this thesis, we start from the path diversity characteristic of these architectures, in order to build a new framework, specific for media streaming in multipath networks. Within this framework we ...

Jurca, Dan — EPFL/ITS, Lausanne, Switzerland


Robust and multiresolution video delivery : From H.26x to Matching pursuit based technologies

With the joint development of networking and digital coding technologies multimedia and more particularly video services are clearly becoming one of the major consumers of the new information networks. The rapid growth of the Internet and computer industry however results in a very heterogeneous infrastructure commonly overloaded. Video service providers have nevertheless to oer to their clients the best possible quality according to their respective capabilities and communication channel status. The Quality of Service is not only inuenced by the compression artifacts, but also by unavoidable packet losses. Hence, the packet video stream has clearly to fulll possibly contradictory requirements, that are coding eciency and robustness to data loss. The rst contribution of this thesis is the complete modeling of the video Quality of Service (QoS) in standard and more particularly MPEG-2 applications. The performance of Forward Error Control (FEC) ...

Frossard, Pascal — Swiss Federal Institute of Technology


Melody Extraction from Polyphonic Music Signals

Music was the first mass-market industry to be completely restructured by digital technology, and today we can have access to thousands of tracks stored locally on our smartphone and millions of tracks through cloud-based music services. Given the vast quantity of music at our fingertips, we now require novel ways of describing, indexing, searching and interacting with musical content. In this thesis we focus on a technology that opens the door to a wide range of such applications: automatically estimating the pitch sequence of the melody directly from the audio signal of a polyphonic music recording, also referred to as melody extraction. Whilst identifying the pitch of the melody is something human listeners can do quite well, doing this automatically is highly challenging. We present a novel method for melody extraction based on the tracking and characterisation of the pitch ...

Salamon, Justin — Universitat Pompeu Fabra


Signal and Spectrum Coordination for Next Generation DSL Networks

The ability to easily exchange and access data has transformed the way we work, study, inform and entertain ourselves. In particular, the Internet has had an effect on people’s lives in the past two decades that is profound. Profound as this effect may be, people seem not to grow tired of it. On the contrary: as of today, the Internet revolution is far from over. The thirst for bigger amounts of data at higher speeds and biquitous connectivity seem not to abate. This thirst for more, faster and better quality data is both a huge challenge and a huge opportunity for the broadband access industry. The opportunity lies on the fact that, as of the end of 2012, there were 600 million subscribers to broadband services around the world. Plus, even though the market is already enormous, it still has ...

Moraes, Rodrigo B. — KU Leuven


Dealing with Variability Factors and Its Application to Biometrics at a Distance

This Thesis is focused on dealing with the variability factors in biometric recognition and applications of biometrics at a distance. In particular, this PhD Thesis explores the problem of variability factors assessment and how to deal with them by the incorporation of soft biometrics information in order to improve person recognition systems working at a distance. The proposed methods supported by experimental results show the benefits of adapting the system considering the variability of the sample at hand. Although being relatively young compared to other mature and long-used security technologies, biometrics have emerged in the last decade as a pushing alternative for applications where automatic recognition of people is needed. Certainly, biometrics are very attractive and useful for video surveillance systems at a distance, widely distributed in our lifes, and for the final user: forget about PINs and passwords, you ...

Tome, Pedro — Universidad Autónoma de Madrid


Facial Soft Biometrics: Methods, Applications and Solutions

This dissertation studies soft biometrics traits, their applicability in different security and commercial scenarios, as well as related usability aspects. We place the emphasis on human facial soft biometric traits which constitute the set of physical, adhered or behavioral human characteristics that can partially differentiate, classify and identify humans. Such traits, which include characteristics like age, gender, skin and eye color, the presence of glasses, moustache or beard, inherit several advantages such as ease of acquisition, as well as a natural compatibility with how humans perceive their surroundings. Specifically, soft biometric traits are compatible with the human process of classifying and recalling our environment, a process which involves constructions of hierarchical structures of different refined traits. This thesis explores these traits, and their application in soft biometric systems (SBSs), and specifically focuses on how such systems can achieve different goals ...

Dantcheva, Antitza — EURECOM / Telecom ParisTech

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.