Understanding and Assessing Quality of Experience in Immersive Communications (2023)
Mixed structural models for 3D audio in virtual environments
In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...
Geronazzo, Michele — University of Padova
Quality of Experience Evaluation Methodology via Crowdsourcing
Provisioning of digital video services is a difficult task as it is hard to estimate optimal settings of video parameters, given transmission constraints, while maximizing the overall end-user quality. With Internet streaming services becoming part of our everyday life, end-to-end optimization of such systems is important. On one hand, huge effort is given into subjective or objective evaluation of the end-user perception. High quality audiovisual perception with respect to the minimized costs of the provided service is one of the main interests for the network providers. On the other hand, subjective evaluations to determine best video and audio configurations are often evaluated in controlled test laboratory environments, which have little to do with the real environments in which consumers enjoy such content. Unfortunately, no serious attempts have been made to take into account interactions between quality of the content and ...
Gardlo, Bruno — University of Zilina
Point Cloud Quality Assessment
Nowadays, richer 3D visual representation formats are emerging, notably light fields and point clouds. These formats enable new applications in many usage domains, notably virtual and augmented reality, geographical information systems, immersive communications, and cultural heritage. Recently, following major improvements in 3D visual data acquisition, there is an increasing interest in point-based visual representation, which models real-world objects as a cloud of sampled points on their surfaces. Point cloud is a 3D representation model where the real visual world is represented by a set of 3D coordinates (the geometry) over the objects with some additional attributes such as color and normals. With the advances in 3D acquisition systems, it is now possible to capture a realistic point cloud to represent a visual scene with a very high resolution. These point clouds may have up to billions of points and, thus, ...
Javaheri, Alireza — Instituto Superior Técnico - University of Lisbon
Dialogue Enhancement and Personalization - Contributions to Quality Assessment and Control
The production and delivery of audio for television involve many creative and technical challenges. One of them is concerned with the level balance between the foreground speech (also referred to as dialogue) and the background elements, e.g., music, sound effects, and ambient sounds. Background elements are fundamental for the narrative and for creating an engaging atmosphere, but they can mask the dialogue, which the audience wishes to follow in a comfortable way. Very different individual factors of the people in the audience clash with the creative freedom of the content creators. As a result, service providers receive regular complaints about difficulties in understanding the dialogue because of too loud background sounds. While this has been a known issue for at least three decades, works analyzing the problem and up-to-date statics were scarce before the contributions in this work. Enabling the ...
Torcoli, Matteo — Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)
Vision models and quality metrics for image processing applications
Optimizing the performance of digital imaging systems with respect to the capture, display, storage and transmission of visual information represents one of the biggest challenges in the field of image and video processing. Taking into account the way humans perceive visual information can be greatly beneficial for this task. To achieve this, it is necessary to understand and model the human visual system, which is also the principal goal of this thesis. Computational models for different aspects of the visual system are developed, which can be used in a wide variety of image and video processing applications. The proposed models and metrics are shown to be consistent with human perception. The focus of this work is visual quality assessment. A perceptual distortion metric (PDM) for the evaluation of video quality is presented. It is based on a model of the ...
Winkler, Stefan — Swiss Federal Institute of Technology
System-Level Modeling and Optimization of MIMO HSDPA Networks
Interaction between the Medium Access Control (MAC)-layer and the physical-layer routines is one of the basic concepts of modern wireless networks. Physical-layer dependent resource allocation and scheduling guarantee efficient network utilization. Accordingly, classical link-level analyses, focusing only on the physical-layer are not sufficient anymore for optimum transceiver structure and algorithm development. This thesis presents the development and application of a system-level description suitable for the downlink of Multiple-Input Multiple-Output (MIMO) enhanced High-Speed Downlink Packet Access (HSDPA), with particular focus on the Double Transmit Antenna Array (D-TxAA) transmission mode. The system-level model allows for investigating and evaluating transmission systems and algorithms in the context of cellular networks. Two separate models are proposed to obtain a complete system-level description: (i) a link-quality model, analytically describing the MIMO HSDPA link quality in a so-called equivalent fading parameter structure, and (ii) a link-performance model, ...
Wrulich, Martin — Vienna University of Technology
Facial Soft Biometrics: Methods, Applications and Solutions
This dissertation studies soft biometrics traits, their applicability in different security and commercial scenarios, as well as related usability aspects. We place the emphasis on human facial soft biometric traits which constitute the set of physical, adhered or behavioral human characteristics that can partially differentiate, classify and identify humans. Such traits, which include characteristics like age, gender, skin and eye color, the presence of glasses, moustache or beard, inherit several advantages such as ease of acquisition, as well as a natural compatibility with how humans perceive their surroundings. Specifically, soft biometric traits are compatible with the human process of classifying and recalling our environment, a process which involves constructions of hierarchical structures of different refined traits. This thesis explores these traits, and their application in soft biometric systems (SBSs), and specifically focuses on how such systems can achieve different goals ...
Dantcheva, Antitza — EURECOM / Telecom ParisTech
Dealing with Variability Factors and Its Application to Biometrics at a Distance
This Thesis is focused on dealing with the variability factors in biometric recognition and applications of biometrics at a distance. In particular, this PhD Thesis explores the problem of variability factors assessment and how to deal with them by the incorporation of soft biometrics information in order to improve person recognition systems working at a distance. The proposed methods supported by experimental results show the benefits of adapting the system considering the variability of the sample at hand. Although being relatively young compared to other mature and long-used security technologies, biometrics have emerged in the last decade as a pushing alternative for applications where automatic recognition of people is needed. Certainly, biometrics are very attractive and useful for video surveillance systems at a distance, widely distributed in our lifes, and for the final user: forget about PINs and passwords, you ...
Tome, Pedro — Universidad Autónoma de Madrid
Computational models of expressive gesture in multimedia systems
This thesis focuses on the development of paradigms and techniques for the design and implementation of multimodal interactive systems, mainly for performing arts applications. The work addresses research issues in the fields of human-computer interaction, multimedia systems, and sound and music computing. The thesis is divided into two parts. In the first one, after a short review of the state-of-the-art, the focus moves on the definition of environments in which novel forms of technology-integrated artistic performances can take place. These are distributed active mixed reality environments in which information at different layers of abstraction is conveyed mainly non-verbally through expressive gestures. Expressive gesture is therefore defined and the internal structure of a virtual observer able to process it (and inhabiting the proposed environments) is described in a multimodal perspective. The definition of the structure of the environments, of the virtual ...
Volpe, Gualtiero — University of Genova
This work considers a Broadcast Channel (BC) system, where the transmitter is equipped with multiple antennas and each user at the receiver side could have one or more antennas. Depending on the number of antennas at the receiver side, such a system is known as Multiple-User Multiple-Input Single-Output (MU-MISO), for single antenna users, or Multiple-UserMultiple-InputMultiple-Output (MU-MIMO), for several antenna users. This model is suitable for current wireless communication systems. Regarding the direction of the data flow, we differentiate between downlink channel or BC, and uplink channel or Multiple Access Channel (MAC). In the BC the signals are sent from the Base Station (BS) to the users, whereas the information from the users is sent to the BS in the MAC. In this work we focus on the BC where the BS applies linear precoding taking advantage of multiple antennas. The ...
González-Coma, José Pablo — University of a Coruña
Adaptive Algorithms for Intelligent Acoustic Interfaces
Modern speech communications are evolving towards a new direction which involves users in a more perceptive way. That is the immersive experience, which may be considered as the “last mile” problem of telecommunications. One of the main feature of immersive communications is the distant-talking, i.e. the hands-free (in the broad sense) speech communications without bodyworn or tethered microphones that takes place in a multisource environment where interfering signals may degrade the communication quality and the intelligibility of the desired speech source. In order to preserve speech quality intelligent acoustic interfaces may be used. An intelligent acoustic interface may comprise multiple microphones and loudspeakers and its peculiarity is to model the acoustic channel in order to adapt to user requirements and to environment conditions. This is the reason why intelligent acoustic interfaces are based on adaptive filtering algorithms. The acoustic path ...
Comminiello, Danilo — Sapienza University of Rome
Content-based search and browsing in semantic multimedia retrieval
Growth in storage capacity has led to large digital video repositories and complicated the discovery of specific information without the laborious manual annotation of data. The research focuses on creating a retrieval system that is ultimately independent of manual work. To retrieve relevant content, the semantic gap between the searcher's information need and the content data has to be overcome using content-based technology. Semantic gap constitutes of two distinct elements: the ambiguity of the true information need and the equivocalness of digital video data. The research problem of this thesis is: what computational content-based models for retrieval increase the effectiveness of the semantic retrieval of digital video? The hypothesis is that semantic search performance can be improved using pattern recognition, data abstraction and clustering techniques jointly with human interaction through manually created queries and visual browsing. The results of this ...
Rautiainen, Mika — University of Oulou
Multiple Objective Optimization for Video Streaming
In this thesis, we propose Multiple Objective Optimization (MOO) frameworks for efficient video streaming. Firstly, we introduce pre-roll delay-distortion optimization (DDO) for uninterrupted content-adaptive video streaming over low capacity, constant bitrate (CBR) channels using MOO. Content analysis is used to divide the input video into shots with assigned relevance levels. The video is adaptively encoded and streamed aiming minimum pre-roll delay and distortion with the optimal spatial and temporal resolutions and quantization parameters for each shot. With buffer and distortion constraints, the bitrate of unimportant shots is reduced to achieve an acceptable quality in important shots. Secondly, we introduce a cross-layer optimized video rate adaptation and scheduling scheme to achieve maximum "application layer" Quality-of-Service (QoS), maximum video throughput (video seconds per transmission slot), and QoS fairness for wireless video streaming. Using the MOO framework, these objectives are jointly optimized such ...
Ozcelebi, Tanir — Koc University
Modeling Perceived Quality for Imaging Applications
People of all generations are making more and more use of digital imaging systems in their daily lives. The image content rendered by these digital imaging systems largely differs in perceived quality depending on the system and its applications. To be able to optimize the experience of viewers of this content understanding and modeling perceived image quality is essential. Research on modeling image quality in a full-reference framework --- where the original content can be used as a reference --- is well established in literature. In many current applications, however, the perceived image quality needs to be modeled in a no-reference framework at real-time. As a consequence, the model needs to quantitatively predict perceived quality of a degraded image without being able to compare it to its original version, and has to achieve this with limited computational complexity in order ...
Liu, Hantao — Delft University of Technology
Video Content Analysis by Active Learning
Advances in compression techniques, decreasing cost of storage, and high-speed transmission have facilitated the way videos are created, stored and distributed. As a consequence, videos are now being used in many applications areas. The increase in the amount of video data deployed and used in today's applications reveals not only the importance as multimedia data type, but also led to the requirement of efficient management of video data. This management paved the way for new research areas, such as indexing and retrieval of video with respect to their spatio-temporal, visual and semantic contents. This thesis presents work towards a unified framework for semi-automated video indexing and interactive retrieval. To create an efficient index, a set of representative key frames are selected which capture and encapsulate the entire video content. This is achieved by, firstly, segmenting the video into its constituent ...
Camara Chavez, Guillermo — Federal University of Minas Gerais
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.