Understanding and Assessing Quality of Experience in Immersive Communications
eXtended Reality (XR) technology, also called Mixed Reality (MR is in constant development and improvement in terms of hardware and software to offer relevant experiences to users. One of the advances in XR has been the introduction of real visual information in the virtual environment, offering a more natural interaction with the scene and a greater acceptance of technology. Another advance has been achieved with the representation of the scene through a video that covers the entire environment, called 360-degree or omnidirectional video. These videos are acquired by cameras with omnidirectional lenses that cover the 360-degrees of the scene and are generally viewed by users through a head-tracked Head Mounted Display (HMD). Users only visualize a subset of the 360-degree scene, called viewport, which changes with the variations of the viewing direction of the users, determined by the movements of the head. This thesis goes one step further and considers a real-time 360-degree video communication for teleconferencing purposes. We envision that this kind of communication will become mainstream within the next couple of decades. Our target is to research the technology that could make this possible and design a proper assessment methodology that scales for massive usage. Therefore, it is necessary to guarantee an acceptable Quality of Experience (QoE), defined as the degree of delight or annoyance of the user with an application or service, to increase the use of immersive communications. Based on this, this thesis presents a cross-sectional research to include the assessment of technical and socioemotional aspects in the 360-degree video communications paradigm. The research follows an evolutionary approach, modifying different conditions of the reference configuration of a 360-degree video communication prototype to understand the challenges of XR technologies in terms of QoE assessment. Starting from video quality, as a significant factor impacting QoE, we validate the Video Multimethod Assessment Fusion (VMAF) objective metric on 360-degree video, designed and developed for 2D content by Netflix, saving time and resources. To evaluate video quality in subjective assessments, we validate the Stimulus Discrete Quality Evaluation (SSDQE) methodology, which can be used with contents of long duration, allowing narrative. Then, we validate the fact that SSDQE allows the simultaneous evaluation of socioemotional and technical aspects, increasing the ecological validity of the experiments. The immersive communication system is mainly explored from the perspective of the remote user, with conclusions drawn on low-level (e.g., possibility of visualizing the hands or using the touchpad or the handheld controller to interact with the virtual environment) and high-level of the scenario (e.g., acquisition perspective). By conducting assessemnts based on both simulated and interactive communications, valuable insights have been concluded. Furthermore, what we have learned about design of experiments is summarized as a best practices guide for developers and researchers. Due to the transversal research, the guidelines are proposed from a common framework for two of the main viewpoints in QoE assessment, telecommunications and human computer interaction areas. The use case of tele-education is analyzed, including a video analysis module added to detect events of interest around the 360-degree scene and notify them to the remote students, helping to guide their attention. The notifications and the system as solution for tele-education are highly accepted by students. Additionally, we provide a database of 360-degree videos of real lessons with annotated events of interest, which is publicly available for training machine learning algorithms and subjective assessments. This thesis is a contribution to understand the paradigm of immersive communications to continue developing and evaluating them until they become a reality in society.
