Dialogue Enhancement and Personalization - Contributions to Quality Assessment and Control (2023)
Quality of Experience Evaluation Methodology via Crowdsourcing
Provisioning of digital video services is a difficult task as it is hard to estimate optimal settings of video parameters, given transmission constraints, while maximizing the overall end-user quality. With Internet streaming services becoming part of our everyday life, end-to-end optimization of such systems is important. On one hand, huge effort is given into subjective or objective evaluation of the end-user perception. High quality audiovisual perception with respect to the minimized costs of the provided service is one of the main interests for the network providers. On the other hand, subjective evaluations to determine best video and audio configurations are often evaluated in controlled test laboratory environments, which have little to do with the real environments in which consumers enjoy such content. Unfortunately, no serious attempts have been made to take into account interactions between quality of the content and ...
Gardlo, Bruno — University of Zilina
The present doctoral thesis aims towards the development of new long-term, multi-channel, audio-visual processing techniques for the analysis of bioacoustics phenomena. The effort is focused on the study of the physiology of the gastrointestinal system, aiming at the support of medical research for the discovery of gastrointestinal motility patterns and the diagnosis of functional disorders. The term "processing" in this case is quite broad, incorporating the procedures of signal processing, content description, manipulation and analysis, that are applied to all the recorded bioacoustics signals, the auxiliary audio-visual surveillance information (for the monitoring of experiments and the subjects' status), and the extracted audio-video sequences describing the abdominal sound-field alterations. The thesis outline is as follows. The main objective of the thesis, which is the technological support of medical research, is presented in the first chapter. A quick problem definition is initially ...
Dimoulas, Charalampos — Department of Electrical and Computer Engineering, Faculty of Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece
Understanding and Assessing Quality of Experience in Immersive Communications
eXtended Reality (XR) technology, also called Mixed Reality (MR), is in constant development and improvement in terms of hardware and software to offer relevant experiences to users. One of the advances in XR has been the introduction of real visual information in the virtual environment, offering a more natural interaction with the scene and a greater acceptance of technology. Another advance has been achieved with the representation of the scene through a video that covers the entire environment, called 360-degree or omnidirectional video. These videos are acquired by cameras with omnidirectional lenses that cover the 360-degrees of the scene and are generally viewed by users through a head-tracked Head Mounted Display (HMD). Users only visualize a subset of the 360-degree scene, called viewport, which changes with the variations of the viewing direction of the users, determined by the movements of ...
Orduna, Marta — Universidad Politécnica de Madrid
Mixed structural models for 3D audio in virtual environments
In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...
Geronazzo, Michele — University of Padova
Application of Sound Source Separation Methods to Advanced Spatial Audio Systems
This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in two-channel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to ...
Cobos, Maximo — Universidad Politecnica de Valencia
Dealing with Variability Factors and Its Application to Biometrics at a Distance
This Thesis is focused on dealing with the variability factors in biometric recognition and applications of biometrics at a distance. In particular, this PhD Thesis explores the problem of variability factors assessment and how to deal with them by the incorporation of soft biometrics information in order to improve person recognition systems working at a distance. The proposed methods supported by experimental results show the benefits of adapting the system considering the variability of the sample at hand. Although being relatively young compared to other mature and long-used security technologies, biometrics have emerged in the last decade as a pushing alternative for applications where automatic recognition of people is needed. Certainly, biometrics are very attractive and useful for video surveillance systems at a distance, widely distributed in our lifes, and for the final user: forget about PINs and passwords, you ...
Tome, Pedro — Universidad Autónoma de Madrid
Best Signal Selection with Automatic Delay Compensation in VoIP Environment
In the last decades, air traffic spread more and more in the world, connecting more and more places. At the same time, the need to manage all the flights correctly and securely increased. Air traffic authorities imposed and updated several standards for the air traffic management (ATM) system, keeping in pace with the growing traffic flow. To achieve this, special voice communication systems (VCS) were developed. They ensure the communication between the pilots and the operators from the ground control centers. When a communication is initiated between the aircraft’s pilot and the ground air traffic control operator, various systems are used. The pilot speaks through the aircraft’s radio station and the signal is received by several ground radio stations. Then, the signal from each ground radio station arrives on different paths to the control center. Here one of the received ...
Marinescu, Radu-Sebastian — University Politehnica of Bucharest
Contributions to Human Motion Modeling and Recognition using Non-intrusive Wearable Sensors
This thesis contributes to motion characterization through inertial and physiological signals captured by wearable devices and analyzed using signal processing and deep learning techniques. This research leverages the possibilities of motion analysis for three main applications: to know what physical activity a person is performing (Human Activity Recognition), to identify who is performing that motion (user identification) or know how the movement is being performed (motor anomaly detection). Most previous research has addressed human motion modeling using invasive sensors in contact with the user or intrusive sensors that modify the user’s behavior while performing an action (cameras or microphones). In this sense, wearable devices such as smartphones and smartwatches can collect motion signals from users during their daily lives in a less invasive or intrusive way. Recently, there has been an exponential increase in research focused on inertial-signal processing to ...
Gil-Martín, Manuel — Universidad Politécnica de Madrid
An analysis of the ergonomic quality of the current standards for the visual display quality leads to a number of recommendations for the development of new international standards: - Separation for different types of users, esp. display designers, purchasers, and end users, -Independence of display technology to allow comparison, -Modular construction with several quality grades to allow benchmarking for different types of applications, -A test method for the end user standard that can be performed at the place of work, to take into account the effects of wear and drift of components and to be able to correct suboptimal configurations. The separate parameters that exert influence on the image quality of a broad category of images in the context of use, and their mutual coherence within the cycle of evaluation and adaptation of image quality are presented in the "Image ...
Besuijen, Jacobus — Delft University of Technology
This thesis concentrates on a major problem within audio signal processing, the separation of source signals from musical mixtures when only a single mixture channel is available. Source separation is the process by which signals that correspond to distinct sources are identified in a signal mixture and extracted from it. Producing multiple entities from a single one is an extremely underdetermined task, so additional prior information can assist in setting appropriate constraints on the solution set. The approach proposed uses prior information such that: (1) it can potentially be applied successfully to a large variety of musical mixtures, and (2) it requires minimal user intervention and no prior learning/training procedures (i.e., it is an unsupervised process). This system can be useful for applications such as remixing, creative effects, restoration and for archiving musical material for internet delivery, amongst others. Here, ...
Siamantas, Georgios — University of York
Filter Optimization for Personal Sound Zones Systems
Personal Sound Zones (PSZ) systems deliver different sounds to a number of listeners sharing an acoustic space through the use of loudspeakers together with signal processing techniques. These systems have attracted a lot of attention in recent years because of the wide range of applications that would benefit from the generation of individual listening zones, e.g., domestic or automotive audio applications. A key aspect of PSZ systems, at least for low and mid frequencies, is the optimization of the filters used to process the sound signals. Different algorithms have been proposed in the literature for computing those filters, each exhibiting some advantages and disadvantages. In this work, the state-of-the-art algorithms for PSZ systems are reviewed, and their performance in a reverberant environment is evaluated. Aspects such as the acoustic isolation between zones, the reproduction error, the energy of the filters, ...
Vicent Molés Cases — Universitat Politecnica de Valencia
Machine vision applies computer vision to industry and manufacturing in order to control or analyze a process or activity. Typical application of machine vision is the inspection of produced goods like electronic devices, automobiles, food and pharmaceuticals. Machine vision systems form their judgement based on specially designed image processing softwares. Therefore, image processing is very crucial for their accuracy. Food industry is among the industries that largely use image processing for inspection of produce. Fruits and vegetables have extremely varying physical appearance. Numerous defect types present for apples as well as high natural variability of their skin color brings apple fruits into the center of our interest. Traditional inspection of apple fruits is performed by human experts. But, automation of this process is necessary to reduce error, variation, fatigue and cost due to human experts as well as to increase ...
Unay, Devrim — Universite de Mons
Robust and multiresolution video delivery : From H.26x to Matching pursuit based technologies
With the joint development of networking and digital coding technologies multimedia and more particularly video services are clearly becoming one of the major consumers of the new information networks. The rapid growth of the Internet and computer industry however results in a very heterogeneous infrastructure commonly overloaded. Video service providers have nevertheless to oer to their clients the best possible quality according to their respective capabilities and communication channel status. The Quality of Service is not only inuenced by the compression artifacts, but also by unavoidable packet losses. Hence, the packet video stream has clearly to fulll possibly contradictory requirements, that are coding eciency and robustness to data loss. The rst contribution of this thesis is the complete modeling of the video Quality of Service (QoS) in standard and more particularly MPEG-2 applications. The performance of Forward Error Control (FEC) ...
Frossard, Pascal — Swiss Federal Institute of Technology
Coordination Strategies for Interference Management in MIMO Dense Cellular Networks
The envisioned rapid and exponential increase of wireless data traffic demand in the next years imposes rethinking current wireless cellular networks due to the scarcity of the available spectrum. In this regard, three main drivers are considered to increase the capacity of today's most advanced (4G systems) and future (5G systems and beyond) cellular networks: i) use more bandwidth (more Hz) through spectral aggregation, ii) enhance the spectral efficiency per base station (BS) (more bits/s/Hz/BS) by using multiple antennas at BSs and users (i.e. MIMO systems), and iii) increase the density of BSs (more BSs/km2) through a dense and heterogeneous deployment (known as dense heterogeneous cellular networks). We focus on the last two drivers. First, the use of multi-antenna systems allows exploiting the spatial dimension for several purposes: improving the capacity of a conventional point-to-point wireless link, increasing the number ...
Lagen, Sandra — Universitat Politecnica de Catalunya
Discrete-time speech processing with application to emotion recognition
The subject of this PhD thesis is the efficient and robust processing and analysis of the audio recordings that are derived from a call center. The thesis is comprised of two parts. The first part is dedicated to dialogue/non-dialogue detection and to speaker segmentation. The systems that are developed are prerequisite for detecting (i) the audio segments that actually contain a dialogue between the system and the call center customer and (ii) the change points between the system and the customer. This way the volume of the audio recordings that need to be processed is significantly reduced, while the system is automated. To detect the presence of a dialogue several systems are developed. This is the first effort found in the international literature that the audio channel is exclusively exploited. Also, it is the first time that the speaker utterance ...
Kotti, Margarita — Aristotle University of Thessaloniki
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.