Cloning with gesture expressivity

Virtual environments allow human beings to be represented by virtual humans or avatars. Users can share a sense of virtual presence is the avatar looks like the real human it represents. This classically involves turning the avatar into a clone with the real human’s appearance and voice. However, the possibility of cloning the gesture expressivity of a real person has received little attention so far. Gesture expressivity combines the style and mood of a person. Expressivity parameters have been defined in earlier works for animating embodied conversational agents. In this work, we focus on expressivity in wrist motion. First, we propose algorithms to estimate three expressivity parameters from captured wrist 3D trajectories: repetition, spatial extent and temporal extent. Then, we conducted perceptual study through a user survey the relevance of expressivity for recognizing individual human. We have animated a virtual ...

Rajagopal, Manoj kumar — Telecom Sudparis


Mixed structural models for 3D audio in virtual environments

In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...

Geronazzo, Michele — University of Padova


Real Time Stereo to Multi-view Video Conversion

A novel and efficient methodology is presented for the conversion of stereo to multi-view video in order to address the 3D content requirements for the next generation 3D-TVs and auto-stereoscopic multi-view displays. There are two main algorithmic blocks in such a conversion system; stereo matching and virtual view rendering that enable extraction of 3D information from stereo video and synthesis of inexistent virtual views, respectively. In the intermediate steps of these functional blocks, a novel edge-preserving filter is proposed that recursively constructs connected support regions for each pixel among color-wise similar neighboring pixels. The proposed recursive update structure eliminates pre-defined window dependency of the conventional approaches, providing complete content adaptibility with quite low computational complexity. Based on extensive tests, it is observed that the proposed filtering technique yields better or competetive results against some leading techniques in the literature. The ...

Cigla, Cevahir — Middle East Technical University


Motion Analysis and Modeling for Activity Recognition and 3-D Animation based on Geometrical and Video Processing Algorithms

The analysis of audiovisual data aims at extracting high level information, equivalent with the one(s) that can be extracted by a human. It is considered as a fundamental, unsolved (in its general form) problem. Even though the inverse problem, the audiovisual (sound and animation) synthesis, is judged easier than the previous, it remains an unsolved problem. The systematic research on these problems yields solutions that constitute the basis for a great number of continuously developing applications. In this thesis, we examine the two aforementioned fundamental problems. We propose algorithms and models of analysis and synthesis of articulated motion and undulatory (snake) locomotion, using data from video sequences. The goal of this research is the multilevel information extraction from video, like object tracking and activity recognition, and the 3-D animation synthesis in virtual environments based on the results of analysis. An ...

Panagiotakis, Costas — University of Crete


Computational models of expressive gesture in multimedia systems

This thesis focuses on the development of paradigms and techniques for the design and implementation of multimodal interactive systems, mainly for performing arts applications. The work addresses research issues in the fields of human-computer interaction, multimedia systems, and sound and music computing. The thesis is divided into two parts. In the first one, after a short review of the state-of-the-art, the focus moves on the definition of environments in which novel forms of technology-integrated artistic performances can take place. These are distributed active mixed reality environments in which information at different layers of abstraction is conveyed mainly non-verbally through expressive gestures. Expressive gesture is therefore defined and the internal structure of a virtual observer able to process it (and inhabiting the proposed environments) is described in a multimodal perspective. The definition of the structure of the environments, of the virtual ...

Volpe, Gualtiero — University of Genova


Adaptive Edge-Enhanced Correlation Based Robust and Real-Time Visual Tracking Framework and Its Deployment in Machine Vision Systems

An adaptive edge-enhanced correlation based robust and real-time visual tracking framework, and two machine vision systems based on the framework are proposed. The visual tracking algorithm can track any object of interest in a video acquired from a stationary or moving camera. It can handle the real-world problems, such as noise, clutter, occlusion, uneven illumination, varying appearance, orientation, scale, and velocity of the maneuvering object, and object fading and obscuration in low contrast video at various zoom levels. The proposed machine vision systems are an active camera tracking system and a vision based system for a UGV (unmanned ground vehicle) to handle a road intersection. The core of the proposed visual tracking framework is an Edge Enhanced Back-propagation neural-network Controlled Fast Normalized Correlation (EE-BCFNC), which makes the object localization stage efficient and robust to noise, object fading, obscuration, and uneven ...

Ahmed, Javed — Electrical (Telecom.) Engineering Department, National University of Sciences and Technology, Rawalpindi, Pakistan.


Camera based motion estimation and recognition for human-computer interaction

Communicating with mobile devices has become an unavoidable part of our daily life. Unfortunately, the current user interface designs are mostly taken directly from desktop computers. This has resulted in devices that are sometimes hard to use. Since more processing power and new sensing technologies are already available, there is a possibility to develop systems to communicate through different modalities. This thesis proposes some novel computer vision approaches, including head tracking, object motion analysis and device ego-motion estimation, to allow efficient interaction with mobile devices. For head tracking, two new methods have been developed. The first method detects a face region and facial features by employing skin detection, morphology, and a geometrical face model. The second method, designed especially for mobile use, detects the face and eyes using local texture features. In both cases, Kalman filtering is applied to estimate ...

Hannuksela, Jari — University of Oulou


Device-to-Device Wireless Communications

Device-to-Device (D2D) is one of the important proposed solutions to increase the capacity, offload the traffic, and improve the energy effciency in next generation cellular networks. D2D communication is known as a direct communication between two users without using cellular infrastructure networks. Despite of large expected bene fits in terms of capacity in D2D, the coexistence of D2D and cellular networks in the same spectrum creates new challenges in interference management and network design. To limit the interference power control schemes on cellular networks and D2D networks are typically adopted. Even though power control is introduced to limit the interference level, it does not prevent cellular and D2D users from experiencing coverage limitation when sharing the same radio resources. Therefore, the design of such networks requires the availability of suitable methods able to properly model the eff ect of interference ...

Alhalabi, Ashraf S.A. — Universita Degli Sudi di Bologna


Contributions to Human Motion Modeling and Recognition using Non-intrusive Wearable Sensors

This thesis contributes to motion characterization through inertial and physiological signals captured by wearable devices and analyzed using signal processing and deep learning techniques. This research leverages the possibilities of motion analysis for three main applications: to know what physical activity a person is performing (Human Activity Recognition), to identify who is performing that motion (user identification) or know how the movement is being performed (motor anomaly detection). Most previous research has addressed human motion modeling using invasive sensors in contact with the user or intrusive sensors that modify the user’s behavior while performing an action (cameras or microphones). In this sense, wearable devices such as smartphones and smartwatches can collect motion signals from users during their daily lives in a less invasive or intrusive way. Recently, there has been an exponential increase in research focused on inertial-signal processing to ...

Gil-Martín, Manuel — Universidad Politécnica de Madrid


Deep Learning Techniques for Visual Counting

The explosion of Deep Learning (DL) added a boost to the already rapidly developing field of Computer Vision to such a point that vision-based tasks are now parts of our everyday lives. Applications such as image classification, photo stylization, or face recognition are nowadays pervasive, as evidenced by the advent of modern systems trivially integrated into mobile applications. In this thesis, we investigated and enhanced the visual counting task, which automatically estimates the number of objects in still images or video frames. Recently, due to the growing interest in it, several Convolutional Neural Network (CNN)-based solutions have been suggested by the scientific community. These artificial neural networks, inspired by the organization of the animal visual cortex, provide a way to automatically learn effective representations from raw visual data and can be successfully employed to address typical challenges characterizing this task, ...

Ciampi Luca — University of Pisa


A Robust Face Recognition Algorithm for Real-World Applications

Face recognition is one of the most challenging problems of computer vision and pattern recognition. The difficulty in face recognition arises mainly from facial appearance variations caused by factors, such as expression, illumination, partial face occlusion, and time gap between training and testing data capture. Moreover, the performance of face recognition algorithms heavily depends on prior facial feature localization step. That is, face images need to be aligned very well before they are fed into a face recognition algorithm, which requires precise facial feature localization. This thesis addresses on solving these two main problems -facial appearance variations due to changes in expression, illumination, occlusion, time gap, and imprecise face alignment due to mislocalized facial features- in order to accomplish its goal of building a generic face recognition algorithm that can function reliably under real-world conditions. The proposed face recognition algorithm ...

Ekenel, Hazim Kemal — University of Karlsruhe


A floating polygon soup representation for 3D video

This thesis presents a new representation called floating polygon soup for applications like 3DTV and FTV (Free Viewpoint Television). The polygon soup is designed for compactness, compression efficiency, and view synthesis quality. The polygons are stored in 2D, with depth values at each corner. They are not necessarily connected to each other and can be deformed (or floated) w.r.t viewpoints and time. Starting from multi-view video plus depth (MVD), the construction holds in two steps: quadtree decomposition and multi-view redundancy reduction. It results in a compact set of polygons replacing the depth maps while preserving depth discontinuities and geometric details. Next, compression efficiency and view-synthesis quality are evaluated. Classical meth- ods such as inpainting and post-processing are implemented and adapted to the poly- gon soup. A new compression method is proposed. It exploits the quadtree structure and uses spatial prediction. ...

Colleu, Thomas — INRIA Rennes Bretagne Atlantique / Orange Labs / IETR


Multi-Sensor Integration for Indoor 3D Reconstruction

Outdoor maps and navigation information delivered by modern services and technologies like Google Maps and Garmin navigators have revolutionized the lifestyle of many people. Motivated by the desire for similar navigation systems for indoor usage from consumers, advertisers, emergency rescuers/responders, etc., many indoor environments such as shopping malls, museums, casinos, airports, transit stations, offices, and schools need to be mapped. Typically, the environment is first reconstructed by capturing many point clouds from various stations and defining their spatial relationships. Currently, there is a lack of an accurate, rigorous, and speedy method for relating point clouds in indoor, urban, satellite-denied environments. This thesis presents a novel and automatic way for fusing calibrated point clouds obtained using a terrestrial laser scanner and the Microsoft Kinect by integrating them with a low-cost inertial measurement unit. The developed system, titled the Scannect, is the ...

Chow, Jacky — University of Calgary


Reduced-Complexity Adaptive Filtering Techniques for Communications Applications

Adaptive filtering algorithms are powerful signal processing tools with widespread use in numerous engineering applications. Computational complexity is a key factor in determining the optimal implementation as well as real-time performance of the adaptive signal processors. To minimize the required hardware and/or software resources for implementing an adaptive filtering algorithm, it is desirable to mitigate its computational complexity as much as possible without imposing any significant sacrifice of performance. This thesis comprises a collection of thirteen peer-reviewed published works as well as an integrating material. The works are along the lines of a common unifying theme that is to devise new low-complexity adaptive filtering algorithms for communications and, more generally, signal processing applications. The main contributions are the new adaptive filtering algorithms, channel equalization techniques, and theoretical analyses listed below under four categories: 1) adaptive system identification • affine projection ...

Arablouei, Reza — University of South Australia


PRIORITIZED 3D SCENE RECONSTRUCTION AND RATE-DISTORTION

In this dissertation, a novel scheme performing 3D reconstruction of a scene from a 2D video sequence is presented. To this aim, first, the trajectories of the salient features in the scene are determined as a sequence of displacements via Kanade-Lukas-Tomasi tracker and Kalman filter. Then, a tentative camera trajectory with respect to a metric reference reconstruction is estimated. All frame pairs are ordered with respect to their amenability to 3D reconstruction by a metric that utilizes the baseline distances and the number of tracked correspondences between the frames. The ordered frame pairs are processed via a sequential structure-from- motion algorithm to estimate the sparse structure and camera matrices. The metric and the associated reconstruction algorithm are shown to outperform their counterparts in the literature via experiments. Finally, a mesh-based, rate- distortion efficient representation is constructed through a novel procedure ...

Imre, Evren — Middle East Technical University, Department of Electrical and Electronics Engineering

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.