Content-based search and browsing in semantic multimedia retrieval

Growth in storage capacity has led to large digital video repositories and complicated the discovery of specific information without the laborious manual annotation of data. The research focuses on creating a retrieval system that is ultimately independent of manual work. To retrieve relevant content, the semantic gap between the searcher's information need and the content data has to be overcome using content-based technology. Semantic gap constitutes of two distinct elements: the ambiguity of the true information need and the equivocalness of digital video data. The research problem of this thesis is: what computational content-based models for retrieval increase the effectiveness of the semantic retrieval of digital video? The hypothesis is that semantic search performance can be improved using pattern recognition, data abstraction and clustering techniques jointly with human interaction through manually created queries and visual browsing. The results of this ...

Rautiainen, Mika — University of Oulou


Video Content Analysis by Active Learning

Advances in compression techniques, decreasing cost of storage, and high-speed transmission have facilitated the way videos are created, stored and distributed. As a consequence, videos are now being used in many applications areas. The increase in the amount of video data deployed and used in today's applications reveals not only the importance as multimedia data type, but also led to the requirement of efficient management of video data. This management paved the way for new research areas, such as indexing and retrieval of video with respect to their spatio-temporal, visual and semantic contents. This thesis presents work towards a unified framework for semi-automated video indexing and interactive retrieval. To create an efficient index, a set of representative key frames are selected which capture and encapsulate the entire video content. This is achieved by, firstly, segmenting the video into its constituent ...

Camara Chavez, Guillermo — Federal University of Minas Gerais


Audio-visual processing and content management techniques, for the study of (human) bioacoustics phenomena

The present doctoral thesis aims towards the development of new long-term, multi-channel, audio-visual processing techniques for the analysis of bioacoustics phenomena. The effort is focused on the study of the physiology of the gastrointestinal system, aiming at the support of medical research for the discovery of gastrointestinal motility patterns and the diagnosis of functional disorders. The term "processing" in this case is quite broad, incorporating the procedures of signal processing, content description, manipulation and analysis, that are applied to all the recorded bioacoustics signals, the auxiliary audio-visual surveillance information (for the monitoring of experiments and the subjects' status), and the extracted audio-video sequences describing the abdominal sound-field alterations. The thesis outline is as follows. The main objective of the thesis, which is the technological support of medical research, is presented in the first chapter. A quick problem definition is initially ...

Dimoulas, Charalampos — Department of Electrical and Computer Engineering, Faculty of Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece


Highly Efficient Low-Level Feature Extraction For Video Representation And Retrieval

Witnessing the omnipresence of ever complex yet so intuitive digital video media, research community has raised the question of its meaningful use and management. Stored in immense multimedia databases, digital videos need to be retrieved and structured in an intelligent way, relying on the content and the rich semantics involved. Therefore, the third generation of Content Based Video Indexing and Retrieval systems faces the problem of the semantic gap between the simplicity of the available visual features and the richness of user semantics. This work focuses on the issues of efficiency and scalability in video indexing and retrieval to facilitate a video representation model capable of semantic annotation. A highly efficient algorithm for temporal analysis and key-frame extraction is developed. It is based on the prediction information extracted directly from the compressed-domain features and the robust scalable analysis in the ...

Calic, Janko — Queen Mary University of London


Understanding and Assessing Quality of Experience in Immersive Communications

eXtended Reality (XR) technology, also called Mixed Reality (MR), is in constant development and improvement in terms of hardware and software to offer relevant experiences to users. One of the advances in XR has been the introduction of real visual information in the virtual environment, offering a more natural interaction with the scene and a greater acceptance of technology. Another advance has been achieved with the representation of the scene through a video that covers the entire environment, called 360-degree or omnidirectional video. These videos are acquired by cameras with omnidirectional lenses that cover the 360-degrees of the scene and are generally viewed by users through a head-tracked Head Mounted Display (HMD). Users only visualize a subset of the 360-degree scene, called viewport, which changes with the variations of the viewing direction of the users, determined by the movements of ...

Orduna, Marta — Universidad Politécnica de Madrid


Density-based shape descriptors and similarity learning for 3D object retrieval

Next generation search engines will enable query formulations, other than text, relying on visual information encoded in terms of images and shapes. The 3D search technology, in particular, targets specialized application domains ranging from computer aided-design and manufacturing to cultural heritage archival and presentation. Content-based retrieval research aims at developing search engines that would allow users to perform a query by similarity of content. This thesis deals with two fundamentals problems in content-based 3D object retrieval: (1) How to describe a 3D shape to obtain a reliable representative for the subsequent task of similarity search? (2) How to supervise the search process to learn inter-shape similarities for more effective and semantic retrieval? Concerning the first problem, we develop a novel 3D shape description scheme based on probability density of multivariate local surface features. We constructively obtain local characterizations of 3D ...

Akgul, Ceyhun Burak — Bogazici University and Telecom ParisTech


Theoretical aspects and real issues in an integrated multiradar system

In the last few years Homeland Security (HS) has gained a considerable interest in the research community. From a scientific point of view, it is a difficult task to provide a definition of this research area and to exactly draw up its boundaries. In fact, when we talk about the security and the surveillance, several problems and aspects must be considered. In particular, the following factors play a crucial role and define the complexity level of the considered application field: the number of potential threats can be high and uncertain; the threat detection and identification can be made more complicated by the use of camouflaging techniques; the monitored area is typically wide and it requires a large and heterogeneous sensor network; the surveillance operation is strongly related to the operational scenario, so that it is not possible to define a ...

Fortunati Stefano — University of Pisa


Sound Event Detection by Exploring Audio Sequence Modelling

Everyday sounds in real-world environments are a powerful source of information by which humans can interact with their environments. Humans can infer what is happening around them by listening to everyday sounds. At the same time, it is a challenging task for a computer algorithm in a smart device to automatically recognise, understand, and interpret everyday sounds. Sound event detection (SED) is the process of transcribing an audio recording into sound event tags with onset and offset time values. This involves classification and segmentation of sound events in the given audio recording. SED has numerous applications in everyday life which include security and surveillance, automation, healthcare monitoring, multimedia information retrieval, and assisted living technologies. SED is to everyday sounds what automatic speech recognition (ASR) is to speech and automatic music transcription (AMT) is to music. The fundamental questions in designing ...

[Pankajakshan], [Arjun] — Queen Mary University of London


Offline Signature Verification with User-Based and Global Classifiers of Local Features

Signature verification deals with the problem of identifying forged signatures of a user from his/her genuine signatures. The difficulty lies in identifying allowed variations in a user’s signatures, in the presence of high intra-class and low inter-class variability (the forgeries may be more similar to a user’s genuine signature, compared to his/her other genuine signatures). The problem can be seen as a non-rigid object matching where classes are very similar. In the field of biometrics, signature is considered a behavioral biometric and the problem possesses further difficulties compared to other modalities (e.g. fingerprints) due to the added issue of skilled forgeries. A novel offline (image-based) signature verification system is proposed in this thesis. In order to capture the signature’s stable parts and alleviate the difficulty of global matching, local features (histogram of oriented gradients, local binary patterns) are used, based ...

Yılmaz, Mustafa Berkay — Sabancı University


Acoustic Event Detection: Feature, Evaluation and Dataset Design

It takes more time to think of a silent scene, action or event than finding one that emanates sound. Not only speaking or playing music but almost everything that happens is accompanied with or results in one or more sounds mixed together. This makes acoustic event detection (AED) one of the most researched topics in audio signal processing nowadays and it will probably not see a decline anywhere in the near future. This is due to the thirst for understanding and digitally abstracting more and more events in life via the enormous amount of recorded audio through thousands of applications in our daily routine. But it is also a result of two intrinsic properties of audio: it doesn’t need a direct sight to be perceived and is less intrusive to record when compared to image or video. Many applications such ...

Mina Mounir — KU Leuven, ESAT STADIUS


Automatic Detection, Classification and Restoration of Defects in Historical Images

Historical photos are significant attestations of the inheritance of the past. Since Photography is an art that is more than 150 years old, more and more diffuse are the photographic archives all over the world. Nevertheless, time and bad preservation corrupts physical supports, and many important historical documents risk to be ruined and their content lost. Therefore solutions must be implemented to preserve their state and to recover damaged information. This PhD thesis proposes a general methodology, and several applicative solutions, to handle these problems, by means of digitization and digital restoration. The purpose is to create a useful tool to support non-expert users in the restoration process of damaged historical images. The content of this thesis is outlined as follows: Chapter 1 gives an overview on the problems related to management and preservation of cultural repositories, and discusses about ...

Mazzola, Giuseppe — Università degli studi di Palermo - Dipartimento di Ingegneria Informatica


New insights into Crowd Density Analysis in Video Surveillance Systems

Crowd analysis has recently emerged as an increasingly important problem for crowd monitoring and management in the visual surveillance community. In this thesis, our objectives are to address the problems of crowd density estimation and to investigate the usefulness of such estimation as additional information to other applications. Towards the first goal, we focus on the problems related to the estimation of the crowd density using low level features in order to avert typical problems in detection of high density crowd. We demonstrate in this dissertation, that the proposed approaches perform better than the baseline methods, either for counting people, or alternatively for estimating the crowd level. Afterwards, we propose a novel approach, in which local information at the pixel level substitutes the overall crowd level or person count. It is based on modeling time-varying dynamics of the crowd density ...

Hajer, Fradi — TELECOM ParisTech


Automatic Analysis of Head and Facial Gestures in Video Streams

Automatic analysis of head gestures and facial expressions is a challenging research area and it has significant applications for intelligent human-computer interfaces. An important task is the automatic classification of non-verbal messages composed of facial signals where both facial expressions and head rotations are observed. This is a challenging task, because there is no definite grammar or code-book for mapping the non-verbal facial signals into a corresponding mental state. Furthermore, non-verbal facial signals and the observed emotions have dependency on personality, society, state of the mood and also the context in which they are displayed or observed. This thesis mainly addresses the three desired tasks for an effective visual information based automatic face and head gesture (FHG) analyzer. First we develop a fully automatic, robust and accurate 17-point facial landmark localizer based on local appearance information and structural information of ...

Cinar Akakin, Hatice — Bogazici University


Direct Pore-based Identification For Fingerprint Matching Process

Fingerprint, is considered one of the most crucial scientific tools in solving criminal cases. This biometric feature is composed of unique and distinctive patterns found on the fingertips of each individual. With advancing technology and progress in forensic sciences, fingerprint analysis plays a vital role in forensic investigations and the analysis of evidence at crime scenes. The fingerprint patterns of each individual start to develop in early stagesof life and never change thereafter. This fact makes fingerprints an exceptional means of identification. In criminal cases, fingerprint analysis is used to decipher traces, evidence, and clues at crime scenes. These analyses not only provide insights into how a crime was committed but also assist in identifying the culprits or individuals involved. Computer-based fingerprint identification systems yield faster and more accurate results compared to traditional methods, making fingerprint comparisons in large databases ...

Vedat DELICAN, PhD — Istanbul Technical University


Music Language Models for Automatic Music Transcription

Much like natural language, music is highly structured, with strong priors on the likelihood of note sequences. In automatic speech recognition (ASR), these priors are called language models, which are used in addition to acoustic models and participate greatly to the success of today's systems. However, in Automatic Music Transcription (AMT), ASR's musical equivalent, Music Language Models (MLMs) are rarely used. AMT can be defined as the process of extracting a symbolic representation from an audio signal, describing which notes were played at what time. In this thesis, we investigate the design of MLMs using recurrent neural networks (RNNs) and their use for AMT. We first look into MLM performance on a polyphonic prediction task. We observe that using musically-relevant timesteps results in desirable MLM behaviour, which is not reflected in usual evaluation metrics. We compare our model against benchmark ...

Ycart, Adrien — Queen Mary University of London

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.