Content-based search and browsing in semantic multimedia retrieval

Growth in storage capacity has led to large digital video repositories and complicated the discovery of specific information without the laborious manual annotation of data. The research focuses on creating a retrieval system that is ultimately independent of manual work. To retrieve relevant content, the semantic gap between the searcher's information need and the content data has to be overcome using content-based technology. Semantic gap constitutes of two distinct elements: the ambiguity of the true information need and the equivocalness of digital video data. The research problem of this thesis is: what computational content-based models for retrieval increase the effectiveness of the semantic retrieval of digital video? The hypothesis is that semantic search performance can be improved using pattern recognition, data abstraction and clustering techniques jointly with human interaction through manually created queries and visual browsing. The results of this ...

Rautiainen, Mika — University of Oulou


Video Content Analysis by Active Learning

Advances in compression techniques, decreasing cost of storage, and high-speed transmission have facilitated the way videos are created, stored and distributed. As a consequence, videos are now being used in many applications areas. The increase in the amount of video data deployed and used in today's applications reveals not only the importance as multimedia data type, but also led to the requirement of efficient management of video data. This management paved the way for new research areas, such as indexing and retrieval of video with respect to their spatio-temporal, visual and semantic contents. This thesis presents work towards a unified framework for semi-automated video indexing and interactive retrieval. To create an efficient index, a set of representative key frames are selected which capture and encapsulate the entire video content. This is achieved by, firstly, segmenting the video into its constituent ...

Camara Chavez, Guillermo — Federal University of Minas Gerais


Audio-visual processing and content management techniques, for the study of (human) bioacoustics phenomena

The present doctoral thesis aims towards the development of new long-term, multi-channel, audio-visual processing techniques for the analysis of bioacoustics phenomena. The effort is focused on the study of the physiology of the gastrointestinal system, aiming at the support of medical research for the discovery of gastrointestinal motility patterns and the diagnosis of functional disorders. The term "processing" in this case is quite broad, incorporating the procedures of signal processing, content description, manipulation and analysis, that are applied to all the recorded bioacoustics signals, the auxiliary audio-visual surveillance information (for the monitoring of experiments and the subjects' status), and the extracted audio-video sequences describing the abdominal sound-field alterations. The thesis outline is as follows. The main objective of the thesis, which is the technological support of medical research, is presented in the first chapter. A quick problem definition is initially ...

Dimoulas, Charalampos — Department of Electrical and Computer Engineering, Faculty of Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece


Highly Efficient Low-Level Feature Extraction For Video Representation And Retrieval

Witnessing the omnipresence of ever complex yet so intuitive digital video media, research community has raised the question of its meaningful use and management. Stored in immense multimedia databases, digital videos need to be retrieved and structured in an intelligent way, relying on the content and the rich semantics involved. Therefore, the third generation of Content Based Video Indexing and Retrieval systems faces the problem of the semantic gap between the simplicity of the available visual features and the richness of user semantics. This work focuses on the issues of efficiency and scalability in video indexing and retrieval to facilitate a video representation model capable of semantic annotation. A highly efficient algorithm for temporal analysis and key-frame extraction is developed. It is based on the prediction information extracted directly from the compressed-domain features and the robust scalable analysis in the ...

Calic, Janko — Queen Mary University of London


Density-based shape descriptors and similarity learning for 3D object retrieval

Next generation search engines will enable query formulations, other than text, relying on visual information encoded in terms of images and shapes. The 3D search technology, in particular, targets specialized application domains ranging from computer aided-design and manufacturing to cultural heritage archival and presentation. Content-based retrieval research aims at developing search engines that would allow users to perform a query by similarity of content. This thesis deals with two fundamentals problems in content-based 3D object retrieval: (1) How to describe a 3D shape to obtain a reliable representative for the subsequent task of similarity search? (2) How to supervise the search process to learn inter-shape similarities for more effective and semantic retrieval? Concerning the first problem, we develop a novel 3D shape description scheme based on probability density of multivariate local surface features. We constructively obtain local characterizations of 3D ...

Akgul, Ceyhun Burak — Bogazici University and Telecom ParisTech


Theoretical aspects and real issues in an integrated multiradar system

In the last few years Homeland Security (HS) has gained a considerable interest in the research community. From a scientific point of view, it is a difficult task to provide a definition of this research area and to exactly draw up its boundaries. In fact, when we talk about the security and the surveillance, several problems and aspects must be considered. In particular, the following factors play a crucial role and define the complexity level of the considered application field: the number of potential threats can be high and uncertain; the threat detection and identification can be made more complicated by the use of camouflaging techniques; the monitored area is typically wide and it requires a large and heterogeneous sensor network; the surveillance operation is strongly related to the operational scenario, so that it is not possible to define a ...

Fortunati Stefano — University of Pisa


Offline Signature Verification with User-Based and Global Classifiers of Local Features

Signature verification deals with the problem of identifying forged signatures of a user from his/her genuine signatures. The difficulty lies in identifying allowed variations in a user’s signatures, in the presence of high intra-class and low inter-class variability (the forgeries may be more similar to a user’s genuine signature, compared to his/her other genuine signatures). The problem can be seen as a non-rigid object matching where classes are very similar. In the field of biometrics, signature is considered a behavioral biometric and the problem possesses further difficulties compared to other modalities (e.g. fingerprints) due to the added issue of skilled forgeries. A novel offline (image-based) signature verification system is proposed in this thesis. In order to capture the signature’s stable parts and alleviate the difficulty of global matching, local features (histogram of oriented gradients, local binary patterns) are used, based ...

Yılmaz, Mustafa Berkay — Sabancı University


New insights into Crowd Density Analysis in Video Surveillance Systems

Crowd analysis has recently emerged as an increasingly important problem for crowd monitoring and management in the visual surveillance community. In this thesis, our objectives are to address the problems of crowd density estimation and to investigate the usefulness of such estimation as additional information to other applications. Towards the first goal, we focus on the problems related to the estimation of the crowd density using low level features in order to avert typical problems in detection of high density crowd. We demonstrate in this dissertation, that the proposed approaches perform better than the baseline methods, either for counting people, or alternatively for estimating the crowd level. Afterwards, we propose a novel approach, in which local information at the pixel level substitutes the overall crowd level or person count. It is based on modeling time-varying dynamics of the crowd density ...

Hajer, Fradi — TELECOM ParisTech


Automatic Detection, Classification and Restoration of Defects in Historical Images

Historical photos are significant attestations of the inheritance of the past. Since Photography is an art that is more than 150 years old, more and more diffuse are the photographic archives all over the world. Nevertheless, time and bad preservation corrupts physical supports, and many important historical documents risk to be ruined and their content lost. Therefore solutions must be implemented to preserve their state and to recover damaged information. This PhD thesis proposes a general methodology, and several applicative solutions, to handle these problems, by means of digitization and digital restoration. The purpose is to create a useful tool to support non-expert users in the restoration process of damaged historical images. The content of this thesis is outlined as follows: Chapter 1 gives an overview on the problems related to management and preservation of cultural repositories, and discusses about ...

Mazzola, Giuseppe — Università degli studi di Palermo - Dipartimento di Ingegneria Informatica


Automatic Analysis of Head and Facial Gestures in Video Streams

Automatic analysis of head gestures and facial expressions is a challenging research area and it has significant applications for intelligent human-computer interfaces. An important task is the automatic classification of non-verbal messages composed of facial signals where both facial expressions and head rotations are observed. This is a challenging task, because there is no definite grammar or code-book for mapping the non-verbal facial signals into a corresponding mental state. Furthermore, non-verbal facial signals and the observed emotions have dependency on personality, society, state of the mood and also the context in which they are displayed or observed. This thesis mainly addresses the three desired tasks for an effective visual information based automatic face and head gesture (FHG) analyzer. First we develop a fully automatic, robust and accurate 17-point facial landmark localizer based on local appearance information and structural information of ...

Cinar Akakin, Hatice — Bogazici University


Music Language Models for Automatic Music Transcription

Much like natural language, music is highly structured, with strong priors on the likelihood of note sequences. In automatic speech recognition (ASR), these priors are called language models, which are used in addition to acoustic models and participate greatly to the success of today's systems. However, in Automatic Music Transcription (AMT), ASR's musical equivalent, Music Language Models (MLMs) are rarely used. AMT can be defined as the process of extracting a symbolic representation from an audio signal, describing which notes were played at what time. In this thesis, we investigate the design of MLMs using recurrent neural networks (RNNs) and their use for AMT. We first look into MLM performance on a polyphonic prediction task. We observe that using musically-relevant timesteps results in desirable MLM behaviour, which is not reflected in usual evaluation metrics. We compare our model against benchmark ...

Ycart, Adrien — Queen Mary University of London


Functional Neuroimaging Data Characterisation Via Tensor Representations

The growing interest in neuroimaging technologies generates a massive amount of biomedical data that exhibit high dimensionality. Tensor-based analysis of brain imaging data has by now been recognized as an effective approach exploiting its inherent multi-way nature. In particular, the advantages of tensorial over matrix-based methods have previously been demonstrated in the context of functional magnetic resonance imaging (fMRI) source localization; the identification of the regions of the brain which are activated at specific time instances. However, such methods can also become ineffective in realistic challenging scenarios, involving, e.g., strong noise and/or significant overlap among the activated regions. Moreover, they commonly rely on the assumption of an underlying multilinear model generating the data. In the first part of this thesis, we aimed at investigating the possible gains from exploiting the 3-dimensional nature of the brain images, through a higher-order tensorization ...

Christos Chatzichristos — National and Kapodistrian University of Athens


Extended Bag-of-Words Formalism for Image Classification

Visual information, in the form of digital images and videos, has become so omnipresent in computer databases and repositories, that it can no longer be considered a “second class citizen”, eclipsed by textual information. In that scenario, image classification has become a critical task. In particular, the pursuit of automatic identification of complex semantical concepts represented in images, such as scenes or objects, has motivated researchers in areas as diverse as Information Retrieval, Computer Vision, Image Processing and Artificial Intelligence. Nevertheless, in contrast to text documents, whose words carry semantic, images consist of pixels that have no semanticinformation by themselves, making the task very challenging. In this dissertation, we have addressed the problem of representing images based on their visual information. Our aim is content-based concept detection in images and videos, with a novel representation that enriches the Bag-of-Words model. ...

Avila, Sandra Eliza Fontes — Universidade Federal de Minas Gerais, Université Pierre et Marie Curie


Deep Learning for Event Detection, Sequence Labelling and Similarity Estimation in Music Signals

When listening to music, some humans can easily recognize which instruments play at what time or when a new musical segment starts, but cannot describe exactly how they do this. To automatically describe particular aspects of a music piece – be it for an academic interest in emulating human perception, or for practical applications –, we can thus not directly replicate the steps taken by a human. We can, however, exploit that humans can easily annotate examples, and optimize a generic function to reproduce these annotations. In this thesis, I explore solving different music perception tasks with deep learning, a recent branch of machine learning that optimizes functions of many stacked nonlinear operations – referred to as deep neural networks – and promises to obtain better results or require less domain knowledge than more traditional techniques. In particular, I employ ...

Schlüter, Jan — Department of Computational Perception, Johannes Kepler University Linz


Perceptually-Based Signal Features for Environmental Sound Classification

This thesis faces the problem of automatically classifying environmental sounds, i.e., any non-speech or non-music sounds that can be found in the environment. Broadly speaking, two main processes are needed to perform such classification: the signal feature extraction so as to compose representative sound patterns and the machine learning technique that performs the classification of such patterns. The main focus of this research is put on the former, studying relevant signal features that optimally represent the sound characteristics since, according to several references, it is a key issue to attain a robust recognition. This type of audio signals holds many differences with speech or music signals, thus specific features should be determined and adapted to their own characteristics. In this sense, new signal features, inspired by the human auditory system and the human perception of sound, are proposed to improve ...

Valero, Xavier — La Salle-Universitat Ramon Llull

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.