Multimedia Content Analysis, Indexing and Summarization: A Perspective on Real-Life Use Cases (2010)
Video Content Analysis by Active Learning
Advances in compression techniques, decreasing cost of storage, and high-speed transmission have facilitated the way videos are created, stored and distributed. As a consequence, videos are now being used in many applications areas. The increase in the amount of video data deployed and used in today's applications reveals not only the importance as multimedia data type, but also led to the requirement of efficient management of video data. This management paved the way for new research areas, such as indexing and retrieval of video with respect to their spatio-temporal, visual and semantic contents. This thesis presents work towards a unified framework for semi-automated video indexing and interactive retrieval. To create an efficient index, a set of representative key frames are selected which capture and encapsulate the entire video content. This is achieved by, firstly, segmenting the video into its constituent ...
Camara Chavez, Guillermo — Federal University of Minas Gerais
Video Sequence Analysis for Content Description, Summarization and Content-Based Retrieval
The main research area of this Ph.D. thesis is video sequence processing and analysis for description and indexing of visual content. Its objective is to contribute in the development of a computational system with the capabilities of object-based segmentation of audiovisual material, automatic content description, summarization for preview and browsing, as well as content-based retrieval. The thesis consists of four parts. The first introduces video sequence analysis, segmentation and object extraction based on color, motion, and depth field. A fusion technique is proposed that combines individual cue segmentations and allows for reliable identification of semantic objects. The second part refers to automatic description and annotation of the visual content by means of feature vectors, summarization, implemented by optimal selection of a limited set of key frames and shots, and content-based search and retrieval. In the third part, the problem of ...
Avrithis, Yannis — National Technical University of Athens
Content-based search and browsing in semantic multimedia retrieval
Growth in storage capacity has led to large digital video repositories and complicated the discovery of specific information without the laborious manual annotation of data. The research focuses on creating a retrieval system that is ultimately independent of manual work. To retrieve relevant content, the semantic gap between the searcher's information need and the content data has to be overcome using content-based technology. Semantic gap constitutes of two distinct elements: the ambiguity of the true information need and the equivocalness of digital video data. The research problem of this thesis is: what computational content-based models for retrieval increase the effectiveness of the semantic retrieval of digital video? The hypothesis is that semantic search performance can be improved using pattern recognition, data abstraction and clustering techniques jointly with human interaction through manually created queries and visual browsing. The results of this ...
Rautiainen, Mika — University of Oulou
The present doctoral thesis aims towards the development of new long-term, multi-channel, audio-visual processing techniques for the analysis of bioacoustics phenomena. The effort is focused on the study of the physiology of the gastrointestinal system, aiming at the support of medical research for the discovery of gastrointestinal motility patterns and the diagnosis of functional disorders. The term "processing" in this case is quite broad, incorporating the procedures of signal processing, content description, manipulation and analysis, that are applied to all the recorded bioacoustics signals, the auxiliary audio-visual surveillance information (for the monitoring of experiments and the subjects' status), and the extracted audio-video sequences describing the abdominal sound-field alterations. The thesis outline is as follows. The main objective of the thesis, which is the technological support of medical research, is presented in the first chapter. A quick problem definition is initially ...
Dimoulas, Charalampos — Department of Electrical and Computer Engineering, Faculty of Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece
Visual ear detection and recognition in unconstrained environments
Automatic ear recognition systems have seen increased interest over recent years due to multiple desirable characteristics. Ear images used in such systems can typically be extracted from profile head shots or video footage. The acquisition procedure is contactless and non-intrusive, and it also does not depend on the cooperation of the subjects. In this regard, ear recognition technology shares similarities with other image-based biometric modalities. Another appealing property of ear biometrics is its distinctiveness. Recent studies even empirically validated existing conjectures that certain features of the ear are distinct for identical twins. This fact has significant implications for security-related applications and puts ear images on a par with epigenetic biometric modalities, such as the iris. Ear images can also supplement other biometric modalities in automatic recognition systems and provide identity cues when other information is unreliable or even unavailable. In ...
Emeršič, Žiga — University of Ljubljana, Faculty of Computer and Information Science
Density-based shape descriptors and similarity learning for 3D object retrieval
Next generation search engines will enable query formulations, other than text, relying on visual information encoded in terms of images and shapes. The 3D search technology, in particular, targets specialized application domains ranging from computer aided-design and manufacturing to cultural heritage archival and presentation. Content-based retrieval research aims at developing search engines that would allow users to perform a query by similarity of content. This thesis deals with two fundamentals problems in content-based 3D object retrieval: (1) How to describe a 3D shape to obtain a reliable representative for the subsequent task of similarity search? (2) How to supervise the search process to learn inter-shape similarities for more effective and semantic retrieval? Concerning the first problem, we develop a novel 3D shape description scheme based on probability density of multivariate local surface features. We constructively obtain local characterizations of 3D ...
Akgul, Ceyhun Burak — Bogazici University and Telecom ParisTech
Acoustic Event Detection: Feature, Evaluation and Dataset Design
It takes more time to think of a silent scene, action or event than finding one that emanates sound. Not only speaking or playing music but almost everything that happens is accompanied with or results in one or more sounds mixed together. This makes acoustic event detection (AED) one of the most researched topics in audio signal processing nowadays and it will probably not see a decline anywhere in the near future. This is due to the thirst for understanding and digitally abstracting more and more events in life via the enormous amount of recorded audio through thousands of applications in our daily routine. But it is also a result of two intrinsic properties of audio: it doesn’t need a direct sight to be perceived and is less intrusive to record when compared to image or video. Many applications such ...
Mina Mounir — KU Leuven, ESAT STADIUS
Deep Learning for Event Detection, Sequence Labelling and Similarity Estimation in Music Signals
When listening to music, some humans can easily recognize which instruments play at what time or when a new musical segment starts, but cannot describe exactly how they do this. To automatically describe particular aspects of a music piece – be it for an academic interest in emulating human perception, or for practical applications –, we can thus not directly replicate the steps taken by a human. We can, however, exploit that humans can easily annotate examples, and optimize a generic function to reproduce these annotations. In this thesis, I explore solving different music perception tasks with deep learning, a recent branch of machine learning that optimizes functions of many stacked nonlinear operations – referred to as deep neural networks – and promises to obtain better results or require less domain knowledge than more traditional techniques. In particular, I employ ...
Schlüter, Jan — Department of Computational Perception, Johannes Kepler University Linz
Highly Efficient Low-Level Feature Extraction For Video Representation And Retrieval
Witnessing the omnipresence of ever complex yet so intuitive digital video media, research community has raised the question of its meaningful use and management. Stored in immense multimedia databases, digital videos need to be retrieved and structured in an intelligent way, relying on the content and the rich semantics involved. Therefore, the third generation of Content Based Video Indexing and Retrieval systems faces the problem of the semantic gap between the simplicity of the available visual features and the richness of user semantics. This work focuses on the issues of efficiency and scalability in video indexing and retrieval to facilitate a video representation model capable of semantic annotation. A highly efficient algorithm for temporal analysis and key-frame extraction is developed. It is based on the prediction information extracted directly from the compressed-domain features and the robust scalable analysis in the ...
Calic, Janko — Queen Mary University of London
Indexation et Recherche de Video pour la Videosurveillance
The goal of this work is to propose a general approach for surveillance video indexing and retrieval. Based on the hypothesis that videos are preprocessed by an external video analysis module, this approach is composed of two phases : indexing phase and retrieval phase. In order to profit from the output of various video analysis modules, a general data model consisting of two main concepts, objects and events, is proposed. The indexing phase that aims at preparing data defined in the data model performs three tasks. Firstly, two new key blob detection methods in the object representation task choose for each detected object a set of key blobs associated with a weight. Secondly, the feature extraction task analyzes a number of visual and temporal features on detected objects. Finally, the indexing task computes attributes of the two concepts and stores ...
Thi-Lan, Le — INRIA, Sophia Antipolis
A statistical approach to motion estimation
Digital video technology has been characterized by a steady growth in the last decade. New applications like video e-mail, third generation mobile phone video communications, videoconferencing, video streaming on the web continuously push for further evolution of research in digital video coding. In order to be sent over the internet or even wireless networks, video information clearly needs compression to meet bandwidth requirements. Compression is mainly realized by exploiting the redundancy present in the data. A sequence of images contains an intrinsic, intuitive and simple idea of redundancy: two successive images are very similar. This simple concept is called temporal redundancy. The research of a proper scheme to exploit the temporal redundancy completely changes the scenario between compression of still pictures and sequence of images. It also represents the key for very high performances in image sequence coding when compared ...
Moschetti, Fulvio — Swiss Federal Institute of Technology
Generalised Bayesian Model Selection Using Reversible Jump Markov Chain Monte Carlo
The main objective of this thesis is to suggest a general Bayesian framework for model selection based on the reversible jump Markov chain Monte Carlo (RJMCMC) algorithm. In particular, we aim to reveal the undiscovered potentials of RJMCMC in model selection applications by exploiting the original formulation to explore spaces of di erent classes or structures and thus, to show that RJMCMC o ers a wider interpretation than just being a trans-dimensional model selection algorithm. The general practice is to use RJMCMC in a trans-dimensional framework e.g. in model estimation studies of linear time series, such as AR and ARMA and mixture processes, etc. In this thesis, we propose a new interpretation on RJMCMC which reveals the undiscovered potentials of the algorithm. This new interpretation, firstly, extends the classical trans-dimensional approach to a much wider meaning by exploring the spaces ...
Karakus, Oktay — Izmir Institute of Technology
Identification of versions of the same musical composition by processing audio descriptions
Automatically making sense of digital information, and specially of music digital documents, is an important problem our modern society is facing. In fact, there are still many tasks that, although being easily performed by humans, cannot be effectively performed by a computer. In this work we focus on one of such tasks: the identification of musical piece versions (alternate renditions of the same musical composition like cover songs, live recordings, remixes, etc.). In particular, we adopt a computational approach solely based on the information provided by the audio signal. We propose a system for version identification that is robust to the main musical changes between versions, including timbre, tempo, key and structure changes. Such a system exploits nonlinear time series analysis tools and standard methods for quantitative music description, and it does not make use of a specific modeling strategy ...
Serra, Joan — Universitat Pompeu Fabra
Mixed structural models for 3D audio in virtual environments
In the world of Information and communications technology (ICT), strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D-media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of new technologies by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality are required. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but ...
Geronazzo, Michele — University of Padova
Polarization and Index Modulations: a Theoretical and Practical Perspective
Radiocommunication systems have evolved significantly in recent years in order to meet present and future demands. Historically, time, frequency and more recently, spatial dimensions have been used to improve capacity and robustness. Paradoxically, radiocommunications that leverage the polarization dimension have not evolved at the same pace. In particular, these communications are widely used by satellites, where several streams are multiplexed in each orthogonal polarization. Current communication trends advocate for simplifying and unifying different frameworks in order to increase flexibility and address future needs. Due to this, systems that do not require channel information are progressively gaining traction, as they help to improve the overall quality of the network instead of that of specific users only. The search for new paradigms aimed at improving the quality of wireless communications is unstoppable. In order to increase the capacity of current communications systems, ...
Henarejos, Pol — Universitat Politècnica de Catalunya
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.