Digital Forensic Techniques for Splicing Detection in Multimedia Contents

Visual and audio contents always played a key role in communications, because of their immediacy and presumed objectivity. This has become even more true in the digital era, and today it is common to have multimedia contents stand as proof of events. Digital contents, however, are also very easy to manipulate, thus calling for analysis methods devoted to uncover their processing history. Multimedia forensics is the science trying to answer questions about the past of a given image, audio or video file, questions like “which was the recording device?", or “is the content authentic?". In particular, authenticity assessment is a crucial task in many contexts, and it usually consists in determining whether the investigated object has been artificially created by splicing together different contents. In this thesis we address the problem of splicing detection in the three main media: image, ...

Fontani, Marco — Dept. of Information Engineering and Mathematics, University of Siena


Multimedia Content Analysis, Indexing and Summarization: A Perspective on Real-Life Use Cases

The problem of finding images, video clips and music, given time, place, interest and mood has kept an immense number of scientists and technology developers busy in the past twenty years. However, straightforward attempts to apply textbased search to non-textual data still seem to be the only viable solution. In spite of the numerous ideas proposed so far in the MIR (Multimedia Information Retrieval) research field, it is remarkable that hardly any significant success story, and in particular a commercially relevant one, has been reported. This thesis addresses the reasons that have prevented broad practical deployment of theories and algorithms for searching and retrieving content in multimedia data collections and proposes novel, generic and robust solutions. In particular, the thesis focuses on the problems that typically emerge when dealing with realistic use cases built around real-life systems, noisy data and ...

Naci, Suphi Umut — Delft University of Technology


Audio-visual processing and content management techniques, for the study of (human) bioacoustics phenomena

The present doctoral thesis aims towards the development of new long-term, multi-channel, audio-visual processing techniques for the analysis of bioacoustics phenomena. The effort is focused on the study of the physiology of the gastrointestinal system, aiming at the support of medical research for the discovery of gastrointestinal motility patterns and the diagnosis of functional disorders. The term "processing" in this case is quite broad, incorporating the procedures of signal processing, content description, manipulation and analysis, that are applied to all the recorded bioacoustics signals, the auxiliary audio-visual surveillance information (for the monitoring of experiments and the subjects' status), and the extracted audio-video sequences describing the abdominal sound-field alterations. The thesis outline is as follows. The main objective of the thesis, which is the technological support of medical research, is presented in the first chapter. A quick problem definition is initially ...

Dimoulas, Charalampos — Department of Electrical and Computer Engineering, Faculty of Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece


Machine Learning Techniques for Image Forensics in Adversarial Setting

The use of machine-learning for multimedia forensics is gaining more and more consensus, especially due to the amazing possibilities offered by modern machine learning techniques. By exploiting deep learning tools, new approaches have been proposed whose performance remarkably exceed those achieved by state-of-the-art methods based on standard machine-learning and model-based techniques. However, the inherent vulnerability and fragility of machine learning architectures pose new serious security threats, hindering the use of these tools in security-oriented applications, and, among them, multimedia forensics. The analysis of the security of machine learning-based techniques in the presence of an adversary attempting to impede the forensic analysis, and the development of new solutions capable to improve the security of such techniques is then of primary importance, and, recently, has marked the birth of a new discipline, named Adversarial Machine Learning. By focusing on Image Forensics and ...

Nowroozi, Ehsan — Dept. of Information Engineering and Mathematics, University of Siena


Sound Event Detection by Exploring Audio Sequence Modelling

Everyday sounds in real-world environments are a powerful source of information by which humans can interact with their environments. Humans can infer what is happening around them by listening to everyday sounds. At the same time, it is a challenging task for a computer algorithm in a smart device to automatically recognise, understand, and interpret everyday sounds. Sound event detection (SED) is the process of transcribing an audio recording into sound event tags with onset and offset time values. This involves classification and segmentation of sound events in the given audio recording. SED has numerous applications in everyday life which include security and surveillance, automation, healthcare monitoring, multimedia information retrieval, and assisted living technologies. SED is to everyday sounds what automatic speech recognition (ASR) is to speech and automatic music transcription (AMT) is to music. The fundamental questions in designing ...

[Pankajakshan], [Arjun] — Queen Mary University of London


Algorithmic Analysis of Complex Audio Scenes

In this thesis, we examine the problem of algorithmic analysis of complex audio scenes with a special emphasis on natural audio scenes. One of the driving goals behind this work is to develop tools for monitoring the presence of animals in areas of interest based on their vocalisations. This task, which often occurs in the evaluation of nature conservation measures, leads to a number of subproblems in audio scene analysis. In order to develop and evaluate pattern recognition algorithms for animal sounds, a representative collection of such sounds is necessary. Building such a collection is beyond the scope of a single researcher and we therefore use data from the Animal Sound Archive of the Humboldt University of Berlin. Although a large portion of well annotated recordings from this archive has been available in digital form, little infrastructure for searching and ...

Bardeli, Rolf — University of Bonn


Highly Efficient Low-Level Feature Extraction For Video Representation And Retrieval

Witnessing the omnipresence of ever complex yet so intuitive digital video media, research community has raised the question of its meaningful use and management. Stored in immense multimedia databases, digital videos need to be retrieved and structured in an intelligent way, relying on the content and the rich semantics involved. Therefore, the third generation of Content Based Video Indexing and Retrieval systems faces the problem of the semantic gap between the simplicity of the available visual features and the richness of user semantics. This work focuses on the issues of efficiency and scalability in video indexing and retrieval to facilitate a video representation model capable of semantic annotation. A highly efficient algorithm for temporal analysis and key-frame extraction is developed. It is based on the prediction information extracted directly from the compressed-domain features and the robust scalable analysis in the ...

Calic, Janko — Queen Mary University of London


Audio motif detection for guided source separation. Application to movie soudtracks.

In audio signal processing, source separation consists in recovering the different audio sources that compose a given observed audio mixture. They are many techniques to estimate these sources and the more information are taken into account about them the more the separation is likely to be successful. One way to incorporate information on sources is the use of a reference signal which will give a first approximation of this source. This thesis aims to explore the theoretical and applied aspects of reference guided source separation. The proposed approach called SPotted REference based Separation (SPORES) explore the particular case where the references are obtained automatically by motif spotting, i.e., by a search of similar content. Such an approach is useful for contents with a certain redundancy or if a large database is be available. Fortunately, the current context often puts us ...

Souviraà-Labastie Nathan — Université de Rennes 1


Melody Extraction from Polyphonic Music Signals

Music was the first mass-market industry to be completely restructured by digital technology, and today we can have access to thousands of tracks stored locally on our smartphone and millions of tracks through cloud-based music services. Given the vast quantity of music at our fingertips, we now require novel ways of describing, indexing, searching and interacting with musical content. In this thesis we focus on a technology that opens the door to a wide range of such applications: automatically estimating the pitch sequence of the melody directly from the audio signal of a polyphonic music recording, also referred to as melody extraction. Whilst identifying the pitch of the melody is something human listeners can do quite well, doing this automatically is highly challenging. We present a novel method for melody extraction based on the tracking and characterisation of the pitch ...

Salamon, Justin — Universitat Pompeu Fabra


Content-based search and browsing in semantic multimedia retrieval

Growth in storage capacity has led to large digital video repositories and complicated the discovery of specific information without the laborious manual annotation of data. The research focuses on creating a retrieval system that is ultimately independent of manual work. To retrieve relevant content, the semantic gap between the searcher's information need and the content data has to be overcome using content-based technology. Semantic gap constitutes of two distinct elements: the ambiguity of the true information need and the equivocalness of digital video data. The research problem of this thesis is: what computational content-based models for retrieval increase the effectiveness of the semantic retrieval of digital video? The hypothesis is that semantic search performance can be improved using pattern recognition, data abstraction and clustering techniques jointly with human interaction through manually created queries and visual browsing. The results of this ...

Rautiainen, Mika — University of Oulou


Computational models of expressive gesture in multimedia systems

This thesis focuses on the development of paradigms and techniques for the design and implementation of multimodal interactive systems, mainly for performing arts applications. The work addresses research issues in the fields of human-computer interaction, multimedia systems, and sound and music computing. The thesis is divided into two parts. In the first one, after a short review of the state-of-the-art, the focus moves on the definition of environments in which novel forms of technology-integrated artistic performances can take place. These are distributed active mixed reality environments in which information at different layers of abstraction is conveyed mainly non-verbally through expressive gestures. Expressive gesture is therefore defined and the internal structure of a virtual observer able to process it (and inhabiting the proposed environments) is described in a multimodal perspective. The definition of the structure of the environments, of the virtual ...

Volpe, Gualtiero — University of Genova


Video Content Analysis by Active Learning

Advances in compression techniques, decreasing cost of storage, and high-speed transmission have facilitated the way videos are created, stored and distributed. As a consequence, videos are now being used in many applications areas. The increase in the amount of video data deployed and used in today's applications reveals not only the importance as multimedia data type, but also led to the requirement of efficient management of video data. This management paved the way for new research areas, such as indexing and retrieval of video with respect to their spatio-temporal, visual and semantic contents. This thesis presents work towards a unified framework for semi-automated video indexing and interactive retrieval. To create an efficient index, a set of representative key frames are selected which capture and encapsulate the entire video content. This is achieved by, firstly, segmenting the video into its constituent ...

Camara Chavez, Guillermo — Federal University of Minas Gerais


Video Sequence Analysis for Content Description, Summarization and Content-Based Retrieval

The main research area of this Ph.D. thesis is video sequence processing and analysis for description and indexing of visual content. Its objective is to contribute in the development of a computational system with the capabilities of object-based segmentation of audiovisual material, automatic content description, summarization for preview and browsing, as well as content-based retrieval. The thesis consists of four parts. The first introduces video sequence analysis, segmentation and object extraction based on color, motion, and depth field. A fusion technique is proposed that combines individual cue segmentations and allows for reliable identification of semantic objects. The second part refers to automatic description and annotation of the visual content by means of feature vectors, summarization, implemented by optimal selection of a limited set of key frames and shots, and content-based search and retrieval. In the third part, the problem of ...

Avrithis, Yannis — National Technical University of Athens


Identification of versions of the same musical composition by processing audio descriptions

Automatically making sense of digital information, and specially of music digital documents, is an important problem our modern society is facing. In fact, there are still many tasks that, although being easily performed by humans, cannot be effectively performed by a computer. In this work we focus on one of such tasks: the identification of musical piece versions (alternate renditions of the same musical composition like cover songs, live recordings, remixes, etc.). In particular, we adopt a computational approach solely based on the information provided by the audio signal. We propose a system for version identification that is robust to the main musical changes between versions, including timbre, tempo, key and structure changes. Such a system exploits nonlinear time series analysis tools and standard methods for quantitative music description, and it does not make use of a specific modeling strategy ...

Serra, Joan — Universitat Pompeu Fabra


Scattering Transform for Playing Technique Recognition

Playing techniques are expressive elements in music performances that carry important information about music expressivity and interpretation. When displaying playing techniques in the time-frequency domain, we observe that each has a distinctive spectro-temporal pattern. Based on the patterns of regularity, we group commonly-used playing techniques into two families: pitch modulation-based techniques (PMTs) and pitch evolution-based techniques (PETs). The former are periodic modulations that elaborate on stable pitches, including vibrato, tremolo, trill, and flutter-tongue; while the latter contain monotonic pitch changes, such as acciaccatura, portamento, and glissando. In this thesis, we present a general framework based on the scattering transform for playing technique recognition. We propose two variants of the scattering transform, the adaptive scattering and the direction-invariant joint scattering. The former provides highly-compact representations that are invariant to pitch transpositions for representing PMTs. The latter captures the spectro-temporal patterns exhibited ...

Wang, Changhong — Queen Mary University of London

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.