Multimedia Content Analysis, Indexing and Summarization: A Perspective on Real-Life Use Cases
The problem of finding images, video clips and music, given time, place, interest and mood has kept an immense number of scientists and technology developers busy in the past twenty years. However, straightforward attempts to apply textbased search to non-textual data still seem to be the only viable solution. In spite of the numerous ideas proposed so far in the MIR (Multimedia Information Retrieval) research field, it is remarkable that hardly any significant success story, and in particular a commercially relevant one, has been reported. This thesis addresses the reasons that have prevented broad practical deployment of theories and algorithms for searching and retrieving content in multimedia data collections and proposes novel, generic and robust solutions. In particular, the thesis focuses on the problems that typically emerge when dealing with realistic use cases built around real-life systems, noisy data and highly unstructured and diverse content. A number of MIR aspects are selected that cover different MIR challenges, namely shot boundary detection, indexing videos of live music content and video summarization. All the algorithms proposed handle complex content and provide generic applicability and real-time operability. In shot boundary detection, spatiotemporal properties of the signal are used. Video indexing provides a new feature set based on crossing rate properties. Finally, in video summarization the audio modality is largely employed, emphasizing the importance of selecting the right contributions from different modalities.
