Video Sequence Analysis for Content Description, Summarization and Content-Based Retrieval
The main research area of this Ph.D. thesis is video sequence processing and analysis for description and indexing of visual content. Its objective is to contribute in the development of a computational system with the capabilities of object-based segmentation of audiovisual material, automatic content description, summarization for preview and browsing, as well as content-based retrieval. The thesis consists of four parts. The first introduces video sequence analysis, segmentation and object extraction based on color, motion, and depth field. A fusion technique is proposed that combines individual cue segmentations and allows for reliable identification of semantic objects. The second part refers to automatic description and annotation of the visual content by means of feature vectors, summarization, implemented by optimal selection of a limited set of key frames and shots, and content-based search and retrieval. In the third part, the problem of object contour analysis and representation is examined, with application to shape-based object retrieval. An original contour normalization scheme is presented, permitting invariant shape representation with respect to a number of transformations without any loss of information. In the fourth part, a novel technique is proposed for temporal segmentation and parsing of broadcast news recordings into elementary story units or news topics using visual cues, based on an advanced algorithm for detection of human faces. Finally, conclusions are drawn and a number of issues are proposed that could form the basis for future research.
