A statistical approach to motion estimation
Digital video technology has been characterized by a steady growth in the last decade. New applications like video e-mail, third generation mobile phone video communications, videoconferencing, video streaming on the web continuously push for further evolution of research in digital video coding. In order to be sent over the internet or even wireless networks, video information clearly needs compression to meet bandwidth requirements. Compression is mainly realized by exploiting the redundancy present in the data. A sequence of images contains an intrinsic, intuitive and simple idea of redundancy: two successive images are very similar. This simple concept is called temporal redundancy. The research of a proper scheme to exploit the temporal redundancy completely changes the scenario between compression of still pictures and sequence of images. It also represents the key for very high performances in image sequence coding when compared to still image coding. Motion estimation and compensation techniques have shown their efficiency to reduce the temporal redundancy in this respect. The principle is the following. The displacement of objects between successive frames is first estimated (motion estimation). The resulting motion information is then exploited for an efficient interframe coding (motion compensation). Consequently the motion information along with the prediction error are transmitted instead of the frame itself. This dissertation first deals with the development of a methodology to estimate the motion field between two frames for video coding applications. Block matching techniques are generally used for motion estimation in video coding. In this context, the best solution from the quality point of view is represented by a full search algorithm that considers every possible detail while requiring however an enormous computational complexity. Different sub-optimal solutions have been proposed in the literature, but a complete and global approach to the problem is still missing. This thesis proposes a global and exhaustive study of the motion estimation process in the framework of a general video coder. The approach proposed in this dissertation, contrary to many solutions proposed in the literature, is not based on particular hypothesis about the behavior of the correlation function used in determining the similarities between two blocks. It rather origins from the analysis of the motion field in generic sequences and propose a statistical model for the motion vectors field, particularly suitable for block based video coding. Another innovatory aspect of the approach used in this work consists in considering the problem of motion estimation globally and not as a problem separated into different levels. The aim is to achieve a global vision of how different approaches interact together while independently targeting the reduction of the computational complexity. It becomes therefore fundamental to understand at which level it is possible to operate (block, temporal redundancies, hierarchical approach) to get a prediction quality comparable to the exhaustive search while reducing the number of operations. This work also addresses the study of parallelism present in the motion estimation and video coding processes. The various levels of parallelism (processor level, instruction level, data level) present in current general purpose architectures are used to efficiently increase the performance of software video coders. A model for the block subsampling is finally introduced and its quality impact on the overall estimation process is studied. The resulting subsampling algorithm allows for a perfect exploitation of the multimedia architectures of recent general purpose processors. It also represents an important insight in the research of a tradeoff between fast search and fast matching motion estimation approaches.
