Compressed sensing and dimensionality reduction for unsupervised learning

This work aims at exploiting compressive sensing paradigms in order to reduce the cost of statistical learning tasks. We first provide a reminder of compressive sensing bases and describe some statistical analysis tasks using similar ideas. Then we describe a framework to perform parameter estimation on probabilistic mixture models in a case where training data is compressed to a fixed-size representation called a sketch. We formulate the estimation as a generalized inverse problem for which we propose a greedy algorithm. We experiment this framework and algorithm on an isotropic Gaussian mixture model. This proof of concept suggests the existence of theoretical recovery guarantees for sparse objects beyond the usual vector and matrix cases. We therefore study the generalization of linear inverse problems stability results on general signal models encompassing the standard cases and the sparse mixtures of probability distributions. We propose conditions under which recovery guarantees hold. Finally, we focus on approximate nearest neighbor search using small-dimensional vector signatures to reduce complexity. In the case where the considered distance is relative to a Mercer kernel, we suggest to perform an explicit embedding of the data in a Euclidean space followed by a signature computation, which leads to a more precise search than KLSH, a standard technique for computing signatures for a kernel distance.

File Type: pdf
File Size: 1 MB
Publication Year: 2014
Author: Bourrier, Anthony
Supervisors: R?mi Gribonval, Patrick P?rez
Institution: INRIA, Technicolor
Keywords: generalized compressed sensing and linear inverse problems, mixture model learning, nearest neighbor search.