Kreutz-Delgado, Kenneth
Coauthors(s): K. Kreutz-Delgado, B.D. Rao, K. Engan Electrical and Computer Engineering Jacobs School of Engineering University of California, San Diego and T.-W. Lee, T. Sejnowski Computational Neurobiology Laboratory The Salk Institute
UC San Diego
Electrical and Computer Engineering
Mail Code 0407 Room 2407 Bldg EB-U1 9500 Gilman Drive La Jolla, CA 92093-0407



Convex/Schur-Convex (CSC) Log-Priors and Sparse Coding

It has been noted by a variety of investigators in several different research domains (human vision, signal processing, ...) that a generative model appropriate for understanding sparse coding and independent component analysis (ICA) is given by the system of equations y = A x + n (1) where y is an observed signal vector, x is an unobserved ("blind") source vector, the columns of A form an overcomplete dictionary, and n is a measurement noise vector which is independent of the source. Equation (1) is interpreted within a Bayesian framework. The components, x[k], k = 1,...,n, of the source vector, x, are independent components according to this model when the statistical prior, p(x), factors into a product of marginal probabilities as p(x) = p(x[1])...p(x[n]). Within this ICA model an estimate of the source vector x is said to provide a "factorial code" of the measured signal y. This shows the relationship between ICA and factorial coding. It is generally believed that for x to provide a sparse coding of y, the prior p(x[k]) should be supergaussian. This fact has not been proven in a strictly rigorous manner, but has a highly plausible intuitiveness and has been demonstrated in a variety of simulations and applications. It serves as a useful "folk theorem" for obtaining sparse codes from the model (1). In this paper we make rigorous statements concerning when a maximum a posteriori (MAP) estimate of the source x will provide a sparse coding of the measured signal y. We do this for both the ICA *and* the "dependent component analysis" (DCA, i.e., non-factorizable) prior) cases in the low-noise limit. The requirement is that the log-prior, log(p(x)), be convex (equivalently, that -log(p(x)) be concave). We also argue that an additional desirable property is that log(p(x)) be Schur-convex (equivalently, -log(p(x) is Schur-concave). Within our framework, it can be rigorously shown that supergaussianess is *not* quite strong enough to yield sparse codes in the low-noise limit. However, in the high-noise limit when thresholding may be required to provide low error (between x and its MAP estimate) sparse codes, supergaussian, or even subgaussian, priors may be prefered. Whether or not these results hold for any specific proposed sparse coding algorithm to be found in the literature depends on the degree to which the algorithms is based on the assumed validity of the factorial model (1). As an example, we discuss the ICA algorithm of Bell and Sejnowski. We also briefly discuss the complexity differences between ICA and DCA for MAP algorithms based on the model (1).