In the previous section we have shown that we can perform classification based upon the class conditional probability densities. In order to make use of this fact, it is necessary to calculate, or at least estimate, . While this may not be a difficult problem in low dimensions, it definitely is not simple to obtain accurate and fast estimates for high dimensional spaces, especially when the number of training examples is limited. Therefore we present in this section a method that uses principal components analysis to model probability densities in higher dimensions from a reasonably small set of training samples.
Figure 7: The probabilistic eigenspace approach
We assume that we have applied principal components analysis to a set of N-dimensional training examples belonging to one class , so that we can now approximately describe these examples by their mean vector and a linear combination of n basis vectors , as in Section 3.4. We arrange these basis vectors as columns of a matrix and we are going to omit the index i from now on and assume that all of the following calculations are applied seperately to each of the classes. Furthermore, let be the coordinate vector of a point with respect to the basis and let be an orthonormal basis for the orthogonal complement of . With these definitions we can write:
This equation is illustrated in Figure 7. Here is the reconstruction error vector and can be described by the (N-n)-dimensional basis . We are now going to model the reconstruction error as white Gaussian noise with variance so that for a given we obtain the following probability density:
Here, shall denote a multivariate gaussian distribution with mean vector and covariance matrix . Note also, that the noise variance should be subscripted by i in order to establish a slightly more general result. The class-conditional density of can be obtained by multiplying Equation (42) by and integrating over . Thus,
In the derivation above we used the fact that
which holds as a consequence of the fact that , and form a rectangular triangle. In order to further simplify our result, we will now assume a particular density for , namely . That is, we assume the projections of the samples are Gaussian distributed (in the eigenspace) with mean and covariance . With this assumption, we can perform the integration in Equation (43) in closed form (refer to the proof in the appendix for details). Hence, the exact density of is given by:
For convenience we will now focus on the logarithm of this expression.
Which can be simplified to the following
We have now at our disposition a distance measure that allows us to assign each object to its most likely class. Our method has two main advantages over the standard eigenspace approach where only the reconstruction error is taken into account. First of all we make use of the probability distribution of the reconstruction error of each class, secondly we also consider the distribution of the class members in the eigenspace.