In the previous section we have shown that we can perform classification based
upon the class conditional probability densities. In order to make use of this
fact, it is necessary to calculate, or at least estimate,
. While this may not be a difficult problem in low
dimensions, it definitely is not simple to obtain accurate and fast estimates
for high dimensional spaces, especially when the number of training examples
is limited. Therefore we present in this section a method that uses principal
components analysis to model probability densities in higher dimensions from a
reasonably small set of training samples.
Figure 7: The probabilistic eigenspace approach
We assume that we have applied principal components analysis to a set of
N-dimensional training examples belonging to one class
, so that
we can now approximately describe these examples by their mean vector
and a linear combination of n basis vectors
, as in Section 3.4. We arrange these basis vectors as
columns of a matrix
and we are going to omit the index i from now
on and assume that all of the following calculations are applied seperately to
each of the classes. Furthermore, let
be the coordinate vector of a
point with respect to the basis
and let
be an
orthonormal basis for the orthogonal complement of
. With these
definitions we can write:
This equation is illustrated in Figure 7.
Here
is the reconstruction error vector and can be described by
the (N-n)-dimensional basis
.
We are now going to model the reconstruction error as white Gaussian noise with variance
so that for a given
we obtain the following probability density:
Here,
shall denote a
multivariate gaussian distribution with mean vector
and covariance matrix
. Note also, that the noise variance
should be subscripted by
i in order to establish a slightly more general result. The
class-conditional density of
can be obtained by multiplying
Equation (42) by
and integrating over
.
Thus,
In the derivation above we used the fact that
which holds as a consequence of the fact that
,
and
form a rectangular triangle.
In order to further simplify our result, we will now assume a particular density for
,
namely
.
That is, we assume the projections of the samples are Gaussian distributed (in the eigenspace) with
mean
and covariance
. With this
assumption, we can perform the integration in Equation (43)
in closed form (refer to the proof in the appendix for details).
Hence, the exact density of
is given by:
For convenience we will now focus on the logarithm of this expression.
Which can be simplified to the following
where
We have now at our disposition a distance measure that allows us to assign each object to its most likely class. Our method has two main advantages over the standard eigenspace approach where only the reconstruction error is taken into account. First of all we make use of the probability distribution of the reconstruction error of each class, secondly we also consider the distribution of the class members in the eigenspace.