We can gain some insight about the uses and limitations of PCA by
considering the related task of least-squares line fitting.
The following discussion is based on the treatment of a related topic
in [BGW91].
Given a density function
where
,
we would like to determine the equation for a line (confined to the
plane spanned by
and
) which best approximates
.
We will assume without loss of generality that the centroid of
is located at the origin. Our goal can be stated as
follows: we wish to find a unit-norm vector
which minimises the
following integral:
The quantity
represents
the squared error between each point in the plane (as weighted by
) and the line passing through the vector
.
Since
cannot increase the norm of
, minimising
is equivalent to maximising the integral
In other words, we wish to find a vector
which has the
largest possible projection onto all points in the plane as weighted
by
. This goal is analogous to that of Equation
(25) for the case of a single eigenvector. Whereas the above
integrals involve a continuous density function, the summation in
Equation (25) incorporates the discrete ``density'' of
points in the plane into the enumerated vectors
. In this regard, PCA can be viewed as a form
of generalized line-fitting to higher dimensional distributions.
This comparison to the task of line fitting hints at some of the inherent
limitations of PCA-based approaches. In a case where the subspace of
variations on a feature is not linear, a linear approximation may not
be wise. Moreover, determining membership to a class based only on the
distance to a particular eigenspace (e.g. the line passing through
in the above line fitting example) ignores any localisation
the training points might possess along the eigenspace. Thus it would
be advantageous in certain cases to make adjustments to classical PCA which
incorporate probabilistic characterisation of the clusters which give
rise to a particular eigenspace. Such a technique is described
in Section 3.6.