In a probabilistic approach, where all available information is contained in the observation
, decision rules are in general based on the conditional probability
. A common procedure is to define a cost function that is minimised
by the optimal decision rule. In that sense we define
to be the cost for taking
a decision in favour of class i when the correct decision is class j.
The overall cost for a decision rule
is then
We can simplify this equation with the assumption that the cost is zero for a correct decision and unity for a false decision. We obtain
where
is the usual Kronecker Delta Function.
Furthermore, since we are assuming a deterministic rule,
the probability that
will chose the correct class for x can only be
0 or 1, i.e.
simplifies to
.
It is now easy to see that the cost is minimised by the following rule:
By applying Bayes' Rule
and the assumption that the
are equal for all classes, we can obtain the
equivalent rule:
which leads to the conclusion that we can construct an optimal classifier that
simply maps a pattern
to the class
for which the
class conditional probability density
is maximal.