Given a dataset of 10,000 firms, one data pattern for one firm. Each
data pattern in the set has 100 variables (X1 ... X100).
These variables capture the information about the firm, such as retained
earning, working capital, debt ratio. There is an additional label Y for
each firm which specifies if the firm is bankrupted or not (0: bankrupt;
1: non-bankrupt)
Given another firm with 100 values in corresponding to the 100 variables
in the dataset, how do we decide if this firm is going to be bankrupted
or not?
Modeling Probability Density
For any variable in the dataset Xi,
(1) divided the 10,000 values of Xi into two groups according to Y=1
or Y=0;
(2) for each group, build a probability density distribution using
methods learned before.
i.e. get the PDF functions:
P(Xi | Y=1) and P(Xi |Y=0)
(3) calculate the probability based on the value of Xi of the new firm:
Xinew
P(Xi=Xinew |
Y=1) and P(Xi=Xinew | Y=0)
(4) classify based on the probability
if P(Xi=Xinew
| Y=1) > P(Xi=Xinew | Y=0), then the new firm is classified
as non-bankrupt;
if P(Xi=Xinew
| Y=1) < P(Xi=Xinew | Y=0), then the new firm is classified
as bankrupt.
The problem with the probability density approach is that estimating
the probability density is not very reliable. It is highly subjective and
very easy to incorporate errors.
Modeling Posterior Probability
This approach is based on the Bayesian theorem. For the given value of
X=Xnew, the conditional probabilities of each classes can be
calculated as
P(Y=1 | X=Xnew) = P(Y=1)*P(X=Xnew |Y=1) /[P(Y=1)*P(X=Xnew
|Y=1)+P(Y=0)*P(X=Xnew |Y=0) ]
Based on the conditional probabilities for each class, we can decide
which class this firm should belong to by picking up the class that has
the highest conditional probability.
Model Decision Boundary
Instead of building measuring function for the classes, this approach directly
partition the input parameter space (a 100 dimension space defined by the
100 variables) into different regions, each of which is in corresponding
to one class (bankrupt or non-bankrupt).
To determine the boundary, we need to assume the form of boundary function.
The traditional approach of linear discriminate analysis usually does not
work very well in the high-dimension space. Instead, we consider using
the sigmoid functions such as tanh(x). To determine the boundary, we also
need an objective function. One choice is MSE (mean square error). Another
is the entropy.
Neural network provides a very power tool to construct the high-dimension
boundary. A three layered network allows us to group input into subgroups
and take into consideration of non-linearity.