Classification

Classification attaches each data pattern with a label that defines which class the pattern belongs to. There are, in general, three types of classification approaches: probability density approach, posterior probability approach, and decision boundary approach.

The Problem

Given a dataset of 10,000 firms, one data pattern for one firm. Each data pattern in the set  has 100 variables (X1 ... X100). These variables capture the information about the firm, such as retained earning, working capital, debt ratio. There is an additional label Y for each firm which specifies if the firm is bankrupted or not (0: bankrupt; 1: non-bankrupt)

Given another firm with 100 values in corresponding to the 100 variables in the dataset, how do we decide if this firm is going to be bankrupted or not?

Modeling Probability Density

For any variable in the dataset Xi,
(1) divided the 10,000 values of Xi into two groups according to Y=1 or Y=0;
(2) for each group, build a probability density distribution using methods learned before.
        i.e. get the PDF functions: P(Xi | Y=1) and P(Xi |Y=0)
(3) calculate the probability based on the value of Xi of the new firm: Xinew
        P(Xi=Xinew | Y=1) and P(Xi=Xinew | Y=0)
(4) classify based on the probability
        if  P(Xi=Xinew | Y=1) > P(Xi=Xinew | Y=0), then the new firm is classified as non-bankrupt;
        if  P(Xi=Xinew | Y=1) < P(Xi=Xinew | Y=0), then the new firm is classified as bankrupt.

The problem with the probability density approach is that estimating the probability density is not very reliable. It is highly subjective and very easy to incorporate errors.

Modeling Posterior Probability

This approach is based on the Bayesian theorem. For the given value of X=Xnew, the conditional probabilities of each classes can be calculated as
            P(Y=1 | X=Xnew) = P(Y=1)*P(X=Xnew |Y=1) /[P(Y=1)*P(X=Xnew |Y=1)+P(Y=0)*P(X=Xnew |Y=0) ]

Based on the conditional probabilities for each class, we can decide which class this firm should belong to by picking up the class that has the highest conditional probability.

Model Decision Boundary

Instead of building measuring function for the classes, this approach directly  partition the input parameter space (a 100 dimension space defined by the 100 variables) into different regions, each of which is in corresponding to one class (bankrupt or non-bankrupt).

To determine the boundary, we need to assume the form of boundary function. The traditional approach of linear discriminate analysis usually does not work very well in the high-dimension space. Instead, we consider using the sigmoid functions such as tanh(x). To determine the boundary, we also need an objective function. One choice is MSE (mean square error). Another is the entropy.

Neural network provides a very power tool to construct the high-dimension boundary. A three layered network allows us to group input into subgroups and take into consideration of non-linearity.