./Notes/03Class02.html

Handout 03 (January 25, 1999)

Data Mining in Finance

NYU/Stern: B20.3355 (IS) and B90.3355 (SOR)

URL: www.stern.nyu.edu/~aweigend/Teaching/DMF

Copyright: Andreas S. Weigend

 

Learning from data -- A simple example

Question

Every day the performance of a trading strategy comes in: made or lost money

Simple reformulation: you toss a coin every day, outcome H = heads

What is the probability of correctly prediction the biasedness of a coin?

Bayes rule

Want to know: prob(H|{D},I)

I = Information Set

{D} = D1, D2,… data

Given: prior = prob(H|I) probability of H only given info that we are dealing with a possibly strange coin

Update

Heads comes in… how to update?

Use Bayes rule

Gives the posterior probability distribution that includes the very last data point

Want to boil down posterior to a couple of numbers

1) Best estimate, e.g., maximum: MAP (maximum a posterior)

How to compute: first derivative = 0 à x_0

2) Uncertainty, i.e.,reliability of best estimate

take logarithm since it varies more slowly L = log_e(prob(X|{D},I})

Taylor expansion

Interpretation

Error bar = symmetric around the maximum

Confidence interval = shortest interval that encloses 95% of the area under the posterior

Solution for multimodal problems: mixture model

Looking at data

How to characterize univariate data?

Simple introduction to MATLAB

Histogram

Kernel based methods

qq-plot

Outliers

Drawing a distribution means assuming a model

Classes of models

Eager vs lazy

Gaussian: eager (compute and throw away data)

Kernel: lazy (need to keep data)

The 7 steps of modeling


Preview of next class

Please bring an interesting graph (visual representation of quantitative data)

Learning and generalization

Model evaluation

Diebold Chapter 12, handed out in Class 2

Error sources

Books

Learning from Data: Concepts, Theory, and Methods

Vladimir S. Cherkassky and Filip M. Mulier

Hardcover: USD75 (442 pages)

John Wiley & Sons (1998)

ISBN: 0471154938

Predictive Data Mining

 

Sholom M. Weiss and Nitin Indurkhya

Paperback: USD40 (225 pages)

Morgan Kaufman Publishers (1997)

ISBN: 1558604030

Neural Networks for Pattern Recognition

 

Christopher M. Bishop

Paperback: USD55 (482 pages)

Oxford University Press (1995)

ISBN: 0198538642