./Notes/03Class02.html
|
|
Handout 03 (January 25, 1999) |
|
Data Mining in Finance |
|
|
NYU/Stern: B20.3355 (IS) and B90.3355 (SOR) |
|
|
URL: www.stern.nyu.edu/~aweigend/Teaching/DMF |
|
|
Copyright: Andreas S. Weigend |
Learning from data -- A simple example
Question
Every day the performance of a trading strategy comes in: made or lost money
Simple reformulation: you toss a coin every day, outcome H = heads
What is the probability of correctly prediction the biasedness of a coin?
Bayes rule
Want to know: prob(H|{D},I)
I = Information Set
{D} = D1, D2,… data
Given: prior = prob(H|I) probability of H only given info that we are dealing with a possibly strange coin
Update
Heads comes in… how to update?
Use Bayes rule
Gives the posterior probability distribution that includes the very last data point
Want to boil down posterior to a couple of numbers
1) Best estimate, e.g., maximum: MAP (maximum a posterior)
How to compute: first derivative = 0 à x_0
2) Uncertainty, i.e.,reliability of best estimate
take logarithm since it varies more slowly L = log_e(prob(X|{D},I})
Taylor expansion
Interpretation
Error bar = symmetric around the maximum
Confidence interval = shortest interval that encloses 95% of the area under the posterior
Solution for multimodal problems: mixture model
Looking at data
How to characterize univariate data?
Simple introduction to MATLAB
Histogram
Kernel based methods
qq-plot
Outliers
Drawing a distribution means assuming a model
Classes of models
Eager vs lazy
Gaussian: eager (compute and throw away data)
Kernel: lazy (need to keep data)
The 7 steps of modeling
Preview of next class
Please bring an interesting graph (visual representation of quantitative data)
Learning and generalization
Model evaluation
Diebold Chapter 12, handed out in Class 2
Error sources
Books
Learning from Data: Concepts, Theory, and Methods
Vladimir S. Cherkassky and Filip M. Mulier
Hardcover: USD75 (442 pages)
John Wiley & Sons (1998)
ISBN: 0471154938
Predictive Data Mining
Sholom M. Weiss and Nitin Indurkhya
Paperback: USD40 (225 pages)
Morgan Kaufman Publishers (1997)
ISBN: 1558604030
Neural Networks for Pattern Recognition
Christopher M. Bishop
Paperback: USD55 (482 pages)
Oxford University Press (1995)
ISBN: 0198538642