Notes from class on May 3, 1999 -- ASW
Basic belief
Modeling = Learning from data
Emphasize process
No true model
No hypotheses, but out-of-sample performance
No first principles, but trade-offs
Noise nonstationarity trade-off
Any interesting problem is noisy
Main problems and goals
Description ŕ
Prediction ŕ
Decision
Evaluation
Always out-of-sample
Suite of performance measures
(Including information theoretic ones)
(KL distance as one measure)
Risk
Motivate: model both risk to trade, what of your co is at
stake
Different kinds: implementation, model, sampling, market,
liquidity risk, default, credit
Risk-adjusted returns
Probabilistic
Motivate: finance noisy
When to give up / what is a good null model?
Milestones? Performance benchmarks
Robust
Motivate: need semi-automatic ways
Sensitivity
Automatic dealing with potential outliers
Understanding data (through building models and analyzing the models)
Style analysis
Clustering
Main tools
Thinking / Creativity: visualization, think about alternative null-models
Data generating processes
Surrogate date
Tools from info theory
Learning ŕ
Conditioning
Find subset of variables
Preprocessing: Finding a good representation
Build model that reduces error when written as model of these variables
Noise ŕ
Distributions
Traditional: summary statistics
We have outliers: need trade-offs, no perfect model, robust
modeling
Bootstrap
Basic dimensions
Model classes
local vs global
lazy vs eager
Axes of comparison
computer intensive
prone to overfitting?
incorporating new data
Nonlinear regression models: neural networks
Lots of data: make it possible
Lots of noise: makes it hard, but also fun
Alternatives: Kernel regression
Understanding models (built from real data)
Statistical framework
Maximum likelihood
Bayes rule
Algorithms
Gradient descent
EM
PCA
ICA
Additional points
Software
Point and click vs commandline
Collection of data
Sources
Data structures
Role of domain knowledge
Knowing what the numbers mean
Understand structure
Here, avoid (usually wrong) “first principle” approaches
Reduce problems to pattern recognition