Notes from class on May 3, 1999 -- ASW

Basic belief

Modeling = Learning from data

Emphasize process

No true model

No hypotheses, but out-of-sample performance

No first principles, but trade-offs

Noise nonstationarity trade-off

Any interesting problem is noisy

Main problems and goals

Description ŕ Prediction ŕ Decision

Evaluation

Always out-of-sample

Suite of performance measures

(Including information theoretic ones)

(KL distance as one measure)

Risk

Motivate: model both risk to trade, what of your co is at stake

Different kinds: implementation, model, sampling, market, liquidity risk, default, credit

Risk-adjusted returns

Probabilistic

Motivate: finance noisy

When to give up / what is a good null model?

Milestones? Performance benchmarks

Robust

Motivate: need semi-automatic ways

Sensitivity

Automatic dealing with potential outliers

Understanding data (through building models and analyzing the models)

Style analysis

Clustering

Main tools

Thinking / Creativity: visualization, think about alternative null-models

Data generating processes

Surrogate date

Tools from info theory

Learning ŕ Conditioning

Find subset of variables

Preprocessing: Finding a good representation

Build model that reduces error when written as model of these variables

Noise ŕ Distributions

Traditional: summary statistics

We have outliers: need trade-offs, no perfect model, robust modeling

Bootstrap

Basic dimensions

Model classes

local vs global

lazy vs eager

Axes of comparison

computer intensive

prone to overfitting?

incorporating new data

Nonlinear regression models: neural networks

Lots of data: make it possible

Lots of noise: makes it hard, but also fun

Alternatives: Kernel regression

Understanding models (built from real data)

Statistical framework

Maximum likelihood

Bayes rule

Algorithms

Gradient descent

EM

PCA

ICA

Additional points

Software

Point and click vs commandline

Collection of data

Sources

Data structures

Role of domain knowledge

Knowing what the numbers mean

Understand structure

Here, avoid (usually wrong) “first principle” approaches

Reduce problems to pattern recognition