1.    Review last class

1.1.        Neural Network

For a set of samples (patterns) with input X and output Y, the relationship between input and output can be depicted in a network of nodes. The number of input nodes equals the number of X, (input dimension); the number of output nodes equals the number of Y, (output dimension). In the following case, we consider one input dimension and one output dimension.

1.1.1.     Linear model

In a linear model, the input node X and the output node Y are directly conncected by a line, representing the weight w.
    Y^ = w * X

1.1.2.     Polynomial “network”

For polynomial estimation, the input node X and the output node Y are linked through a set of middle nodes. Each middle node represents one polynomial factor such as x, x2, x3, etct. Each middle node has an associated weight factor. All weigh factors can be represented in a weight matrix: W = {w1, w2, w3, ...}. Y^ = W * X

1.1.3.     Neural network model

In a neural network, the inputs X and output Y are connected through a set so-called hidden units.

Each take the weighted sum of the inputs that are connected to it, and then “pipes” it through some nonlinearity.

The weight vectors can be represented in a weight matrix: W = {w1, w2, w3, ...}. Unlike the polynomial estimation, the meaning of the hidden units are not defined.

Tyupical activation function:  h = tanh(w, x) + wo
This hyperbolic actuation function is bounded (between –1 and +1), montonically increasing, and smooth

Overall output: Y^ = Sum[tanh(w * x) + wo] + W^

1.1.4.     Expansion vs mixture

·         Example of expansion: Taylor

Possible to choose metric (orthogonality) such that adding a new term (higher power) does not change the lower terms (previously computed weights or coefficients)

1.1.5.     Neural network vs. Polynomial estimation

·         

·        Neural Network

·        Polynomial

·        Parameters

·        Parameters are after non-linearity

·        Parameters are before non-linearity

·        Estimating the weights

·        “Training”

·        “Fitting”

·        Meaning of middle nodes

·        Difficult

·        Clear

1.1.6.     Two Spaces of Neural Network

·         1. Input - Output space

·          Mapping between input X and output Y

This is the s

·         2. Weight Space

·         Mapping between the weighs of hidden units and the in-sample error

During network training, the weights are gradually adjusted to reduce the in-sample error

Easiest way: local linearization, i.e., take a small step along the gradient in the weight space

§         Demo: Approximating a yield curve with a neural network

2.    Discussion Homework 3 (Bootstrap)

2.1.        Ingredients

2.1.1.     Computing Sharpe Ratio

function SR = sharpe(x,Rf)

%To compute annualized Sharpe ratio of daily log returns series x

%Optional argument: annual risk free interest rate (Rf)

%Note: we assume Rf to be constant

%If used in bootstrapping, then bootstrap x

 

N = length(x); %length of data set

T = 253; %number of trading days per year

if nargin==1

   Rf = 0.05; %default for risk free interest rate

end

 

%NUMERATOR

%First make annual Rf daily:

dailyRf = (1+Rf)^(1/T) - 1;

 

%Then compute daily excess return:

excessRet = x - dailyRf;

 

%Now annualize by compounding these excess daily returns:

AnnExcessRet = (mean(excessRet) + 1) ^ T - 1;

 

%DENOMINATOR

%std(x) is the (daily) standard deviation since we use daily returns

%to annualize, need to muliply with sqrt (T) (scaling for standard deviations)

AnnStd = sqrt(T) * std(x);

 

SR = AnnExcessRet / AnnStd;

2.1.2.     Computing Maximum Draw Down

function MD = maxdrawdown(x)

%To compute maximum relative drawdown of return series x

 

%generate price time series from return series

P = exp(cumsum(x));

 

%Cumulative maximum price

N = length(P);

M = zeros(N,1);

M(1) = P(1);

for i=2:N

   M(i) = max(M(i-1),P(i));

end

 

%drawdown for each step

drawdown = (M - P) ./ M;

 

%maximun drawdown

MD = max(drawdown);

2.1.3.     Computing Resampling

function [sample] = resample(nsamp)

% usage: [sample] = resample(nsamp)

% bootstrap sampling buiding block

% nsamp = sample size

% draw a new sample with replacement

sample = fix(rand(nsamp,1) * nsamp) + 1;

2.2.        Additional Analysis

2.2.1.     Cumulative Distribution

%Further analysis for HOMEWORK 3

%MATLAB SCRIPT

 

cd 'D:\My Documents\DMFChenggang'

load HW03Results.mat

 

%MDBoot contains 100k replications of max drawdown

x = MDBoot; %pick the data set to be analyzed

 

%x=[randn(1,10000) - 10   2 * randn(1,10000) + 10];

 

%%PDF

nbin = 200;

[nh,xh] = hist(x,nbin);

plot(xh,nh)

grid

xlabel('Value'); ylabel('PDF (distribution, counts)');

 

%%CDF

p = (0.5:1:(length(x)-0.5))/length(x); %percentile of each point

plot(sort(x),p)

xlabel('Value'); ylabel('CDF');

grid;

2.2.2.     Quantile-quantile plot

%COMPARE TO NORMAL (via percentiles)

p = (0.5:1:(length(x)-0.5))/length(x); %as before

plot(mean(x) + std(x) * norminv(p),sort(x),'r.')

grid; 

xlabel('Normal distribution'); ylabel('Data')

minx = min(x); maxx = max(x);

line([minx maxx],[minx maxx]); axis equal

 

%COMPARE TO NORMAL (via realizations)

plot(sort(randn(size(x))),sort(x),'.')

xlabel('drawings from randn'); ylabel('Data')

 

%See also the MATLAB FUNCTION  qqplot

qqplot(norminv(p),sort(x))

3.    Homework 4

Copy over the files given in http://www.stern.nyu.edu/~aweigend/DMFS99/Notes/Class07

Run the Matlab script yielddemo

You won’t need the polyfit part of the code, so you can delete that.

Vary the number of data points you generate (ndata) and vary the level of the noise you add (noise).

 

Add some code that computes an “out-of-sample” performance.

 

In one plot, show three curves for the out-of-sample performance.

The first curve corresponds to a noise level of  0.1, the second to 0.3, the third to 1.0.

In each case, evaluate the out-of-sample performance for 10, 30, and 100, 300, and 1000 training data points.

Hand in the plot on a semilogarithmic axis (use semilogx instead of plot): the x-axis corresponds to the number of data points available.

The y-axis corresponds to the performance, measured in squared error normalized by predicting the error you obtain by using the mean of the training set as prediction.

4.    Class

Discussion of backpropagation (Handout Chapter 10 from Kennedy et al).