./Notes/01Description.html

Handout 01 (January 20, 1999)

Data Mining in Finance

NYU/Stern: B20.3355 (IS) and B90.3355 (SOR)

URL: www.stern.nyu.edu/~aweigend/Teaching/DMF

Copyright: Andreas S. Weigend

 

About the course Data Mining in Finance

 

The western world has become a place where data is constantly collected about us and our behavior: the places we visit, the phone calls we make, the goods we buy, the financial transactions we perform. Goldman Sachs keeps track of more than 1 million time series ranging from stocks, bonds, futures and options, to the holidays in Myanmar. The proprietary trading group at Morgan Stanley collects about 10 Gigabyte of data every day. Travelers Group manages about 5 million customer accounts.

 

Such information intensive organizations, in their current transformation from passive collectors to active explorers and exploiters of data, are facing serious challenges of how they can benefit from this increasing access to information and learn about their markets, customers, suppliers, operations and internal business processes.

 

Responding to this challenge, and accompanied with much media hype, new data-driven modeling techniques such as neural networks, genetic programming and chaos theory have emerged in a number of communities in the last years. They typically try to discover previously unknown, valid, comprehensible and potentially useful knowledge from large data sets and hope to apply this knowledge to decision making. They build on as artificial intelligence, statistics, pattern recognition, database management and information theory.

 

Despite this hype (or because of it?), these approaches, here referred to collectively as data mining techniques, are having a major impact in business, science, and society.

 

A significant number of Stern's graduate students have excellent technical, analytical, and computational backgrounds; many of them are expected to use modern data-driven modeling on a daily basis. Offering an under-the-hood view of data mining complements the business and management education and helps them differentiate positively for the job market.

 

I obtained an NYU Curricular Development Challenge Grant to design and deliver a course located in the triangle between data, methods and theory, problems and applications. Grounded in my own research and consulting, it focuses on current problems in finance such as managing risk and building and evaluating trading models.

 

This course is a thoughtful and intellectually rigorous approach addressing the following goals:

(1) to shake down the hype of these maturing data-driven methods and reveal the substance for applications in business and finance, providing a sober, even-handed approach what they various methods can and cannot do;

(2) to highlight the assumptions of different model classes, stressing the critical evaluation and comparison with established methods;

(3) to offers approaches for understanding and interpreting the results.

 

It tries to strike several balances:

(1) processes with results;

(2) questions with answers;

(3) concepts with examples;

(4) theory with skills, and

(5) the exploration of exciting new research with the rigor of the established disciplines, particularly important here, given the cutting-edge technological aspects.

 

It covers the following topics:

• Data snooping

• Data and their representation

• Bayesian learning

• Nonlinear regression (Neural networks)

• Risk and trading

Density estimation: Conditional normal and conditional non-normal

Component Analysis (Principal components an independent components)

Hidden Markov models

• Classification

• Style analysis

• Visualization

The detailed schedule for each session will be published on the web and distributed in class on January 27.

 

A course with such exploratory character needs to draw on a variety of cognitive approaches:

(1) in-class demonstrations of software and paradoxes;

(2) computer assignments;

(3) paper-and-pencil exercises,

(4) a writing assignment to synthesizes the approaches on risk,

(5) the possibility to participate in group projects in conjunction with a major Wall Street firms,

(6) presentation by Wall Street practitioners giving their perspective on data mining in finance, and

(7) a full-day workshop on a Saturday (April 10) that takes interested students through the process of building a prediction based state-of-the art portfolio management system. This full-day workshop is given jointly with Georg Zimmermann, directing the research and development group on data mining for finance at Siemens in Munich.

 

There is no textbook for this course yet. All material is published on the course website, http://www.stern.nyu.edu/~aweigend/Teaching/DMF/S99. I intend to expand the class notes to a book to be published by MIT Press.

 

I would like to thank Ed Melnick, Steve Figlewski, Vasant Dhar (who teaches a more gentle and less technical course Knowledge Systems in Organizations) and Sridar Shesdadri for their encouragement and help in integrating this course into the Stern curriculum.