MLSP 2008 Data Analysis Competition

Competition Updates

Last updated Sept. 24, 2008.

Results: The deadline has passed and the competition is now over. The winning algorithm this year was developed by a collaboration of researchers from Rensselaer Polytechnic Institute and Shanghai Maritime University, which consists of Qingsong Peng and Qiang Ji. Complete results can be found in the paper entitled, "The Fourth Annual 2008 MLSP Competition," which will be published in the conference proceedings. We would like to thank all those who participated and we look forward to next year's competition.
New deadline: The deadline for the competition has been delayed a second time. The new deadline is May 30.
Version Number: Newer Matlab versions (and newer toolbox versions) are not always backwards compatible. Hence, an entrant's code may not initially work on the testing system. Testing will be performed using Matlab 2007b. In an effort to resolve any problems resulting from version incompatibility the competition chairs will test any code that is sent to us starting one week prior to the deadline (the code does not need to be the final version before the deadline).
Toolboxes: The testing system has the following toolboxes: Control System, Curve Fitting, Fuzzy Logic, System Identification, Image Processing, Neural Network, Optimization, Statistics, Symbolic Math, and Wavelet. These toolboxes are available; however, the competition chairs recommend that their use is kept to a minimum to avoid possible version compatibility issues and issues arising from method overloading.

Stock Market Prediction

Submission Deadline May 30, 2008

Goal: The goal is to maximize profits during six months of trading [previously this section erroneously stated that testing will occur over “one month of training"]. The researcher/research group producing the largest profit will be deemed the winner of the competition.
Training Data: The data may be downloaded from mlsp08TrainingData.mat. The training data, which is in Matlab format, consist of daily stock prices and monthly indices dating from Sept. 29, 2003 until Sept. 28, 2007. Stock data are given for a total of 2929 stocks. The value reported for each stock is the value at the moment the market closed. The volume of shares puchased/sold is also provided for each stock on a daily basis. The index data are given for a total of 41 indices (index data was obtained from Each variable in the data file is defined as follows,
closevalDaily closing price of the 2929 stocks
volumeDaily volume of the 2929 stocks
datevalDate associated with each column of "closeval" and "volume" using the "YYYYMMDD" format
tickerTicker associated with each stock
stocknameName of each stock
indexMonthly value of the 41 indices
monthvalDate associated with each column of "index" using the "YYYYMM00" format (in all cases the day is arbitrarily set to 00)
indexnameName of each index
Note: the "closeval" and "volume" matrices include numerous "NaN" entries. These entries correspond to invalid dates, e.g., before a stock enters the market or after it has exited the market.
Eligibility: Anyone that has an interest in machine learning and that has access to Matlab.
Registration: Registration is not required. However, if you wish to receive important updates on the competition then please send an email to Kenneth Hild.
Testing Methodology: Stock and index data for dates that differ from the training data will be used to compare the different entries. Each entry will start with $100,000 (fictitious US dollars), which can be used to purchase/sell stocks. Once the submission deadline has passed each entry (in the form of Matlab code) will be presented the price of 2929 stocks and 41 indices for 62 contiguous trading days (approximately 3 months). The Matlab code from each entry is expected to output the list of stocks to purchase/sell and the associated number of shares of each stock. This process will be repeated for 125 consecutive trading days (approximately 6 months). On the last day of trading all remaining securities will be liquidated (and all shorts covered) at current market value.

Only market orders are allowed, as opposed to limit orders, conditional orders, puts, and futures. Both (integer-valued) short and long positions are acceptable. The uncommitted cash reserves (total cash reserves minus a transaction fee minus the sum of all short values; see example below) and the price of the stock on the specified day will be used to determine the maximum number of stocks that may be purchased/shorted/covered (as opposed to sold; the maximum number of stocks that may be sold is limited by the number of shares currently owned). No calls will be enforced nor will any credit be extended (aside from the implicit credit that may occur after a stock is shorted when the price of that stock rises). A stock may not be sold and shorted in a single transaction. Likewise, covering a short and making a purchase requires two transactions. Orders to purchase/sell/cover/short a stock that does not exist on the specified date and purchase/short/cover orders that exceed an entrant's current uncommitted cash reserves will be charged a transaction fee, but will otherwise be ignored. The entrants are charged $10 for each (successful or attempted) purchase/sale/cover/short transaction.

The code used to test the entries may be downloaded from mlsp08Test.m and updateVariables.m. The function myFunction.m is to be supplied by each entrant. The actual test data used for the competition will not be made available to entrants. However, a sample function and sample data are provided below for debugging purposes. The variables used in these programs are defined as follows,
stockPricesDaily stock price of the 2929 stocks (the current market value appears in the final column)
volumesDaily volume of the 2929 stocks
indicesMonthly value of the 41 indices
cashReservesCurrent (committed plus uncommitted) cash reserves
positionsThe (integer) number of shares owned (positive values correspond to long positions and negative values correspond to short positions)
shortValueTotal value of each shorted stock
stockIndex(M x 1) vector that denotes (using integer indices) each stock for the M desired transactions
shares(M x 1) vector containing the (integer) number of shares to purchase/cover (positive values) or sell/short (negative values) for the M desired transactions
userDefinedAn optional (single) variable (we suggest using a "struct" variable) that is entirely user-defined and may be used to pass previous decisions to the user supplied function, myFunction.m (which is responsible for properly initializing "userDefined" since it will initially be empty)
Submission: A successful submission consists of (1) a list of names of the researchers involved, (2) the name(s) of the host institutions of the researchers, (3) a 1-3 paragraph description of the approach used, and (4) Matlab code, myFunction.m, which should function correctly when called by the testing code (found above), should not read from or write to any drive, should not try to access the internet, it must finish running in a reasonable time (on the order of minutes), and must consist of only regular uncompiled Matlab code (P-code, mex files, and compiled code are, e.g., forbidden).
Note: we recommend testing the code prior to submission using the sample testdata.mat file found below.
Deadline: Submissions must be emailed to Kenneth Hild on or before May 30, 2008. Send email to Kenneth Hild.
Publication: The first publication in the MLSP 2008 proceedings, which will be written by the competition chairs, will include a description of the competition, a brief description of the methodology used by each entrant, and the final results. The technical committee may, at its discretion, invite several entrants to write papers about the approach that they used in the competition. These papers will be published in a special issue related to the conference proceedings (although, it is not currently known if there will be a special issue). The criteria used for selecting entrants include, but is not limited to, novelty (in terms of machine learning methodology) and performance.
Sample code: Sample code may be downloaded from myFunction.m (this code is not based on machine learning principles; it is included only to help potential entrants become acquainted with the testing procedure). Sample data may be downloaded from testdata.mat (this data file is generated using "stockPrices = closeval(:,end-185:end); volumes = volume(:,end-185:end); indices = index(:,end-8:end-1);"). To use the sample code, download all 3 M-files (mlsp08Test.m, updateVariables.m, and myFunction.m) and the data file (testdata.m), place them in the current directory, and run "[cashReserves,totalM,totalInvalid] = mlsp08Test;" from the Matlab command line.
Example: The following trivial example is included to help explain the variables and the testing procedure. Suppose that there only 5 total stocks. For this example we will assume that they are IBM, BAC, JNJ, NWA, MSFT, GOOG, and MMM. On Day n0 we have,
stockPrices(:,n0+123) = [103.31 31.23 81.09 22.72 35.21 508.76 77.12]';
cashReserves = 100000;
positions = [0 0 0 0 0 0 0]';
shortValue = [0 0 0 0 0 0 0]';
where "stockPrices" is defined in mlsp08Test.m (as opposed to myFunction.m). Based on this information we decide to short 210 shares of IBM, purchase 1600 shares of GOOG, purchase 367 shares of BAC, and purchase 592 shares of MSFT. This is described by,
stockIndex = [1 6 2 5]';
shares = [-210 1600 367 592]';
After Transaction #1 we have,
cashReserves = 100000 - 10 = 99990;
positions = [-210 0 0 0 0 0 0]';
shortValue = [21695.10 0 0 0 0 0 0]';
After Transaction #2 we have,
cashReserves = 99990 - 10 = 99980;
positions = [-210 0 0 0 0 0 0]';
shortValue = [21695.10 0 0 0 0 0 0]';
Notice that this transaction fails due to lack of uncommitted cash reserves, which equals 99990 - 10 - 21695.10 = 78284.90.
After Transaction #3 we have,
cashReserves = 99980 - 10 - 11461.41 = 88508.59;
positions = [-210 367 0 0 0 0 0]';
shortValue = [21695.10 0 0 0 0 0 0]';
After Transaction #4 we have,
cashReserves = 88508.59 - 10 - 20844.32 = 67654.27;
positions = [-210 367 0 0 592 0 0]';
shortValue = [21695.10 0 0 0 0 0 0]';
On Day n0+1 we have,
stockPrices(:,n0+1+123) = [108.29 32.33 83.05 21.47 32.52 504.71 77.18]';
We decide to sell/short 400 shares of BAC, cover 100 shares of IBM, and sell 592 shares of MSFT. This is described by,
stockIndex = [2 1 5]';
shares = [-400 100 -592]';
After Transaction #5 we have,
cashReserves = 67654.27 - 10 = 67644.27;
positions = [-210 367 0 0 592 0 0]';
shortValue = [21695.10 0 0 0 0 0 0]';
Notice that this transaction fails since we (1) own at least 1 share of BAC and (2) we do not own at least 400 shares of BAC. Recall that sells and shorts must be performed in 2 separate transacations, i.e. stockIndex = [2 2]'; shares = [-367 -33]';.
After Transaction #6 we have,
cashReserves = 67644.27 - 10 + (100/210)*21695.10 - 100*108.29 = 67136.27;
positions = [-110 367 0 0 592 0 0]';
shortValue = [11364.10 0 0 0 0 0 0]';
Notice that cash reserves did not increase for the shorted shares until they were covered. Also recall that the number of shares that may be shorted is limited to the current uncommitted cash reserves. These 2 rules are enforced in order to prevent unbounded amounts of zero-interest loans. The first entry of the updated "shortValue" variable represents the value remaining after the specified number of shares are covered, i.e., 21695.10*(1 - 100/210) = 11364.10.
After Transaction #7 we have,
cashReserves = 67136.27 - 10 + 592*32.52 = 86378.11;
positions = [-110 367 0 0 0 0 0]';
shortValue = [11364.10 0 0 0 0 0 0]';
If this were the final day the remaining IBM shares would be automatically covered at market price and the remaining BAC shares would be automatically sold at market price.

Resources: [1] Kjersti Aas and Xeni K. Dimakos, "Statistical modelling of financial time series: An introduction," Technical report SAMBA/08/04, Norwegian Computing Center Applied Research and Development, 8 March, 2004 (full article).
Aas and Dimakos state that it is more convenient to use arithmetic and geometric returns, as opposed to stock prices. Daily returns are defined by, r(n) = (p(n)-p(n-1))/p(n-1), where p(n) is the stock price at time n.
[2] Zubin Jelveh, "How a Computer Knows What Many Managers Don't," New York Times, July 9, 2006 (full article).
"IF movies like "2001: A Space Odyssey" and "The Matrix" are any indication, humans are not comfortable with the idea of artificial intelligence controlling their fate. So why ever trust a computer model to run your investments? Because, in the real world, it seems to pay off. Many mutual funds that make their trades based on the recommendations of a proprietary computer model, known as quantitative or quant funds, have outperformed their benchmarks in the last three years. And investors have noticed..."
[3] Wikipedia description of a market order,
[4] Wikipedia description of short selling,
"The term 'short selling' ... [is used to denote] all those strategies which allow an investor to gain from the decline in price of a security. ... Short selling has been a target of ire since at least the eighteenth century when England banned it outright. It was perceived as a magnifying effect in the violent downturn in the Dutch tulip market in the seventeenth century."
[5] John Moody and Matthew Saffell, "Learning to Trade via Direct Reinforcement," IEEE Trans. on Neural Networks, Vol. 12, No. 4, pp. 875-889, July, 2001.
[6] M.A.H. Dempster, Tom W. Payne, Yazann Romahi, and G.W.P. Thompson, "Computational Learning Techniques for Intraday FX Trading Using Popular Technical Indicators," IEEE Trans. on Neural Networks, Vol. 12, No. 4, pp. 744-754, July, 2001.
[7] Ramazan Gencay and Min Qi, "Pricing and Hedging Derivative Securities with Neural Networks: Bayesian Regularization, Early Stopping, and Bagging," IEEE Trans. on Neural Networks, Vol. 12, No. 4, pp. 726-734, July, 2001.
[8] Halbert White and Jeffrey Racine, "Statistical Inference, The Bootstrap, and Neural-Network Modeling with Application to Foreign Exchange Rates," IEEE Trans. on Neural Networks, Vol. 12, No. 4, pp. 657-673, July, 2001.
[9] Andrew D. Back, Andreas S. Weigend, "A First Application of Independent Component Analysis to Extracting Structure from Stock Returns," International Journal of Neural Systems, Vol. 8, No. 4, pp. 473-84, August, 1997.