Abstract
In many applications, there are a large number of predictors, designed manually or trained automatically, to predict the same outcome. Much research has been devoted to the design of algorithms that can effectively select/combine these predictors to generate a more accurate ensemble predictor. The collaborative training algorithms from attribute distributed learning provide batch-processing solutions for scenarios in which the individual predictors are heterogeneous, taking different inputs and employing different models. However, in some applications, for example financial market prediction, it is desirable to use an online approach. In this paper, an innovative online algorithm is proposed, stemming from the collaborative training algorithms developed for attribute distributed learning. It sequentially takes new observations, simultaneously adjusting the way that the individual predictors are combined, and provides feedback to the individual predictors for them to be retrained in order to achieve a better ensemble predictor in real time. The efficacy of this new algorithm is demonstrated by extensive simulations on both artificial and real data, and particularly for financial market data. A trading strategy constructed from the ensemble predictor shows strong performance when applied to financial market prediction.
Similar content being viewed by others
Notes
The reason that we use absolute value of errors is to avoid some extreme values that significantly sway the curve. Financial returns often have heavy tail distributions, and hence can contain large jumps.
References
Chan, P., & Stolfo, S. J. (1993). Experiments in multistrategy learning by meta-learning. In Proc. 2nd international conference on information and knowledge management, Washington, DC (pp. 314–323).
Chan, P., & Stolfo, S. J. (1993). Toward parallel and distributed learning by meta-learning. In Working notes of the AAAI workshop, knowledge discovery in databases, Washington, DC (pp. 227–240).
Frank, B. A., & Asuncion, A. (2010). UCI machine learning repository. http://archive.ics.uci.edu/ml.
Hershberger, D. E., & Kargupta, H. (2001). Distributed multivariate regression using wavelet-based collective data mining. Journal of Parallel and Distributed Computing, 61(3), 372–400.
Kargupta, H., Johnson, E., Sanseverino, E. R., Park, B. H., Silvestre, L. D., & Hershberger, D. (1998). Collective data mining from distributed vertically partitioned feature space. In Proc. workshop on distributed data mining (pp. 70–91). New York: AAAI Press.
Kargupta, H., Park, B., Hershberger, D., & Johnson, E. (1999). Collective data mining: a new perspective toward distributed data mining. In Advances in distributed and parallel knowledge discovery (pp. 133–184). Cambridge: AAAI/MIT Press.
Lakshminarayanan, S. (2005). An integrated stock market forecasting model using neural networks (Technical report).
McConnell, S., & Skillicorn, D. B. (2004). Building predictors from vertically distributed data. In Proc. 2004 conference of the centre for advanced studies on collaborative research (pp. 150–162). Markham: IBM Press.
Merugu, S., & Ghosh, J. (2005). A distributed learning framework for heterogeneous data sources. In Proc. 11th ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, IL, August 2005 (pp. 208–217).
Montana, G., & Parrella, F. (2008). Learning to trade with incremental support vector regression experts. In Proceedings of the 3rd international workshop on hybrid artificial intelligence systems, HAIS’08 (pp. 591–598). Berlin: Springer.
Shutin, D., Zheng, H., Fleury, B. H., Kulkarni, S. R., & Poor, H. V. (2010). Space-alternating attribute-distributed sparse learning. In Proc. 2nd international workshop on cognitive information processing, Elba, Italy, June 2010.
Thomas, J. D. (2003). News and trading rules (Technical report), Carnegie Mellon University.
Tsang, E. P. K., Tsang, E. P. K., Li, J., & Li, J. (2000). Combining ordinal financial predictions with genetic programming. In Programming, proceedings, second international conference on intelligent data engineering and automated learning (pp. 532–537). Hong Kong: Springer.
Zheng, H., Kulkarni, S. R., & Poor, H. V. (2008). Dimensionally distributed learning models and algorithm. In Proc. 11th international conference on information fusion, Cologne, Germany, July 2004 (pp. 1–8).
Zheng, H., Kulkarni, S. R., & Poor, H. V. (2010). Agent selection for regression on attribute distributed data. In Proc. 35th IEEE international conference on acoustics, speech and signal processing, Dallas, TX, March 2010.
Zheng, H., Kulkarni, S. R., & Poor, H. V. (2009). Cooperative training for attribute-distributed data: trade-off between data transmission and performance. In Proc. 12th international conference on information fusion, Seattle, WA, July 2009 (pp. 664–671).
Zheng, H., Kulkarni, S. R., & Poor, H. V. (2011). Attribute-distributed learning: models, limits, and algorithms. IEEE Transactions on Signal Processing, 59(1), 386–398.
Acknowledgements
This research was supported in part by the Center for Science of Information (CSoI), an NSF Science and Technology Center, under grant agreement CCF-0939370, and in part by the Office of Naval Research Grant N00014-12-1-0767.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zheng, H., Kulkarni, S.R. & Poor, H.V. A sequential predictor retraining algorithm and its application to market prediction. Ann Oper Res 208, 209–225 (2013). https://doi.org/10.1007/s10479-013-1396-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-013-1396-2