Abstract
We present an application of inductive concept learning and interactive visualization techniques to a large-scale commercial data mining project. This paper focuses on design and configuration of high-level optimization systems (wrappers) for relevance determination and constructive induction, and on integrating these wrappers with elicited knowledge on attribute relevance and synthesis. In particular, we discuss decision support issues for the application (cost prediction for automobile insurance markets in several states) and report experiments using D2K, a Java-based visual programming system for data mining and information visualization, and several commercial and research tools. We describe exploratory clustering, descriptive statistics, and supervised decision tree learning in this application, focusing on a parallel genetic algorithm (GA) system, Jenesis, which is used to implement relevance determination (attribute subset selection). Deployed on several high-performance network-of-workstation systems (Beowulf clusters), Jenesis achieves a linear speedup, due to a high degree of task parallelism. Its test set accuracy is significantly higher than that of decision tree inducers alone and is comparable to that of the best extant search-space based wrappers.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aha, D., Kibler, D., and Albert, M. 1991. Instance-based learning algorithms. Machine Learning, 6:37–66.
Auvil, L., Redman, T., Tcheng, D., and Welge, M. 1999. Data to Knowledge (D2K): A Rapid Application Development Environment for Knowledge Discovery. NCSA Technical Report, URL: http://archive.ncsa.uiuc.edu/STI/ALG/d2k.
Benjamin, D.P. (Ed.) 1990. Change of Representation and Inductive Bias. Boston: Kluwer Academic Publishers.
Brooks, F.P., Jr. 1995. The Mythical Man-Month: Essays on Software Engineering, 20th Anniversary Edition. Reading, MA: Addison-Wesley.
Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., and Freeman, D. 1988. AUTOCLASS: A bayesian classification system. In Proceedings of the Fifth International Conference on Machine Learning (ICML-88), pp. 54–64.
Cherkauer, K.J. and Shavlik, J.W. 1996. Growing simpler decision trees to facilitiate knowledge discovery. In Proceedings of the Second International Conference of Knowledge Discovery and Data Mining (KDD-96): Portland, OR.
Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(Series B):1–38.
Donoho, S.K. 1996. Knowledge-Guided Constructive Induction. PhD Thesis, University of Illinois at Urbana-Champaign (Technical Report UIUC-DCS-R1970).
Dejong, K.A., Spears, W.M., and Gordon, D.F. 1993. Using genetic algorithms for concept learning. Machine Learning, 13:161–188.
Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. 1996. Knowledge Discovery and Data Mining: Towards a Unifying Framework. In Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P., editors, Advances in Knowledge Discovery Data Mining, pp. 82–88. Cambridge, MA: MIT Press.
Gersho, A. and Gray, R.M. 1992. Vector Quantization and Signal Compression. Norwell, MA: Kluwer Academic Publishers.
Goldberg, D.E. 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, MA: Addison-Wesley.
Grefenstette, J.J. 1990. Genesis Genetic Algorithm Package.
Haykin, S. 1999. Neural Networks: A Comprehensive Foundation, 2nd edn. Englewood Cliffs, NJ: Prentice Hall.
Hsu, W.H., Welge, M., Wu, J., and Yang, T. 1999. Genetic algorithms for selection and partitioning of attributes in large-scale data mining problems. In Proceedings of the Joint AAAI-GECCO Workshop on Data Mining and Evolutionary Algorithms, Orlando, FL.
Hsu, W.H. 1998. Time series learning with probabilistic network composites. PhD Thesis, University of Illinois at Urbana-Champaign (Technical Report UIUC-DCS-R2063).
Hsu, W.H., Ray, S.R., and Wilkins, D.C. 2000. A Multistrategy Approach to Classifier Learning from Time Series. Machine Learning, 38(1-2):213–236. Norwell, MA: Kluwer Academic Publishers.
Hsu, W. and Welge, M. to appear. Activities of the Prognostics Working Group. NCSA Technical Report.
John, G., Kohavi, R., and Pfleger, K. 1994. Irrelevant features and the subset selection problem. In Proceedings of the 11th International Conference on Machine Learning, New Brunswick, NJ. Morgan-Kaufmann, Los Altos, CA, pp. 121–129.
Jonske, J. 1999. Personal communication. Unpublished.
Kohavi, R., Becker, B., and Sommerfield, D. 1997. Improving simple bayes. Presented at the European Conference on Machine Learning (ECML-97).
Kohonen, T., Hynninen, J., Kangas, J., and Laaksonen, J. 1996. SOM-PAK: The Self-Organizing Map Program Package. Technical Report A31, Helsinki University of Technology, Laboratory of Computer and Information Science, FIN-02150 Espoo, Finland.
Kohavi, R. and John, G.H. 1997. Wrappers for feature subset selection. Artificial Intelligence, Special Issue on Relevance, 97(1/2):273–324.
Kohonen, T. 1990. The self-organizing map. Proceedings of the IEEE, 78:1464–1480.
Koza, J. 1992. Genetic Programming: On the Programming of Computers by Natural Selection. Cambridge, MA: MIT Press.
Kononenko, I. 1994. Estimating attributes: Analysis and extensions of relief. In Proceedings of the European Conference on Machine Learning, F. Bergadano and L. De Raedt (Eds.).
Kohavi, R. 1995. Wrappers for Performance Enhancement and Oblivious Decision Graphs. PhD Thesis, Department of Computer Science, Stanford University.
Kohavi, R. 1998. MineSet v2.6, Silicon Graphics Incorporated, CA.
Kira, K. and Rendell, L.A. 1992. The feature selection problem: Traditional methods and a new algorithm. In Proceedings of the National Conference on Artificial Intelligence (AAAI-92), San Jose, CA. Cambridge, MA: MIT Press, pp. 129–134.
Krishnamurthy, B. (Ed.) 1995. Practical Reusable UNIX Software. New York: John Wiley and Sons.
Kohavi, R. and Sommerfield, D. 1996. MLC++: Machine Learning Library in C++, Utilities v2.0. URL: http://www.sgi.com/Technology/mlc.
Mitchell, T.M. 1997. Machine Learning. New York, NY: McGraw-Hill.
Neal, R.M. 1996. Bayesian Learning for Neural Networks. New York, NY: Springer-Verlag.
Princip, J. and Lefebvre, C. 1998. NeuroSolutions v3.02, NeuroDimension, Gainesville, FL. URL: http://www.nd.com.
Porter, J. 1998. Personal communication. Unpublished.
Quinlan, J.R. 1985. Induction of decision trees. Machine Learning, 1:81–106.
Quinlan, J.R. 1990. Learning logical definitions from relations. Machine Learning, 5(3):239–266.
Russell, S. and Norvig, P. 1995. Artificial Intelligence: A Modern Approach. Englewood Cliffs, NJ: Prentice Hall.
Raymer, M., Punch, W., Goodman, E., Sanschagrin, P., and Kuhn, L. 1997. Simultaneous feature extraction and selection using a masking genetic algorithm. In Proceedings of the 7th International Conference on Genetic Algorithms, San Francisco, CA, pp. 561–567.
Sarle, W.S. (Ed.). Neural Network FAQ, periodic posting to the Usenet newsgroup comp.ai.neural-nets, URL: ftp://ftp.sas.com/pub/neural/FAQ.html
Sterling, T.L., Salmon, J., Becker, D.J., and Savarese, D.F. 1999. How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters. Cambridge, MA: MIT Press.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hsu, W.H., Welge, M., Redman, T. et al. High-Performance Commercial Data Mining: A Multistrategy Machine Learning Application. Data Mining and Knowledge Discovery 6, 361–391 (2002). https://doi.org/10.1023/A:1016352221465
Issue Date:
DOI: https://doi.org/10.1023/A:1016352221465