Parallel Formulations of Decision-Tree Classification Algorithms

Anurag Srivastava¹,
Eui-Hong Han²,
Vipin Kumar² &
…
Vineet Singh³

474 Accesses
68 Citations
Explore all metrics

Abstract

Classification decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud detection, etc. Highly parallel algorithms for constructing classification decision trees are desirable for dealing with large data sets in reasonable amount of time. Algorithms for building classification decision trees have a natural concurrency, but are difficult to parallelize due to the inherent dynamic nature of the computation. In this paper, we present parallel formulations of classification decision tree learning algorithm based on induction. We describe two basic parallel formulations. One is based on Synchronous Tree Construction Approach and the other is based on Partitioned Tree Construction Approach. We discuss the advantages and disadvantages of using these methods and propose a hybrid method that employs the good features of these methods. We also provide the analysis of the cost of computation and communication of the proposed hybrid method. Moreover, experimental results on an IBM SP-2 demonstrate excellent speedups and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Time Complexity Analysis to the ParDTLT Parallel Algorithm for Decision Tree Induction

A Parallel Approach for Decision Trees Learning from Big Data Streams

Parallel Prediction Algorithms for Heterogeneous Data: A Case Study with Real-Time Big Datasets

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Agrawal, R., Imielinski, T., and Swami, A. 1993. Database mining: A performance perspective. IEEE Transactions on Knowledge and Data Eng., 5(6):914–925.
Article Google Scholar
Alsabti, K., Ranka, S., and Singh, V. 1997. A one-pass algorithm for accurately estimating quantiles for disk-resident data. Proc. of the 23rd VLDB Conference.
Alsabti, K., Ranka, S., and Singh, V. 1998. CLOUDS: Classification for large or out-of-core datasets. http://www.cise.uft.edu/~ranka/dm.html.
Breiman, L., Friedman, J., Olshen, R., and Stone, C. 1984. Classification and Regression Trees. Monterrey, CA: Wadsworth.
Google Scholar
Catlett, J. 1991. Megainduction: machine learning on very large databases. PhD thesis, University of Sydney.
Chan, Philip K. and Stolfo, Salvatore J. 1993a. Experiments on multistrategy learning by metalearning. Proc. Second Intl. Conference on Info. and Knowledge Mgmt, pp. 314–323.
Chan, Philip K. and Stolfo, Salvatore J. 1993b. Metalearning for multistrategy learning and parallel learning. Proc. Second Intl. Conference on Multistrategy Learning, pp. 150–165.
Chattratichat, J., Darlington, J., Ghanem, M., Guo, Y., Huning, H., Kohler, M., Sutiwaraphun, J., To, H.W., and Yang, D. Large scale data mining: Challenges and responses. Proc. of the Third Int'l Conference on Knowledge Discovery and Data Mining.
Goil, S., Aluru, S., and Ranka, S. 1996. Concatenated parallelism: A technique for efficient parallel divide and conquer. Proc. of the Symposium of Parallel and Distributed Computing (SPDP'96).
Goldberg, D.E. 1989. Genetic Algorithms in Search, Optimizations and Machine Learning. Morgan-Kaufman.
Hong, S.J. 1997. Use of contextual information for feature ranking and discretization. IEEE Transactions on Knowledge and Data Eng., 9(5):718–730.
Article Google Scholar
Joshi, M.V., Karypis, G., and Kumar, V., 1998. ScalParC: A new scalable and efficient parallel classification algorithm for mining large datasets. Proc. of the International Parallel Processing Symposium.
George Karypis and Vipin Kumar. 1994. Unstructured tree search on simd parallel computers. Journal of Parallel and Distributed Computing, 22(3):379–391.
Article Google Scholar
Kufrin, R. 1997. Decision trees on parallel processors. In Parallel Processing for Artificial Intelligence 3. J. Geller, H. Kitano, and C.B. Suttner (Ed.). Elsevier Science.
Vipin Kumar, Ananth Grama, Anshul Gupta, and George Karypis. 1994. Introduction to Parallel Computing: Algorithm Design and Analysis. Redwod City: Benjamin Cummings/Addison Wesley.
Lippmann, R. 1987. An introduction to computing with neural nets. IEEE ASSP Magazine, 4(22).
Mehta, M., Agrawal, R., and Rissaneh, J. 1996. SLIQ: A fast scalable classifier for data mining. Proc. of the Fifth Int'l Conference on Extending Database Technology. Avignon. France.
Pearson, R.A. 1994. A coarse grained parallel induction heuristic. In Parallel Processing for Artificial Intelligence 2, H. Kitano, V. Kumar, and C.B. Suttner (Ed.). Elsevier Science, pp. 207–226.
Ross Quinlan, J. 1993. C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Shafer, J., Agrawal, R., and Mehta, M. 1996. SPRINT: A scalable parallel classifier for data mining. Proc. of the 22nd VLDB Conference.
Shankar, R., Alsabti, K., and Ranka, S. 1995. Many-to-many communication with bounded traffic. Frontiers '95, the Fifth Symposium on Advances in Massively Parallel Computation. McLean, VA.
Spiegelhalter, D.J., Michie, D., and Taylor, C.C. 1994. Machine Learning, Neural and Statistical Classification. Ellis Horwood.
Anurag Srivastava, Vineet Singh, Eui-Hong Han, and Vipin Kumar. 1997. An efficient, scalable, parallel classifier for data mining. Technical Report TR-97-010,http://www.cs.umn.edu/~kumar, Department of Computer Science, University of Minnesota, Minneapolis.
Google Scholar
Wirth, J. and Catlett, J. 1988. Experiments on the costs and benefits of windowing in ID3. 5th Int'l Conference on Machine learning.

Download references

Author information

Authors and Affiliations

Digital Impact, USA
Anurag Srivastava
Department of Computer Science & Engineering, Army HPC Research Center, University of Minnesota, USA
Eui-Hong Han & Vipin Kumar
Information Technology Lab, Hitachi America, Ltd, USA
Vineet Singh

Authors

Anurag Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Eui-Hong Han
View author publications
You can also search for this author in PubMed Google Scholar
Vipin Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Vineet Singh
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Srivastava, A., Han, EH., Kumar, V. et al. Parallel Formulations of Decision-Tree Classification Algorithms. Data Mining and Knowledge Discovery 3, 237–261 (1999). https://doi.org/10.1023/A:1009832825273

Download citation

Issue Date: September 1999
DOI: https://doi.org/10.1023/A:1009832825273

Parallel Formulations of Decision-Tree Classification Algorithms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Time Complexity Analysis to the ParDTLT Parallel Algorithm for Decision Tree Induction

A Parallel Approach for Decision Trees Learning from Big Data Streams

Parallel Prediction Algorithms for Heterogeneous Data: A Case Study with Real-Time Big Datasets

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Navigation

Parallel Formulations of Decision-Tree Classification Algorithms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Time Complexity Analysis to the ParDTLT Parallel Algorithm for Decision Tree Induction

A Parallel Approach for Decision Trees Learning from Big Data Streams

Parallel Prediction Algorithms for Heterogeneous Data: A Case Study with Real-Time Big Datasets

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now

Search

Navigation