Parallel induction algorithms for data mining

John Darlington¹,
Yi -ke Guo¹,
Janjao Sutiwaraphun¹ &
…
Hing Wing To¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1280))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

735 Accesses
10 Citations

Abstract

In the last decade, there has been an explosive growth in the generation and collection of data. Nonetheless, the quality of information inferred from this voluminous data has not been proportional to its size. One of the reasons for this is that the computational complexities of the algorithms used to extract information from the data are normally proportional to the number of input data items resulting in prohibitive execution time on large data sets. Parallelism is one solution to this problem. In this paper we present preliminary results on experiments in parallelising C4.5, a classification-rule learning system using decision-trees as a model representation, which has been used as a base model for investigating methods for parallelising induction algorithms. The experiments assess the potential for improving the execution time by exploiting parallelism in the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jaturon Chattratichat, John Darlington, Moustafa Ghanem, Yike Guo, Harald Hüning, Martin Köhler, Janjao Sutiwaraphun, Hing Wing To, and Dan Yang. Large scale data mining: The challenges and the solutions. In Third International Conference on Knowledge Discovery and Data Mining, KDD-97. American Association for Artificial Intelligence, 1997 (submitted).
Google Scholar
E. Han, A. Srivastava, and V. Kumar. Parallel formulation of inductive classification learning algorithm. Technical Report 96-040, Department of Computer and Information Sciences, University of Minnesota, 1996.
Google Scholar
S. R. Hedberg. Parallelism speeds data mining. IEEE Parallel and Distributed Technology System and Applications, 3(4):3–6, 1995.
Article Google Scholar
C. J. Merz and P. M. Murphy. UCI repository of machine learning databases. University of California, Department of Information and Computer Science, http://www.ics.uci.edu/-mlearn/MLRepository.html, 1996.
Google Scholar
J. R. Quinlan. C4.5 Programs for Machine Learning. Morgan Kaufmann Publishers, Inc, 1993.
Google Scholar
Janjao Sutiwaraphun. Data mining on parallel machines. MSc thesis, Department of Computing, Imperial College, September 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, Imperial College, SW7 2BZ, London, UK
John Darlington, Yi -ke Guo, Janjao Sutiwaraphun & Hing Wing To

Authors

John Darlington
View author publications
You can also search for this author in PubMed Google Scholar
Yi -ke Guo
View author publications
You can also search for this author in PubMed Google Scholar
Janjao Sutiwaraphun
View author publications
You can also search for this author in PubMed Google Scholar
Hing Wing To
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Xiaohui Liu Paul Cohen Michael Berthold

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Darlington, J., Guo, Y.k., Sutiwaraphun, J., To, H.W. (1997). Parallel induction algorithms for data mining. In: Liu, X., Cohen, P., Berthold, M. (eds) Advances in Intelligent Data Analysis Reasoning about Data. IDA 1997. Lecture Notes in Computer Science, vol 1280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0052860

Download citation

DOI: https://doi.org/10.1007/BFb0052860
Published: 19 May 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63346-4
Online ISBN: 978-3-540-69520-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics