Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1808036.1808087guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Improved C4.5 algorithm for rule based classification

Published: 20 February 2010 Publication History

Abstract

C4.5 is one of the most popular algorithms for rule base classification. There are many empirical features in this algorithm such as continuous number categorization, missing value handling, etc. However in many cases it takes more processing time and provides less accuracy rate for correctly classified instances. On the other hand, a large dataset might contain hundreds of attributes. We need to choose most related attributes among them to perform higher accuracy using C4.5. It is also a difficult task to choose a proper algorithm to perform efficient and perfect classification. With our proposed method, we select the most relevant attributes from a dataset by reducing input space and simultaneously improve the performance of this algorithm. The improved performance is measured based on better accuracy and less computational complexity. We measure Entropy of Information Theory to identify the central attribute for a dataset. Then apply correlation coefficient measure namely, Pearson's, Spearman, Kendall correlation utilizing the central attribute of the same dataset. We conduct a comparative study using these three most popular correlation coefficient measures to choose the best method on eight well known data mining problem from UCI (University of California Irvine) data repository. We use box plot to compare experimental results. Our proposed method shows better performance in most of the individual experiment.

References

[1]
J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Mateo, CA, 1993.
[2]
J.R. Quinlan, Induction of Decision Trees, Machine Learning, 1986, pp.81-106.
[3]
A. B. M. S. Ali and S. A. Wasimi, Data Mining: Methods and Techniques, Thomson Publishers, Victoria, Australia, 2007.
[4]
A. B. M. S. Ali and K. A. Smith, On learning algorithm for classification, Applied Soft Computing, Dec 2004. pp. 119-138.
[5]
J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publish, 2001.
[6]
M. Singh, How to Handle Missing Values, Articlebase, viewed on Oct 2009, at http://www.articlesbase.com/information-technology-articles/how-to-handle-missing-values-538449.html#.
[7]
C. Blake and C.J. Merz, UCI Repository of machine learning databases, University of California Irvine, 2007. - Feb 2008. - http://archive.ics.uci.edu/ml/.
[8]
I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tool and Technique with Java Implementation, San Francisco: Morgan Kaufmann, 2000.
[9]
C. E. Shannon, A Mathematical Theory of Communication, The Bell System Technical Journal, 30:50-64, January 1948.
[10]
Matlab, Statistics Toolbox User's Guide, The Math Works Inc, USA. 2008. Version 6.2.
[11]
J. W. Tukey, Exploratory Data Analysis, Addison-Wesley Publishing Company, 1977.
[12]
M.M. Mazid, A.B.M. S. Ali, and K.S. Tickle, A Comparison Between Rule Based and Association Rule Mining Algorithms, In Proceedings of the IDSS-NDS conference, Gold Coast, Australia, Oct. 2009.
[13]
M. M. Mazid, S. Ali, and K.S. Tickle, 2008. Finding an unique Association Rule Mining Algorithm based on data characteristics, In Proceedings of the IEEE/ICECE, Dec 2008, Dhaka, Bangladesh.
[14]
K. Polat and S. Güne, A novel hybrid intelligent method based on C4. 5 decision tree classifier and one-against-all approach for multi-class classification problems, Expert Systems with Applications, vol. 36, 2009, pp. 1587-1592.
[15]
S. Jiang and W. Yu, A Combination Classification Algorithm Based on Outlier Detection and C4. 5, Springer Publications, 2009.
[16]
W. W. Cohen, Fast effective rule induction, In Proceedings of the Twelfth International Conference on Machine Learning Chambery, France., 1993, pp. 115-123.
[17]
M. Yu and T. H. Ai, Study of RS data classification based on rough sets and C4. 5 algorithm, In Proceedings of the Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, 2009.
[18]
X. Y. Yang, Decision tree induction with constrained number of leaf node, Masters Thesis, National Central University, Taiwan, 2009.
[19]
K. Pearson, Notes on the history of correlation, Biometrika, 1920, vol. 13, pp. 25-45.
[20]
C. Spearman, The proof and measurement of association between two things, The American journal of psychology, 1904, pp. 72-101.
[21]
M.G. Kendall, Rank Correlation Methods, Hafner Publishing Co, New York, 1955.

Cited By

View all
  • (2016)An Effective Framework with N-Client Transfer Dataset for Weather Prediction Using Data Mining TechniquesProceedings of the International Conference on Informatics and Analytics10.1145/2980258.2982116(1-6)Online publication date: 25-Aug-2016
  • (2016)Aid decision algorithms to estimate the risk in congenital heart surgeryComputer Methods and Programs in Biomedicine10.1016/j.cmpb.2015.12.021126:C(118-127)Online publication date: 1-Apr-2016
  • (2016)Combining RDR-based machine learning approach and human expert knowledge for phishing predictionProceedings of the 14th Pacific Rim International Conference on Trends in Artificial Intelligence10.1007/978-3-319-42911-3_7(80-92)Online publication date: 22-Aug-2016
  • Show More Cited By

Index Terms

  1. Improved C4.5 algorithm for rule based classification
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      AIKED'10: Proceedings of the 9th WSEAS international conference on Artificial intelligence, knowledge engineering and data bases
      February 2010
      394 pages
      ISBN:9789604741540

      Publisher

      World Scientific and Engineering Academy and Society (WSEAS)

      Stevens Point, Wisconsin, United States

      Publication History

      Published: 20 February 2010

      Author Tags

      1. C4.5
      2. Kendall correlation
      3. Pearson's correlation
      4. Spearman correlation
      5. entropy

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 30 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2016)An Effective Framework with N-Client Transfer Dataset for Weather Prediction Using Data Mining TechniquesProceedings of the International Conference on Informatics and Analytics10.1145/2980258.2982116(1-6)Online publication date: 25-Aug-2016
      • (2016)Aid decision algorithms to estimate the risk in congenital heart surgeryComputer Methods and Programs in Biomedicine10.1016/j.cmpb.2015.12.021126:C(118-127)Online publication date: 1-Apr-2016
      • (2016)Combining RDR-based machine learning approach and human expert knowledge for phishing predictionProceedings of the 14th Pacific Rim International Conference on Trends in Artificial Intelligence10.1007/978-3-319-42911-3_7(80-92)Online publication date: 22-Aug-2016
      • (2012)Predicting Friends and Foes in Signed Networks Using Inductive Inference and Social Balance TheoryProceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)10.1109/ASONAM.2012.69(384-388)Online publication date: 26-Aug-2012

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media