Abstract
Recently, categorization methods based on association rules have been given much attention. In general, association classification has the higher accuracy and the better performance. However, the classification accuracy drops rapidly when the distribution of feature words in training set is uneven. Therefore, text categorization algorithm Weighted Association Rules Categorization (WARC) is proposed in this paper. In this method, association rules are used to classify training samples and rule intensity is defined according to the number of misclassified training samples. Each strong rule is multiplied by factor less than 1 to reduce its weight while each weak rule is multiplied by factor more than 1 to increase its weight. The result of research shows that this method can remarkably improve the accuracy of association classification algorithms by regulation of rules weights.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD 1998), NewYork City, NY, August 1998, pp. 80–86 (1998)
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann, San Mateo (1993)
Li, W., Han, J., Pei, J.: CMAR:accurate and efficient classification based on multiple classification rules. San Jose, California, November 29-December 2 (2001)
Zaïane, O.R., Antonie, M.L.: Classifying text documents by associating terms with text categories. In: Proceeding of the Thirteenth Australasian Database Conference (ADC 2002), Melbourne, Australia, January 2002, pp. 215–222 (2002)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceeding of the 1994 International Conference on Vary Large Data Bases, Santiago, chile, pp. 487–499 (1994)
Zhou, S.G., Guan, J.H., Hu, Y.F., Zhou, A.Y.: A Chinese text classification algorithm without lexicon and segmentation. Computer Research and Development 38(7) (2001) (in Chinese)
Freund, Y., Schapire, R.E.: Experiments with a New Boosting Algorithm. In: Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148–157 (1996)
Yang, Y., Lin, X.: A Re-Examination of Text Categorization Methods. In: Proceedings of SIGIR 1999 (1999)
Yang, Y., Pedersen, J.P.: A comparative study on feature selection in text categorization. In: Fisher Jr., D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), Nashville, TN, July 8-12 (1997)
Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1996)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD 2000, Dallas, TX (May 2000)
Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: International Conference on Machine Learning, pp. 170–178 (1997)
Chen, X.Y., Chen, Y., Wang, L., Hu, Y.F.: Text Categorization Based on Association Rules with Term Frequency. In: Proceeding of the 3rd International Conference on Machine Learning and Cybernetics, Shanghai, China, August 26-29, pp. 1610–1615 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, XY., Chen, Y., Li, RL., Hu, YF. (2005). An Improvement of Text Association Classification Using Rules Weights. In: Li, X., Wang, S., Dong, Z.Y. (eds) Advanced Data Mining and Applications. ADMA 2005. Lecture Notes in Computer Science(), vol 3584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527503_43
Download citation
DOI: https://doi.org/10.1007/11527503_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27894-8
Online ISBN: 978-3-540-31877-4
eBook Packages: Computer ScienceComputer Science (R0)