Abstract
Recent studies reveal that associative classification can achieve higher accuracy than traditional approaches. The main drawback of this approach is that it generates a huge number of rules, which makes it difficult to select a subset of rules for accurate classification. In this study, we propose a novel association-based approach especially suitable for text classification. The approach first builds a classifier through a 2-PS (Two-Phase) method. The first phase aims for pruning rules locally, i.e., rules mined within every category are pruned by a sentence-level constraint, and this makes the rules more semantically correlated and less redundant. In the second phase, all the remaining rules are compared and selected with a global view, i.e., training examples from different categories are merged together to evaluate these rules. Moreover, when labeling a new document, the multiple sentence-level appearances of a rule are taken into account. Experimental results on the well-known text corpora show that our method can achieve higher accuracy than many well-known methods. In addition, the performance study shows that our method is quite efficient in comparison with other classification methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Liu, B., Hsu, W., Ma, Y.: Integrating Classification and Association Rule Mining. In: Proc. 4th Int. Conf. Knowledge Discovery and Data Mining, New York, pp. 80–86 (1998)
Li, W., Han, J., Pei, J.: CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules. In: Proc. of the 1st IEEE International Conference on Data Mining, San Jose California, pp. 369–376 (2001)
Liu, B., Ma, Y., Wong, C.-K., Yu, P.S.: Scoring the Data Using Association Rules. Applied Intelligence 18(2), 119–135 (2003)
Meretakis, D., Fragoudis, D., Lu, H., Likothanassis, S.: Scalable Association-based Text Classification. In: Proc. of the 9th ACM International Conference on Information and Knowledge Management, McLean USA, pp. 5–11 (2000)
Antonie, M.-L., Zaiane, O.R.: Text Document Categorization by Term Association. In: Proc. of the IEEE International Conference on Data Mining, Maebashi City, Japan, pp. 19–26 (2002)
Liu, B., Hsu, W., Ma, Y.: Pruning and Summarizing the Discovered Associations. In: Intl. Conf. on Knowledge Discovery and Data Mining, pp. 125–134 (1999)
Janssens, D., Wets, G., Brijs, T., Vanhoof, K.: Integrating Classification and Association Rules by proposing adaptations to the CBA Algorithm. In: Proceedings of the 10th International Conference on Recent Advances in Retailing and Services Science, Portland, Oregon (2003)
Wang, K., Zhou, S., Liew, S.C.: Building Hierarchical Classifiers Using Class Proximity. In: Proceedings of the 25th International Conference on Very Large Data Bases, pp. 363–374 (1999)
Zaki, M.J., Aggarwal, C.C.: XRules: An Effective Structural Classifier for XML Data. In: The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA (2003)
Feng, J., Liu, H., Feng, Y.: Sentential Association Based Text Classification Systems. In: Proceeding of the 7th Asia Pacific Web Conference, Shanghai, China (2005)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison- Wesley, London (1999)
http://www.daviddlewis.com/resources/testcollections/reuters21578/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Qian, T., Wang, Y., Long, H., Feng, J. (2005). 2-PS Based Associative Text Classification. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2005. Lecture Notes in Computer Science, vol 3589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11546849_37
Download citation
DOI: https://doi.org/10.1007/11546849_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28558-8
Online ISBN: 978-3-540-31732-6
eBook Packages: Computer ScienceComputer Science (R0)