Nothing Special   »   [go: up one dir, main page]

Skip to main content

An Incremental Document Clustering for the Large Document Database

  • Conference paper
Information Retrieval Technology (AIRS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3689))

Included in the following conference series:

Abstract

With the development of the internet and computer, the amount of information through the internet is increasing rapidly and it is managed in document form. For this reason, the research into the method to manage for a large amount of document in an effective way is necessary. The document clustering is integrated documents to subject by classifying a set of documents through their similarity among them. Accordingly, the document clustering can be used in exploring and searching a document and it can increase accuracy of search. This paper proposes an efficient incremental clustering algorithm for a set of documents increase gradually. The incremental document clustering algorithm assigns a set of new documents to the legacy clusters which have been identified in advance. In addition, to improve the correctness of the clustering, removing the stop words can be proposed and the weight of the word can be calculated by the proposed TF × NIDF function. In this paper, the performance of the proposed method is analyzed by a series of experiments to identify their various characteristics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: SIGIR, pp. 46–54 (1998)

    Google Scholar 

  2. Wong, W.-c., Wai-chee Fu, A.: Incremental Document Clustering for Web Page Classification. In: Proceedings of 2000 International Conference on Information Society in the 21st Century: Emerging Technologies and New Challenges (IS 2000), Aizu-Wakamatsu City, Fukushima, Japan, November 5-8 (2000)

    Google Scholar 

  3. Van Rijsvergen, C.J.: Information Retrieval, 2nd edn. Butterworth, London (1979)

    Google Scholar 

  4. Lam, W., Ho, C.Y.: Using a generalized instance set for automatic text categorization. In: Proceedings of the 21th annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne, Australia, pp. 81–89 (August 1998)

    Google Scholar 

  5. Slattery, S., Craven, M.: Combining statistical and relation methods for learning in hypertext domains. In: Page, D.L. (ed.) ILP 1998. LNCS, vol. 1446. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  6. Lewis, D.D., Schapire, R.E., Callan, J.P., Papka, R.: Training Algorithms for Linear Text Classifiers. In: Proceedings of 19th ACM International Conference on Research and Development in Information Retrieval (1996)

    Google Scholar 

  7. Han, E.-H(S.), Karypis, G., Kumar, V.: Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, p. 53. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  8. Yang, Y.: Expert Network: Effective and efficient learning from human decisions in text categorization and retrieval. In: 17th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 13–22 (1994)

    Google Scholar 

  9. Frakes, B.W., Baeza-Yates, R.: Information Retrieval: Data Structures & Algorithms. Prentice-Hall, Englewood Cliffs (1992)

    Google Scholar 

  10. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  11. Ribert, A., Ennaji, A., Lecourtier, Y.: An Incremental Hierarchical Clustering. In: Vision Interface 1999, Trois-Rivieres, Canada, May 19-21, pp. 586–591 (1999)

    Google Scholar 

  12. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, Chichester (1972)

    Google Scholar 

  13. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data. An Introduction to Cluster Analysis. Wiley, New York (1990)

    Google Scholar 

  14. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Montreal, Canada, pp. 103–144 (June 1996)

    Google Scholar 

  15. Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In: 15th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 318–329 (1992)

    Google Scholar 

  16. Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering and dynamic information retrieval. In: Proceedings of 29th Annual ACM Symposium on the Theory of Computing, El Paso, Texas, USA, pp. 626–635 (May 1997)

    Google Scholar 

  17. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  18. Singhal, A., Buckley, C., Mitra, M.: Pivoted Document Length Normalization. In: Proceedings of 19th ACM International Conference on Research and Development in Information Retrieval (1996)

    Google Scholar 

  19. fisher, D.: Iterative Optimization and Simplification of Hierarchical Clusterings. Journal of Artificial Intelligence Research (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Joo, K.H., Lee, W.S. (2005). An Incremental Document Clustering for the Large Document Database. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_29

Download citation

  • DOI: https://doi.org/10.1007/11562382_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29186-2

  • Online ISBN: 978-3-540-32001-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics