Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2645791.2645812acmotherconferencesArticle/Chapter ViewAbstractPublication PagespciConference Proceedingsconference-collections
research-article

Clustering Documents using the 3-Gram Graph Representation Model

Published: 02 October 2014 Publication History

Abstract

In this paper we illustrate an innovative clustering method of documents using the 3-Gram graphs representation model and deducing the problem of document clustering to graph partitioning. For the latter we employ the kernel k-means algorithm. We evaluated the proposed method using the Test Collections of Reuters-21578, and compared the results using the Latent Dirichlet Allocation (LDA) Algorithm. The results are encouraging demonstrating that the 3-Gram graph method has much better Recall and F1 score but worse Precision. Further changes that will further improve the results are identified.

References

[1]
Aisopos, F. et al. 2012. Content vs. Context for Sentiment Analysis: A Comparative Analysis over Microblogs. Proceedings of the 23rd ACM Conference on Hypertext and Social Media (New York, NY, USA, 2012), 187--196.
[2]
Amigó, E. et al. 2009. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval. 12, 4 (Aug. 2009), 461--486.
[3]
Blei, D.M. et al. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, (Mar. 2003), 993--1022.
[4]
Dhillon, I. et al. 2005. A Fast Kernel-based Multilevel Algorithm for Graph Clustering. Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (New York, NY, USA, 2005), 629--634.
[5]
Dhillon, I. et al. 2004. A unified view of kernel k-means, spectral clustering and graph cuts.
[6]
Dhillon, I.S. et al. 2004. Kernel K-means: Spectral Clustering and Normalized Cuts. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (New York, NY, USA, 2004), 551--556.
[7]
Giannakopoulos, G. et al. 2012. Representation models for text classification: a comparative analysis over three web document types. 2nd International Conference on Web Intelligence, Mining and Semantics, WIMS '12, Craiova, Romania, June 6-8, 2012 (2012), 13.
[8]
Giannakopoulos, G. et al. 2008. Summarization System Evaluation Revisited: N-gram Graphs. ACM Trans. Speech Lang. Process. 5, 3 (Oct. 2008), 5:1--5:39.
[9]
Hofmann, T. 1999. Probabilistic Latent Semantic Analysis. Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (San Francisco, CA, USA, 1999), 289--296.
[10]
Kanungo, T. et al. 2002. An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 24, 7 (Jul. 2002), 881--892.
[11]
Kernighan, B.W. and Lin, S. 1970. An Efficient Heuristic Procedure for Partitioning Graphs. Bell System Technical Journal. 49, 2 (Feb. 1970), 291--307.
[12]
Newman, M.E.J. and Girvan, M. 2004. Finding and evaluating community structure in networks. Physical Review E. 69, 2 (Feb. 2004), 026113.

Cited By

View all
  • (2017)User Behavior and Application Modeling in Decentralized Edge Cloud InfrastructuresEconomics of Grids, Clouds, Systems, and Services10.1007/978-3-319-68066-8_15(193-203)Online publication date: 7-Oct-2017
  • (2016)Sentiment Analysis using Word-GraphsProceedings of the 6th International Conference on Web Intelligence, Mining and Semantics10.1145/2912845.2912863(1-9)Online publication date: 13-Jun-2016

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
PCI '14: Proceedings of the 18th Panhellenic Conference on Informatics
October 2014
355 pages
ISBN:9781450328975
DOI:10.1145/2645791
  • General Chairs:
  • Katsikas Sokratis,
  • Hatzopoulos Michael,
  • Apostolopoulos Theodoros,
  • Anagnostopoulos Dimosthenis,
  • Program Chairs:
  • Carayiannis Elias,
  • Varvarigou Theodora,
  • Nikolaidou Mara
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Greek Com Soc: Greek Computer Society
  • Univ. of Piraeus: University of Piraeus
  • National and Kapodistrian University of Athens: National and Kapodistrian University of Athens
  • Athens U of Econ & Business: Athens University of Economics and Business

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 October 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. N-Gram graph
  2. Text Clustering
  3. graph Comparison
  4. graph partitioning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PCI '14

Acceptance Rates

PCI '14 Paper Acceptance Rate 51 of 102 submissions, 50%;
Overall Acceptance Rate 190 of 390 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)User Behavior and Application Modeling in Decentralized Edge Cloud InfrastructuresEconomics of Grids, Clouds, Systems, and Services10.1007/978-3-319-68066-8_15(193-203)Online publication date: 7-Oct-2017
  • (2016)Sentiment Analysis using Word-GraphsProceedings of the 6th International Conference on Web Intelligence, Mining and Semantics10.1145/2912845.2912863(1-9)Online publication date: 13-Jun-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media