Nothing Special   »   [go: up one dir, main page]

Skip to main content

Fuzzy C-Means in Lower Dimensional Space for Topics Detection on Indonesian Online News

  • Conference paper
  • First Online:
Data Mining and Big Data (DMBD 2019)

Abstract

One of the automated methods for textual data analysis is topic detection. Fuzzy C-Means is a soft clustering-based method for topic detection. Textual data usually has a high dimensional data, which make Fuzzy C-Means fails for topic detection. An approach to overcome the problem is transforming the textual data into lower dimensional space to identify the memberships of the textual data in clusters and use these memberships to generate topics from the high dimensional textual data in the original space. In this paper, we apply the Fuzzy C-Means in lower dimensional space for topic detection on Indonesian online news. Our simulations show that the Fuzzy C-Means gives comparable accuracies than nonnegative matrix factorization and better accuracies than latent Dirichlet allocation regarding topic interpretation in the form of coherence values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)

    Article  Google Scholar 

  2. Muliawati, T., Murfi, H.: Eigenspace-based fuzzy c-means for sensing trending topics in Twitter. In: AIP Conference Proceedings, vol. 1862, p. 030140 (2017)

    Google Scholar 

  3. Murfi, H.: The accuracy of fuzzy c-means in lower-dimensional space for topic detection. In: Qiu, M. (ed.) SmartCom 2018. LNCS, vol. 11344, pp. 321–334. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05755-8_32

    Chapter  Google Scholar 

  4. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Advanced Applications in Pattern Recognition. Springer, New York (1981). https://doi.org/10.1007/978-1-4757-0450-1

    Book  MATH  Google Scholar 

  5. Winkler, R., Klawonn, F., Kruse, R.: Fuzzy c-means in high dimensional spaces. IJFSA 1, 1–16 (2011)

    Google Scholar 

  6. Burden, R.L., Faires, J.D.: Numerical Analysis. Cole Cengage Learning, Boston (2011)

    MATH  Google Scholar 

  7. Hofmann, T., Schӧlkopf, B., Smola, A.J.: Kernel methods in machine learning. Ann. Stat. 36(3), 1171–1220 (2008)

    Article  MathSciNet  Google Scholar 

  8. Manning, C.D., Schuetze, H., Raghavan, P.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  Google Scholar 

  9. Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pp. 69–72 (2006)

    Google Scholar 

  10. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgment

This work was supported by Universitas Indonesia under PIT 9 2019 grant. Any opinions, findings, and conclusions or recommendations are the authors’ and do not necessarily reflect those of the sponsor.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hendri Murfi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nugraha, P., Rifky Yusdiansyah, M., Murfi, H. (2019). Fuzzy C-Means in Lower Dimensional Space for Topics Detection on Indonesian Online News. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2019. Communications in Computer and Information Science, vol 1071. Springer, Singapore. https://doi.org/10.1007/978-981-32-9563-6_28

Download citation

  • DOI: https://doi.org/10.1007/978-981-32-9563-6_28

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-32-9562-9

  • Online ISBN: 978-981-32-9563-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics