Abstract
One of the automated methods for textual data analysis is topic detection. Fuzzy C-Means is a soft clustering-based method for topic detection. Textual data usually has a high dimensional data, which make Fuzzy C-Means fails for topic detection. An approach to overcome the problem is transforming the textual data into lower dimensional space to identify the memberships of the textual data in clusters and use these memberships to generate topics from the high dimensional textual data in the original space. In this paper, we apply the Fuzzy C-Means in lower dimensional space for topic detection on Indonesian online news. Our simulations show that the Fuzzy C-Means gives comparable accuracies than nonnegative matrix factorization and better accuracies than latent Dirichlet allocation regarding topic interpretation in the form of coherence values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Muliawati, T., Murfi, H.: Eigenspace-based fuzzy c-means for sensing trending topics in Twitter. In: AIP Conference Proceedings, vol. 1862, p. 030140 (2017)
Murfi, H.: The accuracy of fuzzy c-means in lower-dimensional space for topic detection. In: Qiu, M. (ed.) SmartCom 2018. LNCS, vol. 11344, pp. 321–334. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05755-8_32
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Advanced Applications in Pattern Recognition. Springer, New York (1981). https://doi.org/10.1007/978-1-4757-0450-1
Winkler, R., Klawonn, F., Kruse, R.: Fuzzy c-means in high dimensional spaces. IJFSA 1, 1–16 (2011)
Burden, R.L., Faires, J.D.: Numerical Analysis. Cole Cengage Learning, Boston (2011)
Hofmann, T., Schӧlkopf, B., Smola, A.J.: Kernel methods in machine learning. Ann. Stat. 36(3), 1171–1220 (2008)
Manning, C.D., Schuetze, H., Raghavan, P.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pp. 69–72 (2006)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Acknowledgment
This work was supported by Universitas Indonesia under PIT 9 2019 grant. Any opinions, findings, and conclusions or recommendations are the authors’ and do not necessarily reflect those of the sponsor.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Nugraha, P., Rifky Yusdiansyah, M., Murfi, H. (2019). Fuzzy C-Means in Lower Dimensional Space for Topics Detection on Indonesian Online News. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2019. Communications in Computer and Information Science, vol 1071. Springer, Singapore. https://doi.org/10.1007/978-981-32-9563-6_28
Download citation
DOI: https://doi.org/10.1007/978-981-32-9563-6_28
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-32-9562-9
Online ISBN: 978-981-32-9563-6
eBook Packages: Computer ScienceComputer Science (R0)