Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Topic Modeling Techniques for Text Mining Over a Large-Scale Scientific and Biomedical Text Corpus

Published: 29 April 2022 Publication History

Abstract

Topic models are efficient in extracting central themes from large-scale document collection and it is an active research area. The state-of-the-art techniques like Latent Dirichlet Allocation, Correlated Topic Model (CTM), Hierarchical Dirichlet Process (HDP), Dirichlet Multinomial Regression (DMR) and Hierarchical Pachinko Allocation (HPA) model is considered for comparison. . The abstracts of articles were collected between different periods from PUBMED library by keywords adolescence substance use and depression. A lot of research has happened in this area and thousands of articles are available on PubMed in this area. This collection is huge and so extracting information is very time-consuming. To fit the topic models this extracted text data is used and fitted models were evaluated using both likelihood and non-likelihood measures. The topic models are compared using the evaluation parameters like log-likelihood and perplexity. To evaluate the quality of topics topic coherence measures has been used.

References

[1]
29BhaduryA.ChenJ.ZhuJ.LiuS. (2016, April). Scaling up dynamic topic models. In Proceedings of the 25th International Conference on World Wide Web (pp. 381-390). 10.1145/2872427.2883046
[2]
11Blei, D., Ng, A., & Jordan, M. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
[3]
12Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
[4]
13Blei, D. M., & Jordan, M. I. (2006). Variational inference for Dirichlet process mixtures. Bayesian Analysis, 1(1), 121–143.
[5]
27BleiD. M.LaffertyJ. D. (2006, June). Dynamic topic models. In Proceedings of the 23rd international conference on Machine learning (pp. 113-120). 10.1145/1143844.1143859
[6]
14Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of science. The Annals of Applied Statistics, 1(1), 17–35.
[7]
31Card, D., Tan, C., & Smith, N. A. (2017). A neural framework for generalized topic models. Stat, 1050, 25.
[8]
43Cheng, H., Fernando, R. L., & Garrick, D. J. (2015). GenSim: Simulation of Descendants from Sequenced Ancestors Data. Animal Industry Report, 661(1), 18.
[9]
7Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.
[10]
32DelasallesE.LamprierS.DenoyerL. (2019, December). Dynamic Neural Language Models. In International Conference on Neural Information Processing (pp. 282-294). Springer.
[11]
38Dieng, A. B. (2019). The dynamic embedded topic model. arXiv preprint arXiv:1907.05545
[12]
2Fergusson, D. M., & Boden, J. M. (2008). Cannabis use and later life outcomes . Addiction (Abingdon, England), 103(6), 969–976. 18482420.
[13]
37FouldsJ.BoylesL.DuBoisC.SmythP.WellingM. (2013, August). Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 446-454). 10.1145/2487575.2487697
[14]
19Griffiths, T. L., Jordan, M. I., Tenenbaum, J. B., & Blei, D. M. (2004) Hierarchical topic models and the nested chinese restaurant process, Advances in neural information processing systems, 17-24.
[15]
42Hamner. (n.d.). NIPS Papers. Kaggle. 10.34740/DVS/9097
[16]
26Heimerl, F., Lohmann, S., Lange, S., & Ertl, T. (2014, January). Word cloud explorer: Text analytics based on word clouds. In 2014 47th Hawaii International Conference on System Sciences (pp. 1833-1842). IEEE.
[17]
8Hofmann, T. (2013). Probabilistic latent semantic analysis. arXiv preprint arXiv:1301.6705.
[18]
5Holzinger, A. (2014). Biomedical text mining: state-of-the-art, open problems and future challenges. In Interactive knowledge discovery and data mining in biomedical informatics. Springer.
[19]
22Jameel, S., Lam, W., & Bing, L. (2015). Supervised topic models with word order structure for document classification and retrieval learning. Information Retrieval Journal, 18(4), 283–330.
[20]
4Lan, K., Wang, D. T., Fong, S., Liu, L. S., Wong, K. K., & Dey, N. (2018). A survey of data mining and deep learning in bioinformatics. Journal of Medical Systems, 42(8), 1–20. 29956014.
[21]
34Larochelle, H., & Lauly, S. (2012). A neural autoregressive topic model. In Advances in Neural Information Processing Systems (pp. 2708-2716). Academic Press.
[22]
33LiuL.HuangH.GaoY.ZhangY.WeiX. (2019, May). Neural variational correlated topic modeling. In The World Wide Web Conference (pp. 1142-1152). 10.1145/3308558.3313561
[23]
28Liu, L., Tang, L., Dong, W., Yao, S., & Zhou, W. (2016). An overview of topic modeling and its current applications in bioinformatics. SpringerPlus, 5(1), 1608. 27652181.
[24]
21Mcauliffe, J. D., & Blei, D. M. (2008). Supervised topic models. In Advances in neural information processing systems (pp. 121-128). Academic Press.
[25]
9MeiQ.ZhaiC. (2006, August). A mixture model for contextual text mining. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 649-655). 10.1145/1150402.1150482
[26]
35Miao, Y., Grefenstette, E., & Blunsom, P. (2017, August). Discovering discrete latent topics with neural variational inference. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 2410-2419). JMLR.org.
[27]
36Miao, Y., Yu, L., & Blunsom, P. (2016, June). Neural variational inference for text processing. In International conference on machine learning (pp. 1727-1736). Academic Press.
[28]
23Mimno, D., & McCallum, A. (2012). Topic models conditioned on arbitrary features with dirichlet-multinomial regression. arXiv preprint arXiv:1206.3278
[29]
39Mundotiya, R. K., & Yadav, N. (2021). Forward Context-Aware Clickbait Tweet Identification System. International Journal of Ambient Computing and Intelligence, 12(2), 21–32.
[30]
18PaulM.GirjuR. (2009) Cross-cultural analysis of blogs and forums with mixed-collection topic models. Proceedings of the 2009 conference on empirical methods in natural language processing, 1408-1417. 10.3115/1699648.1699687
[31]
15Ramage, D. (2009). Topic modeling for the social sciences. NIPS 2009 workshop on applications for topic models: text and beyond, 5.
[32]
20RamageD.ManningC. D.DumaisS. (2011) Partially labeled topic models for interpretable text mining. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 457-465. 10.1145/2020408.2020481
[33]
25Srijith, P. K., Hepple, M., Bontcheva, K., & Preotiuc-Pietro, D. (2017). Sub-story detection in Twitter with hierarchical Dirichlet processes. Information Processing & Management, 53(4), 989–1003.
[34]
30Srivastava, A., & Sutton, C. (2017). Autoencoding variational inference for topic models. arXiv preprint arXiv:1703.01488.
[35]
6Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 424-440.
[36]
24Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2005). Sharing clusters among related groups: Hierarchical Dirichlet processes. In Advances in neural information processing systems (pp. 1385-1392). Academic Press.
[37]
3Thapar, A., Collishaw, S., Pine, D. S., & Thapar, A. K. (2012). Depression in adolescence . Lancet, 379(9820), 1056–1067. 22305766.
[38]
44Tomotopy Python Package. (n.d.). Available at https://pypi.org/project/tomotopy/
[39]
40Vengadeswaran, S., & Balasundaram, S. R. (2018). An optimal data placement strategy for improving system performance of massive data applications using graph clustering. International Journal of Ambient Computing and Intelligence, 9(3), 15–30.
[40]
10Wang, R., & Wang, G. (2019). Web text categorization based on statistical merging algorithm in big data environment. International Journal of Ambient Computing and Intelligence, 10(3), 17–32.
[41]
1Wang, S-H. (2016). Text mining for identifying topics in the literatures about adolescent substance use and depression. BMC Public Health, 16(1), 279.
[42]
17Xun, G., Li, Y., Zhao, W. X., Gao, J., & Zhang, A. (2017, August). A Correlated Topic Model Using Word Embeddings. In IJCAI (pp. 4207-4213).
[43]
16Yu, K., & Chu, W. (2008). Gaussian process models for link analysis and transfer learning. In Advances in Neural Information Processing Systems (pp. 1657-1664). Academic Press.

Cited By

View all

Index Terms

  1. Topic Modeling Techniques for Text Mining Over a Large-Scale Scientific and Biomedical Text Corpus
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image International Journal of Ambient Computing and Intelligence
            International Journal of Ambient Computing and Intelligence  Volume 13, Issue 1
            Nov 2022
            590 pages
            ISSN:1941-6237
            EISSN:1941-6245
            Issue’s Table of Contents

            Publisher

            IGI Global

            United States

            Publication History

            Published: 29 April 2022

            Author Tags

            1. Correlated Topic Modelinformation Extraction
            2. Latent Dirichlet Allocation
            3. Machine Learning
            4. Text Mining
            5. Topic Modeling

            Qualifiers

            • Article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 24 Sep 2024

            Other Metrics

            Citations

            Cited By

            View all

            View Options

            View options

            Get Access

            Login options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media