Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Topic Modeling Techniques for Text Mining Over a Large-Scale Scientific and Biomedical Text Corpus

Published: 29 April 2022 Publication History

Abstract

Topic models are efficient in extracting central themes from large-scale document collection and it is an active research area. The state-of-the-art techniques like Latent Dirichlet Allocation, Correlated Topic Model (CTM), Hierarchical Dirichlet Process (HDP), Dirichlet Multinomial Regression (DMR) and Hierarchical Pachinko Allocation (HPA) model is considered for comparison. . The abstracts of articles were collected between different periods from PUBMED library by keywords adolescence substance use and depression. A lot of research has happened in this area and thousands of articles are available on PubMed in this area. This collection is huge and so extracting information is very time-consuming. To fit the topic models this extracted text data is used and fitted models were evaluated using both likelihood and non-likelihood measures. The topic models are compared using the evaluation parameters like log-likelihood and perplexity. To evaluate the quality of topics topic coherence measures has been used.

References

[1]
29BhaduryA.ChenJ.ZhuJ.LiuS. (2016, April). Scaling up dynamic topic models. In Proceedings of the 25th International Conference on World Wide Web (pp. 381-390). 10.1145/2872427.2883046
[2]
11Blei, D., Ng, A., & Jordan, M. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
[3]
12Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
[4]
13Blei, D. M., & Jordan, M. I. (2006). Variational inference for Dirichlet process mixtures. Bayesian Analysis, 1(1), 121–143.
[5]
27BleiD. M.LaffertyJ. D. (2006, June). Dynamic topic models. In Proceedings of the 23rd international conference on Machine learning (pp. 113-120). 10.1145/1143844.1143859
[6]
14Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of science. The Annals of Applied Statistics, 1(1), 17–35.
[7]
31Card, D., Tan, C., & Smith, N. A. (2017). A neural framework for generalized topic models. Stat, 1050, 25.
[8]
43Cheng, H., Fernando, R. L., & Garrick, D. J. (2015). GenSim: Simulation of Descendants from Sequenced Ancestors Data. Animal Industry Report, 661(1), 18.
[9]
7Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.
[10]
32DelasallesE.LamprierS.DenoyerL. (2019, December). Dynamic Neural Language Models. In International Conference on Neural Information Processing (pp. 282-294). Springer.
[11]
38Dieng, A. B. (2019). The dynamic embedded topic model. arXiv preprint arXiv:1907.05545
[12]
2Fergusson, D. M., & Boden, J. M. (2008). Cannabis use and later life outcomes . Addiction (Abingdon, England), 103(6), 969–976. 18482420.
[13]
37FouldsJ.BoylesL.DuBoisC.SmythP.WellingM. (2013, August). Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 446-454). 10.1145/2487575.2487697
[14]
19Griffiths, T. L., Jordan, M. I., Tenenbaum, J. B., & Blei, D. M. (2004) Hierarchical topic models and the nested chinese restaurant process, Advances in neural information processing systems, 17-24.
[15]
42Hamner. (n.d.). NIPS Papers. Kaggle. 10.34740/DVS/9097
[16]
26Heimerl, F., Lohmann, S., Lange, S., & Ertl, T. (2014, January). Word cloud explorer: Text analytics based on word clouds. In 2014 47th Hawaii International Conference on System Sciences (pp. 1833-1842). IEEE.
[17]
8Hofmann, T. (2013). Probabilistic latent semantic analysis. arXiv preprint arXiv:1301.6705.
[18]
5Holzinger, A. (2014). Biomedical text mining: state-of-the-art, open problems and future challenges. In Interactive knowledge discovery and data mining in biomedical informatics. Springer.
[19]
22Jameel, S., Lam, W., & Bing, L. (2015). Supervised topic models with word order structure for document classification and retrieval learning. Information Retrieval Journal, 18(4), 283–330.
[20]
4Lan, K., Wang, D. T., Fong, S., Liu, L. S., Wong, K. K., & Dey, N. (2018). A survey of data mining and deep learning in bioinformatics. Journal of Medical Systems, 42(8), 1–20. 29956014.
[21]
34Larochelle, H., & Lauly, S. (2012). A neural autoregressive topic model. In Advances in Neural Information Processing Systems (pp. 2708-2716). Academic Press.
[22]
33LiuL.HuangH.GaoY.ZhangY.WeiX. (2019, May). Neural variational correlated topic modeling. In The World Wide Web Conference (pp. 1142-1152). 10.1145/3308558.3313561
[23]
28Liu, L., Tang, L., Dong, W., Yao, S., & Zhou, W. (2016). An overview of topic modeling and its current applications in bioinformatics. SpringerPlus, 5(1), 1608. 27652181.
[24]
21Mcauliffe, J. D., & Blei, D. M. (2008). Supervised topic models. In Advances in neural information processing systems (pp. 121-128). Academic Press.
[25]
9MeiQ.ZhaiC. (2006, August). A mixture model for contextual text mining. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 649-655). 10.1145/1150402.1150482
[26]
35Miao, Y., Grefenstette, E., & Blunsom, P. (2017, August). Discovering discrete latent topics with neural variational inference. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 2410-2419). JMLR.org.
[27]
36Miao, Y., Yu, L., & Blunsom, P. (2016, June). Neural variational inference for text processing. In International conference on machine learning (pp. 1727-1736). Academic Press.
[28]
23Mimno, D., & McCallum, A. (2012). Topic models conditioned on arbitrary features with dirichlet-multinomial regression. arXiv preprint arXiv:1206.3278
[29]
39Mundotiya, R. K., & Yadav, N. (2021). Forward Context-Aware Clickbait Tweet Identification System. International Journal of Ambient Computing and Intelligence, 12(2), 21–32.
[30]
18PaulM.GirjuR. (2009) Cross-cultural analysis of blogs and forums with mixed-collection topic models. Proceedings of the 2009 conference on empirical methods in natural language processing, 1408-1417. 10.3115/1699648.1699687
[31]
15Ramage, D. (2009). Topic modeling for the social sciences. NIPS 2009 workshop on applications for topic models: text and beyond, 5.
[32]
20RamageD.ManningC. D.DumaisS. (2011) Partially labeled topic models for interpretable text mining. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 457-465. 10.1145/2020408.2020481
[33]
25Srijith, P. K., Hepple, M., Bontcheva, K., & Preotiuc-Pietro, D. (2017). Sub-story detection in Twitter with hierarchical Dirichlet processes. Information Processing & Management, 53(4), 989–1003.
[34]
30Srivastava, A., & Sutton, C. (2017). Autoencoding variational inference for topic models. arXiv preprint arXiv:1703.01488.
[35]
6Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 424-440.
[36]
24Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2005). Sharing clusters among related groups: Hierarchical Dirichlet processes. In Advances in neural information processing systems (pp. 1385-1392). Academic Press.
[37]
3Thapar, A., Collishaw, S., Pine, D. S., & Thapar, A. K. (2012). Depression in adolescence . Lancet, 379(9820), 1056–1067. 22305766.
[38]
44Tomotopy Python Package. (n.d.). Available at https://pypi.org/project/tomotopy/
[39]
40Vengadeswaran, S., & Balasundaram, S. R. (2018). An optimal data placement strategy for improving system performance of massive data applications using graph clustering. International Journal of Ambient Computing and Intelligence, 9(3), 15–30.
[40]
10Wang, R., & Wang, G. (2019). Web text categorization based on statistical merging algorithm in big data environment. International Journal of Ambient Computing and Intelligence, 10(3), 17–32.
[41]
1Wang, S-H. (2016). Text mining for identifying topics in the literatures about adolescent substance use and depression. BMC Public Health, 16(1), 279.
[42]
17Xun, G., Li, Y., Zhao, W. X., Gao, J., & Zhang, A. (2017, August). A Correlated Topic Model Using Word Embeddings. In IJCAI (pp. 4207-4213).
[43]
16Yu, K., & Chu, W. (2008). Gaussian process models for link analysis and transfer learning. In Advances in Neural Information Processing Systems (pp. 1657-1664). Academic Press.

Cited By

View all
  • (2024)A Two-Stage Multi-Modal Multi-Label Emotion Recognition Decision System Based on GCNInternational Journal of Decision Support System Technology10.4018/IJDSST.35239816:1(1-17)Online publication date: 17-Sep-2024

Index Terms

  1. Topic Modeling Techniques for Text Mining Over a Large-Scale Scientific and Biomedical Text Corpus
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image International Journal of Ambient Computing and Intelligence
    International Journal of Ambient Computing and Intelligence  Volume 13, Issue 1
    Nov 2022
    590 pages
    ISSN:1941-6237
    EISSN:1941-6245
    Issue’s Table of Contents

    Publisher

    IGI Global

    United States

    Publication History

    Published: 29 April 2022

    Author Tags

    1. Correlated Topic Modelinformation Extraction
    2. Latent Dirichlet Allocation
    3. Machine Learning
    4. Text Mining
    5. Topic Modeling

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Two-Stage Multi-Modal Multi-Label Emotion Recognition Decision System Based on GCNInternational Journal of Decision Support System Technology10.4018/IJDSST.35239816:1(1-17)Online publication date: 17-Sep-2024

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media