Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3508397.3564853acmconferencesArticle/Chapter ViewAbstractPublication PagesmedesConference Proceedingsconference-collections
research-article

Learn2Sum: A New Approach to Unsupervised Text Summarization Based on Topic Modeling

Published: 08 December 2022 Publication History

Abstract

Due to the enormous volume of data on the web, it is hard for the user to retrieve effective and useful information within the right time. Thus, it has become a need to generate a brief summary from a large amount of textual data according to the user profile. In this context, text summarization is used to identify important information within text documents. It aims to generate shorter versions of the source text, by including only the relevant and salient information. In recent years, the research on summarization techniques based on topic modeling techniques has become a hot topic among researchers thanks to their ability to classify, understand a large text corpora and extract important topics on the text. However, existing studies do not provide the support of personalization when generating summaries because they need to know not only which documents are most helpful to the users, but also which topics and keywords are more or less related to the user' interests. Thus, existing studies lack of the support of adaptive user modeling for user applications in the emerging areas of automatic summarization, topic modeling and visualization. In this context, we propose a new approach of automated text summarization based on topic modeling techniques and taking into account the user's profile which helps to semantically extract relevant topics of textual documents, summarizing information according to the user' topics interests and finally visualize them through a hyper-graph Experiments have been conducted to measure the effectiveness of our solution compared to existing summarizing approaches based on text content. The results show the superiority of our approach.

References

[1]
D. Blei and J. Lafferty, "Correlated topic models," Adv. Neural Inf. Process. Syst., vol. 18, p. 147, 2006.
[2]
P. Anupriya and S. Karpagavalli, "LDA based topic modeling of journal abstracts," in 2015 International
[3]
Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39--41 (1995)
[4]
Roul, Rajendra Kumar. "Topic modeling combined with classification technique for extractive multi-document text summarization." Soft Computing 25.2 (2021): 1113--1127.
[5]
Hafeez, Rubab, et al. "Topic based Summarization of Multiple Documents using Semantic Analysis and Clustering." 2018 15th International Conference on Smart Cities: Improving Quality of Life Using ICT IoT (HONET-ICT). IEEE, 2018.
[6]
Twinandilla, Shiva, et al. "Multi-document summarization using k-means and latent dirichlet allocation (lda)-significance sentences." Procedia Computer Science 135 (2018): 663--670.
[7]
Litvak, Marina, Natalia Vanetik, and Lei Li. "Summarizing Weibo with Topics Compression." International Conference on Computational Linguistics and Intelligent Text Processing. Springer, Cham, 2017.
[8]
Wu, Zongda, et al. "A topic modeling based approach to novel document automatic summarization." Expert Systems with Applications 84 (2017): 12--23.
[9]
Singh, Ksh, H. Mamata Devi, and Anjana Kakoti Mahanta. "Document representation techniques and their effect on the document Clustering and Classification: A Review." International Journal of Advanced Research in Computer Science 8.5 (2017).
[10]
Bashir, Muazzam, Azilawati Rozaimee, and Wan Malini Wan Isa. "Automatic Hausa language text summarization based on feature extraction using Naive Bayes Model." World Applied Science Journal 35.9 (2017): 2074--2080.
[11]
Putri, Indiati Restu, and Retno Kusumaningrum. "Latent Dirichlet allocation (LDA) for sentiment analysis toward tourism review in Indonesia." Journal of Physics: Conference Series. Vol. 801. No. 1. IOP Publishing, 2017.
[12]
Huang, Taiwen, Lei Li, and Yazhao Zhang. "Multilingual multi-document summarization with enhanced hLDA features." Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, Cham, 2016. 299--312.
[13]
Irawan, Santun. "Studi Awal Peringkasan Dokumen Bahasa Indonesia Menggunakan Metode Latent Semantik Analysis dan Maximum Marginal Relevance." Annual Research Seminar (ARS). Vol. 2. No. 1. 2017.
[14]
Chiru, Costin-Gabriel, Traian Rebedea, and Silvia Ciotec. "Comparison between LSA-LDA-Lexical Chains." WEBIST (2). 2014.
[15]
Kim, Hyun Duk, et al. "Enriching text representation with frequent pattern mining for probabilistic topic modeling." Proceedings of the American Society for Information Science and Technology 49.1 (2012): 1--10.
[16]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan, 2003, "Latent Dirichlet Allocation", Journal of Machine Learning Research, Volume 3, pp. 993--1022.
[17]
Zhou Tong, and Haiyi Zhang, 2016, "A Text Mining Research Based on LDA Topic Modeling", The Sixth International Conference on Computer Science, Engineering and Information Technology, Volume 6, pp. 201--210.
[18]
Melissa Ailem, Bowen Zhang, and Fei Sha, 2019, "Topic Augmented Generator for Abstractive Summarization", ArXiv
[19]
Allahyari M, Pouriyeh S, Assef M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) Text summarization techniques: a brief survey. arXiv preprint arXiv:170702268
[20]
Lim KW, Buntine W, Chen C, Du L (2016) Nonparametric bayesian topic modelling with the hierarchical pitman-yor processes. Int J Approx Reason 78:172--191
[21]
Cuong HN, Tran VD, Van LN, Than K (2019) Eliminating overftting of probabilistic topic models on short and noisy text: the role of dropout. Int J Approx Reason.
[22]
Amplayo RK, Song M (2017) An adaptable fne-grained sentiment analysis for summarization of multiple short online reviews. Data Knowl Eng 110:54--67
[23]
Zhang L, Wu Z, Bu Z, Jiang Y, Cao J (2018a) A pattern-based topic detection and analysis system on Chinese tweets. J Comput Sci 28:369--381
[24]
Barros C, Lloret E, Saquete E, Navarro-Colorado B (2019) Natsum:Narrative abstractive summarization through cross-document timeline generation. Inform Process Manag 56(5):1775--1793
[25]
Fu X, Wang J, Zhang J, Wei J, Yang Z (2020) Document summarization with vhtm: Variational hierarchical topic-aware mechanism. In: AAAI, pp 7740--7747
[26]
Widyassari, Adhika Pramita, et al. "Review of automatic text summarization techniques methods." Journal of King Saud University-Computer and Information Sciences (2020).
[27]
Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993--1022.
[28]
Steyvers M, Griffiths T. Probabilistic topic models. In: Landauer T, Mcnamara D, Dennis S, Kintsch W (Eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007
[29]
Lin CY (2004) Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out.
[30]
Zukerman, I., Albrecht, D.: Predictive Statistical Models for User Modeling. User Modeling and User-Adapted Interaction 11(2), 5--18 (2001)
[31]
Blei D M, McAuliffe J. Supervised topic models. In: Advances in Neural Information Processing Systems (NIPS) 21. Cambridge, MA, MIT Press, 2007, 121--128nd
[32]
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by latent semantic analysis," J. Amer. Soc. Inf. Sci., vol. 41, no. 6, pp. 391--407, 1990
[33]
T. Hofmann, "Probabilistic latent semantic analysis," in Proc. 15th Conf. Uncertain. Artif. Intell., San Francisco, CA, USA, 1999, pp. 289--296.
[34]
D. M. Blei and J. D. Lafferty, "Dynamic topic models," in Proc. 23rd Int. Conf. Mach. Learn., Pittsburgh, PA, USA, Jun. 2006, pp. 113--120
[35]
Mimno, David, Wei Li, and Andrew McCallum. "Mixtures of hierarchical topics with pachinko allocation." Proceedings of the 24th international conference on Machine learning. 2007.
[36]
Yang, G., Wen, D., Chen, N. S., Sutinen, E., 2015. A novel contextual topic model for multi-document summarization. Expert Systems with Applications 42 (3), 1340--1352
[37]
Bairi, R., Iyer, R., Ramakrishnan, G., Bilmes, J., 2015. Summarization of multi-document topic hierarchies using submodular mixtures. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. pp. 553--563.
[38]
Riddell, A., 2013. Demography of literary form: Probabilistic models for literary history. Ph.D. thesis, Duke University
[39]
Yuan, J., Sivrikaya, F., Hopfgartner, F., Lommatzsch, A., Mu, M., 2015. Context-aware lda: Balancing relevance and diversity in tv content recommenders. In: Proceedings of the 2nd Workshop on Recommendation Systems for Television and Online Video
[40]
Ali, W., Rehman, Z., Rehman, A. U., Slaman, M. (2018, November). Detection of plagiarism in Urdu text documents. In 2018 14th International Conference on Emerging Technologies (ICET) (pp. 1--6). IEEE

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MEDES '22: Proceedings of the 14th International Conference on Management of Digital EcoSystems
October 2022
172 pages
ISBN:9781450392198
DOI:10.1145/3508397
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classification
  2. graph
  3. summarization
  4. text transformation
  5. topic modeling
  6. topics
  7. user profile

Qualifiers

  • Research-article

Conference

MEDES '22

Acceptance Rates

Overall Acceptance Rate 267 of 682 submissions, 39%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 34
    Total Downloads
  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media