research-article

Learn2Sum: A New Approach to Unsupervised Text Summarization Based on Topic Modeling

Authors:

Abedrazzek JemaiAuthors Info & Claims

MEDES '22: Proceedings of the 14th International Conference on Management of Digital EcoSystems

Pages 136 - 143

https://doi.org/10.1145/3508397.3564853

Published: 08 December 2022 Publication History

Abstract

Due to the enormous volume of data on the web, it is hard for the user to retrieve effective and useful information within the right time. Thus, it has become a need to generate a brief summary from a large amount of textual data according to the user profile. In this context, text summarization is used to identify important information within text documents. It aims to generate shorter versions of the source text, by including only the relevant and salient information. In recent years, the research on summarization techniques based on topic modeling techniques has become a hot topic among researchers thanks to their ability to classify, understand a large text corpora and extract important topics on the text. However, existing studies do not provide the support of personalization when generating summaries because they need to know not only which documents are most helpful to the users, but also which topics and keywords are more or less related to the user' interests. Thus, existing studies lack of the support of adaptive user modeling for user applications in the emerging areas of automatic summarization, topic modeling and visualization. In this context, we propose a new approach of automated text summarization based on topic modeling techniques and taking into account the user's profile which helps to semantically extract relevant topics of textual documents, summarizing information according to the user' topics interests and finally visualize them through a hyper-graph Experiments have been conducted to measure the effectiveness of our solution compared to existing summarizing approaches based on text content. The results show the superiority of our approach.

References

[1]

D. Blei and J. Lafferty, "Correlated topic models," Adv. Neural Inf. Process. Syst., vol. 18, p. 147, 2006.

Digital Library

[2]

P. Anupriya and S. Karpagavalli, "LDA based topic modeling of journal abstracts," in 2015 International

[3]

Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39--41 (1995)

Digital Library

[4]

Roul, Rajendra Kumar. "Topic modeling combined with classification technique for extractive multi-document text summarization." Soft Computing 25.2 (2021): 1113--1127.

Digital Library

[5]

Hafeez, Rubab, et al. "Topic based Summarization of Multiple Documents using Semantic Analysis and Clustering." 2018 15th International Conference on Smart Cities: Improving Quality of Life Using ICT IoT (HONET-ICT). IEEE, 2018.

[6]

Twinandilla, Shiva, et al. "Multi-document summarization using k-means and latent dirichlet allocation (lda)-significance sentences." Procedia Computer Science 135 (2018): 663--670.

[7]

Litvak, Marina, Natalia Vanetik, and Lei Li. "Summarizing Weibo with Topics Compression." International Conference on Computational Linguistics and Intelligent Text Processing. Springer, Cham, 2017.

[8]

Wu, Zongda, et al. "A topic modeling based approach to novel document automatic summarization." Expert Systems with Applications 84 (2017): 12--23.

Digital Library

[9]

Singh, Ksh, H. Mamata Devi, and Anjana Kakoti Mahanta. "Document representation techniques and their effect on the document Clustering and Classification: A Review." International Journal of Advanced Research in Computer Science 8.5 (2017).

[10]

Bashir, Muazzam, Azilawati Rozaimee, and Wan Malini Wan Isa. "Automatic Hausa language text summarization based on feature extraction using Naive Bayes Model." World Applied Science Journal 35.9 (2017): 2074--2080.

[11]

Putri, Indiati Restu, and Retno Kusumaningrum. "Latent Dirichlet allocation (LDA) for sentiment analysis toward tourism review in Indonesia." Journal of Physics: Conference Series. Vol. 801. No. 1. IOP Publishing, 2017.

[12]

Huang, Taiwen, Lei Li, and Yazhao Zhang. "Multilingual multi-document summarization with enhanced hLDA features." Chinese computational linguistics and natural language processing based on naturally annotated big data. Springer, Cham, 2016. 299--312.

[13]

Irawan, Santun. "Studi Awal Peringkasan Dokumen Bahasa Indonesia Menggunakan Metode Latent Semantik Analysis dan Maximum Marginal Relevance." Annual Research Seminar (ARS). Vol. 2. No. 1. 2017.

[14]

Chiru, Costin-Gabriel, Traian Rebedea, and Silvia Ciotec. "Comparison between LSA-LDA-Lexical Chains." WEBIST (2). 2014.

[15]

Kim, Hyun Duk, et al. "Enriching text representation with frequent pattern mining for probabilistic topic modeling." Proceedings of the American Society for Information Science and Technology 49.1 (2012): 1--10.

[16]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan, 2003, "Latent Dirichlet Allocation", Journal of Machine Learning Research, Volume 3, pp. 993--1022.

[17]

Zhou Tong, and Haiyi Zhang, 2016, "A Text Mining Research Based on LDA Topic Modeling", The Sixth International Conference on Computer Science, Engineering and Information Technology, Volume 6, pp. 201--210.

[18]

Melissa Ailem, Bowen Zhang, and Fei Sha, 2019, "Topic Augmented Generator for Abstractive Summarization", ArXiv

[19]

Allahyari M, Pouriyeh S, Assef M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) Text summarization techniques: a brief survey. arXiv preprint arXiv:170702268

[20]

Lim KW, Buntine W, Chen C, Du L (2016) Nonparametric bayesian topic modelling with the hierarchical pitman-yor processes. Int J Approx Reason 78:172--191

[21]

Cuong HN, Tran VD, Van LN, Than K (2019) Eliminating overftting of probabilistic topic models on short and noisy text: the role of dropout. Int J Approx Reason.

[22]

Amplayo RK, Song M (2017) An adaptable fne-grained sentiment analysis for summarization of multiple short online reviews. Data Knowl Eng 110:54--67

[23]

Zhang L, Wu Z, Bu Z, Jiang Y, Cao J (2018a) A pattern-based topic detection and analysis system on Chinese tweets. J Comput Sci 28:369--381

[24]

Barros C, Lloret E, Saquete E, Navarro-Colorado B (2019) Natsum:Narrative abstractive summarization through cross-document timeline generation. Inform Process Manag 56(5):1775--1793

[25]

Fu X, Wang J, Zhang J, Wei J, Yang Z (2020) Document summarization with vhtm: Variational hierarchical topic-aware mechanism. In: AAAI, pp 7740--7747

[26]

Widyassari, Adhika Pramita, et al. "Review of automatic text summarization techniques methods." Journal of King Saud University-Computer and Information Sciences (2020).

[27]

Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993--1022.

Digital Library

[28]

Steyvers M, Griffiths T. Probabilistic topic models. In: Landauer T, Mcnamara D, Dennis S, Kintsch W (Eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007

[29]

Lin CY (2004) Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out.

[30]

Zukerman, I., Albrecht, D.: Predictive Statistical Models for User Modeling. User Modeling and User-Adapted Interaction 11(2), 5--18 (2001)

Digital Library

[31]

Blei D M, McAuliffe J. Supervised topic models. In: Advances in Neural Information Processing Systems (NIPS) 21. Cambridge, MA, MIT Press, 2007, 121--128nd

[32]

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by latent semantic analysis," J. Amer. Soc. Inf. Sci., vol. 41, no. 6, pp. 391--407, 1990

[33]

T. Hofmann, "Probabilistic latent semantic analysis," in Proc. 15th Conf. Uncertain. Artif. Intell., San Francisco, CA, USA, 1999, pp. 289--296.

[34]

D. M. Blei and J. D. Lafferty, "Dynamic topic models," in Proc. 23rd Int. Conf. Mach. Learn., Pittsburgh, PA, USA, Jun. 2006, pp. 113--120

Digital Library

[35]

Mimno, David, Wei Li, and Andrew McCallum. "Mixtures of hierarchical topics with pachinko allocation." Proceedings of the 24th international conference on Machine learning. 2007.

[36]

Yang, G., Wen, D., Chen, N. S., Sutinen, E., 2015. A novel contextual topic model for multi-document summarization. Expert Systems with Applications 42 (3), 1340--1352

[37]

Bairi, R., Iyer, R., Ramakrishnan, G., Bilmes, J., 2015. Summarization of multi-document topic hierarchies using submodular mixtures. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. pp. 553--563.

[38]

Riddell, A., 2013. Demography of literary form: Probabilistic models for literary history. Ph.D. thesis, Duke University

[39]

Yuan, J., Sivrikaya, F., Hopfgartner, F., Lommatzsch, A., Mu, M., 2015. Context-aware lda: Balancing relevance and diversity in tv content recommenders. In: Proceedings of the 2nd Workshop on Recommendation Systems for Television and Online Video

[40]

Ali, W., Rehman, Z., Rehman, A. U., Slaman, M. (2018, November). Detection of plagiarism in Urdu text documents. In 2018 14th International Conference on Emerging Technologies (ICET) (pp. 1--6). IEEE

Recommendations

Intertopic information mining for query-based summarization

In this article, the authors address the problem of sentence ranking in summarization. Although most existing summarization approaches are concerned with the information embodied in a particular topic (including a set of documents and an associated ...
Topic and sentiment aware microblog summarization for twitter
Abstract
Recent advances in microblog content summarization has primarily viewed this task in the context of traditional multi-document summarization techniques where a microblog post or their collection form one document. While these techniques already ...
Topic modeling combined with classification technique for extractive multi-document text summarization
Abstract
The qualities of human readable summaries available in the datasets are not up to the mark, leading to issues in creating an accurate model for text summarization. Although recent works have been largely built upon this issue and set up a strong ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MEDES '22: Proceedings of the 14th International Conference on Management of Digital EcoSystems

October 2022

172 pages

ISBN:9781450392198

DOI:10.1145/3508397

General Chairs:
Ernesto Damiani
Khalifa University, UAE
,
Claudio Silvestri
Università Ca' Foscari di Venezia, Italy
,
Mirjana Ivanovic
University of Novi Sad, Serbia
,
Richard Chbeir
University of Pau and the Adour Region, France
,
Yannis Manolopoulos
Open University of Cyprus, Cyprus

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MEDES '22

MEDES '22: International Conference on Management of Digital EcoSystems

October 19 - 21, 2022

Venice, Italy

Acceptance Rates

Overall Acceptance Rate 267 of 682 submissions, 39%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
34
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten