Extracting abstract and keywords from context for academic articles

Ahmet Anıl Müngen¹ &
Mehmet Kaya¹

1241 Accesses
10 Citations
Explore all metrics

Abstract

Every year thousands of academic studies are published all over the world. When researchers search for a topic, they quickly look at abstracts and keywords. In many academic disciplines, the authors write keywords and abstracts in their publications. On the other hand, there are publications of some disciplines, such as social sciences which do not contain keywords and abstracted information. In addition, there may be no abstract or keyword in some of old publications in all disciplines. Search engines for academic publications usually conduct this search by checking keywords, abstracts and titles. The lack of an abstract and a keyword in the publication makes this situation difficult to provide accurate search results and it prevents the researcher to review the publication quickly. This study proposes a method to generate keywords and an abstract from the text that can be used in academic studies. In the previous studies, k-NN and fuzzy CCG methods have been generally used to solve this problem. Nonetheless, the structures of words have not been examined and semantic analysis has not been used for solving this problem. In this study, the sections of the publication are also divided into parts such as the references, the introduction and the methodology. Each section is graded differently so that the word in each section has a different score. Furthermore, NLP methods were used to analyze texts and phrases, removing prepositions and conjunctions. After these processes, the data was used to generate the keyword using TF–IDF. Text generation for abstract is also performed using the TextRank method with this data. Thus, much more successful, truthful and contextually relevant keywords and abstracts are produced. The proposed method was tested on Sobiad Academic Database, which is employed by 72 universities in Turkey, covering more than 250,000 academic publications. Experimental results were measured with precision and F measure, and the results were found to be promising compared to the previous studies, which focused on keyword derivation and abstract generation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Domain Independent Approach for Extracting Terms from Research Papers

A Study on Automatic Keyphrase Extraction and Its Refinement for Scientific Articles

Enhancing keyphrase extraction from academic articles with their reference information

Article 31 January 2022

Notes

http://ieeexplore.ieee.org/Xplorehelp/#/overview-of-ieee-xplore/about-ieee-xplore (date of access 11 Oct 2017).
http://atif.sobiad.com/istatistik (date of access 11 Oct 2017).
https://stanfordnlp.github.io/CoreNLP/ (date of access 11 Oct 2017).
https://github.com/ahmetaa/zemberek-nlp (date of access 11 Oct 2017).

References

Al-Saleh AB, Menai MEB (2016) Automatic Arabic text summarization: a survey. Artif Intell Rev 45(2):203–234
Article Google Scholar
Dwihananto D, Moh T-S (2007) Effectively finding the right keywords for the target audience. In: 2007 IEEE international symposium on signal processing and information technology, Giza, pp 766–771
EI-Ghannam F, EI-Shishtawy T (2013) Multi-topic multi-document summarizer. Int J Comput Sci Inf Technol 5(6):77–90
Google Scholar
El-Beltagy SR, Rafea A (2009) KP-Miner: a keyphrase extraction system for English and Arabic documents. Inf Syst 34(1):132–144
Article Google Scholar
Erkan G, Radev DR (2004) LexRank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479
Article Google Scholar
Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval—SIGIR’01, New Orleans, LA, pp 19–25
Hliaoutakis A, Zervanou K, Petrakis EGM (2007) Medical document indexing and retrieval: AMTEx vs. NLM MMTx. In: Proceedings of the 12th international symposium for health information management research ISHIMR, Sheffield, UK
Hong B, Zhen D (2012) An extended keyword extraction method. Phys Procedia 24:1120–1127
Article Google Scholar
Jo T (2016a) Using string vector based KNN for keyword extraction. In: International conference of information and knowledge engineering|IKE’16, Los Vegas
Jo T (2016b) Table based KNN for extracting keywords. In: 2016 18th international conference on advanced communication technology (ICACT)
Kaikhah K (2004) Automatic text summarization with neural networks. In: Intelligent systems, 2004. Proceedings. 2004 2nd international IEEE conference, pp 40–44
Karnalim O (2017) Software keyphrase extraction with domain-specific features. In: Proceedings—2016 international conference on advanced computing and applications, ACOMP 2016, Can Tho City, pp 43–50
Kiyoumarsi F, Esfahani FR (2011) Optimizing Persian text summarization based on fuzzy logic approach. In: Proceedings of international conference, vol 5. IACSIT Press, Singapore, pp 264–269
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5):604–632
Article MathSciNet MATH Google Scholar
Li Q, Wu YFB (2006) Identifying important concepts from medical documents. J Biomed Inform 39(6):668–679
Article Google Scholar
Liu W, Chung BC, Wang R, Ng J, Morlet N (2015) A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters. Health Inf Sci Syst 3(5):1–14
Google Scholar
Mashechkin IV, Petrovskiy MI, Popov DS, Tsarev DV (2011) Automatic text summarization using latent semantic analysis. Program Comput Softw 37(6):299–305
Article MathSciNet MATH Google Scholar
Mihalcea R (2005) Language independent extractive summarization. Evaluation pp 49–52
Mihalcea R, Tarau P (2004) TextRank: bringing order into texts. In: Proceedings of EMNLP, vol 85, pp 404–411
Moratanch N, Chitrakala S (2017) A survey on extractive text summarization. In: IEEE international conference on computer, communication, and signal processing
Niu J, Chen H, Zhao Q, Su L, Atiquzzaman M (2017) Multi-document abstractive summarization using chunk-graph and recurrent neural network. In: IEEE international conference on communications
Ozsoy MG, Alpaslan FN, Cicekli I (2011) Text summarization using latent semantic analysis. J Inf Sci 37:405–417
Article MathSciNet Google Scholar
Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: bringing order to the web. World Wide Web Internet Web Inf Syst 54(1999–1966):1–17
Google Scholar
Radev D, Allison T, Blair-Goldensohn S, Blitzer J, Çelebi A, Dimitrov S, Drabek E, Hakim A, Lam W, Liu D, Otterbacher J, Qi H, Saggion H, Teufel S, Topper M, Winkel A, Zhang Z (2004) MEAD—a platform for multi document multilingual text summarization. In: Conference on language resources and evaluation (LREC), pp 699–702
Rahaman M, Amin R (2017) Language independent statistical approach for extracting keywords. In: 4th International conference on advances in electrical engineering (ICAEE), 2017
Ribeiro-Neto B, Horizonte B, Cristo M, Golgher PB, Pampulha C, De Moura ES (2005) Impedance coupling in content-targeted advertising. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, Salvador, pp 496–503
Sankarasubramaniam Y, Ramanathan K, Ghosh S (2014) Text summarization using Wikipedia. Inf Process Manag 50(3):443–461
Article Google Scholar
Sarkar K (2009) Automatic keyphrase extraction from medical documents. Springer, Berlin, pp 273–278
Google Scholar
Shen D, Sun J, Li H, Yang Q, Chen Z (2004) Document summarization using conditional random fields. Science (80-) 7:2862–2867
Google Scholar
Song M, Tanapaisankit P (2013) BioKeySpotter: an unsupervised keyphrase extraction technique in the biomedical full-text collection. In: Holmes DE, Jain LC (eds) Data mining: foundations and intelligent paradigms. Springer, Berlin, Heidelberg, pp 19–27
Google Scholar
Suanmali L, Binwahlan MS, Salim N (2009a) Sentence features fusion for text summarization using fuzzy logic. In: 2009 ninth international conference on hybrid intelligent systems, Washington, DC, pp 142–146
Suanmali L, Salim N, Binwahlan MS (2009b) Fuzzy logic based method for improving text summarization. J Comput Sci 2(1):6
Google Scholar
Svore KM, Way M, Vanderwende L, Burges CJC (2007) Enhancing single-document summarization by combining Ranknet and third-party sources. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Prague, pp 448–457
Wartena C, Brussee R (2008) Topic detection by clustering keywords. In: Belgian/Netherlands artificial intelligence conference, pp 379–380
Wong W, Thangarajah J, Padgham L (2012) Contextual question answering for the health domain. J Am Soc Inf Sci Technol 63(11):2313–2327
Article Google Scholar
Wu YB, Li Q, Bot RS, Chen X (2005) Domain-specific keyphrase extraction. In: Proceedings of the 14th ACM international conference on information and knowledge management—CIKM’05, Bremen, p 283
Yakovlev M, Chernyak E (2016) Using annotated suffix tree similarity measure for text summarisation. In: Studies in classification, data analysis, and knowledge organization, pp 103–112
Yih W, Goodman J, Carvalho VR (2006) Finding advertising keywords on web pages. In: Proceedings of the 15th international conference on World Wide Web—WWW’06, Edinburgh, p 213

Download references

Acknowledgements

This study was supported by TUBITAK under Grant no: 116E889. We would like to thank Sobiad for sharing their data and services.

Author information

Authors and Affiliations

Department of Computer Engineering, Fırat University, 23119, Elazig, Turkey
Ahmet Anıl Müngen & Mehmet Kaya

Authors

Ahmet Anıl Müngen
View author publications
You can also search for this author in PubMed Google Scholar
Mehmet Kaya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mehmet Kaya.

Appendix 1: Detail results of Sect. 4

Title	Transnational Intersections: Rethinking Social Sciences with Jeff Hearn
Author’s keywords	Transnationalism, Intersectionism, Social Sciences, Jeff Hearn, Gex
Keywords of the proposed method	Hale Borak Boratav, old/new discrimination, Social Sciences, Transnationalism
Author’s abstract	Jeff Hearn, who conducted studies on the common areas of different cultures described as “transnational” in this article, sought to oversee the possibilities and difficulties of doing different, new, work trails and doing science from the tradition of making traditional binaries with the emphasis on “intersectionality”. The concepts were examined by giving examples from several different groups. In these examples, both Hearn’s work and the conclusions he has made are discussed. As an addendum and complement, the concept of “Gex”—the intersection of gender and sex, is studied through a feminist critique, some of which are based on traditional dualities. Because in the world we live on, both the concepts or phenomena separated by categorical, which in theory have sharp boundaries in practice, are left to the foreground approaches of blurred, singularity and pluralism. This was emphasized by discussing immigration and immigration issues, especially in the life practices of Syrian refugees living in our country. For a new social science practice, the main purpose of the article is to underline the above concepts
Abstract of the proposed method	In this context Hearn opposes the work of masculinities with borders and categories and believes that this work area should be called the field of Critical Studies on Men and Masculinities (CSMM) with a wide perspective covering all the different forms of men and masculinity. He wants to rethink social science with Jeff Hearn. The result is of course a much larger change in the diversity of those coming from Syria, which is a small part of the most visible and underlying mass. Transnational work is an example of a work that can be considered to be at an initial level, although Hearn’s new kind of demon, driven by the emphasis of transnationalism, could certainly be more ambitious
Title	China Factor and Its Reflections to Turkey’s Foreign Trade In Trade
Author’s keywords	International Trade, being Asian, China and Turkey
Keywords of the proposed method	China, Energy, Economics, Trade, Dollar
Author’s abstract	In this study, in 2001 China’s accession to the World Trade Organization and the resulting development of the “China factor” posed nude aims to examine the effects of international trade and Turkey’s foreign trade. In this article, a reflection of global trends in the production and foreign trade, especially the “China factor” and Asian patients to be dealt with and that Turkey’s exports to China, imports and foreign trade situation has been investigated. As a result, global phenomenon affecting Turkey’s foreign trade, international trade issues and compliance efforts into the international trading system was discussed and some proposals have been made to solve the problem
Abstract of the proposed method	The most important factors for failure to perform systematic studies for our exporters in China and the Asia-Pacific region include: Southeast Asia and Pacific trade is very strong, to be surrounded by major global trading nations and blocks of China, China between Turkey and enough of absence and bilateral investment relations of private trade agreements is undeveloped. This study aims to examine the impact of China’s accession to the World Trade Organization in 2001, with China and emerging factor in the development of international trade and the creation of Turkey’s foreign trade
Title	The Idea of Time and History in the Philosophy of Ibni Khaldun
Author’s keywords	Ibn Khaldun, History Philosophy, Umran Science, History
Keywords of the proposed method	Umran Science, History, Natural Entity, Chronology
Author’s abstract	Ibn Khaldun has given a new dimension to the history of history and understanding of time that has reached him within the process of historical development and has made a great impact on the emergence of philosophy of history as a new discipline. While history until that time was chronologically composed of the transmission and narration of historical events, Ibn Khaldun emphasizes the necessity of conceiving social events in the cause-effect relation, as we will see in the western philosophy of the new science and historical examples that he called “Umran Ilmi”. Instead of examining history in chronological order, Ibn Khaldun suggests the principles that uncover it, the study of causes. According to the general widespread understanding in the world of Islam, one of the greatest contributions of history to human beings is its inclusion of a spiritual education element. In this sense it is important to consider to the history
Abstract of the proposed method	According to Ibni Haldun, nervousness is a gathering of people from a generation who must have a power, might and superiority and gather around an ideal. Ibni Haldun refers to a group of solidarity and social integration and an ideal, to interlock with the influence of various factors. Ibni Haldun, who thinks that his child is different in different ages and different groups, indicates that the same siblings come together to form a power union for certain goals and ideals. Thinking reveals irritability as a principle and as a matter of fact in the original form of bedouin umran In this respect, Ibn Khaldun’s comparison with his previous and later thinkers allows him to see the dimensions of his philosophy, rather than as a contribution to a better understanding of his thoughts. As a whole of his work, which is shaped according to the law determined by God, the knowledge obtained from the historical perspective in the understanding of history differs from the other sciences in terms of the understanding of the laws as it is only the object from the epistemological point of view

Rights and permissions

Reprints and permissions

About this article

Cite this article

Müngen, A.A., Kaya, M. Extracting abstract and keywords from context for academic articles. Soc. Netw. Anal. Min. 8, 45 (2018). https://doi.org/10.1007/s13278-018-0524-z

Download citation

Received: 14 February 2018
Revised: 15 June 2018
Accepted: 16 June 2018
Published: 25 June 2018
DOI: https://doi.org/10.1007/s13278-018-0524-z

Extracting abstract and keywords from context for academic articles

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Domain Independent Approach for Extracting Terms from Research Papers

A Study on Automatic Keyphrase Extraction and Its Refinement for Scientific Articles

Enhancing keyphrase extraction from academic articles with their reference information

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix 1: Detail results of Sect. 4

Appendix 1: Detail results of Sect. 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now