Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Extracting abstract and keywords from context for academic articles

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Every year thousands of academic studies are published all over the world. When researchers search for a topic, they quickly look at abstracts and keywords. In many academic disciplines, the authors write keywords and abstracts in their publications. On the other hand, there are publications of some disciplines, such as social sciences which do not contain keywords and abstracted information. In addition, there may be no abstract or keyword in some of old publications in all disciplines. Search engines for academic publications usually conduct this search by checking keywords, abstracts and titles. The lack of an abstract and a keyword in the publication makes this situation difficult to provide accurate search results and it prevents the researcher to review the publication quickly. This study proposes a method to generate keywords and an abstract from the text that can be used in academic studies. In the previous studies, k-NN and fuzzy CCG methods have been generally used to solve this problem. Nonetheless, the structures of words have not been examined and semantic analysis has not been used for solving this problem. In this study, the sections of the publication are also divided into parts such as the references, the introduction and the methodology. Each section is graded differently so that the word in each section has a different score. Furthermore, NLP methods were used to analyze texts and phrases, removing prepositions and conjunctions. After these processes, the data was used to generate the keyword using TF–IDF. Text generation for abstract is also performed using the TextRank method with this data. Thus, much more successful, truthful and contextually relevant keywords and abstracts are produced. The proposed method was tested on Sobiad Academic Database, which is employed by 72 universities in Turkey, covering more than 250,000 academic publications. Experimental results were measured with precision and F measure, and the results were found to be promising compared to the previous studies, which focused on keyword derivation and abstract generation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. http://ieeexplore.ieee.org/Xplorehelp/#/overview-of-ieee-xplore/about-ieee-xplore (date of access 11 Oct 2017).

  2. http://atif.sobiad.com/istatistik (date of access 11 Oct 2017).

  3. https://stanfordnlp.github.io/CoreNLP/ (date of access 11 Oct 2017).

  4. https://github.com/ahmetaa/zemberek-nlp (date of access 11 Oct 2017).

References

  • Al-Saleh AB, Menai MEB (2016) Automatic Arabic text summarization: a survey. Artif Intell Rev 45(2):203–234

    Article  Google Scholar 

  • Dwihananto D, Moh T-S (2007) Effectively finding the right keywords for the target audience. In: 2007 IEEE international symposium on signal processing and information technology, Giza, pp 766–771

  • EI-Ghannam F, EI-Shishtawy T (2013) Multi-topic multi-document summarizer. Int J Comput Sci Inf Technol 5(6):77–90

    Google Scholar 

  • El-Beltagy SR, Rafea A (2009) KP-Miner: a keyphrase extraction system for English and Arabic documents. Inf Syst 34(1):132–144

    Article  Google Scholar 

  • Erkan G, Radev DR (2004) LexRank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479

    Article  Google Scholar 

  • Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval—SIGIR’01, New Orleans, LA, pp 19–25

  • Hliaoutakis A, Zervanou K, Petrakis EGM (2007) Medical document indexing and retrieval: AMTEx vs. NLM MMTx. In: Proceedings of the 12th international symposium for health information management research ISHIMR, Sheffield, UK

  • Hong B, Zhen D (2012) An extended keyword extraction method. Phys Procedia 24:1120–1127

    Article  Google Scholar 

  • Jo T (2016a) Using string vector based KNN for keyword extraction. In: International conference of information and knowledge engineering|IKE’16, Los Vegas

  • Jo T (2016b) Table based KNN for extracting keywords. In: 2016 18th international conference on advanced communication technology (ICACT)

  • Kaikhah K (2004) Automatic text summarization with neural networks. In: Intelligent systems, 2004. Proceedings. 2004 2nd international IEEE conference, pp 40–44

  • Karnalim O (2017) Software keyphrase extraction with domain-specific features. In: Proceedings—2016 international conference on advanced computing and applications, ACOMP 2016, Can Tho City, pp 43–50

  • Kiyoumarsi F, Esfahani FR (2011) Optimizing Persian text summarization based on fuzzy logic approach. In: Proceedings of international conference, vol 5. IACSIT Press, Singapore, pp 264–269

  • Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5):604–632

    Article  MathSciNet  MATH  Google Scholar 

  • Li Q, Wu YFB (2006) Identifying important concepts from medical documents. J Biomed Inform 39(6):668–679

    Article  Google Scholar 

  • Liu W, Chung BC, Wang R, Ng J, Morlet N (2015) A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters. Health Inf Sci Syst 3(5):1–14

    Google Scholar 

  • Mashechkin IV, Petrovskiy MI, Popov DS, Tsarev DV (2011) Automatic text summarization using latent semantic analysis. Program Comput Softw 37(6):299–305

    Article  MathSciNet  MATH  Google Scholar 

  • Mihalcea R (2005) Language independent extractive summarization. Evaluation pp 49–52

  • Mihalcea R, Tarau P (2004) TextRank: bringing order into texts. In: Proceedings of EMNLP, vol 85, pp 404–411

  • Moratanch N, Chitrakala S (2017) A survey on extractive text summarization. In: IEEE international conference on computer, communication, and signal processing

  • Niu J, Chen H, Zhao Q, Su L, Atiquzzaman M (2017) Multi-document abstractive summarization using chunk-graph and recurrent neural network. In: IEEE international conference on communications

  • Ozsoy MG, Alpaslan FN, Cicekli I (2011) Text summarization using latent semantic analysis. J Inf Sci 37:405–417

    Article  MathSciNet  Google Scholar 

  • Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: bringing order to the web. World Wide Web Internet Web Inf Syst 54(1999–1966):1–17

    Google Scholar 

  • Radev D, Allison T, Blair-Goldensohn S, Blitzer J, Çelebi A, Dimitrov S, Drabek E, Hakim A, Lam W, Liu D, Otterbacher J, Qi H, Saggion H, Teufel S, Topper M, Winkel A, Zhang Z (2004) MEAD—a platform for multi document multilingual text summarization. In: Conference on language resources and evaluation (LREC), pp 699–702

  • Rahaman M, Amin R (2017) Language independent statistical approach for extracting keywords. In: 4th International conference on advances in electrical engineering (ICAEE), 2017

  • Ribeiro-Neto B, Horizonte B, Cristo M, Golgher PB, Pampulha C, De Moura ES (2005) Impedance coupling in content-targeted advertising. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, Salvador, pp 496–503

  • Sankarasubramaniam Y, Ramanathan K, Ghosh S (2014) Text summarization using Wikipedia. Inf Process Manag 50(3):443–461

    Article  Google Scholar 

  • Sarkar K (2009) Automatic keyphrase extraction from medical documents. Springer, Berlin, pp 273–278

    Google Scholar 

  • Shen D, Sun J, Li H, Yang Q, Chen Z (2004) Document summarization using conditional random fields. Science (80-) 7:2862–2867

    Google Scholar 

  • Song M, Tanapaisankit P (2013) BioKeySpotter: an unsupervised keyphrase extraction technique in the biomedical full-text collection. In: Holmes DE, Jain LC (eds) Data mining: foundations and intelligent paradigms. Springer, Berlin, Heidelberg, pp 19–27

    Google Scholar 

  • Suanmali L, Binwahlan MS, Salim N (2009a) Sentence features fusion for text summarization using fuzzy logic. In: 2009 ninth international conference on hybrid intelligent systems, Washington, DC, pp 142–146

  • Suanmali L, Salim N, Binwahlan MS (2009b) Fuzzy logic based method for improving text summarization. J Comput Sci 2(1):6

    Google Scholar 

  • Svore KM, Way M, Vanderwende L, Burges CJC (2007) Enhancing single-document summarization by combining Ranknet and third-party sources. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Prague, pp 448–457

  • Wartena C, Brussee R (2008) Topic detection by clustering keywords. In: Belgian/Netherlands artificial intelligence conference, pp 379–380

  • Wong W, Thangarajah J, Padgham L (2012) Contextual question answering for the health domain. J Am Soc Inf Sci Technol 63(11):2313–2327

    Article  Google Scholar 

  • Wu YB, Li Q, Bot RS, Chen X (2005) Domain-specific keyphrase extraction. In: Proceedings of the 14th ACM international conference on information and knowledge management—CIKM’05, Bremen, p 283

  • Yakovlev M, Chernyak E (2016) Using annotated suffix tree similarity measure for text summarisation. In: Studies in classification, data analysis, and knowledge organization, pp 103–112

  • Yih W, Goodman J, Carvalho VR (2006) Finding advertising keywords on web pages. In: Proceedings of the 15th international conference on World Wide Web—WWW’06, Edinburgh, p 213

Download references

Acknowledgements

This study was supported by TUBITAK under Grant no: 116E889. We would like to thank Sobiad for sharing their data and services.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mehmet Kaya.

Appendix 1: Detail results of Sect. 4

Appendix 1: Detail results of Sect. 4

Title

Transnational Intersections: Rethinking Social Sciences with Jeff Hearn

Author’s keywords

Transnationalism, Intersectionism, Social Sciences, Jeff Hearn, Gex

Keywords of the proposed method

Hale Borak Boratav, old/new discrimination, Social Sciences, Transnationalism

Author’s abstract

Jeff Hearn, who conducted studies on the common areas of different cultures described as “transnational” in this article, sought to oversee the possibilities and difficulties of doing different, new, work trails and doing science from the tradition of making traditional binaries with the emphasis on “intersectionality”. The concepts were examined by giving examples from several different groups. In these examples, both Hearn’s work and the conclusions he has made are discussed. As an addendum and complement, the concept of “Gex”—the intersection of gender and sex, is studied through a feminist critique, some of which are based on traditional dualities. Because in the world we live on, both the concepts or phenomena separated by categorical, which in theory have sharp boundaries in practice, are left to the foreground approaches of blurred, singularity and pluralism. This was emphasized by discussing immigration and immigration issues, especially in the life practices of Syrian refugees living in our country. For a new social science practice, the main purpose of the article is to underline the above concepts

Abstract of the proposed method

In this context Hearn opposes the work of masculinities with borders and categories and believes that this work area should be called the field of Critical Studies on Men and Masculinities (CSMM) with a wide perspective covering all the different forms of men and masculinity. He wants to rethink social science with Jeff Hearn. The result is of course a much larger change in the diversity of those coming from Syria, which is a small part of the most visible and underlying mass. Transnational work is an example of a work that can be considered to be at an initial level, although Hearn’s new kind of demon, driven by the emphasis of transnationalism, could certainly be more ambitious

Title

China Factor and Its Reflections to Turkey’s Foreign Trade In Trade

Author’s keywords

International Trade, being Asian, China and Turkey

Keywords of the proposed method

China, Energy, Economics, Trade, Dollar

Author’s abstract

In this study, in 2001 China’s accession to the World Trade Organization and the resulting development of the “China factor” posed nude aims to examine the effects of international trade and Turkey’s foreign trade. In this article, a reflection of global trends in the production and foreign trade, especially the “China factor” and Asian patients to be dealt with and that Turkey’s exports to China, imports and foreign trade situation has been investigated. As a result, global phenomenon affecting Turkey’s foreign trade, international trade issues and compliance efforts into the international trading system was discussed and some proposals have been made to solve the problem

Abstract of the proposed method

The most important factors for failure to perform systematic studies for our exporters in China and the Asia-Pacific region include: Southeast Asia and Pacific trade is very strong, to be surrounded by major global trading nations and blocks of China, China between Turkey and enough of absence and bilateral investment relations of private trade agreements is undeveloped. This study aims to examine the impact of China’s accession to the World Trade Organization in 2001, with China and emerging factor in the development of international trade and the creation of Turkey’s foreign trade

Title

The Idea of Time and History in the Philosophy of Ibni Khaldun

Author’s keywords

Ibn Khaldun, History Philosophy, Umran Science, History

Keywords of the proposed method

Umran Science, History, Natural Entity, Chronology

Author’s abstract

Ibn Khaldun has given a new dimension to the history of history and understanding of time that has reached him within the process of historical development and has made a great impact on the emergence of philosophy of history as a new discipline. While history until that time was chronologically composed of the transmission and narration of historical events, Ibn Khaldun emphasizes the necessity of conceiving social events in the cause-effect relation, as we will see in the western philosophy of the new science and historical examples that he called “Umran Ilmi”. Instead of examining history in chronological order, Ibn Khaldun suggests the principles that uncover it, the study of causes. According to the general widespread understanding in the world of Islam, one of the greatest contributions of history to human beings is its inclusion of a spiritual education element. In this sense it is important to consider to the history

Abstract of the proposed method

According to Ibni Haldun, nervousness is a gathering of people from a generation who must have a power, might and superiority and gather around an ideal. Ibni Haldun refers to a group of solidarity and social integration and an ideal, to interlock with the influence of various factors. Ibni Haldun, who thinks that his child is different in different ages and different groups, indicates that the same siblings come together to form a power union for certain goals and ideals. Thinking reveals irritability as a principle and as a matter of fact in the original form of bedouin umran

 In this respect, Ibn Khaldun’s comparison with his previous and later thinkers allows him to see the dimensions of his philosophy, rather than as a contribution to a better understanding of his thoughts. As a whole of his work, which is shaped according to the law determined by God, the knowledge obtained from the historical perspective in the understanding of history differs from the other sciences in terms of the understanding of the laws as it is only the object from the epistemological point of view

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Müngen, A.A., Kaya, M. Extracting abstract and keywords from context for academic articles. Soc. Netw. Anal. Min. 8, 45 (2018). https://doi.org/10.1007/s13278-018-0524-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-018-0524-z

Keywords

Navigation