Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3323503.3349562acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
research-article

Semantic enrichment and exploration of open dataset tags

Published: 29 October 2019 Publication History

Abstract

This paper proposes an approach for semantic enrichment of dataset tags through the assignment of terms extracted from the dataset content and the association with meaningful external resources complementing existing tags originally attributed. In this approach, a RDF summary graph is generated to support datasets retrieval through the tags graph exploration. The motivation of this study is the need to improve datasets findability on Open Data Portals through the generation of a richer set of interlinked tags. The semantic enrichment approach is divided in four main steps, comprising cleaning, terms extraction and ranking, linking to associated ontologies or vocabularies terms, and the summarization in graph form, providing tag exploration to find other relevant datasets through tag connections. For the process we developed the Relevant Tag Extractor (RTagE), a semi-automatic software that extracts terms from a dataset, ranks and associates them with external resources. We exemplify the approach with datasets from a Web portal about the use of agrochemicals in agriculture, assigning enriched terms from the AGROVOC thesaurus as dataset tags.

References

[1]
K. Albishre, M. Albathan, and Y. Li. 2015. Effective 20 Newsgroups Dataset Cleaning. In 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Vol. 3. 98--101.
[2]
Fabien Amarger, Jean-Pierre Chanet, Ollivier Haemmerlé, Nathalie Hernandez, and Catherine Roussey. 2014. SKOS Sources Transformations for Ontology Engineering: Agronomical Taxonomy Use Case. Communications in Computer and Information Science 478, 314--328.
[3]
Lizabeth Barclay. 2009. Tagging: People-Powered Metadata for the Social Web (Smith, G.; 2008)[Book Review]. IEEE Transactions on Professional Communication 52, 3 (2009), 321--322.
[4]
Tim Berners-Lee, James Hendler, and Ora Lassila. 2001. The semantic web. Scientific american 284, 5 (2001), 34--43.
[5]
Alwin B Carus and Thomas J DePlonty. 2009. Semantic exploration and discovery. US Patent 7,558,778.
[6]
Ya-Ning Chen. 2017. A Review of Practices for Transforming Library Legacy Records into Linked Open Data. In Research Conference on Metadata and Semantics Research. Springer, 123--133.
[7]
Smitashree Choudhury, John G Breslin, and Alexandre Passant. 2009. Enrichment and ranking of the youtube tag space and integration with the linked data cloud. In International Semantic Web Conference. Springer, 747--762.
[8]
Jonas Jordão de Macêdo. 2018. OpenEasier: A CKAN Extension to Enhance Open Data Publication and Management. Master's thesis. Federal University of Rio Grande do Norte.
[9]
Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R Scott Cost, Yun Peng, Pavan Reddivari, Vishal Doshi, and Joel Sachs. 2004. Swoogle: a search and metadata engine for the semantic web. In Proceedings of the thirteenth ACM international conference on Information and knowledge management. ACM, 652--659.
[10]
Ivan Ermilov, Jens Lehmann, Michael Martin, and Sören Auer. 2016. LODStats: The data web census dataset. In International Semantic Web Conference. Springer, 38--46.
[11]
Katerina Frantzi, Sophia Ananiadou, and Hideki Mima. 2000. Automatic recognition of multi-word terms:. the c-value/nc-value method. International journal on digital libraries 3, 2 (2000), 115--130.
[12]
Governo Digital. 2016. Iniciativa dados.gov.br. https://www.governodigital.gov.br/cidadania/dados-abertos/iniciativa-dados-gov.br Accessed: 2018-05-15.
[13]
Clément Jonquet, Anne Toulet, Elizabeth Arnaud, Sophie Aubin, Esther Dzalé Yeumo, Vincent Emonet, John Graybeal, Marie-Angélique Laporte, Mark A. Musen, Valeria Pesce, and Pierre Larmande. 2018. AgroPortal: A vocabulary and ontology repository for agronomy. Computers and Electronics in Agriculture 144 (2018), 126 -- 143.
[14]
Jon M Kleinberg. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM) 46, 5 (1999), 604--632.
[15]
Jon M. Kleinberg. 1999. Authoritative Sources in a Hyperlinked Environment. J. ACM 46, 5 (Sept. 1999), 604--632.
[16]
Joseph Koivisto and Youngok Choi. 2016. Controlled Vocabulary Enhancement through Crowdsourcing: Project Andvari, Micropasts, and Public Quality Assurance. Society of American Archivists Research Forum (2016).
[17]
Carlos Laufer. 2015. Semantic Web Guideline. http://ceweb.br/guias/web-semantica/en/
[18]
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, Sören Auer, et al. 2015. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6, 2 (2015), 167--195.
[19]
Yike Liu, Tara Safavi, Abhilash Dighe, and Danai Koutra. 2016. Graph Summarization Methods and Applications: A Survey. arXiv:cs.IR/1612.04883
[20]
Juan Antonio Lossio-Ventura, Clement Jonquet, Mathieu Roche, and Maguelonne Teisseire. 2013. Combining C-value and Keyword Extraction Methods for Biomedical Terms Extraction. In LBM: Languages in Biology and Medicine. Tokyo, Japan. https://hal-lirmm.ccsd.cnrs.fr/lirmm-01019991
[21]
Marcos Martínez-Romero, Clement Jonquet, Martin J O'connor, John Graybeal, Alejandro Pazos, and Mark A Musen. 2017. NCBO Ontology Recommender 2.0: an enhanced approach for biomedical ontology recommendation. Journal of biomedical semantics 8, 1 (2017), 21.
[22]
Yasmmin Cortes Martins, Fábio Faria da Mota, and Maria Cláudia Cavalcanti. 2016. DSCrank: A Method for Selection and Ranking of Datasets. In Research Conference on Metadata and Semantics Research. Springer, 333--344.
[23]
Reddy Naidu, Santosh Kumar Bharti, Korra Sathya Babu, and Ramesh Kumar Mohapatra. 2018. Text summarization with automatic keyword extraction in telugu e-newspapers. In Smart Computing and Informatics. Springer, 555--564.
[24]
Nikolaos Nanas, Victoria Uren, Anne De Roeck, and J Domingue. 2003. A comparative study of term weighting methods for information filtering. KMi-TR-128. Knowledge Media Institue, The Open University (2003).
[25]
Sebastian Neumaier, Axel Polleres, Simon Steyskal, and Jürgen Umbrich. 2017. Data Integration for Open Data on the Web. In Reasoning Web. Semantic Interoperability on the Web. Springer, 1--28.
[26]
Sebastian Neumaier, Jürgen Umbrich, and Axel Polleres. 2017. Lifting Data Portals to the Web of Data. In Linked Data on the Web.
[27]
Yukio Ohsawa, Nels E Benson, and Masahiko Yachida. 1998. KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor. In Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-. 12--18.
[28]
Alexandros Pappas, Georgia Troullinou, Giannis Roussakis, Haridimos Kondylakis, and Dimitris Plexousakis. 2017. Exploring importance measures for summarizing RDF/S KBs. In European Semantic Web Conference. Springer, 387--403.
[29]
Jung-Ran Park. 2009. Metadata quality in digital repositories: A survey of the current state of the art. Cataloging & classification quarterly 47, 3-4 (2009), 213--228.
[30]
Maria Elisa Valentim Pickler. 2007. Web Semântica: ontologias como ferramentas de representação do conhecimento. Perspectivas em Ciência da Informação 12, 1 (2007), 65--83.
[31]
Cristina Ribeiro. 2018. Promoting Semantic Annotation of Research Data by Their Creators: A Use Case with B2NOTE at the End of the RDM Workflow. In Metadata and Semantic Research: 11th International Conference, MTSR 2017 Tallinn, Estonia, November 28-December 1, 2017, Proceedings, Vol. 755. Springer, 112.
[32]
Jenn Riley. 2017. Understanding metadata. Washington DC, United States: National Information Standards Organization (http://www.niso.org/publications/press/UnderstandingMetadata.pdf) (2017).
[33]
Stuart Rose, Dave Engel, Nick Cramer, and Wendy Cowley. 2010. Automatic keyword extraction from individual documents. Text Mining: Applications and Theory (2010), 1--20.
[34]
Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management 24, 5 (1988), 513--523.
[35]
Jasmeet Singh and Vishal Gupta. 2016. Text stemming: Approaches, applications, and challenges. ACM Computing Surveys (CSUR) 49, 3 (2016), 45.
[36]
Ayush Singhal, Ravindra Kasturi, Ankit Sharma, and Jaideep Srivastava. 2017. Leveraging web resources for keyword assignment to short text documents. arXiv preprint arXiv:1706.05985 (2017).
[37]
Giovanni Siragusa, Luigi Di Caro, and Marco Tosalli. 2017. Automatic Extraction of Correction Patterns from Expert-Revised Corpora. In Research Conference on Metadata and Semantics Research. Springer, 134--146.
[38]
Ricardo Augusto Teixeira Souza. 2013. Predição de tags usando linked data: um estudo de caso no banco de dados Arquigrafia. Ph.D. Dissertation. Universidade de São Paulo.
[39]
Joshua Tauberer and Larry Lessig. 2007. The 8 principles of open government data. https://opengovdata.org/
[40]
Georgia Troullinou, Haridimos Kondylakis, Kostas Stefanidis, and Dimitris Plexousakis. 2018. Rdfdigest+: A summary-driven system for kbs exploration. In Proceedings of the ISWC. 8--12.
[41]
Alan Tygel. 2016. A Semantic Tags for Open Data Portals: Metadata Enhancements for Searchable Open Data. PhD dissertation. Federal University of Rio de Janeiro.
[42]
Alan Freihof Tygel, Leonardo Gonçalves Gonçalves, Mayara Santos, Gabriel Marques, and Maria Luiza Machado Campos. 2015. Informação para Ação: Desenvolvimento de um Portal de Dados Abertos Sobre Agrotóxicos. Revista Tecnologia e Sociedade 11, 22 (2015).
[43]
Mark Watson. 2009. Cleaning, Segmenting, and Spell-Checking Text. Scripting Intelligence: Web 3.0 Information Gathering and Processing (2009), 19--33.
[44]
Vishanth Weerakkody, Zahir Irani, Kawal Kapoor, Uthayasankar Sivarajah, and Yogesh K Dwivedi. 2017. Open data and its usability: an empirical view from the Citizen's perspective. Information Systems Frontiers 19, 2 (2017), 285--300.
[45]
Ian H Witten, Gordon W Paynter, Eibe Frank, Carl Gutwin, and Craig G Nevill-Manning. 2005. KEA: Practical Automated Keyphrase Extraction. In Design and Usability of Digital Libraries: Case Studies in the Asia Pacific. IGI Global, 129--152.

Cited By

View all
  • (2020)Characterization and Analysis of Open Brazilian Judiciary DataProceedings of the Brazilian Symposium on Multimedia and the Web10.1145/3428658.3430979(317-324)Online publication date: 30-Nov-2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WebMedia '19: Proceedings of the 25th Brazillian Symposium on Multimedia and the Web
October 2019
537 pages
ISBN:9781450367639
DOI:10.1145/3323503
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. information retrieval
  2. keyword extraction
  3. open data
  4. semantic tag

Qualifiers

  • Research-article

Conference

WebMedia '19
WebMedia '19: Brazilian Symposium on Multimedia and the Web
October 29 - November 1, 2019
Rio de Janeiro, Brazil

Acceptance Rates

Overall Acceptance Rate 270 of 873 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)2
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Characterization and Analysis of Open Brazilian Judiciary DataProceedings of the Brazilian Symposium on Multimedia and the Web10.1145/3428658.3430979(317-324)Online publication date: 30-Nov-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media