Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2484712.2484715acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

LOP: capturing and linking open provenance on LOD cycle

Published: 23 June 2013 Publication History

Abstract

The Web of Data has emerged as a means to expose, share, reuse, and connect information on the Web identified by URIs using RDF as a data model, following Linked Data Principles. However, the reuse of third party data can be compromised without proper data quality assessments. In this context, important questions emerge: how can one trust on published data and links? Which manipulation, modification and integration operations have been applied to the data before its publication? What is the nature of comparisons or transformations applied to data during the interlinking process? In this scenario, provenance becomes a fundamental element. In this paper, we describe an approach for generating and capturing Linked Open Provenance (LOP) to support data quality and trustworthiness assessments, which covers preparation and format transformation of traditional data sources, up to dataset publication and interlinking. The proposed architecture takes advantage of provenance agents, orchestrated by an ETL workflow approach, to collect provenance at any specified level and also link it with its corresponding data. We also describe a real use case scenario where the architecture was implemented to evaluate the proposal.

References

[1]
Auer, S. et al. 2012. Managing the life-cycle of linked data with the LOD2 stack. In Proc. of the 11th Int. Conf. on The Semantic Web, Springer-Verlag. Boston, MA. DOI=10.1007/978-3-642-35173-0_1.
[2]
Bizer, C. and Berner-Lee, T. 2009. Linked Data - The Story So Far. Int. J. Semantic Web Inf. Syst. 5(3): 1--22. DOI=10.4018/jswis.2009081901.
[3]
Buneman, P. and Davidson, S. B. 2010. Data provenance -- the foundation of data quality. www.sei.cmu.edu/measurement/research/upload/Davidson. pdf. Accessed on April 12, 2013.
[4]
Carroll, J. J., Bizer, C., Hayes, P., and Stickler, P. 2005. Named Graphs. Journal of Web Semantics. 3(4):247--267. DOI=10.1016/j.websem.2005.09.001.
[5]
Casters, M., Bouman, R., and Van Dongen, J. 2010. Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration. Indianapolis, Wiley Publishing.
[6]
Cordeiro, K. F., Faria, F. F., Pereira, B. O., Freitas, A., Ribeiro, C. E., Freitas, J. V. V. B., Bringuente, A. C., Arantes, L. O., Calhau, R., Zamborlini, V., Campos, M. L. M., and Guizzardi, G. 2011. An approach for managing and semantically enriching the publication of Linked Open Governmental Data. In Proc. of the 3rd Workshop in Applied Computing for Electronic Government (WCGE). 82--95.
[7]
Cordeiro, K. F., Campos, M. L. M., and Borges, M. R. S. 2011. Empowering Citizens and Government with Collaboration on Linked Open Data. In Proc. of the Extended Semantic Web Conference (ESWC). Crete, Grece.
[8]
Cruz, S. M. S, Campos, M. L. M., and Mattoso, M. 2009. Towards a Taxonomy of Provenance in Scientific Workflow Management Systems. In I World Conference on Services, 259--266. DOI=10.1109/services-I.2009.18.
[9]
De La Cerda, J., and Cavalcanti, M. C. 2012. Registro de procedência de ligações RDF em Dados Ligados. In V Seminário de Pesquisa em Ontologias no Brasil - ONTOBRAS. Recife, Brazil.
[10]
Ehrig, M. 2007. Ontology Alignment: Bridging the Semantic Gap, vol. 4 of Semantic Web and Beyond Computing for Human Experience. Springer.
[11]
Euzenat, J. and Shvaiko, P. 2007. Ontology Matching. Springer-Verlag. Heidelberg (Germany):
[12]
Ferrara, A., Nikolov, A., and Scharffe, F. 2011. Data Linking for the Semantic Web. Int. J. on Semantic Web and Information Systems 7(3), 46--76. DOI=10.4018/jswis.2011070103.
[13]
Freitas, A. et al. 2012. Representing Interoperable Provenance Descriptions for ETL Workflows. In 3rd International Workshop on Role of Semantic Web in Provenance Management (SWPM) 2012.
[14]
Hartig, O. 2009. Provenance Information in the Web of Data. In Proc. of the WWW2009 Workshop on Linked Data on the Web (LDOW), Madrid, Spain.
[15]
Heath, T. and Bizer, C. 2011. Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web: Theory and Technology. Morgan & Claypool.
[16]
Isele, R., Jentzsch, A., and Bizer, C. 2010. Silk Server - Adding missing Links while consuming Linked Data. In Proc of the 1st International Workshop on Consuming Linked Data, (COLD), Shanghai, China.
[17]
Keith A. et al. 2009 Describing Linked Datasets, On the Design and Usage of voiD, the Vocabulary of Interlinked Datasets. In Proc. of the WWW2009 Workshop on Linked Data on the Web (LDOW), Madrid, Spain.
[18]
Kimball, R. and Caserta, J. 2004. The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data. Indianapolis, Wiley Publishing.
[19]
Marjit, U., Sharma, K., and Biswas. U. 2012. Provenance Representation and Storage Techniques in Linked Data: A State-of-the-art Survey. Int. J. of Computer Applications 38(9):23--28.
[20]
Mendes, P., Muhleisen, H., and Bizer, C. 2012. Sieve - Linked Data Quality Assessment and Fusion. In Proc. 2nd International Workshop on Linked Web Data Management (LWDM2012). Berlin, Germany.
[21]
Moreau, L. et al. 2011. The open provenance model core specification (v1.1). Future Generation Comp. Syst., 27(6):743--756. DOI=10.1016/j.future.2010.07.005.
[22]
Omitola, T. et al. 2011. Tracing the Provenance of Linked Data using voiD. In Proc. of the International Conference on Web Intelligence, Mining and Semantics (WIMS'11). Sogndal, Norway.
[23]
Omitola, T. et al. 2012. Capturing Interactive Data Transformation Operations using Provenance Workflows. In 3rd International Workshop on Role of Semantic Web in Provenance Management (SWPM) 2012.
[24]
Sahoo, S. S., Sheth, A., and Henson, C. 2008. Semantic Provenance for eScience: Managing the Deluge of Scientific Data, IEEE Internet Computing, 4(12):46--54. DOI= 10.1109/MIC.2008.86.
[25]
Schultz, A. et al. 2011. LDIF: Linked data integration framework. In Proc. of the 2nd International Workshop on Consuming Linked Data (COLD2011).
[26]
Simmhan, Y., Plale, B., and Gannon. D. 2005. A survey of data provenance in e-science. SIGMOD Record, 34(3):31--36.
[27]
Srivastava, D. and Velegrakis, Y. 2007. Intensional associations between data and metadata. In Proc. of the 2007 ACM international conference on Management of data (SIGMOD'07), 401--412. DOI=10.1145/1247480.1247526.
[28]
Volz, J., et al., G. 2009. Silk--a link discovery framework for the web of data. In Proceedings of the 2nd Linked Data on the Web Workshop, 559--572.
[29]
Wölger, S., et al. 2011. A survey on data interlinking methods. Technical report. Semantic Technology Institute.

Cited By

View all
  • (2021)A Vocabulary for Describing Mapping Quality Assessment, Refinement and Validation2021 IEEE 15th International Conference on Semantic Computing (ICSC)10.1109/ICSC50631.2021.00076(425-430)Online publication date: Jan-2021
  • (2020)Provenance-Aware Knowledge Representation: A Survey of Data Models and Contextualized Knowledge GraphsData Science and Engineering10.1007/s41019-020-00118-0Online publication date: 8-May-2020
  • (2019)Enhancing Open Government Data With Data ProvenanceProceedings of the 11th International Conference on Management of Digital EcoSystems10.1145/3297662.3365791(142-149)Online publication date: 12-Nov-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SWIM '13: Proceedings of the Fifth Workshop on Semantic Web Information Management
June 2013
50 pages
ISBN:9781450321945
DOI:10.1145/2484712
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ETL
  2. data quality
  3. interoperability
  4. linked data
  5. linked open data
  6. provenance

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'13
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)A Vocabulary for Describing Mapping Quality Assessment, Refinement and Validation2021 IEEE 15th International Conference on Semantic Computing (ICSC)10.1109/ICSC50631.2021.00076(425-430)Online publication date: Jan-2021
  • (2020)Provenance-Aware Knowledge Representation: A Survey of Data Models and Contextualized Knowledge GraphsData Science and Engineering10.1007/s41019-020-00118-0Online publication date: 8-May-2020
  • (2019)Enhancing Open Government Data With Data ProvenanceProceedings of the 11th International Conference on Management of Digital EcoSystems10.1145/3297662.3365791(142-149)Online publication date: 12-Nov-2019
  • (2016)Towards multi-user provenance tracking of visual analysis workflows over multiple applicationsProceedings of the EuroVis Workshop on Reproducibility, Verification, and Validation in Visualization10.5555/3057025.3057034(23-27)Online publication date: 6-Jun-2016
  • (2016)Semantic Enrichment for Local Search Engine using Linked Open DataProceedings of the 25th International Conference Companion on World Wide Web10.1145/2872518.2890481(631-634)Online publication date: 11-Apr-2016
  • (2014)Collecting cloud provenance metadata with MatriohskaProceedings of the 29th Annual ACM Symposium on Applied Computing10.1145/2554850.2555066(351-356)Online publication date: 24-Mar-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media