Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1779599.1779604acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmdacConference Proceedingsconference-collections
research-article

Towards scalable RDF graph analytics on MapReduce

Published: 26 April 2010 Publication History

Abstract

In order to exploit the growing amount of RDF data in decision-making, there is an increasing demand for analytics-style processing of such data. RDF data is modeled as a labeled graph that represents a collection of binary relations (triples). In this context, analytical queries can be interpreted as consisting of three main constructs namely pattern matching, grouping and aggregation, and require several join operations to reassemble them into n-ary relations relevant to the given query, unlike traditional OLAP systems where data is suitably organized. MapReduce-based parallel processing systems like Pig have gained success in processing scalable analytical workloads. However, these systems offer only relational algebra style operators which would require an iterative n-tuple reassembly process in which intermediate results need to be materialized. This leads to high I/O costs that negatively impacts performance. In this paper, we propose UDFs that (i) re-factor analytical processing on RDF graphs in a way that enables more parallelized processing (ii) perform a look-ahead processing to reduce the cost of subsequent operators in the query execution plan. These functions have been integrated into the Pig Latin function library and the experimental results show up to 50% improvement in execution times for certain classes of queries. An important impact of this work is that it could serve as the foundation for additional physical operators in systems such as Pig for more efficient graph processing.

References

[1]
Weiss, C., Karras, P., Bernstein, A.: Hexastore: Sextuple Indexing for Semantic Web Data Management. VLDB 2008
[2]
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Proc. Of OSDI 2004
[3]
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. ACM SIGMOD 2008
[4]
Abadi, D. J., Marcus, A., Madden, S. R., Hollenbach, K.: Scalable Semantic Web Data Management Using Vertical Partitioning. VLDB 2007
[5]
Yang, H., Dasdan, A., Hsiao, R., Parker Jr., D. S.: Map-reducemerge: simplified relational data processing on large clusters. SIGMOD 2007
[6]
Chaiken, R., Jenkins, B., Larson, P., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: SCOPE: easy and efficient parallel processing of massive data sets. PVLDB 2008
[7]
Yu, Y., Isard, M., Fetterly, D., Badiu, M., Erlingsson, U., Gunda, P. K., and Currey, J.: DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. OSDI 2008
[8]
Pike, R., Dorward, S., Griesemer, R., Quinlan, S.: Interpreting the data: Parallel analysis with Sawzall. Scientific Programming 2005
[9]
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A.: HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. VLDB 2009
[10]
Afrati, Foto N. and Ullman, Jeffrey D.: Optimizing Joins in a Map-Reduce Environment. EDBT 2010
[11]
Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. SIGMOD 2009
[12]
Urbani, J., Kotoulas, S., Oren, E., van Harmelen, F.: Scalable Distributed Reasoning using MapReduce, ISWC 2009
[13]
Newman, A., Li, Y-F., and Hunter, J.: Scalable Semantics - the Silver Lining of Cloud Computing, 4th IEEE International Conference on e-Science, 2008
[14]
Newman, A., Hunter, J., Li, Y-F., Bouton, C., Davis, M.: A Scale-Out RDF Molecule Store for Distributed Processing of Biomedical Data. HCLS'08 at WWW 2008
[15]
Sridhar, R., Ravindra, P., Anyanwu, K.: RAPID: Enabling scalable ad-hoc analytics on the semantic web. ISWC 2009
[16]
Pavlo, A., Paulson, E., Rasin, A., Abadi, D. J., DeWitt, D. J., Madden, S., Stonebraker M.: A Comparison of Approaches to Large-Scale Data Analysis. SIGMOD 2009
[17]
Prud'hommeaux, E., Seaborne, A.: SPARQL query language for RDF. Technical report, World Wide Web Consortium (2005) http://www.w3.org/TR/rdf-sparql-query
[18]
Apache Projects Proceedings, http://hadoop.apache.org/core/
[19]
VCL Setup at NC State University, https://vcl.ncsu.edu/
[20]
JAQL, http://code.google.com/p/jaql

Cited By

View all
  • (2024)Unifying Faceted Search and Analytics over RDF Knowledge GraphsKnowledge and Information Systems10.1007/s10115-024-02076-966:7(3921-3958)Online publication date: 24-Mar-2024
  • (2023)A Brief Survey of Methods for Analytics over RDF Knowledge GraphsAnalytics10.3390/analytics20100042:1(55-74)Online publication date: 17-Jan-2023
  • (2020)RDF Reasoning on Large Ontologies: A Study on Cultural Heritage and WikidataArtificial Intelligence Applications and Innovations10.1007/978-3-030-49161-1_32(381-393)Online publication date: 29-May-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
MDAC '10: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
April 2010
53 pages
ISBN:9781605589916
DOI:10.1145/1779599
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 April 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MapReduce
  2. Pig Latin
  3. RDF analytics

Qualifiers

  • Research-article

Conference

MDAC '10

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Unifying Faceted Search and Analytics over RDF Knowledge GraphsKnowledge and Information Systems10.1007/s10115-024-02076-966:7(3921-3958)Online publication date: 24-Mar-2024
  • (2023)A Brief Survey of Methods for Analytics over RDF Knowledge GraphsAnalytics10.3390/analytics20100042:1(55-74)Online publication date: 17-Jan-2023
  • (2020)RDF Reasoning on Large Ontologies: A Study on Cultural Heritage and WikidataArtificial Intelligence Applications and Innovations10.1007/978-3-030-49161-1_32(381-393)Online publication date: 29-May-2020
  • (2018)Distributed RDF Query ProcessingLinked Data10.1007/978-3-319-73515-3_4(51-83)Online publication date: 2-Mar-2018
  • (2017)Non-native RDF Storage EnginesHandbook of Big Data Technologies10.1007/978-3-319-49340-4_10(339-364)Online publication date: 26-Feb-2017
  • (2015)Design and Implementation of Data Management Scheme to Enable Efficient Analysis of Sensing DataProceedings of the 2015 IEEE International Conference on Autonomic Computing10.1109/ICAC.2015.58(319-324)Online publication date: 7-Jul-2015
  • (2015)Data Management Scheme to Enable Efficient Analysis of Sensing Data for Smart CommunityProceedings of the 2015 IEEE 39th Annual Computer Software and Applications Conference - Volume 0310.1109/COMPSAC.2015.233(182-187)Online publication date: 1-Jul-2015
  • (2015)A Cloud-Based, Geospatial Linked Data Management SystemTransactions on Large-Scale Data- and Knowledge-Centered Systems XX10.1007/978-3-662-46703-9_3(59-89)Online publication date: 18-Mar-2015
  • (2014)Classification of knowledge processing by MapReduce2014 4th International Symposium ISKO-Maghreb: Concepts and Tools for knowledge Management (ISKO-Maghreb)10.1109/ISKO-Maghreb.2014.7033463(1-8)Online publication date: Nov-2014
  • (2013)Efficient Query Processing of Semantic Data Using Graph Contraction on RDBMSProceedings of the 2013 International Conference on Signal-Image Technology & Internet-Based Systems10.1109/SITIS.2013.155(958-965)Online publication date: 2-Dec-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media