Bisimulation reduction of big graphs on mapreduce

Y Luo, Y de Lange, GHL Fletcher, P De Bra… - Big Data: 29th British …, 2013 - Springer
Y Luo, Y de Lange, GHL Fletcher, P De Bra, J Hidders, Y Wu
Big Data: 29th British National Conference on Databases, BNCOD 2013, Oxford …, 2013Springer
Computing the bisimulation partition of a graph is a fundamental problem which plays a key
role in a wide range of basic applications. Intuitively, two nodes in a graph are bisimilar if
they share basic structural properties such as labeling and neighborhood topology. In data
management, reducing a graph under bisimulation equivalence is a crucial step, eg, for
indexing the graph for efficient query processing. Often, graphs of interest in the real world
are massive; examples include social networks and linked open data. For analytics on such …
Abstract
Computing the bisimulation partition of a graph is a fundamental problem which plays a key role in a wide range of basic applications. Intuitively, two nodes in a graph are bisimilar if they share basic structural properties such as labeling and neighborhood topology. In data management, reducing a graph under bisimulation equivalence is a crucial step, e.g., for indexing the graph for efficient query processing. Often, graphs of interest in the real world are massive; examples include social networks and linked open data. For analytics on such graphs, it is becoming increasingly infeasible to rely on in-memory or even I/O-efficient solutions. Hence, a trend in Big Data analytics is the use of distributed computing frameworks such as MapReduce. While there are both internal and external memory solutions for efficiently computing bisimulation, there is, to our knowledge, no effective MapReduce-based solution for bisimulation. Motivated by these observations we propose in this paper the first efficient MapReduce-based algorithm for computing the bisimulation partition of massive graphs. We also detail several optimizations for handling the data skew which often arises in real-world graphs. The results of an extensive empirical study are presented which demonstrate the effectiveness and scalability of our solution.
Springer
Showing the best result for this search. See all results