Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Correlation constraint shortest path over large multi-relation graphs

Published: 01 January 2019 Publication History

Abstract

Multi-relation graphs intuitively capture the heterogeneous correlations among real-world entities by allowing multiple types of relationships to be represented as entity-connecting edges, i.e., two entities could be correlated with more than one type of relationship. This is important in various applications such as social network analysis, ecology, and bio-informatics. Existing studies on these graphs usually consider an edge label constraint perspective, where each edge contains only one label and each edge is considered independently. For example, there are lines of research focusing on reachability between two vertices under a set of edge label constraints, or finding paths whose consecutive edge labels satisfy a user-specified logical expression. This is too restricted in real graphs, and in this work, we define a generic correlation constraint on multi-relation graphs from the perspective of vertex correlations, where a correlation can be defined recursively. Specifically, we formalize and investigate the shortest path problem over large multi-relation graphs in the presence of both necessity and denial constraints, which have various real applications. We show that it is nontrivial to apply conventional graph traversal algorithms (e.g., BFS or DFS) to address the challenge. To effectively reduce the search space, we propose a Hybrid Relation Encoding method, a.k.a. HyRE, to encode both topological and relation information in a compact way. We conduct extensive experiments over large real-world graphs to validate the effectiveness and efficiency of the proposed solution.

References

[1]
I. Abraham, D. Delling, A. V. Goldberg, and R. F. F. Werneck. Hierarchical hub labelings for shortest paths. In Proc. 20th Annual European Symp. on Algorithms, pages 24--35, 2012.
[2]
T. Akiba, Y. Iwata, and Y. Yoshida. Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 349--360. ACM, 2013.
[3]
H. V. N. Y. Angela Bonifati, George Fletcher. Querying graphs. In Synthesis Lectures on Data Management, Morgan Claypool, 2018.
[4]
R. Angles. The property graph database model. In Proceedings of the 12th Alberto Mendelzon International Workshop on Foundations of Data Management, 2018.
[5]
C. Avery. Giraph: Large-scale graph processing infrastructure on hadoop. Proceedings of the Hadoop Summit. Santa Clara, 2011.
[6]
M. Baitaluk, X. Qian, S. Godbole, A. Raval, A. Ray, and A. Gupta. Pathsys: integrating molecular interaction graphs for systems biology. BMC Bioinformatics, 7(1):55, Feb 2006.
[7]
P. Barceló, C. A. Hurtado, L. Libkin, and P. T. Wood. Expressive languages for path queries over graph-structured data. In Proc. ACM SIGACT-SIGMOD Symp. on Principles of Database Systems, pages 3--14. ACM, 2010.
[8]
C. L. Barrett, K. R. Bisset, R. Jacob, G. Konjevod, and M. V. Marathe. Classical and contemporary shortest path problems in road networks: Implementation and experimental analysis of the TRANSIMS router. In Proc. 10th Annual European Symp. on Algorithms, pages 126--138, 2002.
[9]
H. Bast, S. Funke, D. Matijevic, P. Sanders, and D. Schultes. Proc. 9th workshop on algorithm eng. and experiments. In Proceedings of the Nine Workshop on Algorithm Engineering and Experiments, 2007.
[10]
S. Beamer, K. Asanovic, and D. A. Patterson. Direction-optimizing breadth-first search. Scientific Programming, 21(3--4):137--148, 2013.
[11]
F. Bonchi, A. Gionis, F. Gullo, and A. Ukkonen. Distance oracles in edge-labeled graphs. In Proc. 17th Int. Conf. on Extending Database Technology, pages 547--558, 2014.
[12]
D. Calvanese, G. D. Giacomo, M. Lenzerini, and M. Y. Vardi. Containment of conjunctive regular path queries with inverse. In KR, pages 176--185. Morgan Kaufmann, 2000.
[13]
A. Cardillo, J. Gómez-Gardeñes, M. Zanin, M. Romance, D. Papo, F. del Pozo, and S. Boccaletti. Emergence of network features from multiplexity. CoRR, abs/1212.2153, 2012.
[14]
M. Coscia, G. Rossetti, D. Pennacchioli, D. Ceccarelli, and F. Giannotti. You know because i know: a multidimensional network approach to human resources problem. In Advances in Social Networks Analysis and Mining, pages 434--441, 2013.
[15]
J. Dibbelt, T. Pajor, and D. Wagner. User-constrained multimodal route planning. ACM J. Experimental Algorithmics, 19(1), 2014.
[16]
E. W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1:269--271, 1959.
[17]
W. Fan, J. Li, S. Ma, N. Tang, and Y. Wu. Adding regular expressions to graph reachability and pattern queries. Front. Comput. Sci., 6(3):313--338, 2012.
[18]
D. Florescu, A. Y. Levy, and D. Suciu. Query containment for conjunctive queries with regular expressions. In Proc. ACM SIGACT-SIGMOD Symp. on Principles of Database Systems, pages 139--148. ACM Press, 1998.
[19]
R. Geisberger, M. N. Rice, P. Sanders, and V. J. Tsotras. Route planning with flexible edge restrictions. ACM J. Experimental Algorithmics, 17(1), 2012.
[20]
R. Geisberger, P. Sanders, D. Schultes, and D. Delling. Contraction hierarchies: Faster and simpler hierarchical routing in road networks. In Proc. Int. Workshop on Experimental and Efficient Algorithms, pages 319--333, 2008.
[21]
A. Gubichev, S. Bedathur, S. Seufert, and G. Weikum. Fast and accurate estimation of shortest paths in large graphs. In Proc. 19th ACM Int. Conf. on Information and Knowledge Management, pages 499--508. ACM, 2010.
[22]
P. E. Hart, N. J. Nilsson, and B. Raphael. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Systems Science and Cybernetics, 4(2): 100--107, 1968.
[23]
M. S. Hassan, W. G. Aref, and A. M. Aly. Graph indexing for shortest-path finding over dynamic sub-graphs. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 1183--1197, 2016.
[24]
J. Håstad. Clique is hard to approximate within n<sup>1-epsilon</sup>. In FOCS, pages 627--636. IEEE Computer Society, 1996.
[25]
Janusgraph: Distributed graph database. http://janusgraph.org/, 2018.
[26]
A. Kanterakis, D. Kafetzopoulos, V. Moustakis, and G. Potamias. Mining gene expression profiles and gene regulatory networks: Identification of phenotype-specific molecular mechanisms. In Proc. Artificial Intelligence: Theories, Models and Applications, 5th Hellenic Conf. on AI, pages 97--109, 2008.
[27]
D. Lasalle and G. Karypis. Multi-threaded graph partitioning. In Proc. 27th IEEE Int. Parallel & Distributed Processing Symp., pages 225--236, 2013.
[28]
Y. Liang and P. Zhao. Similarity search in graph databases: A multi-layered indexing approach. In Proc. 33st Int. Conf. on Data Engineering, pages 783--794, 2017.
[29]
S. Ma, K. Feng, J. Li, H. Wang, G. Cong, and J. Huai. Proxies for shortest path and distance queries. IEEE Trans. Knowl. and Data Eng., 28(7):1835--1850, 2016.
[30]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 135--146. ACM, 2010.
[31]
M. Mao, J. Lu, G. Zhang, and J. Zhang. Multirelational social recommendations via multigraph ranking. IEEE Trans. Cybernetics, 47(12):4049--4061, 2017.
[32]
J. J. McAuley and J. Leskovec. Image labeling on a network: Using social-network metadata for image classification. In ECCV (4), volume 7575 of Lecture Notes in Computer Science, pages 828--841. Springer, 2012.
[33]
A. O. Mendelzon and P. T. Wood. Finding regular simple paths in graph databases. SIAM J. on Comput., 24(6):1235--1258, 1995.
[34]
Microsoft academic graph. https://www.openacademic.ai/oag/, 2015.
[35]
V. Rastogi, A. Machanavajjhala, L. Chitnis, and A. D. Sarma. Finding connected components in map-reduce in logarithmic rounds. In Proc. 29th Int. Conf. on Data Engineering, pages 50--61, 2013.
[36]
M. N. Rice and V. J. Tsotras. Graph indexing of road networks for shortest path queries with label restrictions. PVLDB, 4(2):69--80, 2010.
[37]
H. Samet, J. Sankaranarayanan, and H. Alborzi. Scalable network distance browsing in spatial databases. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 43--54, 2008.
[38]
A. Sealfon. Shortest paths and distances with differential privacy. In Proc. 35nd ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, pages 29--41, 2016.
[39]
P. Selmer, A. Poulovassilis, and P. T. Wood. Implementing flexible operators for regular path queries. In Proc. 18th Int. Conf. on Extending Database Technology, pages 149--156, 2015.
[40]
T. Shulong, B. Jiajun, C. Chun, and H. Xiaofei. Using rich social media information for music recommendation via hypergraph model. In ACM Trans. Multimedia Comp., Comm., and Appl., volume 7S, pages 213--237. 2011.
[41]
M. Stella, C. S. Andreazzi, S. Selakovic, A. Goudarzi, and A. Antonioni. Parasite spreading in spatial ecological multiplex networks. J. Complex Networks, 5(3):486--511, 2017.
[42]
D. Szklarczyk, A. Franceschini, S. Wyder, K. Forslund, D. Heller, J. Huerta-Cepas, M. Simonovic, A. Roth, A. Santos, K. P. Tsafou, M. Kuhn, P. Bork, L. J. Jensen, and C. von Mering. String v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Research, 43(D1):D447, 2014.
[43]
D. Szklarczyk, J. H. Morris, H. Cook, M. Kuhn, S. Wyder, M. Simonovic, A. Santos, N. T. Doncheva, A. Roth, P. Bork, L. J. Jensen, and C. von Mering. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Research, 45(Database-Issue):D362--D368, 2017.
[44]
L. Tang, X. Wang, and H. Liu. Community detection via heterogeneous interaction analysis. Data Min. Knowl. Discov., 25(1):1--33, 2012.
[45]
The neo4j graph platform. https://neo4j.com/, 2018.
[46]
L. D. J. Valstar, G. H. L. Fletcher, and Y. Yoshida. Landmark indexing for evaluation of label-constrained reachability queries. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 345--358, 2017.
[47]
S. Wang, Y. Tsai, T. Hong, and H. Kao. k-anonymization of multiple shortest paths. J. Soft Comput., 21(15):4215--4226, 2017.
[48]
S. Wang, Z. Tsai, T. Hong, and I. Ting. Anonymizing shortest paths on social network graphs. In Proc. 3rd Int. Conf. on Intell. Information and Database Syst., pages 129--136, 2011.
[49]
F. Wei. Tedi: Efficient shortest path query answering on graphs. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 99--110, 2010.
[50]
F. Wei. TEDI: efficient shortest path query answering on graphs. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 99--110, 2010.
[51]
D. Yan, J. Cheng, Y. Lu, and W. Ng. Blogel: A block-centric framework for distributed computation on real-world graphs. PVLDB, 7(14):1981--1992, 2014.

Cited By

View all
  • (2024)Efficient Betweenness Centrality Computation over Large Heterogeneous Information NetworksProceedings of the VLDB Endowment10.14778/3681954.368200617:11(3360-3372)Online publication date: 30-Aug-2024
  • (2022)Efficiently Answering Minimum Reachable Label Set Queries in Edge-Labeled GraphsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557593(4585-4589)Online publication date: 17-Oct-2022
  • (2022)Multi-relation Graph SummarizationACM Transactions on Knowledge Discovery from Data10.1145/349456116:5(1-30)Online publication date: 9-Mar-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 12, Issue 5
January 2019
163 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 January 2019
Published in PVLDB Volume 12, Issue 5

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Betweenness Centrality Computation over Large Heterogeneous Information NetworksProceedings of the VLDB Endowment10.14778/3681954.368200617:11(3360-3372)Online publication date: 30-Aug-2024
  • (2022)Efficiently Answering Minimum Reachable Label Set Queries in Edge-Labeled GraphsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557593(4585-4589)Online publication date: 17-Oct-2022
  • (2022)Multi-relation Graph SummarizationACM Transactions on Knowledge Discovery from Data10.1145/349456116:5(1-30)Online publication date: 9-Mar-2022
  • (2022)Answering reachability and K-reach queries on large graphs with label constraintsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00695-031:1(101-127)Online publication date: 1-Jan-2022
  • (2021)Accelerating Set Intersections over Graphs by Reducing-MergingProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467219(2349-2359)Online publication date: 14-Aug-2021
  • (2020)Answering billion-scale label-constrained reachability queries within microsecondProceedings of the VLDB Endowment10.14778/3380750.338075313:6(812-825)Online publication date: 11-Mar-2020

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media