Abstract
The graph data keep growing over time in real life. The ever-growing amount of dynamic graph data demands efficient techniques of incremental graph computation. However, incremental graph algorithms are challenging to develop. Existing approaches usually require users to manually design nontrivial incremental operators, or choose different memoization strategies for certain specific types of computation, limiting the usability and generality. In light of these challenges, we propose \(\textsf{Ingress}\), an automated system for incremental graph proc essing. \(\textsf{Ingress}\) is able to deduce the incremental counterpart of a batch vertex-centric algorithm, without the need of redesigned logic or data structures from users. Underlying \(\textsf{Ingress}\) is an automated incrementalization framework equipped with four different memoization policies, to support all kinds of vertex-centric computations with optimized memory utilization. We identify sufficient conditions for the applicability of these policies. \(\textsf{Ingress}\) chooses the best-fit policy for a given algorithm automatically by verifying these conditions. In addition to the ease-of-use and generalization, \(\textsf{Ingress}\) outperforms state-of-the-art incremental graph systems by \(12.14\times \) on average (up to \(49.23\times \)) in efficiency.
Similar content being viewed by others
Notes
This is essentially the VAD-Reset approach in [50].
References
Acar, U.A.: Self-adjusting computation. Ph.D. thesis, CMU (2005)
Baluja, S., Seth, R., Sivakumar, D., Jing, Y., Yagnik, J., Kumar, S., Ravichandran, D., Aly, M.: Video suggestion and discovery for youtube: taking random walks through the view graph. In: WWW, pp. 895–904 (2008)
Bang-Jensen, J., Gutin, G.Z.: Digraphs-Theory, Algorithms and Applications, 2nd edn. Springer, Cham (2009)
Cai, Y., Giarrusso, P.G., Rendel, T., Ostermann, K.: A theory of changes for higher-order languages: incrementalizing \(\lambda \)-calculi by static differentiation. In: PLDI, pp. 145–155 (2014)
Chang, X., Liu, X., Wen, J., Li, S., Fang, Y., Song, L., Qi, Y.: Continuous-time dynamic graph learning via neural interaction processes. In: CIKM, pp. 145–154 (2020)
Europe-OSM. https://www.cise.ufl.edu/research/sparse/matrices/DIMACS10/europe_osm.html (2010)
Fan, W., Hu, C., Tian, C.: Incremental graph computations: Doable and undoable. In: SIGMOD, pp. 155–169 (2017)
Fan, W., Liu, M., Tian, C., Xu, R., Zhou, J.: Incrementalization of graph partitioning algorithms. PVLDB 13(8), 1261–1274 (2020)
Fan, W., Tiao, C., Xu, R., Yin, Q., Yu, W., Zhou, J.: Incrementalizing graph algorithms. pp. 459–471 (2022)
Fan, W., Xu, J., Wu, Y., Yu, W., Jiang, J., Zheng, Z., Zhang, B., Cao, Y., Tian, C.: Parallelizing sequential graph computations. In: SIGMOD, pp. 495–510 (2017)
Feng, G., Ma, Z., Li, D., Chen, S., Zhu, X., Han, W., Chen, W.: Risgraph: A real-time streaming system for evolving graphs to support sub-millisecond per-update analysis at millions ops/s. In: SIGMOD, pp. 513–527 (2021)
Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34(3), 596–615 (1987)
Gong, S., Tian, C., Yin, Q., Yu, W., Zhang, Y., Geng, L., Yu, S., Yu, G., Zhou, J.: Automating incremental graph processing with flexible memoization. PVLDB 14(9), 1613–1625 (2021)
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: OSDI, pp. 17–30 (2012)
Grädel, E., Kolaitis, P.G., Vardi, M.Y.: On the decision problem for two-variable first-order logic. Bull. Symb. Log. 3(1), 53–69 (1997)
Guan, Z., Wu, J., Zhang, Q., Singh, A.K., Yan, X.: Assessing and ranking structural correlations in graphs. In: SIGMOD, pp. 937–948 (2011)
Hamilton, W.L., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NIPS (2017)
Hammer, M.A., Khoo, Y.P., Hicks, M., Foster, J.S.: Adapton: composable, demand-driven incremental computation. In: PLDI, pp. 156–166 (2014)
Holm, J., de Lichtenberg, K., Thorup, M.: Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity. J. ACM 48(4), 723–760 (2001)
Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: KDD, pp. 538–543 (2002)
Jiang, X., Xu, C., Yin, X., Zhao, Z., Gupta, R.: Tripoline: generalized incremental graph processing via graph triangle inequality. In: EuroSys, pp. 17–32 (2021)
Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)
Kim, K., Seo, I., Han, W., Lee, J., Hong, S., Chafi, H., Shin, H., Jeong, G.: Turboflux: a fast continuous subgraph matching system for streaming graph data. In: SIGMOD, pp. 411–426 (2018)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
Li, R., Yu, J.X., Mao, R.: Efficient core maintenance in large dynamic graphs. TKDE 26(10), 2453–2465 (2014)
Liu, Y.A.: Efficiency by incrementalization: an introduction. High. Order Symb. Comput. 13(4), 289–313 (2000)
Luo, X., Liu, L., Yang, Y., Bo, L., Cao, Y., Wu, J., Li, Q., Yang, K., Zhu, K.Q.: Alicoco: Alibaba e-commerce cognitive concept net. In: SIGMOD, pp. 313–327 (2020)
Libgrape-Lite. https://github.com/alibaba/libgrape-lite
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)
Mariappan, M., Che, J., Vora, K.: Dzig: Sparsity-aware incremental processing of streaming graphs. In: EuroSys, pp. 83–98 (2021)
Mariappan, M., Vora, K.: Graphbolt: Dependency-driven synchronous processing of streaming graphs. In: EuroSys, pp. 1–16 (2019)
Matijasevič, Y.V.: Diophantine representation of recursively enumerable predicates. In: Studies in Logic and the Foundations of Mathematics, vol. 63, pp. 171–177 (1971)
McCune, R.R., Weninger, T., Madey, G.: Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv. 48(2), 1–39 (2015)
McGregor, A., Vorotnikova, S., Vu, H.T.: Better algorithms for counting triangles in data streams. In: PODS, pp. 401–411 (2016)
McSherry, F., Murray, D.G., Isaacs, R., Isard, M.: Differential dataflow. In: CIDR (2013)
de Moura, L.M., Bjørner, N.: Z3: an efficient SMT solver. In: TACAS, pp. 337–340 (2008)
Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: SOSP, pp. 439–455 (2013)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)
Pearl, J.: Reverend Bayes on inference engines: a distributed hierarchical approach. In: AAAI, pp. 129–138 (1982)
Rossi, R.A., Ahmed, N.K.: The network data repository with interactive graph analytics and visualization. In: AAAI (2015). http://networkrepository.com
Road-USA-Graph. https://www.cise.ufl.edu/research/sparse/matrices/DIMACS10/road_usa.html (2011)
Schieber, B., Vishkin, U.: On finding lowest common ancestors: simplification and parallelization. SIAM J. Comput. 17(6), 1253–1262 (1988)
Sengupta, D., Sundaram, N., Zhu, X., Willke, T.L., Young, J., Wolf, M., Schwan, K.: Graphin: an online high performance incremental graph processing framework. In: Euro-Par, pp. 319–333 (2016)
Shi, X., Cui, B., Shao, Y., Tong, Y.: Tornado: a system for real-time iterative analysis over evolving data. In: SIGMOD, pp. 417–430 (2016)
Sukhbaatar, S., Szlam, A., Fergus, R.: Learning multiagent communication with backpropagation. In: NIPS (2016)
Size of Wikipedia (2020). https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia
Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From “think like a vertex’’ to “think like a graph’’. PVLDB 7(3), 193–204 (2013)
UK-2005. https://www.cise.ufl.edu/research/sparse/matrices/LAW/uk-2005.html (2005)
Valiant, L.G.: A bridging model for parallel computation. CACM 33(8), 103–111 (1990)
Vora, K., Gupta, R., Xu, G.: Kickstarter: Fast and accurate computations on streaming graphs via trimmed approximations. In: ASPLOS, pp. 237–251 (2017)
Wang, Q., Zhang, Y., Wang, H., Geng, L., Lee, R., Zhang, X., Yu, G.: Automating incremental and asynchronous evaluation for recursive aggregate data processing. In: SIGMOD, pp. 2439–2454 (2020)
Xu, S., Zhang, H., Neubig, G., Dai, W., Kim, J.K., Deng, Z., Ho, Q., Yang, G., Xing, E.P.: Cavs: an efficient runtime system for dynamic neural networks. In: ATC, pp. 937–950 (2018)
Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. In: ICDM, pp. 1–8 (2015)
Zakian, T.A.K., Capelli, L.A.R., Hu, Z.: Incrementalization of vertex-centric programs. In: IPDPS, pp. 1019–1029 (2019)
Zhang, Y., Gao, Q., Gao, L., Wang, C.: Priter: a distributed framework for prioritized iterative computations. In: SOCC, pp. 1–14 (2011)
Zhang, Y., Gao, Q., Gao, L., Wang, C.: Maiter: an asynchronous graph processing framework for delta-based accumulative iterative computation. TPDS 25(8), 2091–2100 (2013)
Zhao, J., Yang, Y., Zhang, Y., Liao, X., Gu, L., He, L., He, B., Jin, H., Liu, H., Jiang, X., Yu, H.: Tdgraph: a topology-driven accelerator for high-performance streaming graph processing. In: ISCA, pp. 116–129 (2022)
Zheng, L., Li, Z., Li, J., Li, Z., Gao, J.: Addgraph: anomaly detection in dynamic graph using attention-based temporal GCN. In: IJCAI (2019)
Zhu, X., Chen, W., Zheng, W., Ma, X.: Gemini: A computation-centric distributed graph processing system. In: OSDI, pp. 301–316 (2016)
Acknowledgements
The work is supported by the National Natural Science Foundation of China (62202301, 62072082, U2241212, 62202088, 62137001, 62302027), the 111 Project (B16009), the Fundamental Research Funds for the Central Universities (N2216012, N2216015), and a research Grant from Alibaba Innovative Research (AIR) Program.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
A1. Proof of Lemma 1
Proof
Following the computation process of an \(\textsf{MF}\)-applicable \(\mathcal {A}\) as shown in Eq. (4), we have that
In the above equations, lines (a) and (b) are due to the property (C2) as defined in Sect. 4.1; line (c) follows from a simple induction; and line (d) is true because of Eq. (4) and the property (C3). \(\square \)
A2. Proof of Lemma 2
Proof
The incremental \(\textsf{MF}\)-applicable \(\mathcal {A}\) on \(G\oplus \varDelta G\) starts from the previous computation status (see Algorithm 1). By Lemma 1, we have that \(\hat{\mathbb {X}}^0=\mathbb {X}^k={\mathcal {U}}\left( \mathbb {X}^0\cup \bigcup _{\ell =0}^{k-1} \mathcal {G}^\ell (\mathbb {M}^0)\right) \). It remains to verify \(\hat{\mathbb {M}}^{0}\). Let \(\mathbb {M}_v = \bigcup _{\ell =0}^{k-1} m_v^{\ell }\) be the messages received by v in the computation over G. Observe the following.
(1) Algorithm 1 generates two sets of messages, \(M_v^{+}\) and \(M_v^{-}\), in case that the edges related to v are evolved, i.e., \(\mathcal {G}(\mathbb {M}_{v}) \ne \hat{\mathcal {G}}(\mathbb {M}_{v})\). By Algorithm 1, we have that \({\mathcal {U}}\left( \hat{\mathcal {G}}(\mathbb {M}_{v}) \ominus \mathcal {G}(\mathbb {M}_{v})\right) = {\mathcal {U}}(M_v^{+}\cup M_v^{-})\).
(2) If the edges related to v are not evolved, we have that \(\mathcal {G}(\mathbb {M}_v) = \hat{\mathcal {G}}(\mathbb {M}_v)\). By (C1) and (C2), it holds that:
That is, the messages in \(\hat{\mathcal {G}}(\mathbb {M}_v)\) and \(\mathcal {G}(\mathbb {M}_v)\) can be canceled out w.r.t. function \({\mathcal {U}}\). Thus there is no need to generate these messages in incremental computation over \(G\oplus \varDelta G\).
According to Algorithm 1, the initial messages \(\hat{\mathbb {M}}^{0}\) for the incremental computation consist of \({\mathcal {U}}(M_v^{+}\cup M_v^{-})\) for vertices with evolved edges. By the above analysis, vertices without evolved edges contribute empty even if the corresponding messages are also generated. As a consequence, we can express \(\hat{\mathbb {M}}^{0}\) as follows:
\(\square \)
A3. Proof of Lemma 3
Proof
Since the batch algorithm \(\mathcal {A}\) is \(\textsf{MP}\)-applicable, i.e., condition (C4) holds, each state \(x_v^k\) of an unreset vertex v in \(\mathbb {X}_N^k\) is determined by an effective message generated when running \(\mathcal {A}\) on G, denoted as \(m_v^c\). Hence \(x_v^k = m_v^c\). Observe that the right-hand side of Eq. (8) refers to the result of executing \(\mathcal {A}\) over the reserved graph \(G_\mathcal {R}\) (see Lemma 1). Then if for each unreset vertex v, the effective message \(m_v^c\) is the same as the corresponding effective message generated for v when invoking \(\mathcal {A}\) over \(G_\mathcal {R}\), then Eq. (8) holds. To show this precondition, it suffices to prove that \(m_v^c\) is transmitted from another unreset vertex \(v'\) and \((v', v)\) is not a deleted edge in \(\varDelta G\). This is because the computation of \(\mathcal {A}\) strictly follows the propagation of messages and it can be formally verified by induction on the rounds of the computation.
We next prove that each effective message \(m_v^c\) is sent from another unreset vertex via an edge that is not deleted by contradiction. Assume by contradiction that (i) a reset vertex \(v'\) sends \(m_v^c\) to the unreset v or (ii) \(m_v^c\) is transmitted along a deleted edge \((v',v)\). (i) If \(v'\) has been reset, then v must also be a reset vertex as Algorithm 2 propagates \(\bot \) along the paths formed by all effective messages, including \(m_v^c\). (ii) If \((v',v)\) is a deleted edge, then the state of v should also be reset since the deleted \((v',v)\) initiates a cancelation message in Algorithm 2. Hence either case leads to a contradiction. The correctness of Eq. 8 follows. \(\square \)
A4. Proof of Lemma 4
Proof
When messages \(\bot \) are propagated in Algorithm 2, the initial \(\hat{\mathbb {X}}^0\) includes the current states for both reset and unreset vertices. By Lemma 3 and the definition of changed graph \(G_\mathcal {C}\), we have that \(\hat{\mathbb {X}}^0 = {\mathcal {U}}\big (\mathbb {X}_\mathcal {R}^0 \cup \bigcup ^{k-1}_{\ell =0}\mathcal {G}_\mathcal {R}^\ell (\mathbb {M}_\mathcal {R}^0) \big ) \cup \mathbb {X}_\mathcal {C}^0 = {\mathcal {U}}\left( \mathbb {X}_\mathcal {R}^0 \cup \bigcup ^{k}_{\ell =0}\mathbb {M}_\mathcal {R}^\ell \right) \cup \mathbb {X}_\mathcal {C}^0 \).
We next analyze the initial compensation messages \(\hat{\mathbb {M}}_0\) initiated by Algorithm 2. By its operations, we can see that each initial compensation message can be sent from either an unreset vertex in \(V_\mathcal {R}\) or a reset vertex in \(V_\mathcal {C}\). In light of this, we denote by \(\hat{\mathbb {M}}^r\) and \(\hat{\mathbb {M}}^c\) the collections of initial compensation messages sent from unreset and reset vertices, respectively.
Suppose that an unreset vertex v sends an initial compensation message to \(v'\). Due to the logic of Algorithm 2 (line 4), \(v'\) must be a reset vertex or \((v, v')\) is an evolved edge. Hence edge \((v, v')\) cannot appear in the reserved graph \(G_\mathcal {R}\) under both cases. Recall that \(M_v^+\) denotes the initial compensation messages sent from v. Then for each unreset vertex \(v \in V_R\), \(M_v^+ = {\mathcal {U}}\circ \hat{\mathcal {G}}(x_v^k) {\setminus } {\mathcal {U}}\circ \mathcal {G}_\mathcal {R}(x_v^k)\). This expression also indicates that no actual message is created on unreset v if none of v’s adjacent edges is evolved or covers reset vertices. Based on this observation and Lemma 3, we have
Putting this and the fact that \(\hat{\mathbb {M}}^c {=} \mathbb {M}_\mathcal {C}^0\) together, i.e., the initial messages constructed at reset vertices are the same as their counterparts created in the changed graph, we have \(\hat{\mathbb {M}}^0 = \mathbb {M}^0_\mathcal {C}\cup \bigcup ^{k}_{\ell =0}\left( {\mathcal {U}}\circ \hat{\mathcal {G}}( \mathbb {M}^\ell _\mathcal {R}){\setminus } {\mathcal {U}}\circ \mathcal {G}_\mathcal {R}(\mathbb {M}^\ell _\mathcal {R})\right) \). \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gong, S., Tian, C., Yin, Q. et al. Ingress: an automated incremental graph processing system. The VLDB Journal 33, 781–806 (2024). https://doi.org/10.1007/s00778-024-00838-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-024-00838-z