Maintaining acyclic foreign-key joins under updates
Proceedings of the 2020 ACM SIGMOD International Conference on Management of …, 2020•dl.acm.org
A large number of analytical queries (eg, all the 22 queries in the TPC-H benchmark) are
based on acyclic foreign-key joins. In this paper, we study the problem of incrementally
maintaining the query results of these joins under updates, ie, insertion and deletion of
tuples to any of the relations. Prior work has shown that this problem is inherently hard,
requiring at least Ω (| db| 1/2-ε) time per update, where| db| is the size of the database, and
ε> 0 can be any small constant. However, this negative result holds only on adversarially …
based on acyclic foreign-key joins. In this paper, we study the problem of incrementally
maintaining the query results of these joins under updates, ie, insertion and deletion of
tuples to any of the relations. Prior work has shown that this problem is inherently hard,
requiring at least Ω (| db| 1/2-ε) time per update, where| db| is the size of the database, and
ε> 0 can be any small constant. However, this negative result holds only on adversarially …
A large number of analytical queries (e.g., all the 22 queries in the TPC-H benchmark) are based on acyclic foreign-key joins. In this paper, we study the problem of incrementally maintaining the query results of these joins under updates, i.e., insertion and deletion of tuples to any of the relations. Prior work has shown that this problem is inherently hard, requiring at least Ω(|db|1/2 -ε) time per update, where |db| is the size of the database, and ε > 0 can be any small constant. However, this negative result holds only on adversarially constructed update sequences; on the other hand, most real-world update sequences are "nice", nowhere near these worst-case scenarios. We introduce a measure λ, which we call the enclosureness of the update sequence, to more precisely characterize its intrinsic difficulty. We present an algorithm to maintain the query results of any acyclic foreign-key join in O(λ) time amortized, on any update sequence whose enclosureness is λ. This is complemented with a lower bound of Ω(λ1-ε), showing that our algorithm is essentially optimal with respect to λ. Next, using this algorithm as the core component, we show how all the 22 queries in the TPC-H benchmark can be supported in ~O(łambda) time. Finally, based on the algorithms developed, we built a continuous query processing system on top of Flink, and experimental results show that our system outperforms previous ones significantly.
ACM Digital Library