Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Gossip-based visibility control for high-performance geo-distributed transactions

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Providing ACID transactions under conflicts across globally distributed data is the Everest of transaction processing protocols. Transaction processing in this scenario is particularly costly due to the high latency of cross-continent network links, which inflates concurrency control and data replication overheads. To mitigate the problem, we introduce Ocean Vista—a novel distributed protocol that guarantees strict serializability. We observe that concurrency control and replication address different aspects of resolving the visibility of transactions, and we address both concerns using a multi-version protocol that tracks visibility using version watermarks and arrives at correct visibility decisions using efficient gossip. Gossiping the watermarks enables asynchronous transaction processing and acknowledging transaction visibility in batches in the concurrency control and replication protocols, which improves efficiency under high cross-data center network delays. In particular, Ocean Vista can access conflicting transactions in parallel and supports efficient write-quorum/read-one access using one round trip in the common case. We demonstrate experimentally in a multi-data center cloud environment that our design outperforms a leading distributed transaction processing engine (TAPIR) more than tenfold in terms of peak throughput, albeit at the cost of additional latency for gossip and a more restricted transaction model. The latency penalty is generally bounded by one wide area network (WAN) round trip time (RTT), and in the best case (i.e., under light load) our system nearly breaks even with TAPIR by committing transactions in around one WAN RTT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Super-quorums are defined as in FastPaxos [22].

  2. An alternative expression is using a super-quorum size \(2f + 1\) out of \(3f + 1\) replicas.

References

  1. Balakrishnan, M., Malkhi, D., Davis, J.D., Prabhakaran, V., Wei, M., Wobber, T.: CORFU: a distributed shared log. ACM Trans. Comput. Syst. 31(4), 10:1–10:24 (2013)

    Article  Google Scholar 

  2. Bernstein, P., Reid, C., Das, S.: Hyder—a transactional record manager for shared flash. In: Proceedings of the 5th Biennial Conference on Innovative Data Systems Research, CIDR’11, pp. 9–20 (2011)

  3. Bernstein, P.A., Das, S., Ding, B., Pilman, M.: Optimizing optimistic concurrency control for tree-structured, log-structured databases. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD’15, pp. 1295–1309 (2015)

  4. Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Database Systems. Addison-Wesley Longman Publishing Co. Inc, Boston, MA (1987)

    Google Scholar 

  5. Bernstein, P.A., Shipman, D.W., Wong, W.S.: Formal aspects of serializability in database concurrency control. IEEE Trans. Softw. Eng. 5(3), 203–216 (1979)

    Article  MathSciNet  Google Scholar 

  6. Chockler, G.V., Keidar, I., Vitenberg, R.: Group communication specifications: a comprehensive study. ACM Comput. Surv. 33(4), 427–469 (2001)

    Article  Google Scholar 

  7. Corbett, J.C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J.J., Ghemawat, S., Gubarev, A., Heiser, C., Hochschild, P., Hsieh, W., Kanthak, S., Kogan, E., Li, H., Lloyd, A., Melnik, S., Mwaura, D., Nagle, D., Quinlan, S., Rao, R., Rolig, L., Saito, Y., Szymaniak, M., Taylor, C., Wang, R., Woodford, D.: Spanner: Google’s globally distributed database. ACM Trans. Comput. Syst. 31(3), 8:1–8:22 (2013)

  8. Dey, A., Fekete, A., Nambiar, R., Röhm, U.: YCSB+T: Benchmarking web-scale transactional databases. In: IEEE 30th International Conference on Data Engineering Workshops, pp. 223–230 (2014)

  9. Didona, D., Guerraoui, R., Wang, J., Zwaenepoel, W.: Causal consistency and latency optimality: friend or foe? PVLDB 11(11), 1618–1632 (2018)

    Google Scholar 

  10. Facebook: fbthrift. https://github.com/facebook/fbthrift

  11. Faleiro, J.M., Abadi, D.J.: Rethinking serializable multiversion concurrency control. PVLDB 8(11), 1190–1201 (2015)

    Google Scholar 

  12. Faleiro, J.M., Abadi, D.J., Hellerstein, J.M.: High performance transactions via early write visibility. PVLDB 10(5), 613–624 (2017)

    Google Scholar 

  13. Fan, H., Golab, W.: Scalable transaction processing using functors. In: Proceedings of the 38th IEEE International Conference on Distributed Computing Systems, ICDCS’18, pp. 1004–1016 (2018)

  14. Fan, H., Golab, W.: Ocean vista: gossip-based visibility control for speedy geo-distributed transactions. PVLDB 12(11), 1471–1484 (2019)

    Google Scholar 

  15. Goel, A.K., Pound, J., Auch, N., Bumbulis, P., MacLean, S., Färber, F., Gropengiesser, F., Mathis, C., Bodner, T., Lehner, W.: Towards scalable real-time analytics: an architecture for scale-out of OLxP workloads. PVLDB 8(12), 1716–1727 (2015)

    Google Scholar 

  16. Herlihy, M.P., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990)

    Article  Google Scholar 

  17. Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: ZooKeeper: Wait-free coordination for internet-scale systems. In: Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC’10, pp. 11–11 (2010)

  18. Kallman, R., Kimura, H., Natkins, J., Pavlo, A., Rasin, A., Zdonik, S., Jones, E.P.C., Madden, S., Stonebraker, M., Zhang, Y., Hugg, J., Abadi, D.J.: H-store: a high-performance, distributed main memory transaction processing system. PVLDB 1(2), 1496–1499 (2008)

    Google Scholar 

  19. Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the 29th Annual ACM Symposium on Theory of Computing, STOC’97, pp. 654–663 (1997)

  20. Kraska, T., Pang, G., Franklin, M.J., Madden, S., Fekete, A.: MDCC: Multi-data center consistency. In: Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys’13, pp. 113–126 (2013)

  21. Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133–169 (1998)

    Article  Google Scholar 

  22. Lamport, L.: Fast Paxos. Distrib. Comput. 19, 79–103 (2006)

    Article  Google Scholar 

  23. Leau, C.: Spring Data Redis—Retwis-J. https://docs.spring.io/springdata/ data-keyvalue/examples/retwisj/current/ (2013)

  24. Levandoski, J.J., Lomet, D.B., Sengupta, S.: The Bw-Tree: A B-tree for new hardware platforms. In: Proceedings of the 2013 IEEE International Conference on Data Engineering, ICDE’13, pp. 302–313 (2013)

  25. Lloyd, W., Freedman, M.J., Kaminsky, M., Andersen, D.G.: Stronger semantics for low-latency geo-replicated storage. In: Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, NSDI’13, pp. 313–328 (2013)

  26. Lockerman, J., Faleiro, J.M., Kim, J., Sankaran, S., Abadi, D.J., Aspnes, J., Sen, S., Balakrishnan, M.: The fuzzylog: A partially ordered shared log. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI’18, pp. 357–372 (2018)

  27. Mu, S., Nelson, L., Lloyd, W., Li, J.: Consolidating concurrency control and consensus for commits under conflicts. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI’16, pp. 517–532 (2016)

  28. Ongaro, D., Ousterhout, J.: In search of an understandable consensus algorithm. In: Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, USENIXATC’14, pp. 305–320 (2014)

  29. Papadimitriou, C.H.: The serializability of concurrent database updates. J. ACM 26(4), 631–653 (1979)

    Article  MathSciNet  Google Scholar 

  30. Ren, K., Li, D., Abadi, D.J.: SLOG: serializable, low-latency, geo-replicated transactions. PVLDB 12(11), 1747–1761 (2019)

    Google Scholar 

  31. Ren, K., Thomson, A., Abadi, D.J.: An evaluation of the advantages and disadvantages of deterministic database systems. PVLDB 7(10), 821–832 (2014)

    Google Scholar 

  32. Sovran, Y., Power, R., Aguilera, M.K., Li, J.: Transactional storage for geo-replicated systems. In: Proceedings of the 23rd ACM Symposium on Operating Systems Principles, SOSP’11, pp. 385–400 (2011)

  33. Thomson, A., Diamond, T., Weng, S.C., Ren, K., Shao, P., Abadi, D.J.: Calvin: Fast distributed transactions for partitioned database systems. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD’12, pp. 1–12 (2012)

  34. UWSysLab: TAPIR implementation. https://github.com/UWSysLab/tapir (2018)

  35. Verbitski, A., Gupta, A., Saha, D., Brahmadesam, M., Gupta, K., Mittal, R., Krishnamurthy, S., Maurice, S., Kharatishvili, T., Bao, X.: Amazon Aurora: Design considerations for high throughput cloud-native relational databases. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD’17, pp. 1041–1052 (2017)

  36. Verbitski, A., Gupta, A., Saha, D., Corey, J., Gupta, K., Brahmadesam, M., Mittal, R., Krishnamurthy, S., Maurice, S., Kharatishvilli, T., Bao, X.: Amazon Aurora: On avoiding distributed consensus for I/Os, commits, and membership changes. In: Proceedings of the 2018 International Conference on Management of Data, SIGMOD’18, pp. 789–796 (2018)

  37. Yan, X., Yang, L., Zhang, H., Lin, X.C., Wong, B., Salem, K., Brecht, T.: Carousel: Low-latency transaction processing for globally-distributed data. In: Proceedings of the 2018 International Conference on Management of Data, SIGMOD’18, pp. 231–243 (2018)

  38. Zhang, I., Sharma, N.K., Szekeres, A., Krishnamurthy, A., Ports, D.R.K.: Building consistent transactions with inconsistent replication. ACM Trans. Comput. Syst. 35(4), 12:1–12:37 (2018)

    Article  Google Scholar 

  39. Zhang, Y., Power, R., Zhou, S., Sovran, Y., Aguilera, M.K., Li, J.: Transaction chains: achieving serializability with low latency in geo-distributed storage systems. In: Proceedings of the 24th ACM symposium on operating systems principles, SOSP’13, pp. 276–291 (2013)

Download references

Acknowledgements

We are grateful to the anonymous reviewers from both PVLDB and VLDBJ and to Ken Salem, for their helpful feedback on this research. Wojciech Golab is supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada and by Ripple.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hua Fan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, H., Golab, W. Gossip-based visibility control for high-performance geo-distributed transactions. The VLDB Journal 30, 93–114 (2021). https://doi.org/10.1007/s00778-020-00626-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-020-00626-5

Keywords

Navigation