Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3026877.3026892acmotherconferencesArticle/Chapter ViewAbstractPublication PagesosdiConference Proceedingsconference-collections
Article

FaSST: fast, scalable and simple distributed transactions with two-sided (RDMA) datagram RPCs

Published: 02 November 2016 Publication History

Abstract

FaSST is an RDMA-based system that provides distributed in-memory transactions with serializability and durability. Existing RDMA-based transaction processing systems use one-sided RDMA primitives for their ability to bypass the remote CPU. This design choice brings several drawbacks. First, the limited flexibility of one-sided RDMA reduces performance and increases software complexity when designing distributed data stores. Second, deep-rooted technical limitations of RDMA hardware limit scalability in large clusters. FaSST eschews one-sided RDMA for fast RPCs using two-sided unreliable datagrams, which we show drop packets extremely rarely on modern RDMA networks. This approach provides better performance, scalability, and simplicity, without requiring expensive reliability mechanisms in software. In comparison with published numbers, FaSST outperforms FaRM on the TATP benchmark by almost 2x while using close to half the hardware resources, and it outperforms DrTM+R on the SmallBank benchmark by around 1.7x without making data locality assumptions.

References

[1]
Private communication with FaRM's authors.
[2]
Mellanox Connect-IB product brief. http: //www.mellanox.com/related-docs/prod_adapter_cards/PB_Connect-IB.pdf, 2015.
[3]
Mellanox OFED for Linux user manual. http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_User_Manual_v2.2-1.0.1.pdf, 2015.
[4]
B. W. Barrett, R. Brightwell, S. Hemmert, K. Pedretti, K. Wheeler, K. Underwood, R. Riesen, A. B. Maccabe, and T. Hudson. The Portals 4.0 network programming interface november 14, 2012 draft.
[5]
C. Binnig, A. Crotty, A. Galakatos, T. Kraska, and E. Zamanian. The end of slow networks: It's time for a redesign. In Proc. VLDB, New Delhi, India, Aug. 2016.
[6]
M. S. Birrittella, M. Debbage, R. Huggahalli, J. Kunz, T. Lovett, T. Rimmer, K. D. Underwood, and R. C. Zak. Intel Omni-path architecture: Enabling scalable, high performance fabrics. In Proceedings of the 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, 2015.
[7]
Y. Chen, X. Wei, J. Shi, R. Chen, and H. Chen. Fast and general distributed transactions using RDMA and HTM. In Proc. 11th ACM European Conference on Computer Systems (EuroSys), Apr. 2016.
[8]
D. Crupnicoff, M. Kagan, A. Shahar, N. Bloch, and H. Chapman. Dynamically-connected transport service, May 19 2011. URL https://www.google.com/patents/US20110116512. US Patent App. 12/621,523.
[9]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proc. 21st ACM Symposium on Operating Systems Principles (SOSP), Stevenson, WA, Oct. 2007.
[10]
S. Derradji, T. Palfer-Sollier, J.-P. Panziera, A. Poudes, and F. W. Atos. The BXI interconnect architecture. In Proceedings of the 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, 2015.
[11]
A. Dragojevic, D. Narayanan, O. Hodson, and M. Castro. FaRM: Fast remote memory. In Proc. 11th USENIX NSDI, Seattle, WA, Apr. 2014.
[12]
A. Dragojevic, D. Narayanan, E. B. Nightingale, M. Renzelmann, A. Shamis, A. Badam, and M. Castro. No compromises: Distributed transactions with consistency, availability, and performance. In Proc. 25th ACM Symposium on Operating Systems Principles (SOSP), Monterey, CA, Oct. 2015.
[13]
D. Dunning, G. Regnier, G. McAlpine, D. Cameron, B. Shubert, F. Berry, A. M. Merritt, E. Gronke, and C. Dodd. The virtual interface architecture. IEEE Micro, pages 66-76, 1998.
[14]
G. Gibson, G. Grider, A. Jacobson, and W. Lloyd. PRObE: A Thousand-Node Experimental Cluster for Computer Systems Research.
[15]
A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA efficiently for key-value services. In Proc. ACM SIGCOMM, Chicago, IL, Aug. 2014.
[16]
A. Kalia, M. Kaminsky, and D. G. Andersen. Design guidelines for high-performance RDMA systems. In Proc. USENIX Annual Technical Conference, Denver, CO, June 2016.
[17]
L. Lamport, D. Malkhi, and L. Zhou. Vertical Paxos and primary-backup replication. Technical report, Microsoft Research, 2009.
[18]
H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. MICA: A holistic approach to fast in-memory key-value storage. In Proc. 11th USENIX NSDI, Seattle, WA, Apr. 2014.
[19]
Y. Mao, E. Kohler, and R. T. Morris. Cache craftiness for fast multicore key-value storage. In Proc. 7th ACM European Conference on Computer Systems (EuroSys), Bern, Switzerland, Apr. 2012.
[20]
C. Mitchell, Y. Geng, and J. Li. Using one-sided RDMA reads to build a fast, CPU-efficient key-value store. In Proc. USENIX Annual Technical Conference, San Jose, CA, June 2013.
[21]
C. Mitchell, K. Montgomery, L. Nelson, S. Sen, and J. Li. Balancing CPU and network in the cell distributed B-Tree store. In Proc. USENIX Annual Technical Conference, Denver, CO, June 2016.
[22]
J. Nelson, B. Holt, B. Myers, P. Briggs, L. Ceze, S. Kahan, and M. Oskin. Latency-tolerant software distributed shared memory. In Proc. USENIX Annual Technical Conference, Santa Clara, CA, June 2015.
[23]
S. Raikin, L. Liss, A. Shachar, N. Bloch, and M. Kagan. Remote transactional memory, 2015. US Patent App. 20150269116.
[24]
J. W. Stamos and F. Cristian. Coordinator log transaction execution protocol. Distrib. Parallel Databases, 1(4):383-408, Oct. 1993. ISSN 0926- 8782. URL http://dx.doi.org/10.1007/BF01264014.
[25]
A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi. Calvin: Fast distributed transactions for partitioned database systems. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, May 2012.
[26]
TPC-C. TPC benchmark C. http://www.tpc.org/tpcc/, 2010.
[27]
S. Tu, W. Zheng, E. Kohler, B. Liskov, and S. Madden. Speedy transactions in multicore in-memory databases. In Proc. 24th ACM Symposium on Operating Systems Principles (SOSP), Farmington, PA, Nov. 2013.
[28]
X. Wei, J. Shi, Y. Chen, R. Chen, and H. Chen. Fast in-memory transaction processing using RDMA and HTM. In Proc. 25th ACM Symposium on Operating Systems Principles (SOSP), Monterey, CA, Oct. 2015.
[29]
B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler, C. Barb, and A. Joglekar. An integrated experimental environment for distributed systems and networks. In Proc. 5th USENIX OSDI, pages 255-270, Boston, MA, Dec. 2002.
[30]
H. Zhang, D. G. Andersen, A. Pavlo, M. Kaminsky, L. Ma, and R. Shen. Reducing the storage overhead of main-memory OLTP databases with hybrid indexes. In Proc. ACM SIGMOD, San Francisco, USA, June 2016.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
OSDI'16: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation
November 2016
786 pages
ISBN:9781931971331

Sponsors

  • VMware
  • NetApp
  • Google Inc.
  • Microsoft: Microsoft
  • Facebook: Facebook

In-Cooperation

Publisher

USENIX Association

United States

Publication History

Published: 02 November 2016

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)TeRMProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650698(1-16)Online publication date: 27-Feb-2024
  • (2024)DEX: Scalable Range Indexing on Disaggregated MemoryProceedings of the VLDB Endowment10.14778/3675034.367505017:10(2603-2616)Online publication date: 1-Jun-2024
  • (2024)Lupin: Tolerating Partial Failures in a CXL PodProceedings of the 2nd Workshop on Disruptive Memory Systems10.1145/3698783.3699377(41-50)Online publication date: 3-Nov-2024
  • (2024)Scythe: A Low-latency RDMA-enabled Distributed Transaction System for Disaggregated MemoryACM Transactions on Architecture and Code Optimization10.1145/366600421:3(1-26)Online publication date: 27-May-2024
  • (2024)Unanimous 2PC: Fault-tolerant Distributed Transactions Can be Fast and SimpleProceedings of the 11th Workshop on Principles and Practice of Consistency for Distributed Data10.1145/3642976.3653035(44-57)Online publication date: 22-Apr-2024
  • (2024)Zero-sided RDMA: Network-driven Data Shuffling for Disaggregated Heterogeneous Cloud DBMSsProceedings of the ACM on Management of Data10.1145/36392912:1(1-28)Online publication date: 26-Mar-2024
  • (2024)Exploring the Asynchrony of Slow Memory Filesystem with EasyIOProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629586(624-640)Online publication date: 22-Apr-2024
  • (2024)Serialization/Deserialization-free State Transfer in Serverless WorkflowsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629568(132-147)Online publication date: 22-Apr-2024
  • (2023)FUSEEProceedings of the 21st USENIX Conference on File and Storage Technologies10.5555/3585938.3585944(81-97)Online publication date: 21-Feb-2023
  • (2023)Efficient Distributed Transaction Processing in Heterogeneous NetworksProceedings of the VLDB Endowment10.14778/3583140.358315316:6(1372-1385)Online publication date: 20-Apr-2023
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media