Article

FaSST: fast, scalable and simple distributed transactions with two-sided (RDMA) datagram RPCs

Authors:

Michael Kaminsky,

David G. AndersenAuthors Info & Claims

OSDI'16: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation

Pages 185 - 201

Published: 02 November 2016 Publication History

Abstract

FaSST is an RDMA-based system that provides distributed in-memory transactions with serializability and durability. Existing RDMA-based transaction processing systems use one-sided RDMA primitives for their ability to bypass the remote CPU. This design choice brings several drawbacks. First, the limited flexibility of one-sided RDMA reduces performance and increases software complexity when designing distributed data stores. Second, deep-rooted technical limitations of RDMA hardware limit scalability in large clusters. FaSST eschews one-sided RDMA for fast RPCs using two-sided unreliable datagrams, which we show drop packets extremely rarely on modern RDMA networks. This approach provides better performance, scalability, and simplicity, without requiring expensive reliability mechanisms in software. In comparison with published numbers, FaSST outperforms FaRM on the TATP benchmark by almost 2x while using close to half the hardware resources, and it outperforms DrTM+R on the SmallBank benchmark by around 1.7x without making data locality assumptions.

References

[1]

Private communication with FaRM's authors.

[2]

Mellanox Connect-IB product brief. http: //www.mellanox.com/related-docs/prod_adapter_cards/PB_Connect-IB.pdf, 2015.

[3]

Mellanox OFED for Linux user manual. http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_User_Manual_v2.2-1.0.1.pdf, 2015.

[4]

B. W. Barrett, R. Brightwell, S. Hemmert, K. Pedretti, K. Wheeler, K. Underwood, R. Riesen, A. B. Maccabe, and T. Hudson. The Portals 4.0 network programming interface november 14, 2012 draft.

[5]

C. Binnig, A. Crotty, A. Galakatos, T. Kraska, and E. Zamanian. The end of slow networks: It's time for a redesign. In Proc. VLDB, New Delhi, India, Aug. 2016.

Digital Library

[6]

M. S. Birrittella, M. Debbage, R. Huggahalli, J. Kunz, T. Lovett, T. Rimmer, K. D. Underwood, and R. C. Zak. Intel Omni-path architecture: Enabling scalable, high performance fabrics. In Proceedings of the 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, 2015.

Digital Library

[7]

Y. Chen, X. Wei, J. Shi, R. Chen, and H. Chen. Fast and general distributed transactions using RDMA and HTM. In Proc. 11th ACM European Conference on Computer Systems (EuroSys), Apr. 2016.

Digital Library

[8]

D. Crupnicoff, M. Kagan, A. Shahar, N. Bloch, and H. Chapman. Dynamically-connected transport service, May 19 2011. URL https://www.google.com/patents/US20110116512. US Patent App. 12/621,523.

[9]

G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proc. 21st ACM Symposium on Operating Systems Principles (SOSP), Stevenson, WA, Oct. 2007.

Digital Library

[10]

S. Derradji, T. Palfer-Sollier, J.-P. Panziera, A. Poudes, and F. W. Atos. The BXI interconnect architecture. In Proceedings of the 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, 2015.

Digital Library

[11]

A. Dragojevic, D. Narayanan, O. Hodson, and M. Castro. FaRM: Fast remote memory. In Proc. 11th USENIX NSDI, Seattle, WA, Apr. 2014.

Digital Library

[12]

A. Dragojevic, D. Narayanan, E. B. Nightingale, M. Renzelmann, A. Shamis, A. Badam, and M. Castro. No compromises: Distributed transactions with consistency, availability, and performance. In Proc. 25th ACM Symposium on Operating Systems Principles (SOSP), Monterey, CA, Oct. 2015.

Digital Library

[13]

D. Dunning, G. Regnier, G. McAlpine, D. Cameron, B. Shubert, F. Berry, A. M. Merritt, E. Gronke, and C. Dodd. The virtual interface architecture. IEEE Micro, pages 66-76, 1998.

Digital Library

[14]

G. Gibson, G. Grider, A. Jacobson, and W. Lloyd. PRObE: A Thousand-Node Experimental Cluster for Computer Systems Research.

[15]

A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA efficiently for key-value services. In Proc. ACM SIGCOMM, Chicago, IL, Aug. 2014.

Digital Library

[16]

A. Kalia, M. Kaminsky, and D. G. Andersen. Design guidelines for high-performance RDMA systems. In Proc. USENIX Annual Technical Conference, Denver, CO, June 2016.

Digital Library

[17]

L. Lamport, D. Malkhi, and L. Zhou. Vertical Paxos and primary-backup replication. Technical report, Microsoft Research, 2009.

[18]

H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. MICA: A holistic approach to fast in-memory key-value storage. In Proc. 11th USENIX NSDI, Seattle, WA, Apr. 2014.

Digital Library

[19]

Y. Mao, E. Kohler, and R. T. Morris. Cache craftiness for fast multicore key-value storage. In Proc. 7th ACM European Conference on Computer Systems (EuroSys), Bern, Switzerland, Apr. 2012.

Digital Library

[20]

C. Mitchell, Y. Geng, and J. Li. Using one-sided RDMA reads to build a fast, CPU-efficient key-value store. In Proc. USENIX Annual Technical Conference, San Jose, CA, June 2013.

Digital Library

[21]

C. Mitchell, K. Montgomery, L. Nelson, S. Sen, and J. Li. Balancing CPU and network in the cell distributed B-Tree store. In Proc. USENIX Annual Technical Conference, Denver, CO, June 2016.

Digital Library

[22]

J. Nelson, B. Holt, B. Myers, P. Briggs, L. Ceze, S. Kahan, and M. Oskin. Latency-tolerant software distributed shared memory. In Proc. USENIX Annual Technical Conference, Santa Clara, CA, June 2015.

Digital Library

[23]

S. Raikin, L. Liss, A. Shachar, N. Bloch, and M. Kagan. Remote transactional memory, 2015. US Patent App. 20150269116.

[24]

J. W. Stamos and F. Cristian. Coordinator log transaction execution protocol. Distrib. Parallel Databases, 1(4):383-408, Oct. 1993. ISSN 0926- 8782. URL http://dx.doi.org/10.1007/BF01264014.

Digital Library

[25]

A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi. Calvin: Fast distributed transactions for partitioned database systems. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, May 2012.

Digital Library

[26]

TPC-C. TPC benchmark C. http://www.tpc.org/tpcc/, 2010.

[27]

S. Tu, W. Zheng, E. Kohler, B. Liskov, and S. Madden. Speedy transactions in multicore in-memory databases. In Proc. 24th ACM Symposium on Operating Systems Principles (SOSP), Farmington, PA, Nov. 2013.

Digital Library

[28]

X. Wei, J. Shi, Y. Chen, R. Chen, and H. Chen. Fast in-memory transaction processing using RDMA and HTM. In Proc. 25th ACM Symposium on Operating Systems Principles (SOSP), Monterey, CA, Oct. 2015.

Digital Library

[29]

B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler, C. Barb, and A. Joglekar. An integrated experimental environment for distributed systems and networks. In Proc. 5th USENIX OSDI, pages 255-270, Boston, MA, Dec. 2002.

Digital Library

[30]

H. Zhang, D. G. Andersen, A. Pavlo, M. Kaminsky, L. Ma, and R. Shen. Reducing the storage overhead of main-memory OLTP databases with hybrid indexes. In Proc. ACM SIGMOD, San Francisco, USA, June 2016.

Digital Library

Cited By

Yang ZWang QLiao XLu YHuang KShu JMa XWon Y(2024)TeRMProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650698(1-16)Online publication date: 27-Feb-2024
https://dl.acm.org/doi/10.5555/3650697.3650698
Lu BHuang KLiang CWang TLo E(2024)DEX: Scalable Range Indexing on Disaggregated MemoryProceedings of the VLDB Endowment10.14778/3675034.367505017:10(2603-2616)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.14778/3675034.3675050
Zhu ZNi NHuang YSun YJia ZKim NWitchel E(2024)Lupin: Tolerating Partial Failures in a CXL PodProceedings of the 2nd Workshop on Disruptive Memory Systems10.1145/3698783.3699377(41-50)Online publication date: 3-Nov-2024
https://dl.acm.org/doi/10.1145/3698783.3699377
Show More Cited By

FaSST: fast, scalable and simple distributed transactions with two-sided (RDMA) datagram RPCs

Recommendations

WFR-TM

Transactional Memory (TM) is a promising concurrent programming paradigm which employs transactions to achieve synchronization in accessing common data known as transactional variables. A transaction may either commit, making its updates to ...
DudeTx: Durable Transactions Made Decoupled
Special Issue on NVM and Storage

Emerging non-volatile memory (NVM) offers non-volatility, byte-addressability, and fast access at the same time. It is suggested that programs should access NVM directly through CPU load and store instructions. To guarantee crash consistency, durable ...
WOM-Code Solutions for Low Latency and High Endurance in Phase Change Memory
This paper describes a write-once-memory-code phase change memory (WOM-code PCM) architecture for next-generation non-volatile memory applications. Specifically, we address the long latency of the write operation in PCM—attributed to PCM SET—...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

OSDI'16: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation

November 2016

786 pages

ISBN:9781931971331

Program Chairs:
Kimberly Keeton
Hewlett Packard Labs
,
Timothy Roscoe
ETH Zurich

Sponsors

VMware
NetApp
Google Inc.
Microsoft: Microsoft
Facebook: Facebook

In-Cooperation

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

USENIX Association

United States

Publication History

Published: 02 November 2016

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

74
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang ZWang QLiao XLu YHuang KShu JMa XWon Y(2024)TeRMProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650698(1-16)Online publication date: 27-Feb-2024
https://dl.acm.org/doi/10.5555/3650697.3650698
Lu BHuang KLiang CWang TLo E(2024)DEX: Scalable Range Indexing on Disaggregated MemoryProceedings of the VLDB Endowment10.14778/3675034.367505017:10(2603-2616)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.14778/3675034.3675050
Zhu ZNi NHuang YSun YJia ZKim NWitchel E(2024)Lupin: Tolerating Partial Failures in a CXL PodProceedings of the 2nd Workshop on Disruptive Memory Systems10.1145/3698783.3699377(41-50)Online publication date: 3-Nov-2024
https://dl.acm.org/doi/10.1145/3698783.3699377
Lu KZhao SShan HWei QLi GWan JYao TWu HWang D(2024)Scythe: A Low-latency RDMA-enabled Distributed Transaction System for Disaggregated MemoryACM Transactions on Architecture and Code Optimization10.1145/366600421:3(1-26)Online publication date: 27-May-2024
https://dl.acm.org/doi/10.1145/3666004
Jensen CHoward HKatsarakis AMortier R(2024)Unanimous 2PC: Fault-tolerant Distributed Transactions Can be Fast and SimpleProceedings of the 11th Workshop on Principles and Practice of Consistency for Distributed Data10.1145/3642976.3653035(44-57)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3642976.3653035
Jasny MThostrup LTamimi SKoch AIstván ZBinnig C(2024)Zero-sided RDMA: Network-driven Data Shuffling for Disaggregated Heterogeneous Cloud DBMSsProceedings of the ACM on Management of Data10.1145/36392912:1(1-28)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639291
Zhu BChen YShu J(2024)Exploring the Asynchrony of Slow Memory Filesystem with EasyIOProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629586(624-640)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3629586
Lu FWei XHuang ZChen RWu MChen H(2024)Serialization/Deserialization-free State Transfer in Serverless WorkflowsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629568(132-147)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3629568
Shen JZuo PLuo XYang TSu YZhou YLyu MNaor DGoel A(2023)FUSEEProceedings of the 21st USENIX Conference on File and Storage Technologies10.5555/3585938.3585944(81-97)Online publication date: 21-Feb-2023
https://dl.acm.org/doi/10.5555/3585938.3585944
Zhang QLi JZhao HXu QLu WXiao JHan FYang CDu X(2023)Efficient Distributed Transaction Processing in Heterogeneous NetworksProceedings of the VLDB Endowment10.14778/3583140.358315316:6(1372-1385)Online publication date: 20-Apr-2023
https://dl.acm.org/doi/10.14778/3583140.3583153
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents