Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2616448.2616486acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnsdiConference Proceedingsconference-collections
Article

FaRM: fast remote memory

Published: 02 April 2014 Publication History

Abstract

We describe the design and implementation of FaRM, a new main memory distributed computing platform that exploits RDMA to improve both latency and throughput by an order of magnitude relative to state of the art main memory systems that use TCP/IP. FaRM exposes the memory of machines in the cluster as a shared address space. Applications can use transactions to allocate, read, write, and free objects in the address space with location transparency. We expect this simple programming model to be sufficient for most application code. FaRM provides two mechanisms to improve performance where required: lock-free reads over RDMA, and support for collocating objects and function shipping to enable the use of efficient single machine transactions. FaRM uses RDMA both to directly access data in the shared address space and for fast messaging and is carefully tuned for the best RDMA performance. We used FaRM to build a key-value store and a graph store similar to Facebook's. They both perform well, for example, a 20-machine cluster can perform 167 million key-value lookups per second with a latency of 31µs.

References

[1]
Memcached. http://memcached.org.
[2]
Viking Technology. http://www.vikingtechnology.com/.
[3]
AGUILERA, M. K., MERCHANT, A., SHAH, M., VEITCH, A., AND KARAMANOLIS, C. Sinfonia: A New Paradigm for Building Scalable Distributed Systems. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles (2007), SOSP '07.
[4]
ALLEN, E., CHASE, D., HALLETT, J., LUCHANGCO, V., MAESSEN, J.-W., RYU, S., STEELE JR, G. L., TOBINHOCHSTADT, S., DIAS, J., EASTLUND, C., ET AL. The Fortress language specification. Sun Microsystems 139 (2005), 140.
[5]
AMZA, C., COX, A. L., DWARKADAS, S., KELEHER, P. J., LU, H., RAJAMONY, R., YU, W., AND ZWAENEPOEL, W. Treadmarks: Shared memory computing on networks of workstations. IEEE Computer 29, 2 (1996), 18-28.
[6]
ANJALI, G., BARBARA, L., AND RODRIGO, R. Efficient routing for peer-to-peer overlays. In Proceedings of the 1st Symposium on Networked Systems Design and Implementation (2004), NSDI'04.
[7]
APPAVOO, J., WATERLAND, A., DA SILVA, D., UHLIG, V., ROSENBURG, B., VAN HENSBERGEN, E., STOESS, J., WISNIEWSKI, R., AND STEINBERG, U. Providing a cloud network infrastructure on a supercomputer. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (2010), HPDC '10.
[8]
ARMSTRONG, T. G., PONNEKANTI, V., BORTHAKUR, D., AND CALLAGHAN, M. LinkBench: a database benchmark based on the Facebook social graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (2013), SIGMOD '13.
[9]
ATIKOGLU, B., XU, Y., FRACHTENBERG, E., JIANG, S., AND PALECZNY, M. Workload Analysis of a Large-scale Key-value Store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (2012), SIGMETRICS '12.
[10]
BERGER, E. D., MCKINLEY, K. S., BLUMOFE, R. D., AND WILSON, P. R. Hoard: A scalable memory allocator for multithreaded applications. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (2000), ASPLOS-IX.
[11]
BRONSON, N., AMSDEN, Z., CABRERA, G., CHAKKA, P., DIMOV, P., DING, H., FERRIS, J., GIARDULLO, A., KULKARNI, S., LI, H., MARCHUKOV, M., PETROV, D., PUZAR, L., SONG, Y. J., AND VENKATARAMANI, V. TAO: Facebook's distributed data store for the social graph. In Proceedings of the 2013 USENIX Annual Technical Conference (2013), USENIX ATC'13.
[12]
CARTER, J. B., BENNETT, J. K., AND ZWAENEPOEL, W. Implementation and performance of Munin. In ACM SIGOPS Operating Systems Review (1991), vol. 25, ACM.
[13]
CHAMBERLAIN, B. L., CALLAHAN, D., AND ZIMA, H. P. Parallel programmability and the Chapel language. International Journal of High Performance Computing Applications 21, 3 (2007), 291-312.
[14]
CHARLES, P., GROTHOFF, C., SARASWAT, V., DONAWA, C., KIELSTRA, A., EBCIOGLU, K., VON PRAUN, C., AND SARKAR, V. X10: An object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (2005), OOPSLA '05.
[15]
COOPER, B. F., SILBERSTEIN, A., TAM, E., RAMAKRISHNAN, R., AND SEARS, R. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud computing (2010), SoCC '10.
[16]
FAN, B., ANDERSEN, D. G., AND KAMINSKY, M. MemC3: Compact and concurrent MemCache with dumber caching and smarter hashing. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (2013), NSDI'13.
[17]
FRASER, K. Practical Lock Freedom. PhD thesis, Cambridge University Computer Laboratory, 2003. Also available as Technical Report UCAM-CL-TR-579.
[18]
GRAY, J., AND REUTER, A. Transaction Processing: Concepts and Techniques. 1992.
[19]
GREENWALD, M., AND CHERITON, D. The synergy between non-blocking synchronization and operating system structure. In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation (1996), OSDI '96.
[20]
HENDLER, D., INCZE, I., SHAVIT, N., AND TZAFRIR, M. Flat combining and the synchronization-parallelism tradeoff. In Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures (2010), SPAA '10.
[21]
HERLIHY, M., SHAVIT, N., AND TZAFRIR, M. Hopscotch hashing. In Proceedings of the 22nd International Symposium on Distributed Computing (2008), DISC '08.
[22]
HOEFLER, T., DINAN, J., THAKUR, R., BARRETT, B., BALAJI, P., GROPP, W., AND UNDERWOOD, K. Remote memory access programming in MPI-3. ACM Trans. Parallel Comput. (Mar. 2013).
[23]
HUANG, J., OUYANG, X., JOSE, J., W.-U. R. M., WANG, H., LUO, M., SUBRAMONI, H., MURTHY, C., AND PANDA, D. K. High-performance design of HBase with RDMA over Infiniband. In Parallel and Distributed Processing Symposium (2012).
[24]
HUNT, P., KONAR, M., JUNQUEIRA, F. P., AND REED, B. Zookeeper: wait-free coordination for internet-scale systems. In Proceedings of the 2010 USENIX Annual Technical Conference (2010), USENIX'10.
[25]
IEEE. 802.1Qau - Congestion Notification, 2010.
[26]
IEEE. 802.1Qbb - Priority-based Flow Control, 2011.
[27]
INFINIBAND TRADE ASSOCIATION. Supplement to InfiniBand Architecture Specification Volume 1 Release 1.2.2 Annex A16: RDMA over Converged Ethernet (RoCE), 2010.
[28]
ISLAM, N. S., RAHMAN, M., JOSE, J., RAJACHANDRASEKAR, R., WANG, H., SUBRAMONI, H., MURTHY, C., AND PANDA, D. K. High performance RDMA-based design of HDFS over InfiniBand. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (2012), IEEE Computer Society Press, p. 35.
[29]
JOSE, J., SUBRAMONI, H., KANDALLA, K., WASI-UR RAHMAN, M., WANG, H., NARRAVULA, S., AND PANDA, D. K. Scalable Memcached design for Infiniband clusters using hybrid transports. In Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (2012).
[30]
JOSE, J., SUBRAMONI, H., LUO, M., ZHANG, M., HUANG, J., WASI-UR RAHMAN, M., ISLAM, N. S., OUYANG, X., WANG, H., SUR, S., AND PANDA, D. K. Memcached design on high performance RDMA capable interconnects. In Proceedings of the 2011 International Conference on Parallel Processing (2011).
[31]
KARGER, D., LEHMAN, E., LEIGHTON, T., PANIGRAHY, R., LEVINE, M., AND LEWIN, D. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (1997), STOC '97.
[32]
KUNG, H. T., AND ROBINSON, J. T. On optimistic methods for concurrency control. ACM Transactions on Database Systems (1981).
[33]
LI, B., ZHANG, P., HUO, Z., AND MENG, D. Early experiences with write-write design of NFS over RDMA. In Networking, Architecture, and Storage, 2009. NAS 2009. IEEE International Conference on (2009), IEEE, pp. 303-308.
[34]
LIU, J., JIANG, W., WYCOFF, P., PANDA, D., ASHTON, D., BUNTINAS, D., GROPP, W., AND TOONEN, B. Design and implementation of MPICH2 over Infiniband with RDMA support. In Parallel and Distributed Processing Symposium (2004).
[35]
LIU, J., WU, J., AND PANDA, D. K. High performance RDMA-based MPI implementation over InfiniBand. International Journal of Parallel Programming 32, 3 (June 2004).
[36]
MELLANOX TECHNOLOGIES. Connect-IB: Architecture for Scalable High Performance Computing, 2013.
[37]
MITCHELL, C., YIFENG, G., AND JINYANG, L. Using one-sided RDMA reads to build a fast, CPU-efficient key-value store. In Proceedings of the 2013 USENIX Annual Technical Conference (2013), USENIX ATC'13.
[38]
NISHTALA, R., FUGAL, H., GRIMM, S., KWIATKOWSKI, M., LEE, H., LI, H. C., MCELROY, R., PALECZNY, M., PEEK, D., SAAB, P., STAFFORD, D., TUNG, T., AND VENKATARAMANI, V. Scaling Memcache at Facebook. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (2013), NSDI'13.
[39]
ONGARO, D., RUMBLE, S. M., STUTSMAN, R., OUSTERHOUT, J., AND ROSENBLUM, M. Fast crash recovery in RAM-Cloud. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (2011), SOSP '11.
[40]
PAGH, R., AND RODLER, F. F. Cuckoo hashing. Journal of Algorithms (2004).
[41]
SETHI, R. Useless actions make a difference: Strict serializability of database updates. Journal of the ACM (1982).
[42]
SHIPMAN, G., WOODALL, T., GRAHAM, R., MACCABE, A., AND BRIDGES, P. Infiniband scalability in Open MPI. In Parallel and Distributed Processing Symposium (2006).
[43]
STETS, R., DWARKADAS, S., HARDAVELLAS, N., HUNT, G., KONTOTHANASSIS, L., PARTHASARATHY, S., AND SCOTT, M. Cashmere-2L: Software coherent shared memory on a clustered remote-write network. In ACM SIGOPS Operating Systems Review (1997), vol. 31, ACM, pp. 170-183.
[44]
STOICA, I., MORRIS, R., KARGER, D., KAASHOEK, M. F., AND BALAKRISHNAN, H. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (2001), SIGCOMM'01.
[45]
SUBRAMONI, H., POTLURI, S., KANDALLA, K., BARTH, B., VIENNE, J., KEASLER, J., TOMKO, K., SCHULZ, K., MOODY, A., AND PANDA, D. K. Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (2012).
[46]
TREIBER, R. K. Systems programming: Coping with parallelism. Tech. Rep. RJ 5118 (53162), IBM, Thomas J. Watson Research Center, 1986.
[47]
UPC CONSORTIUM. UPC Language Specifications, v1.2., Technical Report. Lawrence Berkeley National Laboratory LBNL- 59208, 2005.
[48]
WU, J., WYCKOFF, P., AND PANDA, D. PVFS over Infini-Band: Design and performance evaluation. In Parallel Processing, 2003. Proceedings. 2003 International Conference on (2003), IEEE, pp. 125-132.
[49]
WWW.OPENSHMEM.ORG. OpenSHMEM Application Programming Interface, 2012.
[50]
YELICK, K., SEMENZATO, L., PIKE, G., MIYAMOTO, C., LIBLIT, B., KRISHNAMURTHY, A., HILFINGER, P., GRAHAM, S., GAY, D., COLELLA, P., ET AL. Titanium: A high-performance Java dialect. Concurrency Practice and Experience 10, 11-13 (1998), 825-836.

Cited By

View all
  • (2024)TeRMProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650698(1-16)Online publication date: 27-Feb-2024
  • (2024)A Survey of RDMA Distributed StorageProceedings of the 2024 5th International Conference on Computing, Networks and Internet of Things10.1145/3670105.3670199(534-539)Online publication date: 24-May-2024
  • (2024)Serverless End Game: Disaggregation enabling TransparencyProceedings of the 2nd Workshop on SErverless Systems, Applications and MEthodologies10.1145/3642977.3652094(9-14)Online publication date: 22-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
NSDI'14: Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
April 2014
546 pages
ISBN:9781931971096

Sponsors

  • USENIX Assoc: USENIX Assoc

In-Cooperation

Publisher

USENIX Association

United States

Publication History

Published: 02 April 2014

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)TeRMProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650698(1-16)Online publication date: 27-Feb-2024
  • (2024)A Survey of RDMA Distributed StorageProceedings of the 2024 5th International Conference on Computing, Networks and Internet of Things10.1145/3670105.3670199(534-539)Online publication date: 24-May-2024
  • (2024)Serverless End Game: Disaggregation enabling TransparencyProceedings of the 2nd Workshop on SErverless Systems, Applications and MEthodologies10.1145/3642977.3652094(9-14)Online publication date: 22-Apr-2024
  • (2024)Unanimous 2PC: Fault-tolerant Distributed Transactions Can be Fast and SimpleProceedings of the 11th Workshop on Principles and Practice of Consistency for Distributed Data10.1145/3642976.3653035(44-57)Online publication date: 22-Apr-2024
  • (2024)Serialization/Deserialization-free State Transfer in Serverless WorkflowsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629568(132-147)Online publication date: 22-Apr-2024
  • (2024)DLHT: A Non-blocking Resizable Hashtable with Fast Deletes and Memory-awarenessProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658682(186-199)Online publication date: 3-Jun-2024
  • (2024)CC-NIC: a Cache-Coherent Interface to the NICProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624868(52-68)Online publication date: 27-Apr-2024
  • (2024)Scaling Up Memory Disaggregated Applications with SMARTProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624857(351-367)Online publication date: 27-Apr-2024
  • (2023)PatronusProceedings of the 21st USENIX Conference on File and Storage Technologies10.5555/3585938.3585958(315-330)Online publication date: 21-Feb-2023
  • (2023)Efficient Distributed Transaction Processing in Heterogeneous NetworksProceedings of the VLDB Endowment10.14778/3583140.358315316:6(1372-1385)Online publication date: 20-Apr-2023
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media