Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Thin servers with smart pipes: designing SoC accelerators for memcached

Published: 23 June 2013 Publication History

Abstract

Distributed in-memory key-value stores, such as memcached, are central to the scalability of modern internet services. Current deployments use commodity servers with high-end processors. However, given the cost-sensitivity of internet services and the recent proliferation of volume low-power System-on-Chip (SoC) designs, we see an opportunity for alternative architectures. We undertake a detailed characterization of memcached to reveal performance and power inefficiencies. Our study considers both high-performance and low-power CPUs and NICs across a variety of carefully-designed benchmarks that exercise the range of memcached behavior. We discover that, regardless of CPU microarchitecture, memcached execution is remarkably inefficient, saturating neither network links nor available memory bandwidth. Instead, we find performance is typically limited by the per-packet processing overheads in the NIC and OS kernel---long code paths limit CPU performance due to poor branch predictability and instruction fetch bottlenecks.
Our insights suggest that neither high-performance nor low-power cores provide a satisfactory power-performance trade-off, and point to a need for tighter integration of the network interface. Hence, we argue for an alternate architecture---Thin Servers with Smart Pipes (TSSP)---for cost-effective high-performance memcached deployment. TSSP couples an embedded-class low-power core to a memcached accelerator that can process GET requests entirely in hardware, offloading both network handling and data look up. We demonstrate the potential benefits of our TSSP architecture through an FPGA prototyping platform, and show the potential for a 6X-16X power-performance improvement over conventional server baselines.

References

[1]
Private communication with Facebook engineers, 2012.
[2]
Zynq-7000 All Programmable SoC, 2012.
[3]
D. Abts, M. R. Marty, P. M. Wells, P. Klausler, and H. Liu. Energy proportional datacenter networks. In Proceedings of the International Symposium on Computer Architecture, 2010.
[4]
M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: Dynamic flow scheduling for data center networks. In Proceedings of the Symposium on Networked Systems Design and Implementation, 2010.
[5]
D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. FAWN: A fast array of wimpy nodes. In Proceedings of the Symposium on Operating Systems Principles, 2009.
[6]
B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload analysis of a large-scale key-value store. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems, 2012.
[7]
L. A. Barroso. Warehouse-scale computing: Entering the teenage decade. In Proceedings of the International Symposium on Computer Architecture, 2011.
[8]
D. Beaver, S. Kumar, H. C. Li, J. Sobel, and P. Vajgel. Finding a needle in haystack: facebook's photo storage. In Proceedings of the Symposium on Operating System Design and Implementation, 2010.
[9]
M. Berezecki, E. Frachtenberg, M. Paleczny, and K. Steele. Many-core key-value store. In Proceedings of the International Green Computing Conference, 2011.
[10]
A. Bhattacharjee and M. Martonosi. Characterizing the TLB behavior of emerging parallel workloads on chip multiprocessors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2009.
[11]
N. L. Binkert, L. R. Hsu, A. G. Saidi, R. G. Dreslinski, A. L. Schultz, and S. K. Reinhardt. Performance Analysis of System Overheads in TCP/IP Workloads. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2005.
[12]
N. L. Binkert, A. G. Saidi, and S. K. Reinhardt. Integrated network interfaces for high-bandwidth TCP/IP. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, 2006.
[13]
M. Cha, A. Mislove, and K. P. Gummadi. A measurement-driven analysis of information propagation in the flickr social network. In Proceedings of the International Conference on World Wide Web, 2009.
[14]
S. R. Chalamalasetti, K. Lim, M. Wright, A. AuYoung, P. Ranganathan, and M. Margala. An FPGA memcached appliance. In Proceedings of the International Symposium on Field Programmable Gate Arrays, 2013.
[15]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-value Store. In Proceedings of the Symposium on Operating Systems Principles, 2007.
[16]
Facebook. Memcached Tech Talk with M. Zuckerberg, 2010.
[17]
A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. Vl2: a scalable and flexible data center network. In Proceedings of the Conference on Data Communication, 2009.
[18]
T. H. Hetherington, T. G. Rogers, L. Hsu, M. O'Connor, and T. M. Aamodt. Characterizing and evaluating a key-value store application on heterogeneous cpu-gpu systems. In Proceedings of the International Symposium on Performance Analysis of Systems and Software, 2012.
[19]
R. Huggahalli, R. Iyer, and S. Tetrick. Direct Cache Access for High Bandwidth Network I/O. In Proceedings of the International Symposium on Computer Architecture, 2005.
[20]
J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, M. W. ur Rahman, N. S. Islam, X. Ouyang, H. Wang, S. Sur, and D. K. Panda. Memcached design on high performance rdma capable interconnects. In Proceedings of the International Conference on Parallel Processing, 2011.
[21]
R. Kapoor, G. Porter, M. Tewari, G. M. Voelker, and A. Vahdat. Chronos: predictable low latency for data center applications. In Proceedings of the Symposium on Cloud Computing, 2012.
[22]
A. Kirsch and M. Mitzenmacher. The power of one move: Hashing schemes for hardware. IEEE/ACM Transactions on Networking, 18(6):1752--1765, dec. 2010.
[23]
G. Liao and L. Bhuyan. Performance measurement of an integrated nic architecture with 10gbe. In Proceedings of the Symposium on High Performance Interconnects, 2009.
[24]
K. Lim, P. Ranganathan, J. Chang, C. Patel, T. Mudge, and S. Reinhardt. Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments. In Proceedings of the International Symposium on Computer Architecture, 2008.
[25]
Z. Metreveli, N. Zeldovich, and M. F. Kaashoek. Cphash: a cache-partitioned hash table. In Proceedings of the Symposium on Principles and Practice of Parallel Programming. ACM, 2012.
[26]
G. Minshall, Y. Saito, J. C. Mogul, and B. Verghese. Application performance pitfalls and TCP's Nagle algorithm. SIGMETRICS Performance Evaluation Review, 27(4), 2000.
[27]
R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, D. Stafford, T. Tung, and V. Venkataramani. Scaling memcache at facebook. In Proceedings of the Symposium on Networked Systems Design and Implementation, 2013.
[28]
G. Regnier, S. Makineni, I. Illikkal, R. Iyer, D. Minturn, R. Huggahalli, D. Newell, L. Cline, and A. Foong. Tcp onloading for data center servers. Computer, 37(11), nov. 2004.
[29]
S. M. Rumble, D. Ongaro, R. Stutsman, M. Rosenblum, and J. K. Ousterhout. It's time for low latency. In Proceedings of the Conference on Hot Topics in Operating Systems, 2011.
[30]
P. Stuedi, A. Trivedi, and B. Metzler. Wimpy nodes with 10gbe: leveraging one-sided operations in soft-rdma to boost memcached. In Proceedings of the USENIX Annual Technical Conference, 2012.
[31]
V. Janapa Reddi, Benjamin Lee, Trishul Chilimbi, and Kushagra Vaid. Web Search Using Mobile Cores: Quantifying and Mitigating the Price of Efficiency. In Proceedings of the International Symposium on Computer Architecture, 2010.
[32]
A. Wiggins and J. Langston. Enhancing the Scalability of Memcached, 2012.

Cited By

View all
  • (2023)A review of in-memory computing for machine learning: architectures, optionsInternational Journal of Web Information Systems10.1108/IJWIS-08-2023-013120:1(24-47)Online publication date: 22-Dec-2023
  • (2022)eRDAC: Efficient and Reliable Remote Direct Access and Control for Embedded SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319749241:11(3685-3696)Online publication date: 1-Nov-2022
  • (2021)Memcached: An Experimental Study of DDoS Attacks for the Wellbeing of IoT ApplicationsSensors10.3390/s2123807121:23(8071)Online publication date: 2-Dec-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 41, Issue 3
ICSA '13
June 2013
666 pages
ISSN:0163-5964
DOI:10.1145/2508148
Issue’s Table of Contents
  • cover image ACM Other conferences
    ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
    June 2013
    686 pages
    ISBN:9781450320795
    DOI:10.1145/2485922
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2013
Published in SIGARCH Volume 41, Issue 3

Check for updates

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)53
  • Downloads (Last 6 weeks)7
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A review of in-memory computing for machine learning: architectures, optionsInternational Journal of Web Information Systems10.1108/IJWIS-08-2023-013120:1(24-47)Online publication date: 22-Dec-2023
  • (2022)eRDAC: Efficient and Reliable Remote Direct Access and Control for Embedded SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319749241:11(3685-3696)Online publication date: 1-Nov-2022
  • (2021)Memcached: An Experimental Study of DDoS Attacks for the Wellbeing of IoT ApplicationsSensors10.3390/s2123807121:23(8071)Online publication date: 2-Dec-2021
  • (2020)The Power of ARM64 in Public Clouds2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID)10.1109/CCGrid49817.2020.00-47(459-468)Online publication date: May-2020
  • (2020)An intelligent memory caching architecture for data-intensive multimedia applicationsMultimedia Tools and Applications10.1007/s11042-020-08805-wOnline publication date: 26-Mar-2020
  • (2019)Balancing Distributed Key-Value Stores with Efficient In-Network RedirectingElectronics10.3390/electronics80910088:9(1008)Online publication date: 9-Sep-2019
  • (2019)Financial Big Data Hot and Cold Separation Scheme Based on HBase and Redis2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00237(1612-1617)Online publication date: Dec-2019
  • (2018)Accelerating Memcached on AWS Cloud FPGAsProceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies10.1145/3241793.3241795(1-8)Online publication date: 20-Jun-2018
  • (2018)Scale-Out vs Scale-UpACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/32321623:4(1-23)Online publication date: 22-Aug-2018
  • (2018)Scaling datacenter accelerators with compute-reuse architecturesProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00038(353-366)Online publication date: 2-Jun-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media