Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3472456.3472475acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

A Universal Construction to implement Concurrent Data Structure for NUMA-muticore

Published: 05 October 2021 Publication History

Abstract

Universal constructions are attractive as they can turn a sequential implementation of any data structure into a concurrent implementation. However, existing universal constructions have limitations, such as imposing high copying overhead, or poor scalability on NUMA systems mainly due to their lack of NUMA-aware design principles. To overcome these limitations, this paper introduces CR, a universal construction that provides highly scalable updates on NUMA systems while offering fast read-side performance. CR achieves NUMA-awareness by utilizing delegation within a NUMA node and a global shared log to maintain the consistency of replicas of data structures across nodes. Using CR does not require expertise in concurrent data structure design. Our evaluation shows that CR has up to 11.2 times better performance compared to a state-of-the-art universal construction CX on our tested sequential data structures. To demonstrate the effectiveness and applicability of CR, we have applied CR to an in-memory database system. The database shows up to 18.1 times better performance compared to the original version.

References

[1]
[1] N. Shavit and D. Touitou. Software Transactional Memory. PODC’ 97.
[2]
[2] Jaeho Kim, Ajit Mathew, Sanidhya Kashyap, Madhava Krishnan Ramanathan, and Changwoo Min. 2019. MV-RLU: Scaling Read-Log-Update with Multi-Versioning. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 779–792.
[3]
[3] Alexander Matveev, Nir Shavit, Pascal Felber, and Patrick Marlier. 2015. Read-log-update: a lightweight synchronization mechanism for concurrent programming. In Proceedings of the 25th ACM Symposium on Operating Systems Principles. ACM, 168–183.
[4]
[4] Paul E McKenney and John D Slingwine. 1998. Read-copy update: Using execution history to solve concurrency problems. In Parallel and Distributed Computing and Systems. 509–518.
[5]
[5] Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, and Marcos K. Aguilera. 2017. Black-box Concurrent Data Structures for NUMA Architectures. In Proceedings of the 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, Xi’an, China, 207–221.
[6]
[6] Sepideh Roghanchi, Jakob Eriksson, and Nilanjana Basu. 2017. Ffwd: Delegation is (Much) Faster Than You Think. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP). ACM, Shanghai, China, 342–358.
[7]
[7] Seongjae Park, Paul E. McKenney, Laurent Dufour, Heon Y. Yeom. 2020. An HTM-based update-side synchronization for RCU on NUMA systems. In Proceedings of the 15th European Conference on Computer Systems (EuroSys).
[8]
[8] Sepideh Roghanchi, Jakob Eriksson, and Nilanjana Basu. 2017. Ffwd: Delegation is (Much) Faster Than You Think. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP). ACM, Shanghai, China, 342–358.
[9]
[9] C. Cascaval, C. Blundell, M. Michael, H. W. Cain, P. Wu, S. Chiras, and S. Chatterjee. Software Transactional Memory: Why Is It Only a Research Toy? ACM Queue ’08.
[10]
[10] M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems (TOPLAS), 13:124– 149, 1991.
[11]
[11] M. Herlihy. A methodology for implementing highly concurrent data objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 15:745–770, 1993.
[12]
[12] L. Lamport. Specifying concurrent program modules. ACM Transactions on Programming Languages and Systems (TOPLAS), 5:190–222, 1983.
[13]
[13] M. M. Michael. High performance dynamic lock-free hash tables and list-based sets. In Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, pages 73–82. ACM, 2002.
[14]
[14] C. Purcell and T. Harris. Non-blocking hashtables with open addressing. In International Symposium on Distributed Computing, pages 108–121. Springer, 2005.
[15]
[15] H. Sundell and P. Tsigas. Fast and lock-free concurrent priority queues for multi-thread systems. In Parallel and Distributed Processing Symposium, 2003. Proceedings. International, pages 11–pp. IEEE, 2003.
[16]
[16] J. D. Valois. Lock-free data structures. 1996.
[17]
[17] T. Brown, A. Kogan, Y. Lev, and V. Luchangco. Investigating the performance of hardware transactions on a multi-socket machine. In ACM Symposium on Parallelism in Algorithms and Architectures, pages 121–132, July 2016.
[18]
[18] M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. ACM SIGARCH Computer Architecture News, 21(2):289–300, May 1993.
[19]
[19] M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008.
[20]
[20] Paul E. McKenney and Aravinda Prasad. 2015. Some more details on Read-Log-Update. (2015). https://lwn.net/Articles/667720/.
[21]
[21] Panagiota Fatourou and Nikolaos D. Kallimanis. 2014. Highly-Efficient Wait-Free Synchronization. Theory Comput. Syst. 55, 3 (2014), 475–520. https://doi.org/10.1007/s00224-013-9491-y
[22]
[22] Maurice Herlihy. 1992. A Methodology for Implementing Highly Concurrent Data Objects (Abstract). Operating Systems Review 26, 2 (1992), 12. https://doi.org/10.1145/142111.964613
[23]
[23] Maurice Herlihy. 1991. Wait-Free Synchronization. ACM Trans. Program. Lang. Syst. 13, 1 (1991), 124–149. https://doi.org/10.1145/114005. 102808
[24]
[24] S. Boyd-Wickizer, M. F. Kaashoek, R. Morris, and N. Zeldovich. OpLog: a library for scaling update-heavy data structures. Technical Report TR-2014-019, MIT CSAIL, Sept. 2014.
[25]
[25] Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL Server’s Memory-optimized OLTP Engine. In Proceedings of the 2013 ACM SIGMOD/PODS Conference. ACM, New York, USA, 1243–1254.
[26]
[26] Andreia Correia, Pedro Ramalhete, and Pascal Felber. 2020. A Wait-Free Universal Construct for Large Objects. In Proceedings of the 25rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’20).
[27]
[27] Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. 2010. Flat Combining and the Synchronization-parallelism Tradeoff. In Proceedings of the ACM symposium on Parallelism in algorithms and architectures (SPAA). ACM, Thira, Santorini, Greece, 355–364.
[28]
[28] Andreia Correia and Pedro Ramalhete. 2018. Strong Trylocks for Reader-Writer Locks. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). Association for Computing Machinery, New York, NY, USA, 387–388. https://doi.org/10.1145/3178487.3178519
[29]
[29] Irina Calciu, Dave Dice, Yossi Lev, Victor Luchangco, Virendra J. Marathe, and Nir Shavit. 2013. NUMA-Aware Reader-Writer Locks. PPoPP 2013 (2013).
[30]
[30]J.-P. Lozi, F. David, G. Thomas, J. Lawall, and G. Muller. Fast and Portable Locking for Multicore Architectures. ACM Trans. Comput. Syst., 33(4):13:1–13:62, Jan. 2016.
[31]
[31] Rachid Guerraoui and Vasileios Trigonakis. 2016. Optimistic Concurrency with OPTIK. In Proceedings of the 21st ACM Symposium on Principles and Practice of Parallel Programming (PPoPP). ACM, Barcelona, Spain, 18:1–18:12.
[32]
[32] Maurice Herlihy and Jeannette M. Wing. 1990. Linearizability: A Correctness Condition for Concurrent Objects. ACM Trans. Program. Lang. Syst. 12, 3 (1990), 463–492. https://doi.org/10.1145/78969.78972
[33]
[33] T. David, R. Guerraoui, and V. Trigonakis. Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask. SOSP ’13.
[34]
[34] FAL Labs. 2011. Kyoto Cabinet: a straightforward implementation of DBM. http://fallabs.com/kyotocabinet/.
[35]
[35] Dave Dice, Alex Kogan, Yossi Lev, Timothy Merrifield, and Mark Moir. 2014. Adaptive integration of hardware and software lock elision techniques. In Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures. ACM, 188–197.
[36]
[36] Mingzhe Zhang, Haibo Chen, Luwei Cheng, Francis CM Lau, and Cho-Li Wang. 2017. Scalable Adaptive NUMA-Aware Lock. IEEE Transactions on Parallel and Distributed Systems 28, 6 (2017), 1754-1769.
[37]
[37] Dmitry Vyukov. Distributed Reader-Writer Mutex. http://www.1024cores.net/home/lock-free-algorithms/ reader-writer-problem/distributed-reader-writer-mutex.
[38]
[38] M. Balakrishnan, D. Malkhi, J. P. Davis, V. Prabhakaran, M. Wei, and T. Wobber. CORFU: A distributed shared log. ACM Transactions on Computer Systems, 31(4), Dec. 2013.
[39]
[39] D. Molka, D. Hackenberg, R. Schöne, and W. E. Nagel. Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture. In Proceedings of the 44th International Conference on Parallel Processing, ICPP ’ 15, pages 739–748, Beijing, China, 2015.

Cited By

View all
  • (2022)Scalable NUMA-aware persistent B+-tree for non-volatile memory devicesCluster Computing10.1007/s10586-022-03766-126:5(2865-2881)Online publication date: 17-Nov-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '21: Proceedings of the 50th International Conference on Parallel Processing
August 2021
927 pages
ISBN:9781450390682
DOI:10.1145/3472456
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. NUMA multicore
  2. concurrent data structure
  3. synchronization
  4. universal construction

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP 2021

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)4
Reflects downloads up to 23 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Scalable NUMA-aware persistent B+-tree for non-volatile memory devicesCluster Computing10.1007/s10586-022-03766-126:5(2865-2881)Online publication date: 17-Nov-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media