research-article

A Universal Construction to implement Concurrent Data Structure for NUMA-muticore

Authors:

Kai ChenAuthors Info & Claims

ICPP '21: Proceedings of the 50th International Conference on Parallel Processing

Article No.: 74, Pages 1 - 11

https://doi.org/10.1145/3472456.3472475

Published: 05 October 2021 Publication History

Abstract

Universal constructions are attractive as they can turn a sequential implementation of any data structure into a concurrent implementation. However, existing universal constructions have limitations, such as imposing high copying overhead, or poor scalability on NUMA systems mainly due to their lack of NUMA-aware design principles. To overcome these limitations, this paper introduces CR, a universal construction that provides highly scalable updates on NUMA systems while offering fast read-side performance. CR achieves NUMA-awareness by utilizing delegation within a NUMA node and a global shared log to maintain the consistency of replicas of data structures across nodes. Using CR does not require expertise in concurrent data structure design. Our evaluation shows that CR has up to 11.2 times better performance compared to a state-of-the-art universal construction CX on our tested sequential data structures. To demonstrate the effectiveness and applicability of CR, we have applied CR to an in-memory database system. The database shows up to 18.1 times better performance compared to the original version.

References

[1]

[1] N. Shavit and D. Touitou. Software Transactional Memory. PODC’ 97.

[2]

[2] Jaeho Kim, Ajit Mathew, Sanidhya Kashyap, Madhava Krishnan Ramanathan, and Changwoo Min. 2019. MV-RLU: Scaling Read-Log-Update with Multi-Versioning. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 779–792.

Digital Library

[3]

[3] Alexander Matveev, Nir Shavit, Pascal Felber, and Patrick Marlier. 2015. Read-log-update: a lightweight synchronization mechanism for concurrent programming. In Proceedings of the 25th ACM Symposium on Operating Systems Principles. ACM, 168–183.

Digital Library

[4]

[4] Paul E McKenney and John D Slingwine. 1998. Read-copy update: Using execution history to solve concurrency problems. In Parallel and Distributed Computing and Systems. 509–518.

[5]

[5] Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, and Marcos K. Aguilera. 2017. Black-box Concurrent Data Structures for NUMA Architectures. In Proceedings of the 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, Xi’an, China, 207–221.

[6]

[6] Sepideh Roghanchi, Jakob Eriksson, and Nilanjana Basu. 2017. Ffwd: Delegation is (Much) Faster Than You Think. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP). ACM, Shanghai, China, 342–358.

Digital Library

[7]

[7] Seongjae Park, Paul E. McKenney, Laurent Dufour, Heon Y. Yeom. 2020. An HTM-based update-side synchronization for RCU on NUMA systems. In Proceedings of the 15th European Conference on Computer Systems (EuroSys).

Digital Library

[8]

[8] Sepideh Roghanchi, Jakob Eriksson, and Nilanjana Basu. 2017. Ffwd: Delegation is (Much) Faster Than You Think. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP). ACM, Shanghai, China, 342–358.

Digital Library

[9]

[9] C. Cascaval, C. Blundell, M. Michael, H. W. Cain, P. Wu, S. Chiras, and S. Chatterjee. Software Transactional Memory: Why Is It Only a Research Toy? ACM Queue ’08.

[10]

[10] M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems (TOPLAS), 13:124– 149, 1991.

[11]

[11] M. Herlihy. A methodology for implementing highly concurrent data objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 15:745–770, 1993.

[12]

[12] L. Lamport. Specifying concurrent program modules. ACM Transactions on Programming Languages and Systems (TOPLAS), 5:190–222, 1983.

[13]

[13] M. M. Michael. High performance dynamic lock-free hash tables and list-based sets. In Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, pages 73–82. ACM, 2002.

[14]

[14] C. Purcell and T. Harris. Non-blocking hashtables with open addressing. In International Symposium on Distributed Computing, pages 108–121. Springer, 2005.

[15]

[15] H. Sundell and P. Tsigas. Fast and lock-free concurrent priority queues for multi-thread systems. In Parallel and Distributed Processing Symposium, 2003. Proceedings. International, pages 11–pp. IEEE, 2003.

[16]

[16] J. D. Valois. Lock-free data structures. 1996.

Digital Library

[17]

[17] T. Brown, A. Kogan, Y. Lev, and V. Luchangco. Investigating the performance of hardware transactions on a multi-socket machine. In ACM Symposium on Parallelism in Algorithms and Architectures, pages 121–132, July 2016.

Digital Library

[18]

[18] M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. ACM SIGARCH Computer Architecture News, 21(2):289–300, May 1993.

[19]

[19] M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008.

Digital Library

[20]

[20] Paul E. McKenney and Aravinda Prasad. 2015. Some more details on Read-Log-Update. (2015). https://lwn.net/Articles/667720/.

[21]

[21] Panagiota Fatourou and Nikolaos D. Kallimanis. 2014. Highly-Efficient Wait-Free Synchronization. Theory Comput. Syst. 55, 3 (2014), 475–520. https://doi.org/10.1007/s00224-013-9491-y

Digital Library

[22]

[22] Maurice Herlihy. 1992. A Methodology for Implementing Highly Concurrent Data Objects (Abstract). Operating Systems Review 26, 2 (1992), 12. https://doi.org/10.1145/142111.964613

Digital Library

[23]

[23] Maurice Herlihy. 1991. Wait-Free Synchronization. ACM Trans. Program. Lang. Syst. 13, 1 (1991), 124–149. https://doi.org/10.1145/114005. 102808

Digital Library

[24]

[24] S. Boyd-Wickizer, M. F. Kaashoek, R. Morris, and N. Zeldovich. OpLog: a library for scaling update-heavy data structures. Technical Report TR-2014-019, MIT CSAIL, Sept. 2014.

[25]

[25] Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL Server’s Memory-optimized OLTP Engine. In Proceedings of the 2013 ACM SIGMOD/PODS Conference. ACM, New York, USA, 1243–1254.

Digital Library

[26]

[26] Andreia Correia, Pedro Ramalhete, and Pascal Felber. 2020. A Wait-Free Universal Construct for Large Objects. In Proceedings of the 25rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’20).

[27]

[27] Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. 2010. Flat Combining and the Synchronization-parallelism Tradeoff. In Proceedings of the ACM symposium on Parallelism in algorithms and architectures (SPAA). ACM, Thira, Santorini, Greece, 355–364.

Digital Library

[28]

[28] Andreia Correia and Pedro Ramalhete. 2018. Strong Trylocks for Reader-Writer Locks. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). Association for Computing Machinery, New York, NY, USA, 387–388. https://doi.org/10.1145/3178487.3178519

Digital Library

[29]

[29] Irina Calciu, Dave Dice, Yossi Lev, Victor Luchangco, Virendra J. Marathe, and Nir Shavit. 2013. NUMA-Aware Reader-Writer Locks. PPoPP 2013 (2013).

[30]

[30]J.-P. Lozi, F. David, G. Thomas, J. Lawall, and G. Muller. Fast and Portable Locking for Multicore Architectures. ACM Trans. Comput. Syst., 33(4):13:1–13:62, Jan. 2016.

[31]

[31] Rachid Guerraoui and Vasileios Trigonakis. 2016. Optimistic Concurrency with OPTIK. In Proceedings of the 21st ACM Symposium on Principles and Practice of Parallel Programming (PPoPP). ACM, Barcelona, Spain, 18:1–18:12.

Digital Library

[32]

[32] Maurice Herlihy and Jeannette M. Wing. 1990. Linearizability: A Correctness Condition for Concurrent Objects. ACM Trans. Program. Lang. Syst. 12, 3 (1990), 463–492. https://doi.org/10.1145/78969.78972

Digital Library

[33]

[33] T. David, R. Guerraoui, and V. Trigonakis. Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask. SOSP ’13.

[34]

[34] FAL Labs. 2011. Kyoto Cabinet: a straightforward implementation of DBM. http://fallabs.com/kyotocabinet/.

[35]

[35] Dave Dice, Alex Kogan, Yossi Lev, Timothy Merrifield, and Mark Moir. 2014. Adaptive integration of hardware and software lock elision techniques. In Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures. ACM, 188–197.

Digital Library

[36]

[36] Mingzhe Zhang, Haibo Chen, Luwei Cheng, Francis CM Lau, and Cho-Li Wang. 2017. Scalable Adaptive NUMA-Aware Lock. IEEE Transactions on Parallel and Distributed Systems 28, 6 (2017), 1754-1769.

Digital Library

[37]

[37] Dmitry Vyukov. Distributed Reader-Writer Mutex. http://www.1024cores.net/home/lock-free-algorithms/ reader-writer-problem/distributed-reader-writer-mutex.

[38]

[38] M. Balakrishnan, D. Malkhi, J. P. Davis, V. Prabhakaran, M. Wei, and T. Wobber. CORFU: A distributed shared log. ACM Transactions on Computer Systems, 31(4), Dec. 2013.

[39]

[39] D. Molka, D. Hackenberg, R. Schöne, and W. E. Nagel. Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture. In Proceedings of the 44th International Conference on Parallel Processing, ICPP ’ 15, pages 739–748, Beijing, China, 2015.

Digital Library

Cited By

Jamil SSalam AKhan ABurgstaller BPark SKim Y(2022)Scalable NUMA-aware persistent B+-tree for non-volatile memory devicesCluster Computing10.1007/s10586-022-03766-126:5(2865-2881)Online publication date: 17-Nov-2022
https://dl.acm.org/doi/10.1007/s10586-022-03766-1

Recommendations

A wait-free universal construction for large objects
PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Concurrency has been a subject of study for more than 50 years. Still, many developers struggle to adapt their sequential code to be accessed concurrently. This need has pushed for generic solutions and specific concurrent data structures.

Wait-free ...
Towards a universal construction for transaction-based multiprocess programs

The aim of a Software Transactional Memory (STM) system is to discharge the programmer from the explicit management of synchronization issues. The programmer's job resides in the design of multiprocess programs in which processes are made up of ...
Towards a universal construction for transaction-based multiprocess programs
ICDCN'12: Proceedings of the 13th international conference on Distributed Computing and Networking

The aim of a Software Transactional Memory (STM) system is to discharge the programmer from the explicit management of synchronization issues. The programmer's job resides in the design of multiprocess programs in which processes are made up of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '21: Proceedings of the 50th International Conference on Parallel Processing

August 2021

927 pages

ISBN:9781450390682

DOI:10.1145/3472456

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPP 2021

ICPP 2021: 50th International Conference on Parallel Processing

August 9 - 12, 2021

IL, Lemont, USA

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
144
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)4

Reflects downloads up to 23 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jamil SSalam AKhan ABurgstaller BPark SKim Y(2022)Scalable NUMA-aware persistent B+-tree for non-volatile memory devicesCluster Computing10.1007/s10586-022-03766-126:5(2865-2881)Online publication date: 17-Nov-2022
https://dl.acm.org/doi/10.1007/s10586-022-03766-1

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents