Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A stealing mechanism for delegation methods

Published: 01 October 2021 Publication History

Abstract

Modern multi-core architectures exhibit non-uniform memory access (NUMA) behavior, where access by a core to data cached locally on a NUMA node is much faster than access to data cached on a remote node. Prior work has shown that on the NUMA multi-core system, delegation is the highly efficient solution to address the generally poor performance of locking methods on highly contended locks due to delegating the execution of critical section to one thread, reducing the movement of shared data between cores. However, we observe that delegation methods exhibit sub-par single-thread performance due mainly to the overheads of the communication between the server and client threads and request array traversal. To address this problem, this paper proposes a stealing mechanism that under no contention the clients can directly enter the critical section by enabling lock stealing. Meanwhile, under contention it employs delegation protocol by disabling lock stealing. Furthermore, we apply stealing mechanism to the state-of-the-art delegation methods: FFWD and RCL. The evaluation shows that delegation augmented with lock stealing can significantly improve the performance in the non-contended case, while matching the performance of their original counterparts in the contended circumstances.

References

[1]
Anderson TE The performance of spin lock alternatives for shared memory multiprocessors IEEE Trans Parallel Distrib Syst 1990 1 1 6-16
[2]
Baumann A, Barham P, Dagand PE, Tyshasta H, Rebecca I, Simon P, Timothy R, Adrian S, Akhilesh S (2009) The multi-kernel: a new OS architecture for scalable multicore systems. The ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP 09), pp. 29-44
[3]
Baumann A, Peter S, Adrian S, Akhilesh S, Timothy R, Paul B, Rebecca I (2009) Your computer is already a distributed system. Why Isn’t Your OS? Workshop on Workstation Operating Systems /workshop on Hot Topics in Operating Systems (HotOS 09), pp. 12-12
[4]
Bienia C (2011) Benchmarking modern multiprocessors. Ph.D. Dissertation. Princeton University, Princeton, NJ
[5]
Boyd-Wickizer S, Clements AT, Mao Y, Pesterev A, Kaashoek MF, Morris RT, Zeldovich N (2010) An analysis of Linux scalability to many cores. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, Vancouver, Canada, pp. 1-16
[6]
Boyd-Wickizer S, Kaashoek MF, Morris R, Zeldovich N (2012) Non-scalable locks are dangerous. In: Proceedings of the Linux Symposium. Ottawa, Canada
[7]
Chabbi M, Fagan M, Mellor-Crummey J (2015) High performance locks for multi-level NUMA systems. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 215-226
[8]
Chabbi M, Mellor-Crummey J (2016) Contention-conscious, locality-preserving locks. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP)
[9]
Craig T (1993) Building FIFO and priority queuing spin locks from atomic swap, Univ. Washington, Seattle, WA, USA, Tech. Rep. 93-02-02
[10]
Dave Dice, Virendra J. Marathe, and Nir Shavit. Flat-combining NUMA locks. In Proceedings of the Twenty-third Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 65-74, 2011
[11]
David T, Guerraoui R, Trigonakis V (2013) Everything you always wanted to know about synchronization but were afraid to ask. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles(SOSP 13), pp. 33-48
[12]
Dice D (2017) Malthusian locks. In: Proceedings of the Twelfth European Conference on Computer Systems (EuroSys), pp. 314-327
[13]
Dice D, Marathe VJ, and Shavit N Lock cohorting: a general technique for designing NUMA locks ACM Trans Parallel Comput 2015 1 2 13:1-13:42
[14]
Dice D, Kogan A (2019) Compact NUMA-aware Locks. In: Proceedings of the Fourteenth EuroSys Conference 2019 (EuroSys’19). ACM, New York, NY, USA, Article 12, 15 pages
[15]
Dogan H, Hijaz F, Ahmad M, Kahne B, Wilson P, Khan O (2017). Accelerating graph and machine learning workloads using a shared memory multicore architecture with auxiliary support for in-hardware explicit messaging. In Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International. IEEE, pp. 254-264
[16]
Eyerman S, Eeckhout L (2010) Modeling critical sections in Amdahl’s law and its implications for multicore design. In: Proceedings of the Annual International Symposium on Computer Architecture (ISCA), pp. 362-370
[17]
Fatourou P, Kallimanis ND (2012) Revisiting the combining synchronization technique. In: Proceedings of the 17th ACM Symposium on Principles and Practice of Parallel Programming (PPOPP), pp. 257-266, New Orleans, LA
[18]
Guerraoui R, Guiroux H, Lachaize R, Quma V, and Trigonakis V Lock-unlock: Is that all? a pragmatic analysis of locking in software systems ACM Trans Comput Syst 2019 36 1 1:1-1:149
[19]
Guerraoui R, Trigonakis V (2016) Optimistic concurrency with OPTIK. In: Proceedings of the 21st ACM Symposium on Principles and Practice of Parallel Programming (PPoPP). ACM, Barcelona, Spain, pp. 18:1-18:12
[20]
Guiroux H, Lachaize R, Quéma V (2016) Multicore locks: the case is not closed yet. In: Proceedings of the USENIX Annual Technical Conference (ATC), pp. 649-662
[21]
Hackenberg D, Molka D, Nagel W (2009) Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems. ACM International Symposium on Microarchitecture(MICRO 09), pp. 413- 422
[22]
Hendler D, Incze I, Shavit N, Tzafrir M (2010) Flat combining and the synchronization-parallelism tradeoff. In: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures, pp. 355-364. ACM
[23]
He B, Scherer WN, Scott ML (2005) Preemption Adaptivity in Time-published Queue-based Spin Locks. In: Proceedings of the 12th International Conference on High Performance Computing (HiPC’05) (2005), Springer-Verlag
[24]
Kashyap S, Calciu I, Cheng X, Min C, Kim T (2019) Scalable and practical locking with shuffling. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP), pp. 586-599
[25]
Kashyap S, Min C, Kim T (2017) Scalable NUMA-aware Blocking Synchronization Primitives. In: Proceedings of the (2017) USENIX Annual Technical Conference (ATC). USENIX Association, Santa Clara, CA
[26]
Kim J, Mathew A, Kashyap S, Ramanathan MK, Min C (2019) MV-RLU: scaling read-log-update with multi-versioning. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 779-792
[27]
Klaftenegger D, Sagonas K, and Winblad K Queue delegation locking IEEE Trans Parallel Distrib Syst 2018 29 3 687-704
[28]
Liu T and Berger ED Sheriff: precise detection and automatic mitigation of false sharing ACM Sigplan Notices 2011 46 3-18
[29]
Lozi JP, David F, Thomas G, Lawall J, Muller G (2016) Fast and portable locking for multicore architectures. ACM Trans. Comput. Syst., 33(4):13:1-13:62
[30]
Lozi JP, David F, Thomas G, Lawall JL, Muller G et al. (2012) Remote core locking: Migrating critical-section execution to improve the performance of multithreaded applications. In: USENIX Annual Technical Conference, pp. 65-76
[31]
Magnusson PS, Landin A, Hagersten E (1994) Queue locks on cache coherent multiprocessors. In: Proceedings of the 8th International Parallel Processing Symposium, pages pp. 165-171
[32]
Mellor-Crummey JM and Scott ML Algorithms for scalable synchronization on shared-memory multiprocessors ACM Trans Comput Syst 1991
[35]
Nanavati M, Spear M, Taylor N, Rajagopalan S, Meyer DT, Aiello W, Warfield A (2013) Whose cache line is it anyway?: operating system support for live detection and repair of false sharing. In Proceedings of the 8th ACM European Conference on Computer Systems, pp. 141-154. ACM, 2013
[36]
Oyama Y, Taura K, Yonezawa A (1999) Executing parallel programs with synchronization bottlenecks efficiently, In: Proc. Int. Workshop Parallel Distrib. Comput. Symbolic Irregular Appl., pp. 182-204
[37]
Radovic Z, Hagersten E (2003) Hierarchical backoff locks for nonuniform communication architectures. In International Symposium on High- Performance Computer Architecture (HPCA), pp. 241-252
[38]
Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C (2007) Evaluating mapreduce for multi-core and multiprocessor systems. In High Performance Computer Architecture. HPCA (2007) IEEE 13th International Symposium on, pp. 13-24. Ieee, 2007
[39]
Roghanchi S, Eriksson J, Basu N (2017) ffwd: delegation is (much) faster than you think. In: Proceedings of the 26th Symposium on Operating Systems Principles. ACM, pp. 342-358
[41]
Tang X, Zhai J, Qian X, Chen W (2019) pLock: a fast lock for architectures with explicit inter-core message passing. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 765-778
[42]
Victor Luchangco, Dan Nussbaum, and Nir Shavit. A hierarchical CLH queue lock. In Proceedings of the 12th International Conference on Parallel Processing (EuroPar), pages 801-810, 2006
[43]
Zhang M, Chen H, Cheng L, Lau FCM, and Wang CL Scalable Adaptive NUMA-Aware Lock IEEE Trans Parallel Distrib Syst 2017 28 6 1754-1769

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Supercomputing
The Journal of Supercomputing  Volume 77, Issue 10
Oct 2021
1466 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 October 2021
Accepted: 25 February 2021

Author Tags

  1. Locks
  2. Delegation
  3. Synchronization
  4. Mutual exclusion
  5. NUMA multi-core

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media