Article

Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications

Authors:

Jean-Pierre Lozi,

Gilles MullerAuthors Info & Claims

USENIX ATC'12: Proceedings of the 2012 USENIX conference on Annual Technical Conference

Page 6

Published: 13 June 2012 Publication History

Abstract

The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. In this paper, we propose a new lock algorithm, Remote Core Locking (RCL), that aims to improve the performance of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server core. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the core acquiring the lock because such data can typically remain in the server core's cache.

We have developed a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool that transforms POSIX locks into RCL locks. We have evaluated our approach on 18 applications: Memcached, Berkeley DB, the 9 applications of the SPLASH-2 benchmark suite and the 7 applications of the Phoenix2 benchmark suite. 10 of these applications, including Memcached and Berkeley DB, are unable to scale because of locks, and benefit from RCL. Using RCL locks, we get performance improvements of up to 2.6 times with respect to POSIX locks on Memcached, and up to 14 times with respect to Berkeley DB.

References

[1]

J. L. Abellán, J. Fernández, and M. E. Acacio. GLocks: Efficient support for highly-contended locks in many-core CMPs. In 25th IPDPS, 2011.

[2]

A. Agarwal and M. Cherian. Adaptive backoff synchronization techniques. In ISCA'89, pages 396-406, 1989.

[3]

D. F. Bacon, R. Konuru, C. Murthy, and M. Serrano. Thin locks: featherweight synchronization for Java. In PLDI'98, pages 258-268, 1998.

[4]

A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schuepbach, and A. Singhania. The multikernel: a new OS architecture for scalable multicore systems. In SOSP'09, pages 29-44, 2009.

[5]

L. Boguslavsky, K. Harzallah, A. Kreinen, K. Sevcik, and A. Vainshtein. Optimal strategies for spinning and blocking. Journal of Parallel and Distributed Computing, 1993.

[6]

A. Brodsky, F. Ellen, and P. Woelfel. Fully-adaptive algorithms for long-lived renaming. In DISC '06, pages 413-427, 2006.

[7]

E. M. Chaves Jr., P. Das, T. J. LeBlanc, B. D. Marsh, and M. L. Scott. Kernel-kernel communication in a shared-memory multiprocessor. Concurrency - Practice and Experience, 5(3):171-191, 1993.

[8]

T. S. Craig. Building FIFO and priority-queueing spin locks from atomic swap. Technical Report TR 93-02-02, Department of Computer Science, University of Washington, Feb. 2003.

[9]

U. Drepper and I. Molnar. Native POSIX thread library for Linux. Technical report, RedHat, 2003.

[10]

M. Fowler. Refactoring: Improving the Design of Existing Code. Addison Wesley, 1999.

[11]

B. Gamsa, O. Krieger, J. Appavoo, and M. Stumm. Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. In OSDI'99, pages 87-100.

[12]

B. He, W. N. Scherer III, and M. L. Scott. Preemption adaptivity in time-published queue-based spin locks. In 11th International Conference on High Performance Computing, 2005.

[13]

D. Hendler, I. Incze, N. Shavit, and M. Tzafrir. Flat combining and the synchronization-parallelism tradeoff. In SPAA'10, pages 355-364, 2010.

[14]

M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Elsevier, 2008.

[15]

F. R. Johnson, R. Stoica, A. Ailamaki, and T. C. Mowry. Decoupling contention management from scheduling. In ASPLOS'10, pages 117-128, 2010.

[16]

E. Ladan-Mozes and N. Shavit. An optimistic approach to lock-free FIFO queues. Distributed Computing, 20(5):323-341, 2008.

[17]

J. M. Mellor-Crummey and M. L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM TOCS, 9(1):21-65, 1991.

[18]

J. M. Mellor-Crummey and M. L. Scott. Synchronization without contention. In ASPLOS, pages 269-278. ACM, 1991.

[19]

E. B. Nightingale, O. Hodson, R. McIlroy, C. Hawblitzel, and G. Hunt. Helios: heterogeneous multiprocessing with satellite kernels. In SOSP'09, pages 221-234, 2009.

[20]

J. K. Ousterhout. Scheduling techniques for concurrent systems. In ICDCS '82, pages 22-30, 1982.

[21]

Y. Padioleau, J. Lawall, R. R. Hansen, and G. Muller. Documenting and automating collateral evolutions in Linux device drivers. In EuroSys, pages 247-260, 2008.

[22]

I. Pandis, R. Johnson, N. Hardavellas, and A. Ailamaki. Data-oriented transaction execution. Proceedings of the VLDB Endowment (PVLDB), 3(1):928-939, 2010.

[23]

A. Roy, S. Hand, and T. Harris. Exploring the limits of disjoint access parallelism. In HotPar'09, pages 8-8, 2009.

[24]

M. L. Scott and W. N. Scherer. Scalable queue-based spin locks with timeout. In PPoPP'01, pages 44-52, 2001.

[25]

O. Shalev and N. Shavit. Split-ordered lists: Lock-free extensible hash tables. JACM, 53(3):379-405, May 2006.

[26]

S. Sridharan, B. Keck, R. Murphy, S. Chandra, and P. Kogge. Thread migration to improve synchronization performance. In In Workshop on Operating System Interference in High Performance Applications, 2006.

[27]

M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. Accelerating critical section execution with asymmetric multi-core architectures. In ASPLOS, pages 253-264, 2009.

[28]

S. B. Wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Corey: an operating system for many cores. In OSDI'08.

[29]

W. Xiong, S. Park, J. Zhang, Y. Zhou, and Z. Ma. Ad hoc synchronization considered harmful. In OSDI'10, pages 1-8, 2010.

[30]

K. Yotov, K. Pingali, and P. Stodghill. Automatic measurement of memory hierarchy parameters. In SIGMETRICS '05, pages 181-192, 2005.

Cited By

Yi ZYao YChen KGunawi HMa X(2021)FTSDProceedings of the 12th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3476886.3477518(123-130)Online publication date: 24-Aug-2021
https://dl.acm.org/doi/10.1145/3476886.3477518
Devadas VCurtis-Maury M(2020)Scalable Coordination of Hierarchical ParallelismProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404398(1-11)Online publication date: 17-Aug-2020
https://dl.acm.org/doi/10.1145/3404397.3404398
Chabbi MAmer ALiu X(2020)Efficient Abortable-locking Protocol for Multi-level NUMA SystemsACM Transactions on Parallel Computing10.1145/33997287:3(1-32)Online publication date: 10-Jul-2020
https://dl.acm.org/doi/10.1145/3399728
Show More Cited By

Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications
1. General and reference
  1. Cross-computing tools and techniques
2. Social and professional topics
  1. Professional topics
    1. Computing profession

Recommendations

Locking Made Easy
Middleware '16: Proceedings of the 17th International Middleware Conference

A priori, locking seems easy: To protect shared data from concurrent accesses, it is sufficient to lock before accessing the data and unlock after. Nevertheless, making locking efficient requires fine-tuning (a) the granularity of locks and (b) the ...
Pre-analysis locking: a safe and deadlock free locking policy
VLDB '85: Proceedings of the 11th international conference on Very Large Data Bases - Volume 11

A safe and deadlock free lock policy is introduced, called pre-analysis locking. Pre-analysis locking is based on an efficient geometric algorithm which inserts lock and unlock operations into the transactions. Pre-analysis locking is the first safe and ...
Locking Timestamps versus Locking Objects
PODC '18: Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing

We present multiversion timestamp locking (MVTL), a new genre of multiversion concurrency control algorithms for serializable transactions. The key idea behind MVTL is simple: lock individual timestamps instead of locking objects. After presenting a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

USENIX ATC'12: Proceedings of the 2012 USENIX conference on Annual Technical Conference

June 2012

41 pages

Publisher

USENIX Association

United States

Publication History

Published: 13 June 2012

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

54
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yi ZYao YChen KGunawi HMa X(2021)FTSDProceedings of the 12th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3476886.3477518(123-130)Online publication date: 24-Aug-2021
https://dl.acm.org/doi/10.1145/3476886.3477518
Devadas VCurtis-Maury M(2020)Scalable Coordination of Hierarchical ParallelismProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404398(1-11)Online publication date: 17-Aug-2020
https://dl.acm.org/doi/10.1145/3404397.3404398
Chabbi MAmer ALiu X(2020)Efficient Abortable-locking Protocol for Multi-level NUMA SystemsACM Transactions on Parallel Computing10.1145/33997287:3(1-32)Online publication date: 10-Jul-2020
https://dl.acm.org/doi/10.1145/3399728
Patel YYang LArulraj LArpaci-Dusseau AArpaci-Dusseau RSwift MBilas AMagoutis KMarkatos EKostic DSeltzer M(2020)Avoiding scheduler subversion using scheduler-cooperative locksProceedings of the Fifteenth European Conference on Computer Systems10.1145/3342195.3387521(1-17)Online publication date: 15-Apr-2020
https://dl.acm.org/doi/10.1145/3342195.3387521
Guerraoui RGuiroux HLachaize RQuéma VTrigonakis V(2019)Lock–UnlockACM Transactions on Computer Systems10.1145/330150136:1(1-149)Online publication date: 14-Mar-2019
https://dl.acm.org/doi/10.1145/3301501
Dogan HAhmad MKahne BKhan O(2019)Accelerating Synchronization Using Moving Compute to Data Model at 1,000-core Multicore ScaleACM Transactions on Architecture and Code Optimization10.1145/330020816:1(1-27)Online publication date: 13-Feb-2019
https://dl.acm.org/doi/10.1145/3300208
Wang KFussell DLin CBahar IHerlihy MWitchel ELebeck A(2019)Fast Fine-Grained Global Synchronization on GPUsProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304055(793-806)Online publication date: 4-Apr-2019
https://dl.acm.org/doi/10.1145/3297858.3304055
Kim JMathew AKashyap SRamanathan MMin CBahar IHerlihy MWitchel ELebeck A(2019)MV-RLUProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304040(779-792)Online publication date: 4-Apr-2019
https://dl.acm.org/doi/10.1145/3297858.3304040
Tang XZhai JQian XChen WBahar IHerlihy MWitchel ELebeck A(2019)pLockProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304030(765-778)Online publication date: 4-Apr-2019
https://dl.acm.org/doi/10.1145/3297858.3304030
Reif SSchröder-Preikschat W(2018)A predictable synchronisation algorithmACM SIGPLAN Notices10.1145/3200691.317853353:1(415-416)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3200691.3178533
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents