research-article

Decoupling contention management from scheduling

Authors:

F. Ryan Johnson,

Anastasia Ailamaki,

Todd C. MowryAuthors Info & Claims

ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems

Pages 117 - 128

https://doi.org/10.1145/1736020.1736035

Published: 13 March 2010 Publication History

Abstract

Many parallel applications exhibit unpredictable communication between threads, leading to contention for shared objects. The choice of contention management strategy impacts strongly the performance and scalability of these applications: spinning provides maximum performance but wastes significant processor resources, while blocking-based approaches conserve processor resources but introduce high overheads on the critical path of computation. Under situations of high or changing load, the operating system complicates matters further with arbitrary scheduling decisions which often preempt lock holders, leading to long serialization delays until the preempted thread resumes execution.

We observe that contention management is orthogonal to the problems of scheduling and load management and propose to decouple them so each may be solved independently and effectively. To this end, we propose a load control mechanism which manages the number of active threads in the system separately from any contention which may exist. By isolating contention management from damaging interactions with the OS scheduler, we combine the efficiency of spinning with the robustness of blocking. The proposed load control mechanism results in stable, high performance for both lightly and heavily loaded systems, requires no special privileges or modifications at the OS level, and can be implemented as a library which benefits existing code.

References

[1]

A. Agarwal and M. Cherian. Adaptive backoff synchronization techniques. In Proc. ISCA (1989), pp. 396--406.

Digital Library

[2]

T. Anderson, B. Bershad, E. Lazowska, and H. Levy. Scheduler activations: effective kernel support for the user-level management of parallelism. In ACM Transactions on Computer Systems (TOCS) 10,2 (Feb 1992), pp. 53--79.

Digital Library

[3]

N. Bartolini, G. Bongiovanni, Simone Silvestri. Self-* through self-learning: Overload control for distributed web systems. In International Journal of Computer and Telecommunications Networking 53,5 (Apr 2009), pp. 727-743.

Digital Library

[4]

C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proc. PACT (Oct 2008). Source package available at http://parsec.cs.princeton.edu.

Digital Library

[5]

M. Blasgen, J. Gray, M. Mitoma, and T. Price. The convoy phenomenon. ACM SIGOPS Operating Systems Review 13,2 (1979) pp. 20--25.

Digital Library

[6]

L. Boguslavsky, K. Harzallah, A. Kreinen, K. Sevcik, A. Vainshtein. Optimal strategies for spinning and blocking. Journal of Parallel and Distributed Computing 21,2 (May 1994), pp. 246-254.

Digital Library

[7]

B. Cantrill, M. Shapiro, and A. Leventhal. 2004. Dynamic instrumentation of production systems. In Proc. Usenix Annual Technical Conference, 2004.

Digital Library

[8]

M. Carey, S. Krishnamurthi, and M. Livny. Load control for locking: the "half-and-half" approach. In Proc. Symposium on Principles of Database Systems (PODS) (Apr 1990).

Digital Library

[9]

J. Carlstrom and R. Rom. Application aware admission control and scheduling in web servers. In Proc. IEEE INFOCOM (2002).

[10]

D. Dice and N. Shavit. What really makes transactions fast? In Proc. ACM SIGPLAN Workshop on Languages, Compilers, and Hardware Support for Transactional Compouting (Transact) (Jun 2006).

[11]

A. Dragojevi, R. Guerraoui, and M. Kapalka. Dividing Transactional Memories by Zero. In Proc. Transact (2008, Salt Lake City, UT).

[12]

H. Franke, R. Russell, M. K. Fuss. Futexes and furwocks: Fast userlevel locking in linux. In Proc. 2002 Ottawa Linux Summit (2002).

[13]

G. Franklin, J. Powell, and A. Emami-Naeini. Feedback control of dynamic systems, 4th edition. Prentice Hall, NJ, USA.

Digital Library

[14]

A. Gupta, A. Tucker, and S. Urushibara. The impact of operating scheduling policies and synchronization methods on the performance of parallel applications. In Proc. ACM SIGMETRICS Conference on Measuring and Modeling Computer Systems (May 1991).

Digital Library

[15]

B. He, W. N. Scherer III, and M. L. Scott. Preemption adaptivity in time-published queue-based spin locks. In Proc. High Performance Computing (HiPC) (2005).

Digital Library

[16]

M. Herlihy. Wait-free synchronization. ACM Trans. on Programming Languages and Systems (TOPLAS) 13,1 (Jan 1991), pp. 124--149.

Digital Library

[17]

M. Herlihy and J. Moss. Transactional memory: architectural support for lock-free data structures. ACM SIGARCH Computer Architecture News 21, 2 (May 1993), pp. 289--300.

Digital Library

[18]

IBM. Telecom Application Transaction Processing (TATP) Benchmark Description. Available online at http://tatpbenchmark.sourceforge.net/TATP_Description.pdf.

[19]

R. Johnson, M. Athannassoulis, R. Stoica, and A. Ailamaki. A new look at the roles of spinning and blocking. In Proc. ACM SIGMOD Workshop on Data Management on New Hardware (DaMoN) (Jul 2009, Providence, RI).

Digital Library

[20]

R. Johnson, I. Pandis, A. Ailamaki, and B. Falsafi. Shore-MT: a scalable storage manager for the multicore era. In Proc EDBT'09 (Mar 2009, St.Petersburg). Source code and benchmark kit available at http://diaswww.epfl.ch/shore-mt/

Digital Library

[21]

R. Kalman. A New Approach to Linear Filtering and Prediction Problems. Transactions of the ASME -- Journal of Basic Engineering 82,D (1960), pp. 35--45.

[22]

P. Magnussen, A. Landin, and E. Hagersten. Queue locks on cache coherent multiprocessors. In Proc. International Symposium on Parallel Processing (Apr. 1994), pp. 165-171.

Digital Library

[23]

J. Mauro and R. McDougall. Solaris Internals: Core Kernel Components. Sun Microsystems Press (2001).

Digital Library

[24]

J. M. Mellor-Crummey, M. L. Scott, Algorithms for scalable synchronization on shared-memory multiprocessors, ACM TOCS 9,1 (Feb 1991), p.21--65.

Digital Library

[25]

A. Monkeberg, G. Weikum. Performance evaluation of an adaptive and robust load control method for the avoidance of data contention thrashing. In Proc. Very Large Databases (VLDB) (1992).

Digital Library

[26]

Nokia. Network Database Benchmark. Specification and reference implementation available online at http://hoslab.cs.helsinki.fi/homepages/ndbbenchmark/

[27]

J. K. Ousterhout. Scheduling techniques for concurrent systems. In Proc. Conf. on Dist. Computing Systems (1982).

[28]

I. Pandis, R. Johnson, N. Hardavellas, and A. Ailamaki. Data-Oriented Transaction Execution. To appear, Proc. of the VLDB Endowment 3,1 (Aug 2010).

Digital Library

[29]

D. P. Reed and R. K. Kanodia. Synchronization with Eventcounts and Sequencers. Communications of the ACM, 22(2):115--23 (Feb. 1979).

Digital Library

[30]

N. Shavit and D. Touitou. Software transactional memory. In Proc ACM Symposium on Principles of Distributed Computing (PODC) (Aug 1995), pp. 204--213.

Digital Library

[31]

Transaction Processing Council (TPC). TPC benchmark C (OLTP) standard specification, revision 5.9. Available online at http://www.tpc.org/tpcc/spec/tpcc_current.pdf.

[32]

M. Welsh, D. Culler, and E. Brewer. SEDA: an architecture for well-conditioned, scalable internet services. In Proc. Symposium on operating systems principles (SOSP) (Dec 2001).

Digital Library

[33]

S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In Proc ISCA (Jun 1995), pp. 24-38

Digital Library

Cited By

Huang HRao JWu SJin HJiang HChe HWu XLaure EMarkidis SVerbanescu ALofstead G(2021)Towards Exploiting CPU Elasticity via Efficient Thread OversubscriptionProceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3431379.3460641(215-226)Online publication date: 21-Jun-2021
https://dl.acm.org/doi/10.1145/3431379.3460641
Adamus-Matuszyńska APolok G(2020)Decoupling in the perspective of responsibility11th International Scientific Conference “Business and Management 2020”10.3846/bm.2020.588Online publication date: 2020
https://doi.org/10.3846/bm.2020.588
Guerraoui RGuiroux HLachaize RQuéma VTrigonakis V(2019)Lock–UnlockACM Transactions on Computer Systems10.1145/330150136:1(1-149)Online publication date: 14-Mar-2019
https://dl.acm.org/doi/10.1145/3301501
Show More Cited By

Index Terms

Decoupling contention management from scheduling

Recommendations

Decoupling contention management from scheduling
ASPLOS '10

Many parallel applications exhibit unpredictable communication between threads, leading to contention for shared objects. The choice of contention management strategy impacts strongly the performance and scalability of these applications: spinning ...
Decoupling contention management from scheduling
ASPLOS '10

Many parallel applications exhibit unpredictable communication between threads, leading to contention for shared objects. The choice of contention management strategy impacts strongly the performance and scalability of these applications: spinning ...
Malthusian Locks
EuroSys '17: Proceedings of the Twelfth European Conference on Computer Systems

Applications running in modern multithreaded environments are sometimes overthreaded. The excess threads do not improve performance, and in fact may act to degrade performance via scalability collapse, which can manifest even when there are fewer ready ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems

March 2010

422 pages

ISBN:9781605588391

DOI:10.1145/1736020

General Chair:
James C. Hoe
Carnegie Mellon University, USA
,
Program Chair:
Vikram S. Adve
University of Illinois at Urbana-Champaign, USA

ACM SIGARCH Computer Architecture News Volume 38, Issue 1
ASPLOS '10
March 2010
399 pages
ISSN:0163-5964
DOI:10.1145/1735970
Issue’s Table of Contents
ACM SIGPLAN Notices Volume 45, Issue 3
ASPLOS '10
March 2010
399 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1735971
Issue’s Table of Contents

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASPLOS '10

Sponsor:

ASPLOS '10: Architectural Support for Programming Languages and Operating Systems

March 13 - 17, 2010

Pennsylvania, Pittsburgh, USA

Acceptance Rates

ASPLOS XV Paper Acceptance Rate 32 of 181 submissions, 18%;

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

44
Total Citations
View Citations
1,227
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)6

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Huang HRao JWu SJin HJiang HChe HWu XLaure EMarkidis SVerbanescu ALofstead G(2021)Towards Exploiting CPU Elasticity via Efficient Thread OversubscriptionProceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3431379.3460641(215-226)Online publication date: 21-Jun-2021
https://dl.acm.org/doi/10.1145/3431379.3460641
Adamus-Matuszyńska APolok G(2020)Decoupling in the perspective of responsibility11th International Scientific Conference “Business and Management 2020”10.3846/bm.2020.588Online publication date: 2020
https://doi.org/10.3846/bm.2020.588
Guerraoui RGuiroux HLachaize RQuéma VTrigonakis V(2019)Lock–UnlockACM Transactions on Computer Systems10.1145/330150136:1(1-149)Online publication date: 14-Mar-2019
https://dl.acm.org/doi/10.1145/3301501
Yi ZChen FYao Y(2019)A barrier optimization framework for NUMA multi‐core systemConcurrency and Computation: Practice and Experience10.1002/cpe.552732:5Online publication date: 21-Oct-2019
https://doi.org/10.1002/cpe.5527
Sriraman AWenisch TArpaci-Dusseau AVoelker G(2018)µtuneProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291182(177-194)Online publication date: 8-Oct-2018
https://dl.acm.org/doi/10.5555/3291168.3291182
Dice D(2017)Malthusian LocksProceedings of the Twelfth European Conference on Computer Systems10.1145/3064176.3064203(314-327)Online publication date: 23-Apr-2017
https://dl.acm.org/doi/10.1145/3064176.3064203
Shan JDing XGehani N(2017)APPLES: Efficiently Handling Spin-lock Synchronization on Virtualized PlatformsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.262524928:7(1811-1824)Online publication date: 1-Jul-2017
https://doi.org/10.1109/TPDS.2016.2625249
Antić JChatzopoulos GGuerraoui RTrigonakis V(2016)Locking Made EasyProceedings of the 17th International Middleware Conference10.1145/2988336.2988357(1-14)Online publication date: 28-Nov-2016
https://dl.acm.org/doi/10.1145/2988336.2988357
Orosa LBruguera JAntelo E(2016)Asymmetric Allocation in a Shared Flexible Signature Module for Multicore ProcessorsThe Computer Journal10.1093/comjnl/bxw01059:10(1453-1469)Online publication date: 17-Mar-2016
https://doi.org/10.1093/comjnl/bxw010
Wu SPeng YJin H(2016)Time Donating Barrier for efficient task scheduling in competitive multicore systemsFuture Generation Computer Systems10.1016/j.future.2015.04.00554:C(469-477)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.1016/j.future.2015.04.005
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents