Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1736020.1736035acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Decoupling contention management from scheduling

Published: 13 March 2010 Publication History

Abstract

Many parallel applications exhibit unpredictable communication between threads, leading to contention for shared objects. The choice of contention management strategy impacts strongly the performance and scalability of these applications: spinning provides maximum performance but wastes significant processor resources, while blocking-based approaches conserve processor resources but introduce high overheads on the critical path of computation. Under situations of high or changing load, the operating system complicates matters further with arbitrary scheduling decisions which often preempt lock holders, leading to long serialization delays until the preempted thread resumes execution.
We observe that contention management is orthogonal to the problems of scheduling and load management and propose to decouple them so each may be solved independently and effectively. To this end, we propose a load control mechanism which manages the number of active threads in the system separately from any contention which may exist. By isolating contention management from damaging interactions with the OS scheduler, we combine the efficiency of spinning with the robustness of blocking. The proposed load control mechanism results in stable, high performance for both lightly and heavily loaded systems, requires no special privileges or modifications at the OS level, and can be implemented as a library which benefits existing code.

References

[1]
A. Agarwal and M. Cherian. Adaptive backoff synchronization techniques. In Proc. ISCA (1989), pp. 396--406.
[2]
T. Anderson, B. Bershad, E. Lazowska, and H. Levy. Scheduler activations: effective kernel support for the user-level management of parallelism. In ACM Transactions on Computer Systems (TOCS) 10,2 (Feb 1992), pp. 53--79.
[3]
N. Bartolini, G. Bongiovanni, Simone Silvestri. Self-* through self-learning: Overload control for distributed web systems. In International Journal of Computer and Telecommunications Networking 53,5 (Apr 2009), pp. 727-743.
[4]
C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proc. PACT (Oct 2008). Source package available at http://parsec.cs.princeton.edu.
[5]
M. Blasgen, J. Gray, M. Mitoma, and T. Price. The convoy phenomenon. ACM SIGOPS Operating Systems Review 13,2 (1979) pp. 20--25.
[6]
L. Boguslavsky, K. Harzallah, A. Kreinen, K. Sevcik, A. Vainshtein. Optimal strategies for spinning and blocking. Journal of Parallel and Distributed Computing 21,2 (May 1994), pp. 246-254.
[7]
B. Cantrill, M. Shapiro, and A. Leventhal. 2004. Dynamic instrumentation of production systems. In Proc. Usenix Annual Technical Conference, 2004.
[8]
M. Carey, S. Krishnamurthi, and M. Livny. Load control for locking: the "half-and-half" approach. In Proc. Symposium on Principles of Database Systems (PODS) (Apr 1990).
[9]
J. Carlstrom and R. Rom. Application aware admission control and scheduling in web servers. In Proc. IEEE INFOCOM (2002).
[10]
D. Dice and N. Shavit. What really makes transactions fast? In Proc. ACM SIGPLAN Workshop on Languages, Compilers, and Hardware Support for Transactional Compouting (Transact) (Jun 2006).
[11]
A. Dragojevi, R. Guerraoui, and M. Kapalka. Dividing Transactional Memories by Zero. In Proc. Transact (2008, Salt Lake City, UT).
[12]
H. Franke, R. Russell, M. K. Fuss. Futexes and furwocks: Fast userlevel locking in linux. In Proc. 2002 Ottawa Linux Summit (2002).
[13]
G. Franklin, J. Powell, and A. Emami-Naeini. Feedback control of dynamic systems, 4th edition. Prentice Hall, NJ, USA.
[14]
A. Gupta, A. Tucker, and S. Urushibara. The impact of operating scheduling policies and synchronization methods on the performance of parallel applications. In Proc. ACM SIGMETRICS Conference on Measuring and Modeling Computer Systems (May 1991).
[15]
B. He, W. N. Scherer III, and M. L. Scott. Preemption adaptivity in time-published queue-based spin locks. In Proc. High Performance Computing (HiPC) (2005).
[16]
M. Herlihy. Wait-free synchronization. ACM Trans. on Programming Languages and Systems (TOPLAS) 13,1 (Jan 1991), pp. 124--149.
[17]
M. Herlihy and J. Moss. Transactional memory: architectural support for lock-free data structures. ACM SIGARCH Computer Architecture News 21, 2 (May 1993), pp. 289--300.
[18]
IBM. Telecom Application Transaction Processing (TATP) Benchmark Description. Available online at http://tatpbenchmark.sourceforge.net/TATP_Description.pdf.
[19]
R. Johnson, M. Athannassoulis, R. Stoica, and A. Ailamaki. A new look at the roles of spinning and blocking. In Proc. ACM SIGMOD Workshop on Data Management on New Hardware (DaMoN) (Jul 2009, Providence, RI).
[20]
R. Johnson, I. Pandis, A. Ailamaki, and B. Falsafi. Shore-MT: a scalable storage manager for the multicore era. In Proc EDBT'09 (Mar 2009, St.Petersburg). Source code and benchmark kit available at http://diaswww.epfl.ch/shore-mt/
[21]
R. Kalman. A New Approach to Linear Filtering and Prediction Problems. Transactions of the ASME -- Journal of Basic Engineering 82,D (1960), pp. 35--45.
[22]
P. Magnussen, A. Landin, and E. Hagersten. Queue locks on cache coherent multiprocessors. In Proc. International Symposium on Parallel Processing (Apr. 1994), pp. 165-171.
[23]
J. Mauro and R. McDougall. Solaris Internals: Core Kernel Components. Sun Microsystems Press (2001).
[24]
J. M. Mellor-Crummey, M. L. Scott, Algorithms for scalable synchronization on shared-memory multiprocessors, ACM TOCS 9,1 (Feb 1991), p.21--65.
[25]
A. Monkeberg, G. Weikum. Performance evaluation of an adaptive and robust load control method for the avoidance of data contention thrashing. In Proc. Very Large Databases (VLDB) (1992).
[26]
Nokia. Network Database Benchmark. Specification and reference implementation available online at http://hoslab.cs.helsinki.fi/homepages/ndbbenchmark/
[27]
J. K. Ousterhout. Scheduling techniques for concurrent systems. In Proc. Conf. on Dist. Computing Systems (1982).
[28]
I. Pandis, R. Johnson, N. Hardavellas, and A. Ailamaki. Data-Oriented Transaction Execution. To appear, Proc. of the VLDB Endowment 3,1 (Aug 2010).
[29]
D. P. Reed and R. K. Kanodia. Synchronization with Eventcounts and Sequencers. Communications of the ACM, 22(2):115--23 (Feb. 1979).
[30]
N. Shavit and D. Touitou. Software transactional memory. In Proc ACM Symposium on Principles of Distributed Computing (PODC) (Aug 1995), pp. 204--213.
[31]
Transaction Processing Council (TPC). TPC benchmark C (OLTP) standard specification, revision 5.9. Available online at http://www.tpc.org/tpcc/spec/tpcc_current.pdf.
[32]
M. Welsh, D. Culler, and E. Brewer. SEDA: an architecture for well-conditioned, scalable internet services. In Proc. Symposium on operating systems principles (SOSP) (Dec 2001).
[33]
S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In Proc ISCA (Jun 1995), pp. 24-38

Cited By

View all
  • (2021)Towards Exploiting CPU Elasticity via Efficient Thread OversubscriptionProceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3431379.3460641(215-226)Online publication date: 21-Jun-2021
  • (2020)Decoupling in the perspective of responsibility11th International Scientific Conference “Business and Management 2020”10.3846/bm.2020.588Online publication date: 2020
  • (2019)Lock–UnlockACM Transactions on Computer Systems10.1145/330150136:1(1-149)Online publication date: 14-Mar-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems
March 2010
422 pages
ISBN:9781605588391
DOI:10.1145/1736020
  • General Chair:
  • James C. Hoe,
  • Program Chair:
  • Vikram S. Adve
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 38, Issue 1
    ASPLOS '10
    March 2010
    399 pages
    ISSN:0163-5964
    DOI:10.1145/1735970
    Issue’s Table of Contents
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 45, Issue 3
    ASPLOS '10
    March 2010
    399 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1735971
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. blocking
  2. concurrency control
  3. contention
  4. load management
  5. multicore
  6. scheduling
  7. spinning
  8. threads

Qualifiers

  • Research-article

Conference

ASPLOS '10

Acceptance Rates

ASPLOS XV Paper Acceptance Rate 32 of 181 submissions, 18%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)6
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Towards Exploiting CPU Elasticity via Efficient Thread OversubscriptionProceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3431379.3460641(215-226)Online publication date: 21-Jun-2021
  • (2020)Decoupling in the perspective of responsibility11th International Scientific Conference “Business and Management 2020”10.3846/bm.2020.588Online publication date: 2020
  • (2019)Lock–UnlockACM Transactions on Computer Systems10.1145/330150136:1(1-149)Online publication date: 14-Mar-2019
  • (2019)A barrier optimization framework for NUMA multi‐core systemConcurrency and Computation: Practice and Experience10.1002/cpe.552732:5Online publication date: 21-Oct-2019
  • (2018)µtuneProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291182(177-194)Online publication date: 8-Oct-2018
  • (2017)Malthusian LocksProceedings of the Twelfth European Conference on Computer Systems10.1145/3064176.3064203(314-327)Online publication date: 23-Apr-2017
  • (2017)APPLES: Efficiently Handling Spin-lock Synchronization on Virtualized PlatformsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.262524928:7(1811-1824)Online publication date: 1-Jul-2017
  • (2016)Locking Made EasyProceedings of the 17th International Middleware Conference10.1145/2988336.2988357(1-14)Online publication date: 28-Nov-2016
  • (2016)Asymmetric Allocation in a Shared Flexible Signature Module for Multicore ProcessorsThe Computer Journal10.1093/comjnl/bxw01059:10(1453-1469)Online publication date: 17-Mar-2016
  • (2016)Time Donating Barrier for efficient task scheduling in competitive multicore systemsFuture Generation Computer Systems10.1016/j.future.2015.04.00554:C(469-477)Online publication date: 1-Jan-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media