Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1454115.1454146acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Analysis and approximation of optimal co-scheduling on chip multiprocessors

Published: 25 October 2008 Publication History

Abstract

Cache sharing among processors is important for Chip Multiprocessors to reduce inter-thread latency, but also brings cache contention, degrading program performance considerably. Recent studies have shown that job co-scheduling can effectively alleviate the contention, but it remains an open question how to efficiently find optimal co-schedules. Solving the question is critical for determining the potential of a co-scheduling system. This paper presents a theoretical analysis of the complexity of co-scheduling, proving its NP-completeness. Furthermore, for a special case when there are two sharers per chip, we propose an algorithm that finds the optimal co-schedules in polynomial time. For more complex cases, we design and evaluate a sequence of approximation algorithms, among which, the hierarchical matching algorithm produces near-optimal schedules and shows good scalability. This study facilitates the evaluation of co-scheduling systems, as well as offers some techniques directly usable in proactive job co-scheduling.

References

[1]
J. R. Bulpin and I. A. Pratt. Hyper-threading aware process scheduling heuristics. In 2005 USENIX Annual Technical Conference, pages 103--106, 2005.
[2]
D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), 2005.
[3]
W. Cook and A. Rohe. Computing minimum-weight perfect matchings. INFORMS Journal on Computing, 11:138--148, 1999.
[4]
P. Denning. Thrashing: Its causes and prevention. In Proceedings of the AFIPS 1968 Fall Joint Computer Conference, volume 33, pages 915--922, 1968.
[5]
M. DeVuyst, R. Kumar, and D. M. Tullsen. Exploiting unbalanced thread scheduling for energy and performance on a cmp of smt processors. In Proceedings of International Parallel and Distribute Processing Symposium (IPDPS), 2006.
[6]
J. Edmonds. Maximum matching and a polyhedron with 0,1-vertices. Journal of Research of the National Bureau of Standards B, 69B:125--130, 1965.
[7]
A. El-Moursy, R. Garg, D. H. Albonesi, and S. Dwarkadas. Compatible phase co-scheduling on a cmp of multi-threaded processors. In Proceedings of International Parallel and Distribute Processing Symposium (IPDPS), 2006.
[8]
A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance of multithreaded chip multiprocessors and implications for operating system design. In USENIX Annual Technical Conference, 2005.
[9]
A. Fedorova, M. Seltzer, and M. D. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2007.
[10]
H. Gabow and R. E. Tarjan. Faster scaling algorithms for general graph-matching problems. Journal of ACM, 38:815--853, 1991.
[11]
M. Garey and D. Johnson. Computers and Intractability. Feeman, San Francisco, CA, 1979.
[12]
L. R. Hsu, S. K. Reinhardt, R. Lyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2006.
[13]
J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. Keckler. A nuca substrate for flexible cmp cache sharing. In Proceedings of International Conference on Supercomputing, pages 31--40, 2005.
[14]
Y. Jiang and X. Shen. Exploration of the influence of program inputs on cmp co-scheduling. In European Conference on Parallel Computing (Euro-Par), August 2008.
[15]
R. Karp. Reducibility among combinatiorial problems. In R. Miller and J. Thatcher, editors, Complexity of Computer Computations, pages 85--103. Plenum Press, 1972.
[16]
S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2004.
[17]
R. Kumar, D. M. Tullsen, and N. P. Jouppi. Core architecture optimization for heterogeneous chip multiprocessors. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2006.
[18]
J. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE TCCA Newsletter, 1995. http://www.cs.virginia.edu/stream.
[19]
P. Nagpurkar, M. Hind, C. Krintz, P. F. Sweeney, and V. Rajan. Online phase detection algorithms. In Proceedings of the International Symposium on Code Generation and Optimization, March 2006.
[20]
Nakijima and Pallipadi. Enhancements for hyperthreading technology in the operating system -- seeking the optimal scheduling. In Proceedings of USENIX Annual Technical Conference, 2002.
[21]
S. Parekh, S. Eggers, H. Levy, and J. Lo. Thread-sensitive scheduling for smt processors. Technical Report 2000-04-02, University of Washington, June 2000.
[22]
N. Rafique, W. Lim, and M. Thottethodi. Architectural support for operating system-driven cmp cache management. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2006.
[23]
A. Settle, J. L. Kihm, A. Janiszewski, and D. A. Connors. Architectural support for enhanced smt job scheduling. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pages 63--73, 2004.
[24]
X. Shen and J. Shaw. Scalable implementation of efficient locality approximation. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing, 2008.
[25]
X. Shen, J. Shaw, B. Meeker, and C. Ding. Locality approximation using time. In Proceedings of the ACM SIGPLAN Conference on Principles of Programming Languages (POPL), 2007.
[26]
X. Shen, Y. Zhong, and C. Ding. Locality phase prediction. In Proceedings of the Eleventh International Conference on Architect ural Support for Programming Languages and Operating Systems (ASPLOS XI), Boston, MA, 2004.
[27]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, October 2002.
[28]
A. Snavely and D. Tullsen. Symbiotic jobscheduling for a simultaneous multithreading processor. In Proceedings of ASPLOS, 2000.
[29]
A. Snavely, D. Tullsen, and G. Voelker. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, 2002.
[30]
H. Stone, J. Turek, and J. Wolf. Optimal partitioning of cache memory. IEEE Transactions on Computers, 41(9), 1992.
[31]
G. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2002.
[32]
N. Tuck and D. M. Tullsen. Initial observations of the simultaneous multithreading Pentium 4 processor. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, New Orleans, Louisiana, September 2003.
[33]
X. Zhang, S. Dwarkadas, G. Folkmanis, and K. Shen. Processor hardware counter statistics as a first-class system resource. In Proceedings of the 11th Workshop on Hot Topics in Operating Systems, 2007.
[34]
Y. Zhong and W. Chang. Sampling-based program locality approximation. In Proceedings of the International Symposium on Memory Management, 2008.

Cited By

View all
  • (2023)Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00023(185-196)Online publication date: 31-Oct-2023
  • (2023)Efficient thread‐to‐core mapping alternatives for application‐level redundant multithreadingConcurrency and Computation: Practice and Experience10.1002/cpe.762235:24Online publication date: 18-Jan-2023
  • (2022)Scorpius: Proactive Code Preparation to Accelerate Function Startup2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS)10.1109/IWQoS54832.2022.9812868(1-10)Online publication date: 10-Jun-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques
October 2008
328 pages
ISBN:9781605582825
DOI:10.1145/1454115
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 October 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CMP scheduling
  2. cache contention
  3. co-scheduling
  4. perfect matching

Qualifiers

  • Research-article

Conference

PACT '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)2
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00023(185-196)Online publication date: 31-Oct-2023
  • (2023)Efficient thread‐to‐core mapping alternatives for application‐level redundant multithreadingConcurrency and Computation: Practice and Experience10.1002/cpe.762235:24Online publication date: 18-Jan-2023
  • (2022)Scorpius: Proactive Code Preparation to Accelerate Function Startup2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS)10.1109/IWQoS54832.2022.9812868(1-10)Online publication date: 10-Jun-2022
  • (2021)Octans: Optimal Placement of Service Function Chains in Many-Core SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.306361332:9(2202-2215)Online publication date: 1-Sep-2021
  • (2021)Threads Scheduling and Load Balancing with Loop Iteration in Multicore Processors: a Case Study with OpenMP2021 3rd International Conference on Sustainable Technologies for Industry 4.0 (STI)10.1109/STI53101.2021.9732563(1-6)Online publication date: 18-Dec-2021
  • (2021)Interference-aware execution framework with Co-scheML on GPU clustersCluster Computing10.1007/s10586-021-03299-z26:5(2577-2589)Online publication date: 18-May-2021
  • (2021)Multi-core Processor Scheduling with Respect to Data Bus BandwidthAdvances in Optimization and Applications10.1007/978-3-030-65739-0_5(55-69)Online publication date: 18-Jan-2021
  • (2021)Mitigating execution unit contention in parallel applications using instruction‐aware mappingConcurrency and Computation: Practice and Experience10.1002/cpe.681935:17Online publication date: 30-Dec-2021
  • (2020)Data mining and image analysis using genetic programmingACM SIGAPP Applied Computing Review10.1145/3381307.338131119:4(40-49)Online publication date: 28-Jan-2020
  • (2020)Open-source tools and benchmarks for code-clone detectionACM SIGAPP Applied Computing Review10.1145/3381307.338131019:4(28-39)Online publication date: 28-Jan-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media