Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1810085.1810113acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

An approach to resource-aware co-scheduling for CMPs

Published: 02 June 2010 Publication History

Abstract

We develop real-time scheduling techniques for improving performance and energy for multiprogrammed workloads that scale non-uniformly with increasing thread counts. Multithreaded programs generally deliver higher throughput than single-threaded programs on chip multiprocessors, but performance gains from increasing threads decrease when there is contention for shared resources. We use analytic metrics to derive local search heuristics for creating efficient multiprogrammed, multithreaded workload schedules. Programs are allocated fewer cores than requested, and scheduled to space-share the CMP to improve global throughput. Our holistic approach attempts to co-schedule programs that complement each other with respect to shared resource consumption. We find application co-scheduling for performance and energy in a resource-aware manner achieves better results than solely targeting total throughput or concurrently co-scheduling all programs. Our schedulers improve overall energy delay (E*D) by a factor of 1.5 over time-multiplexed gang scheduling.

References

[1]
D. Bailey, T. Harris, W. Saphir, R. Van der Wijngaart, A. Woo, and M. Yarrow. The NAS parallel benchmarks 2.0. Report NAS-95-020, NASA Ames Research Center, Dec. 1995.
[2]
M. Banikazemi, D. Poff, and B. Abali. PAM: A novel performance/power aware meta-scheduler for multi-core systems. In Proc. IEEE/ACM Supercomputing International Conference on High Performance Computing, Networking, Storage and Analysis, number 39, Nov. 2008.
[3]
M. Bhadauria and S. McKee. Optimizing thread throughput for multithreaded workloads on memory constrained CMPs. In Proc. ACM Computing Frontiers Conference, pages 119--128, May 2008.
[4]
C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. IEEE/ACM International Conference on Parallel Architectures and Compilation Techniques, pages 72--81, Oct. 2008.
[5]
C. Boneti, R. Gioiosa, F. Cazorla, and M. Valero. A dynamic scheduler for balancing HPC applications. In Proc. IEEE/ACM Supercomputing International Conference on High Performance Computing, Networking, Storage and Analysis, number 41, Nov. 2008.
[6]
J. Corbalan, X. Martorell, and J. Labarta. Performance-driven processor allocation. In Proc. 4th USENIX Symposium on Operating System Design and Implementation, pages 59--73, Oct. 2000.
[7]
J. Corbalan, X. Martorell, and J. Labarta. Improving gang scheduling through job performance analysis and malleability. In Proc. 15th ACM International Conference on Supercomputing, pages 303--312, June 2001.
[8]
M. Curtis-Maury, K. Singh, S. McKee, F. Blagojevic, D. Nikolopoulos, B. de Supinski, and M. Schulz. Identifying energy-efficient concurrency levels using machine learning. In Proc. 1st International Workshop on Green Computing, Sept. 2007.
[9]
Electronic Educational Devices. Watts Up PRO. http://www.wattsupmeters.com/, May 2009.
[10]
A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance of multithreaded chip multiprocessors and implications for operating system design. In Proc. USENIX Annual Technical Conference, pages 26--26, Apr. 2005.
[11]
E. Frachtenberg, D. G. Feitelson, F. Petrini, and J. Fernandez. Adaptive parallel job scheduling with flexible coscheduling. IEEE Transactions on Parallel and Distributed Systems, 16(11):1066--1077, Nov. 2005.
[12]
S. Herbert and D. Marculescu. Analysis of dynamic voltage/frequency scaling in chip-multiprocessors. In Proc. IEEE/ACM International Symposium on Low Power Electronics and Design, pages 38--43, Aug. 2007.
[13]
C. Isci, G. Contreras, and M. Martonosi. Live, runtime phase monitoring and prediction on real systems with application to dynamic power management. In Proc. IEEE/ACM 40th Annual International Symposium on Microarchitecture, pages 359--370, Dec. 2006.
[14]
P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded SPARC processor. IEEE Micro, 25(2):21--29, Mar. 2005.
[15]
R. McGregor, C. Antonopoulos, and D. Nikolopoulos. Scheduling algorithms for effective thread pairing on hybrid multiprocessors. In Proc. 19th IEEE/ACM International Parallel and Distributed Processing Symposium, volume 1, page 28a, Los Alamitos, CA, USA, Apr. 2005. IEEE Computer Society.
[16]
S. McKee. Maximizing Memory Bandwidth for Streamed Computations. PhD thesis, School of Engineering and Applied Science, Univ. of Virginia, May 1995.
[17]
K. Nesbit, N. Aggarwal, J. Laudon, and J. Smith. Fair queuing memory systems. In Proc. IEEE/ACM 40th Annual International Symposium on Microarchitecture, pages 208--222, Dec. 2006.
[18]
K. Nesbit, J. Laudon, and J. Smith. Virtual private caches. In Proc. 34th IEEE/ACM International Symposium on Computer Architecture, pages 57--68, June 2007.
[19]
S. Parekh, S. Eggers, and H. Levy. Thread-sensitive scheduling for SMT processors. Technical Report Technical Report, University of Washington, 2000.
[20]
C. Severance and R. Enbody. Comparing gang scheduling with dynamic space sharing on symmetric multiprocessors using automatic self-allocating threads (ASAT). In 11th International Parallel Processing Symposium, pages 288--292, Apr. 1997.
[21]
K. Singh, M. Bhadauria, and S. McKee. Real time power estimation of multi-cores via performance counters. Proc. Workshop on Design, Architecture and Simulation of Chip Multi-Processors, Nov. 2008.
[22]
G. Suh, L. Rudolph, and S. Devadas. Effects of memory performance on parallel job scheduling. Lecture Notes in Computer Science, 2221:116, Jan. 2001.
[23]
G. E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proc. 8th IEEE Symposium on High Performance Computer Architecture, pages 117--125, Feb. 2002.
[24]
M. A. Suleman, M. K. Qureshi, and Y. N. Patt. Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs. In Proc. 13th ACM Symposium on Architectural Support for Programming Languages and Operating Systems, pages 277--286, Mar. 2008.
[25]
D. M. Tullsen, S. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In Proc. 22nd IEEE/ACM International Symposium on Computer Architecture, pages 392--403, June 1995.
[26]
M. D. Vuyst, R. Kumar, and D. Tullsen. Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors. In Proc. 20th IEEE/ACM International Parallel and Distributed Processing Symposium, page 10, Apr. 2006.

Cited By

View all
  • (2024)A Novel Priority Based Scheduler for Asymmetric Multi-core Edge ComputingCurrent Trends in Web Engineering10.1007/978-3-031-50385-6_1(7-18)Online publication date: 4-Jan-2024
  • (2023)Resource scheduling techniques in cloud from a view of coordination: a holistic survey从协同视角论云资源调度技术:综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210029824:1(1-40)Online publication date: 23-Jan-2023
  • (2023)Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00023(185-196)Online publication date: 31-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '10: Proceedings of the 24th ACM International Conference on Supercomputing
June 2010
365 pages
ISBN:9781450300186
DOI:10.1145/1810085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CMP
  2. energy efficiency
  3. performance
  4. scheduling

Qualifiers

  • Research-article

Conference

ICS'10
Sponsor:
ICS'10: International Conference on Supercomputing
June 2 - 4, 2010
Ibaraki, Tsukuba, Japan

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)2
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Novel Priority Based Scheduler for Asymmetric Multi-core Edge ComputingCurrent Trends in Web Engineering10.1007/978-3-031-50385-6_1(7-18)Online publication date: 4-Jan-2024
  • (2023)Resource scheduling techniques in cloud from a view of coordination: a holistic survey从协同视角论云资源调度技术:综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210029824:1(1-40)Online publication date: 23-Jan-2023
  • (2023)Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00023(185-196)Online publication date: 31-Oct-2023
  • (2022)Evaluation and Improvement of Protocols for Ganoderma boninense Protoplast Isolation and RegenerationMalaysian Applied Biology10.55230/mabjournal.v51i5.234751:5(43-57)Online publication date: 26-Dec-2022
  • (2022)Efficient Shortest Paths in Scale-Free Networks with Underlying Hyperbolic GeometryACM Transactions on Algorithms10.1145/351648318:2(1-32)Online publication date: 30-Mar-2022
  • (2022)Solving Connectivity Problems Parameterized by Treewidth in Single Exponential TimeACM Transactions on Algorithms10.1145/350670718:2(1-31)Online publication date: 4-Mar-2022
  • (2022)Word Level Script Identification Using Convolutional Neural Network Enhancement for Scenic ImagesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/350669921:4(1-29)Online publication date: 4-Mar-2022
  • (2022)A Lemmatizer for Low-resource Languages: WSD and Its Role in the Assamese LanguageACM Transactions on Asian and Low-Resource Language Information Processing10.1145/350215721:4(1-22)Online publication date: 17-May-2022
  • (2022)MAPPER: Managing Application Performance via Parallel Efficiency Regulation*ACM Transactions on Architecture and Code Optimization10.1145/350176719:2(1-26)Online publication date: 24-Mar-2022
  • (2022)Constant-time Dynamic (Δ +1)-ColoringACM Transactions on Algorithms10.1145/350140318:2(1-21)Online publication date: 4-Mar-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media