Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1272996.1273004acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
Article

Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors

Published: 21 March 2007 Publication History

Abstract

The major chip manufacturers have all introduced chip multiprocessing (CMP) and simultaneous multithreading (SMT) technology into their processing units. As a result, even low-end computing systems and game consoles have become shared memory multiprocessors with L1 and L2 cache sharing within a chip. Mid- and large-scale systems will have multiple processing chips and hence consist of an SMP-CMP-SMT configuration with non-uniform data sharing overheads. Current operating system schedulers are not aware of these new cache organizations, and as a result, distribute threads across processors in a way that causes many unnecessary, long-latency cross-chip cache accesses.
In this paper we describe the design and implementation of a scheme to schedule threads based on sharing patterns detected online using features of standard performance monitoring units (PMUs) available in today's processing units. The primary advantage of using the PMU infrastructure is that it is fine-grained (down to the cache line) and has relatively low overhead. We have implemented our scheme in Linux running on an 8-way Power5 SMP-CMP-SMT multi-processor. For commercial multithreaded server workloads (VolanoMark, SPECjbb, and RUBiS), we are able to demonstrate reductions in cross-chip cache accesses of up to 70%. These reductions lead to application-reported performance improvements of up to 7%.

References

[1]
C. Amza, A. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. Treadmarks: Shared memory computing on networks of workstations. IEEE Computer, 29(2):18--28, Feb 1996.
[2]
R. Azimi, M. Stumm, and R. Wisniewski. Online performance analysis by statistical sampling of microprocessor performance counters. In Intl. Conf. on Supercomputing, 2005.
[3]
F. Bellosa. Follow-on scheduling: Using TLB information to reduce cache misses. In Symp. on Operating Systems Principles - Work in Progress Session, 1997.
[4]
F. Bellosa and M. Steckermeier. The performance implications of locality information usage in shared-memory multiprocessors. J. of Parallel and Distributed Computing, 37(1):113--121, Aug 1996.
[5]
J. R. Bulpin and I. A. Pratt. Hyper-threading aware process scheduling heuristics. In Usenix Annual Technical Conf., 2005.
[6]
A. El-Moursy, R. Garg, D. H. Albonesi, and S. Dwarkadas. Compatible phase co-scheduling on a CMP of multi-threaded processors. In Intl. Parallel and Distributed Processing Symp., 2006.
[7]
A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance of multithreaded chip multiprocessors and implications for operating system design. In Usenix Annual Technical Conf., 2005.
[8]
A. Fedorova, C. Small, D. Nussbaum, and M. Seltzer. Chip multithreading systems need a new operating system scheduler. In SIGOPS European Workshop, 2004.
[9]
S. Harizopoulos and A. Ailamaki. STEPS towards cache-resident transaction processing. In Conf. on Very Large Data Bases, 2004.
[10]
A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Computing Surveys, 31(3):264--323, 1999.
[11]
P. Koka and M. H. Lipasti. Opportunities for cache friendly process scheduling. In Workshop on Interaction Between Operating Systems and Computer Architecture, 2005.
[12]
J. Larus and M. Parkes. Using cohort scheduling to enhance server performance. In Usenix Annual Technical Conf., 2002.
[13]
R. L. McGregor, C. D. Antonopoulos, and D. S. Nikolopoulos. Scheduling algorithms for effective thread pairing on hybrid multiprocessors. In Intl. Parallel and Distributed Processing Symp., 2005.
[14]
J. Nakajima and V. Pallipadi. Enhancements for Hyper-Threading technology in the operating system - seeking the optimal micro-architectural scheduling. In Workshop on Industrial Experiences with Systems Software, 2002.
[15]
S. Parekh, S. Eggers, H. Levy, and J. Lo. Thread-sensitive scheduling for SMT processors. Technical report, Dept. of Computer Science & Engineering, Univ. of Washington, 2000.
[16]
J. Philbin, J. Edler, O. J. Anshus, C. C. Douglas, and K. Li. Thread scheduling for cache locality. In Conf. on Architectural Support for Programming Languages and Operating Systems, 1996.
[17]
A. Settle, J. Kihm, A. Janiszewski, and D. A. Connors. Architectural support for enhanced SMT job scheduling. In Symp. on Parallel Architectures and Compilation Techniques, 2004.
[18]
A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreading processor. In Conf. on Architectural Support for Programming Languages and Operating Systems, 2000.
[19]
S. Sridharan, B. Keck, R. Murphy, S. Chandra, and P. Kogge. Thread migration to improve synchronization performance. In Workshop on Operating System Interference in High Performance Applications, 2006.
[20]
E. G. Suh, L. Rudolph, and S. Devadas. Effects of memory performance on parallel job scheduling. In D. G. Feitelson and L. Rudolph, editors, Workshop on Job Scheduling Strategies for Parallel Processing, volume 2221 of Lecture Notes in Computer Science, pages 116--132, Cambridge, MA, Jun 16 2001. Springer-Verlag.
[21]
E. G. Suh, L. Rudolph, and S. Devadas. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Symp. on High-Performance Computer Architecture, 2002.
[22]
R. Thekkah and S. J. Eggers. Impact of sharing-based thread placement on multithreaded architectures. In Intl. Symp. on Computer Architecture, 1994.
[23]
B. Weissman. Performance counters and state sharing annotations: a unified approach to thread locality. In Conf. on Architectural Support for Programming Languages and Operating Systems, 1998.
[24]
M. Welsh, D. Culler, and E. Brewer. SEDA: An architecture for well-conditioned, scalable internet services. In Symp. on Operating Systems Principles, 2001.

Cited By

View all
  • (2023)SLITS: Sparsity-Lightened Intelligent Thread SchedulingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35794367:1(1-23)Online publication date: 2-Mar-2023
  • (2023)NUBA: Non-Uniform Bandwidth GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575745(544-559)Online publication date: 27-Jan-2023
  • (2023)Selective Data Migration Between Locality Groups in NUMA SystemsEconomics of Grids, Clouds, Systems, and Services10.1007/978-3-031-29315-3_13(143-147)Online publication date: 31-Mar-2023
  • Show More Cited By

Index Terms

  1. Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors

                            Recommendations

                            Comments

                            Please enable JavaScript to view thecomments powered by Disqus.

                            Information & Contributors

                            Information

                            Published In

                            cover image ACM Conferences
                            EuroSys '07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
                            March 2007
                            431 pages
                            ISBN:9781595936363
                            DOI:10.1145/1272996
                            • cover image ACM SIGOPS Operating Systems Review
                              ACM SIGOPS Operating Systems Review  Volume 41, Issue 3
                              EuroSys'07 Conference Proceedings
                              June 2007
                              386 pages
                              ISSN:0163-5980
                              DOI:10.1145/1272998
                              Issue’s Table of Contents
                            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                            Sponsors

                            Publisher

                            Association for Computing Machinery

                            New York, NY, United States

                            Publication History

                            Published: 21 March 2007

                            Permissions

                            Request permissions for this article.

                            Check for updates

                            Author Tags

                            1. CMP
                            2. SMP
                            3. SMT
                            4. affinity scheduling
                            5. cache behavior
                            6. cache locality
                            7. detecting sharing
                            8. hardware performance counters
                            9. hardware performance monitors
                            10. multithreading
                            11. performance monitoring unit
                            12. resource allocation
                            13. shared caches
                            14. sharing
                            15. simultaneous multithreading
                            16. single-chip multiprocessors
                            17. thread migration
                            18. thread placement
                            19. thread scheduling

                            Qualifiers

                            • Article

                            Conference

                            EuroSys07
                            Sponsor:
                            EuroSys07: Eurosys 2007 Conference
                            March 21 - 23, 2007
                            Lisbon, Portugal

                            Acceptance Rates

                            Overall Acceptance Rate 241 of 1,308 submissions, 18%

                            Upcoming Conference

                            EuroSys '25
                            Twentieth European Conference on Computer Systems
                            March 30 - April 3, 2025
                            Rotterdam , Netherlands

                            Contributors

                            Other Metrics

                            Bibliometrics & Citations

                            Bibliometrics

                            Article Metrics

                            • Downloads (Last 12 months)79
                            • Downloads (Last 6 weeks)14
                            Reflects downloads up to 20 Nov 2024

                            Other Metrics

                            Citations

                            Cited By

                            View all
                            • (2023)SLITS: Sparsity-Lightened Intelligent Thread SchedulingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35794367:1(1-23)Online publication date: 2-Mar-2023
                            • (2023)NUBA: Non-Uniform Bandwidth GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575745(544-559)Online publication date: 27-Jan-2023
                            • (2023)Selective Data Migration Between Locality Groups in NUMA SystemsEconomics of Grids, Clouds, Systems, and Services10.1007/978-3-031-29315-3_13(143-147)Online publication date: 31-Mar-2023
                            • (2022)Towards practical multikernel OSes with MySySProceedings of the 13th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3546591.3547525(29-37)Online publication date: 23-Aug-2022
                            • (2022)Using machine learning to optimize graph execution on NUMA machinesProceedings of the 59th ACM/IEEE Design Automation Conference10.1145/3489517.3530581(1027-1032)Online publication date: 10-Jul-2022
                            • (2021)Modeling Cache and Application Performance on Modern Shared Memory Multiprocessors2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00158(1151-1158)Online publication date: Sep-2021
                            • (2021)Mitigating execution unit contention in parallel applications using instruction‐aware mappingConcurrency and Computation: Practice and Experience10.1002/cpe.681935:17Online publication date: 30-Dec-2021
                            • (2020)Fewer cores, more hertzProceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference10.5555/3489146.3489175(435-448)Online publication date: 15-Jul-2020
                            • (2020)IOctopusProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378509(101-115)Online publication date: 9-Mar-2020
                            • (2020)Thread-Placement Learning2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS47774.2020.00050(877-887)Online publication date: Nov-2020
                            • Show More Cited By

                            View Options

                            Login options

                            View options

                            PDF

                            View or Download as a PDF file.

                            PDF

                            eReader

                            View online with eReader.

                            eReader

                            Media

                            Figures

                            Other

                            Tables

                            Share

                            Share

                            Share this Publication link

                            Share on social media