research-article

Characterizing the resource-sharing levels in the UltraSPARC T2 processor

Authors:

Vladimir Čakarević,

Petar Radojković,

Francisco J. Cazorla,

Mario Nemirovsky,

Mateo ValeroAuthors Info & Claims

MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

Pages 481 - 492

https://doi.org/10.1145/1669112.1669173

Published: 12 December 2009 Publication History

Abstract

Thread level parallelism (TLP) has become a popular trend to improve processor performance, overcoming the limitations of extracting instruction level parallelism. Each TLP paradigm, such as Simultaneous Multithreading or Chip-Multiprocessors, provides different benefits, which has motivated processor vendors to combine several TLP paradigms in each chip design. Even if most of these combined-TLP designs are homogeneous, they present different levels of hardware resource sharing, which introduces complexities on the operating system scheduling and load balancing.

Commonly, processor designs provide two levels of resource sharing: Inter-core in which only the highest levels of the cache hierarchy are shared, and Intra-core in which most of the hardware resources of the core are shared. Recently, Sun Microsystems has released the UltraSPARC T2, a processor with three levels of hardware resource sharing: InterCore, IntraCore, and IntraPipe. In this work, we provide the first characterization of a three-level resource sharing processor, the UltraSPARC T2, and we show how multi-level resource sharing affects the operating system design. We further identify the most critical hardware resources in the T2 and the characteristics of applications that are not sensitive to resource sharing. Finally, we present a case study in which we run a real multithreaded network application, showing that a resource sharing aware scheduler can improve the system throughput up to 55%.

References

[1]

OpenSPARC#8482; T2 Core Microarchitecture Specification, 2007.

[2]

Netra Data Plane Software Suite 2.0 Update 2 Users Guide, 2008.

[3]

J. Aas. Understanding the Linux 2.6.8.1 CPU Scheduler. SGI, 2005., 2005.

[4]

A. Agarwal, B. H. Lim, D. Kranz, and J. Kubiatowicz. APRIL: A processor architecture for multiprocessing. Technical Report MIT/LCS/TM-450, 1991.

Digital Library

[5]

P. Crowley, M. A. Franklin, H. Hadimioglu, and P. Z. Onufryk. Network Processor Design: Issues and Practices, volume 1. 2002.

Digital Library

[6]

C. Dhruba, G. Fei, K. Seongbeom, and S. Yan. Predicting inter-thread cache contention on a chip multi-processor architecture. In 11th HPCA, 2005.

Digital Library

[7]

D. Doucette and A. Fedorova. Base vectors: A potential technique for microarchitectural classification of applications. In Workshop on the Interaction between Operating Systems and Computer Architecture (WIOSCA), 2007.

[8]

A. Fedorova. Operating system scheduling for chip multithreaded processors. PhD thesis, Cambridge, MA, USA, 2006.

Digital Library

[9]

A. Fedorova, M. Seltzer, and M. Smith. Improving performance isolation on chip multiprocessors via an operating systems scheduler. In 16th PACT, 2007.

Digital Library

[10]

T. Jessica H., Y. Hao, N. Shailabh, D. Niteesh, F. Hubertus, P. Pratap, I. Hiroshi, and N. Toshio. Performance studies of commercial workloads on a multi-core system. In IISWC '07, 2007.

[11]

R. Halstead and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In ISCA-15, 1988.

Digital Library

[12]

J. Kihm, A. Settle, A. Janiszewski, and D. A. Connors. Understanding the impact of inter-thread cache interference on ilp in modern smt processors. The Journal of Instruction Level Parallelism, 7, 2005.

[13]

E. Kohler, J. Li, V. Paxson, and S. Shenker. Observed Structure of Addresses in IP Traffic. In 2nd ACM SIGCOMM Workshop on Internet measurment, 2002.

Digital Library

[14]

R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA Heterogenous Multi-Core Architectures for Multithreaded Workload Performance. In 31st ISCA, 2004.

Digital Library

[15]

K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. The case for a single-chip multiprocessor. SIGPLAN Not., 31, 1996.

Digital Library

[16]

S. Parekh, S. Eggers, H. Levy, and J. Lo. Thread-sensitive scheduling for smt processors. Technical report, University of Washington, Department of Computer Science&Engineering, 2000.

[17]

P. Radojković, V. Čakarević, J. Verdú, A. Pajuelo, R. Gioiosa, F. Cazorla, M. Nemirovsky, and M. Valero. Measuring Operating System Overhead on CMT Processors. In SBAC-PAD '08, 2008.

Digital Library

[18]

M. J. Serrano, R. Wood, and M. Nemirovsky. A study on multistreamed superscalar processors. Technical Report 93--05, University of California Santa Barbara, 1993.

[19]

D. Shelepov, J. C. S. Alcaide, S. Jeffery, A. Fedorova, N. Perez, Z. F. Huang, S. Blagodurov, and V. Kumar. HASS: A scheduler for heterogeneous multicore systems. In ACM SIGOPS Operating Systems Review, 2009.

Digital Library

[20]

T. Sherwood, G. Varghese, and B. Calder. A Pipelined Memory Architecture for High Throughput Network Processors. In 30th ISCA, 2003.

Digital Library

[21]

B. Smith. Architecture and applications of the HEP multiprocessor computer system. Fourth Symposium on Real Time Signal Processing, 1981.

[22]

A. Snavely, D. M. Tullsen, and G. Voelker. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. In ACM SIGMETRICS, 2002.

Digital Library

[23]

S. Storino, A. Aipperspach, J. Borkenhagen R. Eickemeyer, S. Kunkel, S. Levenstein, and G. Uhlmann. A commercial multithreaded RISC processor. In 45th International Solid-State Circuits Conference, 1998.

[24]

L. A. Torrey, J. Coleman, and Barton P. Miller. A comparison of interactivity in the Linux 2.6 scheduler and an MLFQ scheduler. Software Practice and Experience, 37, 2007.

Digital Library

[25]

D. M. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous Multithreading: Maximizing On-Chip Parallelism. In 22nd ISCA, 1995.

Digital Library

[26]

J. Vera, F. J. Cazorla, A. Pajuelo, O. J. Santana, E. Fernandez, and M. Valero. Analysis of system overhead on parallel computers. In 16th PACT, 2007.

Cited By

Chatterjee KGoharshady AGoyal PIbsen-Jensen RPavlogiannis A(2019)Faster Algorithms for Dynamic Algebraic Queries in Basic RSMs with Constant TreewidthACM Transactions on Programming Languages and Systems10.1145/336352541:4(1-46)Online publication date: 13-Nov-2019
https://dl.acm.org/doi/10.1145/3363525
Saeed SWille RKarri R(2019)Locking the Design of Building Blocks for Quantum CircuitsACM Transactions on Embedded Computing Systems10.1145/335818418:5s(1-15)Online publication date: 7-Oct-2019
https://dl.acm.org/doi/10.1145/3358184
Jin XZhou YHuang BYu ZZhan XWang HWang SYu NSun NBao YEigenmann RDing CMcKee S(2019)QoSMTProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330364(206-216)Online publication date: 26-Jun-2019
https://dl.acm.org/doi/10.1145/3330345.3330364
Show More Cited By

Index Terms

Characterizing the resource-sharing levels in the UltraSPARC T2 processor

Recommendations

Thread to strand binding of parallel network applications in massive multi-threaded systems
PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

In processors with several levels of hardware resource sharing,like CMPs in which each core is an SMT, the scheduling process becomes more complex than in processors with a single level of resource sharing, such as pure-SMT or pure-CMP processors. Once ...
Software-Controlled Priority Characterization of POWER5 Processor

Due to the limitations of instruction-level parallelism, thread-level parallelism has become a popular way to improve processor performance. One example is the IBM POWER5TM processor, a two-context simultaneous-multithreaded dual-core chip. In each SMT ...
An evaluation of speculative instruction execution on simultaneous multithreaded processors

Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

December 2009

601 pages

ISBN:9781605587981

DOI:10.1145/1669112

General Chairs:
David Albonesi
Cornell
,
Margaret Martonosi
Princeton
,
Program Chairs:
David August
Princeton/Parakinetics
,
José Martínez
Cornell

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS TG u-Arch

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 December 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Ministerio de Economía y Competitividad

Conference

Micro-42

Sponsor:

SIGMICRO

Micro-42: The 42nd Annual IEEE/ACM International Symposium on Microarchitecture

December 12 - 16, 2009

New York, New York

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
338
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chatterjee KGoharshady AGoyal PIbsen-Jensen RPavlogiannis A(2019)Faster Algorithms for Dynamic Algebraic Queries in Basic RSMs with Constant TreewidthACM Transactions on Programming Languages and Systems10.1145/336352541:4(1-46)Online publication date: 13-Nov-2019
https://dl.acm.org/doi/10.1145/3363525
Saeed SWille RKarri R(2019)Locking the Design of Building Blocks for Quantum CircuitsACM Transactions on Embedded Computing Systems10.1145/335818418:5s(1-15)Online publication date: 7-Oct-2019
https://dl.acm.org/doi/10.1145/3358184
Jin XZhou YHuang BYu ZZhan XWang HWang SYu NSun NBao YEigenmann RDing CMcKee S(2019)QoSMTProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330364(206-216)Online publication date: 26-Jun-2019
https://dl.acm.org/doi/10.1145/3330345.3330364
Roy PLiu XSong SNakashima HTaura KLange J(2016)SMT-Aware Instantaneous Footprint OptimizationProceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing10.1145/2907294.2907308(267-279)Online publication date: 31-May-2016
https://dl.acm.org/doi/10.1145/2907294.2907308
Feliu JSahuquillo JPetit SDuato J(2016)Bandwidth-Aware On-Line Scheduling in SMT MulticoresIEEE Transactions on Computers10.1109/TC.2015.242869465:2(422-434)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1109/TC.2015.2428694
Radojkovic PCarpenter PMoreto MCakarevic VVerdu JPajuelo ACazorla FNemirovsky MValero M(2016)Thread Assignment in Multicore/Multithreaded Processors: A Statistical ApproachIEEE Transactions on Computers10.1109/TC.2015.241753365:1(256-269)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.1109/TC.2015.2417533
Bartenstein TLiu Y(2014)Rate types for stream programsACM SIGPLAN Notices10.1145/2714064.266022549:10(213-232)Online publication date: 15-Oct-2014
https://dl.acm.org/doi/10.1145/2714064.2660225
Chakrabarti DBoehm HBhandari K(2014)AtlasACM SIGPLAN Notices10.1145/2714064.266022449:10(433-452)Online publication date: 15-Oct-2014
https://dl.acm.org/doi/10.1145/2714064.2660224
Zhang WLarsen PBrunthaler SFranz M(2014)Accelerating iterators in optimizing AST interpretersACM SIGPLAN Notices10.1145/2714064.266022349:10(727-743)Online publication date: 15-Oct-2014
https://dl.acm.org/doi/10.1145/2714064.2660223
Rodrigues RKoren IKundu S(2014)Performance and Power Benefits of Sharing Execution Units between a High Performance Core and a Low Power CoreProceedings of the 2014 27th International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems10.1109/VLSID.2014.42(204-209)Online publication date: 5-Jan-2014
https://dl.acm.org/doi/10.1109/VLSID.2014.42
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten