Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1669112.1669173acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Characterizing the resource-sharing levels in the UltraSPARC T2 processor

Published: 12 December 2009 Publication History

Abstract

Thread level parallelism (TLP) has become a popular trend to improve processor performance, overcoming the limitations of extracting instruction level parallelism. Each TLP paradigm, such as Simultaneous Multithreading or Chip-Multiprocessors, provides different benefits, which has motivated processor vendors to combine several TLP paradigms in each chip design. Even if most of these combined-TLP designs are homogeneous, they present different levels of hardware resource sharing, which introduces complexities on the operating system scheduling and load balancing.
Commonly, processor designs provide two levels of resource sharing: Inter-core in which only the highest levels of the cache hierarchy are shared, and Intra-core in which most of the hardware resources of the core are shared. Recently, Sun Microsystems has released the UltraSPARC T2, a processor with three levels of hardware resource sharing: InterCore, IntraCore, and IntraPipe. In this work, we provide the first characterization of a three-level resource sharing processor, the UltraSPARC T2, and we show how multi-level resource sharing affects the operating system design. We further identify the most critical hardware resources in the T2 and the characteristics of applications that are not sensitive to resource sharing. Finally, we present a case study in which we run a real multithreaded network application, showing that a resource sharing aware scheduler can improve the system throughput up to 55%.

References

[1]
OpenSPARC#8482; T2 Core Microarchitecture Specification, 2007.
[2]
Netra Data Plane Software Suite 2.0 Update 2 Users Guide, 2008.
[3]
J. Aas. Understanding the Linux 2.6.8.1 CPU Scheduler. SGI, 2005., 2005.
[4]
A. Agarwal, B. H. Lim, D. Kranz, and J. Kubiatowicz. APRIL: A processor architecture for multiprocessing. Technical Report MIT/LCS/TM-450, 1991.
[5]
P. Crowley, M. A. Franklin, H. Hadimioglu, and P. Z. Onufryk. Network Processor Design: Issues and Practices, volume 1. 2002.
[6]
C. Dhruba, G. Fei, K. Seongbeom, and S. Yan. Predicting inter-thread cache contention on a chip multi-processor architecture. In 11th HPCA, 2005.
[7]
D. Doucette and A. Fedorova. Base vectors: A potential technique for microarchitectural classification of applications. In Workshop on the Interaction between Operating Systems and Computer Architecture (WIOSCA), 2007.
[8]
A. Fedorova. Operating system scheduling for chip multithreaded processors. PhD thesis, Cambridge, MA, USA, 2006.
[9]
A. Fedorova, M. Seltzer, and M. Smith. Improving performance isolation on chip multiprocessors via an operating systems scheduler. In 16th PACT, 2007.
[10]
T. Jessica H., Y. Hao, N. Shailabh, D. Niteesh, F. Hubertus, P. Pratap, I. Hiroshi, and N. Toshio. Performance studies of commercial workloads on a multi-core system. In IISWC '07, 2007.
[11]
R. Halstead and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In ISCA-15, 1988.
[12]
J. Kihm, A. Settle, A. Janiszewski, and D. A. Connors. Understanding the impact of inter-thread cache interference on ilp in modern smt processors. The Journal of Instruction Level Parallelism, 7, 2005.
[13]
E. Kohler, J. Li, V. Paxson, and S. Shenker. Observed Structure of Addresses in IP Traffic. In 2nd ACM SIGCOMM Workshop on Internet measurment, 2002.
[14]
R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA Heterogenous Multi-Core Architectures for Multithreaded Workload Performance. In 31st ISCA, 2004.
[15]
K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. The case for a single-chip multiprocessor. SIGPLAN Not., 31, 1996.
[16]
S. Parekh, S. Eggers, H. Levy, and J. Lo. Thread-sensitive scheduling for smt processors. Technical report, University of Washington, Department of Computer Science&Engineering, 2000.
[17]
P. Radojković, V. Čakarević, J. Verdú, A. Pajuelo, R. Gioiosa, F. Cazorla, M. Nemirovsky, and M. Valero. Measuring Operating System Overhead on CMT Processors. In SBAC-PAD '08, 2008.
[18]
M. J. Serrano, R. Wood, and M. Nemirovsky. A study on multistreamed superscalar processors. Technical Report 93--05, University of California Santa Barbara, 1993.
[19]
D. Shelepov, J. C. S. Alcaide, S. Jeffery, A. Fedorova, N. Perez, Z. F. Huang, S. Blagodurov, and V. Kumar. HASS: A scheduler for heterogeneous multicore systems. In ACM SIGOPS Operating Systems Review, 2009.
[20]
T. Sherwood, G. Varghese, and B. Calder. A Pipelined Memory Architecture for High Throughput Network Processors. In 30th ISCA, 2003.
[21]
B. Smith. Architecture and applications of the HEP multiprocessor computer system. Fourth Symposium on Real Time Signal Processing, 1981.
[22]
A. Snavely, D. M. Tullsen, and G. Voelker. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. In ACM SIGMETRICS, 2002.
[23]
S. Storino, A. Aipperspach, J. Borkenhagen R. Eickemeyer, S. Kunkel, S. Levenstein, and G. Uhlmann. A commercial multithreaded RISC processor. In 45th International Solid-State Circuits Conference, 1998.
[24]
L. A. Torrey, J. Coleman, and Barton P. Miller. A comparison of interactivity in the Linux 2.6 scheduler and an MLFQ scheduler. Software Practice and Experience, 37, 2007.
[25]
D. M. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous Multithreading: Maximizing On-Chip Parallelism. In 22nd ISCA, 1995.
[26]
J. Vera, F. J. Cazorla, A. Pajuelo, O. J. Santana, E. Fernandez, and M. Valero. Analysis of system overhead on parallel computers. In 16th PACT, 2007.

Cited By

View all
  • (2019)Faster Algorithms for Dynamic Algebraic Queries in Basic RSMs with Constant TreewidthACM Transactions on Programming Languages and Systems10.1145/336352541:4(1-46)Online publication date: 13-Nov-2019
  • (2019)Locking the Design of Building Blocks for Quantum CircuitsACM Transactions on Embedded Computing Systems10.1145/335818418:5s(1-15)Online publication date: 7-Oct-2019
  • (2019)QoSMTProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330364(206-216)Online publication date: 26-Jun-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
December 2009
601 pages
ISBN:9781605587981
DOI:10.1145/1669112
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 December 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CMP
  2. CMT
  3. Sun Nigara T2
  4. job scheduling
  5. performance characterization
  6. simultaneous multithreading

Qualifiers

  • Research-article

Funding Sources

Conference

Micro-42
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 29 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Faster Algorithms for Dynamic Algebraic Queries in Basic RSMs with Constant TreewidthACM Transactions on Programming Languages and Systems10.1145/336352541:4(1-46)Online publication date: 13-Nov-2019
  • (2019)Locking the Design of Building Blocks for Quantum CircuitsACM Transactions on Embedded Computing Systems10.1145/335818418:5s(1-15)Online publication date: 7-Oct-2019
  • (2019)QoSMTProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330364(206-216)Online publication date: 26-Jun-2019
  • (2016)SMT-Aware Instantaneous Footprint OptimizationProceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing10.1145/2907294.2907308(267-279)Online publication date: 31-May-2016
  • (2016)Bandwidth-Aware On-Line Scheduling in SMT MulticoresIEEE Transactions on Computers10.1109/TC.2015.242869465:2(422-434)Online publication date: 1-Feb-2016
  • (2016)Thread Assignment in Multicore/Multithreaded Processors: A Statistical ApproachIEEE Transactions on Computers10.1109/TC.2015.241753365:1(256-269)Online publication date: 1-Jan-2016
  • (2014)Rate types for stream programsACM SIGPLAN Notices10.1145/2714064.266022549:10(213-232)Online publication date: 15-Oct-2014
  • (2014)AtlasACM SIGPLAN Notices10.1145/2714064.266022449:10(433-452)Online publication date: 15-Oct-2014
  • (2014)Accelerating iterators in optimizing AST interpretersACM SIGPLAN Notices10.1145/2714064.266022349:10(727-743)Online publication date: 15-Oct-2014
  • (2014)Performance and Power Benefits of Sharing Execution Units between a High Performance Core and a Low Power CoreProceedings of the 2014 27th International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems10.1109/VLSID.2014.42(204-209)Online publication date: 5-Jan-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media