Article

Region-based hierarchical operation partitioning for multicluster processors

Authors:

Scott MahlkeAuthors Info & Claims

PLDI '03: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation

Pages 300 - 311

https://doi.org/10.1145/781131.781165

Published: 09 May 2003 Publication History

Abstract

Clustered architectures are a solution to the bottleneck of centralized register files in superscalar and VLIW processors. The main challenge associated with clustered architectures is compiler support to effectively partition operations across the available resources on each cluster. In this work, we present a novel technique for clustering operations based on graph partitioning methods. Our approach incorporates new methods of assigning weights to nodes and edges within the dataflow graph to guide the partitioner. Nodes are assigned weights to reflect their resource usage within a cluster, while a slack distribution method intelligently assigns weights to edges to reflect the cost of inserting moves across clusters. A multilevel graph partitioning algorithm, which globally divides a dataflow graph into multiple parts in a hierarchical manner, uses these weights to efficiently generate estimates for the quality of partitions. We found that our algorithm was able to achieve an average of 20% improvement in DSP kernels and 5% improvement in SPECint2000 for a four-cluster architecture.

References

[1]

A. Aletà, J. Codina, J. Sánchez, and A. González. Graph-partitioning based instruction scheduling for clustered processors. In Proceedings of the 34th Annual International Symposium on Microarchitecture, Dec. 2001.

Digital Library

[2]

A. Aletà, J. Codina, J. Sánchez, A. González, and D. Kaeli. Exploiting pseudo-schedules to guide data dependence graph partitioning. In Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques, Sept. 2002.

Digital Library

[3]

A. Capitanio, N. Dutt, and A. Nicolau. Partitioned register files for VLIWs: A preliminary analysis of tradeoffs. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 103--114, Dec. 1992.

Digital Library

[4]

J. Codina, J. Sánchez, and A. González. URACAM: A unified register allocation, cluster assignment and modulo scheduling approach. In Proceedings of the 34th Annual International Symposium on Microarchitecture, Dec. 2001.

[5]

G. Desoli. Instruction assignment for clustered VLIW DSP compilers: A new approach. Technical Report HPL-98-13, Hewlett-Packard Laboratories, Feb. 1998.

[6]

J. Ellis. Bulldog: A Compiler for VLIW Architectures. MIT Press, Cambridge, MA, 1985.

Digital Library

[7]

P. Faraboschi, G. Desoli, and J. Fisher. Clustered instruction-level parallel processors. Technical Report HPL-98-204, Hewlett-Packard Laboratories, Dec. 1998.

[8]

K. Farkas, P. Chow, N. Jouppi, and Z. Vranesic. The multicluster architecture: Reducing cycle time through partitioning. In Proceedings of the 30th Annual International Symposium on Microarchitecture, Dec. 1997.

Digital Library

[9]

B. Fields, R. Bodík, and M. D. Hill. Slack: Maximizing performance under technological constraints. In Proceedings of the 29th Annual International Symposium on Computer Architecture, May 2002.

Digital Library

[10]

J. Fisher. Very long instruction word architectures and the ELI-52. In Proceedings of the 10th Annual International Symposium on Computer Architecture, pages 140--150, June 13--17, 1983.

Digital Library

[11]

R. Hank, W. Hwu, and B. Rau. Region-based compilation: An introduction and motivation. In Proceedings of the 28th Annual International Symposium on Microarchitecture, pages 158--168, Nov. 1995.

Digital Library

[12]

B. Hendrickson and R. Leland. The Chaco User's Guide. Sandia National Laboratories, July 1995.

[13]

J. Hiser, S. Carr, and P. Sweany. Global register partitioning. In Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques, pages 13--23, Oct. 2000.

Digital Library

[14]

K. Kailas, K. Ebcioglu, and A. Agrawala. CARS: A new code generation framework for clustered ILP processors. In Proceeding of the 2001 International Conference on High Performance Computer Architecture, pages 133--142, Feb. 2001.

Digital Library

[15]

G. Karypis and V. Kumar. Metis: A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes and Computing Fill-Reducing Orderings of Sparse Matrices. University of Minnesota, Sept. 1998.

[16]

G. Krishnamurthy, E. Granston, and E. Stotzer. Affinity-based cluster assignment for unrolled loops. In Proceedings of the 2002 International Conference on Supercomputing, pages 107--116, June 2002.

Digital Library

[17]

V. Lapinskii, M. Jacome, and G. de~Veciana. High-quality operation binding for clustered VLIW datapaths. In Proceedings of the 2001 Design Automation Conference, June 2001.

Digital Library

[18]

W. Lee, D. Puppin, S. Swenson, and S. Amarasinghe. Convergent scheduling. In Proceedings of the 35th Annual International Symposium on Microarchitecture, Nov. 2002.

Digital Library

[19]

R. Leupers. Instruction scheduling for clustered VLIW DSPs. In Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques, Oct. 2000.

Digital Library

[20]

J. Liou and M. Palis. A new heuristic for scheduling parallel programs on multiprocessor. In Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, pages 358--365, Oct. 1998.

Digital Library

[21]

P. Lowney et~al. The Multiflow Trace Scheduling compiler. The Journal of Supercomputing, 7(1-2):51--142, 1993.

Digital Library

[22]

E. Nystrom and A. E. Eichenberger. Effective cluster assignment for modulo scheduling. In Proceedings of the 31th Annual International Symposium on Microarchitecture, pages 103--114, Nov. 1998.

Digital Library

[23]

E. Özer, S. Banerjia, and T. Conte. Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures. In Proceedings of the 31th Annual International Symposium on Microarchitecture, pages 308--315, Nov. 1998.

Digital Library

[24]

B. Rau. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the 27th Annual International Symposium on Microarchitecture, pages 63--74, Nov. 1994.

Digital Library

[25]

Trimaran. An infrastructure for research in ILP. http://www.trimaran.org.

[26]

T. Yang and A. Gerasoulis. DSC: Scheduling parallel tasks on an unbounded number of processors. IEEE Transactions on Parallel and Distributed Systems, 1994.

Digital Library

Cited By

Curtis-Maury MDevadas VFang VKulkarni AKeeton KRoscoe T(2016)To waffinity and beyondProceedings of the 12th USENIX conference on Operating Systems Design and Implementation10.5555/3026877.3026910(419-434)Online publication date: 2-Nov-2016
https://dl.acm.org/doi/10.5555/3026877.3026910
Tang WBrewer F(2014)A hierarchical framework to enhance scalability and performance of scheduling and mapping algorithmsProceedings of the 2014 Electronic System Level Synthesis Conference (ESLsyn)10.1109/ESLsyn.2014.6850384(1-6)Online publication date: May-2014
https://doi.org/10.1109/ESLsyn.2014.6850384
Beg MBeek P(2013)A constraint programming approach for integrated spatial and temporal scheduling for clustered architecturesACM Transactions on Embedded Computing Systems10.1145/251247013:1(1-23)Online publication date: 5-Sep-2013
https://dl.acm.org/doi/10.1145/2512470
Show More Cited By

Index Terms

Region-based hierarchical operation partitioning for multicluster processors
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Very long instruction word
    2. Serial architectures
      1. Complex instruction set computing
      2. Reduced instruction set computing
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Retargetable compilers
      2. Source code generation

Recommendations

Region-based hierarchical operation partitioning for multicluster processors

Clustered architectures are a solution to the bottleneck of centralized register files in superscalar and VLIW processors. The main challenge associated with clustered architectures is compiler support to effectively partition operations across the ...
An evaluation of speculative instruction execution on simultaneous multithreaded processors

Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
Evaluating Register Allocation and Instruction Scheduling Techniques in Out-Of-Order Issue Processors
PACT '99: Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques

The phase ordering of register allocation and instruction scheduling in a compiler and their integration have been well studied for in-order issue and VLIW processors. In this paper we study this problem in the context of out-of-order issue processors. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI '03: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation

June 2003

360 pages

ISBN:1581136625

DOI:10.1145/781131

General Chair:
Ron Cytron
Washington University, USA
,
Program Chair:
Rajiv Gupta
University of Arizona, USA

ACM SIGPLAN Notices Volume 38, Issue 5
May 2003
349 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/780822
Issue’s Table of Contents

Copyright © 2003 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2003

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

PLDI03

Sponsor:

PLDI03: ACM SIGPLAN Conference on Programming Language Design and Implementation

June 9 - 11, 2003

California, San Diego, USA

Acceptance Rates

PLDI '03 Paper Acceptance Rate 28 of 131 submissions, 21%;

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

57
Total Citations
View Citations
773
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Curtis-Maury MDevadas VFang VKulkarni AKeeton KRoscoe T(2016)To waffinity and beyondProceedings of the 12th USENIX conference on Operating Systems Design and Implementation10.5555/3026877.3026910(419-434)Online publication date: 2-Nov-2016
https://dl.acm.org/doi/10.5555/3026877.3026910
Tang WBrewer F(2014)A hierarchical framework to enhance scalability and performance of scheduling and mapping algorithmsProceedings of the 2014 Electronic System Level Synthesis Conference (ESLsyn)10.1109/ESLsyn.2014.6850384(1-6)Online publication date: May-2014
https://doi.org/10.1109/ESLsyn.2014.6850384
Beg MBeek P(2013)A constraint programming approach for integrated spatial and temporal scheduling for clustered architecturesACM Transactions on Embedded Computing Systems10.1145/251247013:1(1-23)Online publication date: 5-Sep-2013
https://dl.acm.org/doi/10.1145/2512470
Khudia DMahlke S(2013)Low cost control flow protection using abstract control signaturesACM SIGPLAN Notices10.1145/2499369.246556848:5(3-12)Online publication date: 20-Jun-2013
https://dl.acm.org/doi/10.1145/2499369.2465568
Khudia DMahlke SFranke BXue J(2013)Low cost control flow protection using abstract control signaturesProceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems10.1145/2491899.2465568(3-12)Online publication date: 20-Jun-2013
https://dl.acm.org/doi/10.1145/2491899.2465568
Khudia DMahlke SFranke BXue J(2013)Low cost control flow protection using abstract control signaturesProceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems10.1145/2465554.2465568(3-12)Online publication date: 20-Jun-2013
https://dl.acm.org/doi/10.1145/2465554.2465568
Park YSeo SPark HCho HMahlke S(2012)SIMD defragmenterACM SIGPLAN Notices10.1145/2248487.215101447:4(363-374)Online publication date: 3-Mar-2012
https://dl.acm.org/doi/10.1145/2248487.2151014
Park YSeo SPark HCho HMahlke S(2012)SIMD defragmenterACM SIGARCH Computer Architecture News10.1145/2189750.215101440:1(363-374)Online publication date: 3-Mar-2012
https://dl.acm.org/doi/10.1145/2189750.2151014
Park YSeo SPark HCho HMahlke SHarris TScott M(2012)SIMD defragmenterProceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems10.1145/2150976.2151014(363-374)Online publication date: 3-Mar-2012
https://dl.acm.org/doi/10.1145/2150976.2151014
Xu DZheng Q(2012)An Address-Based Compiling Optimization for FFT on Multi-cluster DSPProceedings of the 2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming10.1109/PAAP.2012.17(60-64)Online publication date: 17-Dec-2012
https://dl.acm.org/doi/10.1109/PAAP.2012.17
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents