Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/776261.776274acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

Optimization for the Intel® Itanium® architecture register stack

Published: 23 March 2003 Publication History

Abstract

The Intel® Itanium® architecture contains a number of innovative compiler-controllable features designed to exploit instruction level parallelism. New code generation and optimization techniques are critical to the application of these features to improve processor performance. For instance, the Itanium® architecture provides a compiler-controllable virtual register stack to reduce the penalty of memory accesses associated with procedure calls. The ltanium® Register Stack Engine (RSE) transparently manages the register stack and saves and restores physical registers to and from memory as needed. Existing code generation techniques for the register stack aggressively allocate virtual registers without regard to the register pressure on different control-flow paths. As such, applications with large data sets may stress the RSE, and cause substantial execution delays due to the high number of register saves and restores. Since the Itanium® architecture is developed around Explicitly Parallel Instruction Computing (EPIC) concepts, solutions to increasing the register stack efficiency favor code generation techniques rather than hardware approaches.

References

[1]
A. Aho, R. Sethi, and J. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, MA, 1986.]]
[2]
D.I. August, D. A. Connors, S. A. Mahlke, J. W. Sias, K. M. Crozier, B. Cheng, P. R. Eaton, Q. B. Olaniran, and W. W. Hwu. Integrated predication and speculative execution in the IMPACT EPIC architecture. In Proceedings of the 25th International Symposium on Computer Architecture, pages 227--237, June 1998.]]
[3]
J. Bharadwaj, W. Y. Chen, W. Chuang, G. Hoflehner, K. Menezes, K. Muthukumar, and J. Pierce. The intel ia-64 compiler code generator. IEE Micro, 20(5):44--52, September, October 2000.]]
[4]
I. Bratt, A. Settle, and D. A. Connors. Predicate-based transformations to eliminate control and data-irrelevant cache misses. In Proceedings of the First Workshop on Explicitly Parallel Instruction Computing Architectures and Compiler Techniques, pages 11--22, December 2001.]]
[5]
G. J. Chaitin. Register allocation and spilling via graph coloring. In Proceedings of the ACM SIGPLAN 82 Symp. on Compiler Construction, pages 98--105, June 1982.]]
[6]
A. Douillet, J. N. Amaral, and G. R. Gao. Fine-grain stacked register allocation for the itanium architecture. In 15th Workshop on Languages and Compilers for Parallel Computing (LCPC), 2002.]]
[7]
R.E. Hank, W. W. Hwu, and B. R. Rau. Region-based compilation: An introduction and motivation. In Proceedings of the 28th Annual International Symposium on Microarchitecture, pages 158--168, December 1995.]]
[8]
G. E Hoflehner and J. E. Pierce. Method and apparatus for inserting more than one allocation instruction within a routine. In United States Patent Disclosure, June 2002.]]
[9]
Intel Corporation. lntel IA-64 Architecture Software Developer's Manual. Santa Clara, CA, 2000.]]
[10]
Intel Corporation. Intel IA-64 Architecture Software Developer's Manual. Santa Clara, CA, 2002.]]
[11]
D. Keppel. Register windows and user-space threads on the SPARC. Technical Report TR-91-08-01, 1991.]]
[12]
T. Kiyohara, S. M. W. Chen, R. Bringmann, R. Hank, S. Anik, and W. Hwu. Register connection: A new approach to adding registers into instruction set architectures. In Proceedings of the 20th International Symposium on Computer Architecture, pages 247--256, May 1993.]]
[13]
R. Krishnaiyer, D. Kulkarni, D. Lavery, W. Li, C. Lim, J. Ng, and D. Sehr. An advanced optimizer for the ia-64 architecture. IEEE Micro, 20(6):60--68, November 2000.]]
[14]
M. Martin, A. Roth, and C. Fischer. Exploiting dead value information. In Proceedings of the 30th International Symposium on Microarchitecture, pages 125--135, December 1997.]]
[15]
S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufman, San Francisco, California, 1997.]]
[16]
M. Postiff, D. Greene, S. Raasch, and T. N. Mudge. Integrating superscalar processor components to implement register caching. In International Conference on Supercomputing, pages 348--357, 2001.]]
[17]
B. R. Rau, M. Lee, P. P. Tirumalai, and M. S. Schlansker. Register allocation for software pipelined loops. In Proceedings of the ACM SIGPLAN 92 Conference on Programming Language Design and Implementation, pages 283--299, June 1992.]]
[18]
D. L. Weaver and T. Germond. The SPARC Architecture Manual. SPARC International, Inc., Menlo Park, CA, 1994.]]
[19]
R. D. Weldon, S. S. Chang, H. Wang, G. Hoflehner, P. H. Wang, D. Lavery, and J. P. Shen. Quantitative evaluation of the register stack engine and optimizations for future itanium processors. In Proceedings of the Sixth Annual Workshop on Interaction between Compilers and Computer Architectures, Santa Clara, CA 95052, July 2002.]]

Cited By

View all
  • (2006)PrematerializationProceedings of the 15th international conference on Parallel architectures and compilation techniques10.1145/1152154.1152197(285-294)Online publication date: 16-Sep-2006
  • (2004)Compiler Optimizations for Transaction Processing Workloads on Itanium® Linux SystemsProceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2004.11(294-303)Online publication date: 4-Dec-2004
  • (2003)Inter-procedural stacked register allocation for itanium® like architectureProceedings of the 17th annual international conference on Supercomputing10.1145/782814.782844(215-225)Online publication date: 23-Jun-2003

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '03: Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
March 2003
349 pages
ISBN:076951913X

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 23 March 2003

Check for updates

Qualifiers

  • Article

Conference

CGO03
Sponsor:

Acceptance Rates

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2006)PrematerializationProceedings of the 15th international conference on Parallel architectures and compilation techniques10.1145/1152154.1152197(285-294)Online publication date: 16-Sep-2006
  • (2004)Compiler Optimizations for Transaction Processing Workloads on Itanium® Linux SystemsProceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2004.11(294-303)Online publication date: 4-Dec-2004
  • (2003)Inter-procedural stacked register allocation for itanium® like architectureProceedings of the 17th annual international conference on Supercomputing10.1145/782814.782844(215-225)Online publication date: 23-Jun-2003

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media