Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Speculative precomputation: long-range prefetching of delinquent loads

Published: 01 May 2001 Publication History

Abstract

This paper explores Speculative Precomputation, a technique that uses idle thread context in a multithreaded architecture to improve performance of single-threaded applications. It attacks program stalls from data cache misses by pre-computing future memory accesses in available thread contexts, and prefetching these data. This technique is evaluated by simulating the performance of a research processor based on the Itanium™ ISA supporting Simultaneous Multithreading. Two primary forms of Speculative Precomputation are evaluated. If only the non-speculative thread spawns speculative threads, performance gains of up to 30% are achieved when assuming ideal hardware. However, this speedup drops considerably with more realistic hardware assumptions. Permitting speculative threads to directly spawn additional speculative threads reduces the overhead associated with spawning threads and enables significantly more aggressive speculation, overcoming this limitation. Even with realistic costs for spawning threads, speedups as high as 169% are achieved, with an average speedup of 76%.

References

[1]
S.G. Abraham and B. R. Rau. Predicting load latencies using cache profiling. In Hewlett Packard Lab, Technical Report HPL-94-110, Dec. 1994.
[2]
J. Bharadwajh. et al. The Intel IA-64 compiler code generator. In IEEE Micro, pages 44-53, Sept. 2000.
[3]
M. Carlisle. Olden: Parallelizing programs with dynamic data structures on distributed-memory machines. In PhD Thesis, Princeton University Department of Computer Science, June 1996.
[4]
R. Chappell, J. Stark, S. Kim, S. Reinhardt, and Y. Patt. Simultaneous subordinate microthreading (SSMT). In 26th Annual International Symposium on Computer Architecture, pages 186-195, Oct. 1999.
[5]
J. Emer. Simultaneous multithreading: Multiplying Alpha's performance. In Microprocessor Forum, Oct. 1999.
[6]
M.D. Hill. Aspects of cache memory an instruction buffer performance. In PhD Thesis, Universi O, of California, Berkeley, 1987.
[7]
J. Huck, D. Morris, J. Ross, A. Knies, H. Mulder, and R. Zahir. Introducing the IA-64 architecture. In IEEE Micro, pages 12- 23, Sept. 2000.
[8]
Intel Corporation. Intel IA-64 architecture software developer's manual.
[9]
D. Joseph and D. Grunwald. Prefetching using Markov predictors. In 24th Annual International Symposium on Computer Architecture, June 1997.
[10]
Y. Kim, M. Hill, and D. Wood. Implementing stack simulation for highly-associative memories (extended abstract). In ACM Sigmetrics, pages 212-213, May 1991.
[11]
R. Krishnaiyer. et al. An advanced optimizer for the IA-64 architecture. In IEEE Micro, pages 60-68, Nov. 2000.
[12]
A. Roth, A. Moshovos, and G. Sohi. Dependence based prefetching for linked data structures. In Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1998.
[13]
A. Roth and G. Sohi. Speculative data-driven multithreading. In Seventh International S.ymposium on High Performance Computer Architecture, pages 37-48, Jan. 2001.
[14]
H. Sharangpani and K. Aurora. Itanium processor microarchitecture. In IEEE Micro, pages 24-43, Sept. 2000.
[15]
Y. Song and M. Dubois. Assisted execution. In Tcchnicai Report CENG 98-25, Department of EE-Systems, UniversiO' of Southern Californm, Oct. 1998.
[16]
SPEC. SPEC cpu2000 documentation. In http://www.spec.org/osg/cpu2OOO/docs/.
[17]
K. Sundaramoorthy, Z. Purser, and E. Rotenberg. Slipstream processors: Improving both performance and fault tolerance. In Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 257-268, Nov. 2000.
[18]
D. Tullsen. Simulation and modeling of a simultaneous multitbreaded processor. In 22nd Annual Computer Measurement Group Conference, Dec. 1996.
[19]
D. Tullsen, S. Eggers, J. Emer, H. Levy, J. Lo, and R. Stamm. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In 23rd Annual International Symposium on Computer Architecture, pages 191-202, May 1996.
[20]
D. M. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In 22nd Annual International Symposium on Computer Architecture, pages 392-403, June 1995.
[21]
R. Uhlig, R. Fishtein, O. Gershon, 1. Hirsh, and H. Wang. SoftSDV: A presilicon software development environment for the IA-64 architecture. In lntel Technology Journal, 4th Quarter 1999.
[22]
S. Wallace, B. Calder, and D, M. Tullsen. Threaded multiple path execution. In 25th Annual International Symposium on Computer Architecture, pages 238-249, June 1998.
[23]
H. Wang et al. A conjugate flow processor. In Docket No. 884.225US1. Patent Pending, May 2000.
[24]
C. Young, N. Gloy, and M. D. Smith. A comparative analysis of schemes for correlated branch prediction. In 22nd Annual International S lvnposium on Computer Architecture, pages 276-286, May 1995.
[25]
C. Zilles and G. Sohi. Understanding the backward slices of performance degrading instructions. In 27th Annual International Symposium on Computer Architecture, pages 172-181, June 2000.

Cited By

View all
  • (2024)Tyche: An Efficient and General Prefetcher for Indirect Memory AccessesACM Transactions on Architecture and Code Optimization10.1145/364185321:2(1-26)Online publication date: 22-Jan-2024
  • (2022)Trends in Computing and Memory TechnologiesEmerging Computing: From Devices to Systems10.1007/978-981-16-7487-7_1(3-11)Online publication date: 9-Jul-2022
  • (2021)Vector runaheadProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00024(195-208)Online publication date: 14-Jun-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 29, Issue 2
Special Issue: Proceedings of the 28th annual international symposium on Computer architecture (ISCA '01)
May 2001
262 pages
ISSN:0163-5964
DOI:10.1145/384285
Issue’s Table of Contents
  • cover image ACM Conferences
    ISCA '01: Proceedings of the 28th annual international symposium on Computer architecture
    June 2001
    289 pages
    ISBN:0769511627
    DOI:10.1145/379240

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2001
Published in SIGARCH Volume 29, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)2
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Tyche: An Efficient and General Prefetcher for Indirect Memory AccessesACM Transactions on Architecture and Code Optimization10.1145/364185321:2(1-26)Online publication date: 22-Jan-2024
  • (2022)Trends in Computing and Memory TechnologiesEmerging Computing: From Devices to Systems10.1007/978-981-16-7487-7_1(3-11)Online publication date: 9-Jul-2022
  • (2021)Vector runaheadProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00024(195-208)Online publication date: 14-Jun-2021
  • (2018)Parallel Precomputation with Input Value Prediction for Model Predictive Control SystemsIEICE Transactions on Information and Systems10.1587/transinf.2018PAP0003E101.D:12(2864-2877)Online publication date: 1-Dec-2018
  • (2015)IMPProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830807(178-190)Online publication date: 5-Dec-2015
  • (2014)Automatic Skeleton-Driven Memory Affinity for Transactional Worklist ApplicationsInternational Journal of Parallel Programming10.1007/s10766-013-0253-x42:2(365-382)Online publication date: 1-Apr-2014
  • (2013)Multithreading ArchitectureSynthesis Lectures on Computer Architecture10.2200/S00458ED1V01Y201212CAC0218:1(1-109)Online publication date: 15-Jan-2013
  • (2010)Optimistic Parallelism Based on Speculative Asynchronous Messages PassingProceedings of the International Symposium on Parallel and Distributed Processing with Applications10.1109/ISPA.2010.43(382-391)Online publication date: 6-Sep-2010
  • (2009)Exploiting Speculative TLP in Recursive Programs by Dynamic Thread PredictionProceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 200910.1007/978-3-642-00722-4_7(78-93)Online publication date: 27-Mar-2009
  • (2008)Prefetching irregular references for software cache on cellProceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization10.1145/1356058.1356079(155-164)Online publication date: 6-Apr-2008
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media