Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3307650.3322207acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Perceptron-based prefetch filtering

Published: 22 June 2019 Publication History

Abstract

Hardware prefetching is an effective technique for hiding cache miss latencies in modern processor designs. Prefetcher performance can be characterized by two main metrics that are generally at odds with one another: coverage, the fraction of baseline cache misses which the prefetcher brings into the cache; and accuracy, the fraction of prefetches which are ultimately used. An overly aggressive prefetcher may improve coverage at the cost of reduced accuracy. Thus, performance may be harmed by this over-aggressiveness because many resources are wasted, including cache capacity and bandwidth. An ideal prefetcher would have both high coverage and accuracy.
In this paper, we introduce Perceptron-based Prefetch Filtering (PPF) as a way to increase the coverage of the prefetches generated by an underlying prefetcher without negatively impacting accuracy. PPF enables more aggressive tuning of the underlying prefetcher, leading to increased coverage by filtering out the growing numbers of inaccurate prefetches such an aggressive tuning implies. We also explore a range of features to use to train PPF's perceptron layer to identify inaccurate prefetches. PPF improves performance on a memory-intensive subset of the SPEC CPU 2017 benchmarks by 3.78% for a single-core configuration, and by 11.4% for a 4-core configuration, compared to the underlying prefetcher alone.

References

[1]
W. A. Wulf and S. A. McKee, "Hitting the memory wall: Implications of the obvious," SIGARCH Comput. Archit. News, vol. 23, pp. 20--24, Mar. 1995.
[2]
J. Kim, S. H. Pugsley, P. V. Gratz, A. L. N. Reddy, C. Wilkerson, and Z. Chishti, "Path confidence based lookahead prefetching," in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1--12, Oct 2016.
[3]
N. D. E. Jerger, E. L. Hill, and M. H. Lipasti, "Friendly fire: understanding the effects of multiprocessor prefetches," in 2006 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 177--188, March 2006.
[4]
L. Peled, S. Mannor, U. Weiser, and Y. Etsion, "Semantic locality and context-based prefetching using reinforcement learning," in 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), pp. 285--297, June 2015.
[5]
S. Liao, T. Hung, D. Nguyen, C. Chou, C. Tu, and H. Zhou, "Machine learning-based prefetch optimization for data center applications," in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1--10, Nov 2009.
[6]
N. P. Jouppi, "Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers," in Proceedings of the 17th Annual International Symposium on Computer Architecture, ISCA '90, (New York, NY, USA), pp. 364--373, ACM, 1990.
[7]
A. J. Smith, "Sequential program prefetching in memory hierarchies," Computer, vol. 11, pp. 7--21, Dec. 1978.
[8]
J.-L. Baer and T.-F. Chen, "An effective on-chip preloading scheme to reduce data access penalty," in Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, Supercomputing '91, (New York, NY, USA), pp. 176--186, ACM, 1991.
[9]
J.-L. Baer and T.-F. Chen, "Effective hardware-based data prefetching for high-performance processors," IEEE Trans. Comput., vol. 44, pp. 609--623, May 1995.
[10]
T. F. Wenisch, M. Ferdman, A. Ailamaki, B. Falsafi, and A. Moshovos, "Making address-correlated prefetching practical," IEEE Micro, vol. 30, pp. 50--59, Jan 2010.
[11]
Y. Ishii, M. Inaba, and K. Hiraki, "Access map pattern matching for data cache prefetch," in Proceedings of the 23rd International Conference on Supercomputing, ICS '09, (New York, NY, USA), pp. 499--500, ACM, 2009.
[12]
C. F. Chen, S. . Yang, B. Falsafi, and A. Moshovos, "Accurate and complexity-effective spatial pattern prediction," in 10th International Symposium on High Performance Computer Architecture (HPCA'04), pp. 276--287, Feb 2004.
[13]
S. Somogyi, T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos, "Spatial memory streaming," in Proceedings of the 33rd Annual International Symposium on Computer Architecture, ISCA '06, (Washington, DC, USA), pp. 252--263, IEEE Computer Society, 2006.
[14]
M. Ferdman, T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos, "Temporal instruction fetch streaming," in Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 41, (Washington, DC, USA), pp. 1--10, IEEE Computer Society, 2008.
[15]
T. F. Wenisch, M. Ferdman, A. Ailamaki, B. Falsafi, and A. Moshovos, "Practical off-chip meta-data for temporal memory streaming," in 2009 IEEE 15th International Symposium on High Performance Computer Architecture, pp. 79--90, Feb 2009.
[16]
S. Somogyi, T. F. Wenisch, A. Ailamaki, and B. Falsafi, "Spatio-temporal memory streaming," in Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA '09, (New York, NY, USA), pp. 69--80, ACM, 2009.
[17]
S. Somogyi, T. F. Wenisch, M. Ferdman, and B. Falsafi, "Spatial memory streaming," J. Instruction-Level Parallelism, vol. 13, 2011.
[18]
D. Kadjo, J. Kim, P. Sharma, R. Panda, P. Gratz, and D. Jiménez, "B-fetch: Branch prediction directed prefetching for chip-multiprocessors," in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 623--634, Dec 2014.
[19]
L. M. AlBarakat, P. V. Gratz, and D. A. Jiménez, "Mtb-fetch: Multithreading aware hardware prefetching for chip multiprocessors," IEEE Computer Architecture Letters, vol. 17, pp. 175--178, July 2018.
[20]
D. A. Jiménez and C. Lin, "Dynamic branch prediction with perceptrons," in Proceedings of the 7th International Symposium on High Performance Computer Architecture (HPCA-7), pp. 197--206, 2001.
[21]
D. Tarjan and K. Skadron, "Merging path and gshare indexing in perceptron branch prediction," ACM Trans. Archit. Code Optim., vol. 2, pp. 280--300, Sept. 2005.
[22]
E. Teran, Z. Wang, and D. A. Jiménez, "Perceptron learning for reuse prediction," in The 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-49, (Piscataway, NJ, USA), pp. 2:1--2:12, IEEE Press, 2016.
[23]
D. A. Jiménez and E. Teran, "Multiperspective reuse prediction," in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-50 '17, (New York, NY, USA), pp. 436--448, ACM, 2017.
[24]
J. A. Joao, O. Mutlu, C. J. Lee, R. Cohn, Y. N. Patt, and H. Kim, "Virtual program counter (vpc) prediction: Very low cost indirect branch prediction using conditional branch prediction hardware," IEEE Transactions on Computers, vol. 58, pp. 1153--1170, 12 2008.
[25]
"The champsim simulator." https://github.com/ChampSim/ChampSim.
[26]
S. H. Pugsley, A. R. Alameldeen, C. Wilkerson, and H. Kim, "The 2nd data prefetching championship (dpc-2)." http://comparch-conf.gatech.edu/dpc2/.
[27]
"The 2nd cache replacement championship (crc-2)."
[28]
"Standard performance evaluation corporation cpu2017 benchmark suite." http://www.spec.org/cpu2017/.
[29]
E. Perelman, G. Hamerly, M. Van Biesbrouck, T. Sherwood, and B. Calder, "Using simpoint for accurate and efficient simulation," in Proceedings of the 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '03, (New York, NY, USA), pp. 318--319, ACM, 2003.
[30]
"Standard performance evaluation corporation cpu2006 benchmark suite." http://www.spec.org/cpu2006/.
[31]
M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the clouds: A study of emerging scale-out workloads on modern hardware," in Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, (New York, NY, USA), pp. 37--48, ACM, 2012.
[32]
Y. Ishii, M. Inaba, and K. Hiraki, "Unified memory optimizing architecture: Memory subsystem control with a unified predictor," in Proceedings of the 26th ACM International Conference on Supercomputing, ICS '12, (NY, USA), pp. 267--278, ACM, 2012.
[33]
S. H. Pugsley, Z. Chishti, C. Wilkerson, P. Chuang, R. L. Scott, A. Jaleel, S. Lu, K. Chow, and R. Balasubramonian, "Sandbox prefetching: Safe run-time evaluation of aggressive prefetchers," in 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), pp. 626--637, Feb 2014.
[34]
P. Michaud, "Best-offset hardware prefetching," in 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 469--480, March 2016.
[35]
M. Shevgoor, S. Koladiya, R. Balasubramonian, C. Wilkerson, S. H. Pugsley, and Z. Chishti, "Efficiently prefetching complex address patterns," in 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 141--152, Dec 2015.
[36]
J. Kim, E. Teran, P. V. Gratz, D. A. Jiménez, S. H. Pugsley, and C. Wilkerson, "Kill the program counter: Reconstructing program behavior in the processor cache hierarchy," in Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '17, (New York, NY, USA), pp. 737--749, ACM, 2017.
[37]
C. Wu, A. Jaleel, M. Martonosi, S. C. Steely, and J. Emer, "Pacman: Prefetch-aware cache management for high performance caching," in 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 442--453, Dec 2011.
[38]
V. Seshadri, S. Yedkar, H. Xin, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, "Mitigating prefetcher-caused pollution using informed caching policies for prefetched blocks," ACM Trans. Archit. Code Optim., vol. 11, pp. 51:1--51:22, Jan. 2015.
[39]
V. Seshadri, O. Mutlu, M. A. Kozuch, and T. C. Mowry, "The evicted-address filter: A unified mechanism to address both cache pollution and thrashing," in Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT '12, (New York, NY, USA), pp. 355--366, ACM, 2012.
[40]
A. Jain and C. Lin, "Rethinking belady's algorithm to accommodate prefetching," in 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 110--123, June 2018.
[41]
E. Ebrahimi, O. Mutlu, C. J. Lee, and Y. N. Patt, "Coordinated control of multiple prefetchers in multi-core systems," in Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, (New York, NY, USA), pp. 316--326, ACM, 2009.
[42]
M. Hashemi, K. Swersky, J. A. Smith, G. Ayers, H. Litz, J. Chang, C. Kozyrakis, and P. Ranganathan, "Learning memory access patterns," CoRR, vol. abs/1803.02329, 2018.
[43]
H. Wang and Z. Luo, "Data cache prefetching with perceptron learning," CoRR, vol. abs/1712.00905, 2017.
[44]
S. M. Khan, Y. Tian, and D. A. Jiménez, "Sampling dead block prediction for last-level caches," in Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '43, (Washington, DC, USA), pp. 175--186, IEEE Computer Society, 2010.

Cited By

View all
  • (2024)Exploiting Vector Code Semantics for Efficient Data Cache PrefetchingProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656635(98-109)Online publication date: 30-May-2024
  • (2024)Triangel: A High-Performance, Accurate, Timely On-Chip Temporal Prefetcher2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00090(1202-1216)Online publication date: 29-Jun-2024
  • (2024)A New Formulation of Neural Data Prefetching2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00088(1173-1187)Online publication date: 29-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '19: Proceedings of the 46th International Symposium on Computer Architecture
June 2019
849 pages
ISBN:9781450366694
DOI:10.1145/3307650
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2019

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ISCA '19
Sponsor:

Acceptance Rates

ISCA '19 Paper Acceptance Rate 62 of 365 submissions, 17%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)218
  • Downloads (Last 6 weeks)23
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Exploiting Vector Code Semantics for Efficient Data Cache PrefetchingProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656635(98-109)Online publication date: 30-May-2024
  • (2024)Triangel: A High-Performance, Accurate, Timely On-Chip Temporal Prefetcher2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00090(1202-1216)Online publication date: 29-Jun-2024
  • (2024)A New Formulation of Neural Data Prefetching2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00088(1173-1187)Online publication date: 29-Jun-2024
  • (2024)A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00046(528-542)Online publication date: 2-Mar-2024
  • (2024)Rowhammer Cache: A Last-Level Cache for Low-Overhead Rowhammer Tracking2024 IEEE International Symposium on Hardware Oriented Security and Trust (HOST)10.1109/HOST55342.2024.10545410(349-360)Online publication date: 6-May-2024
  • (2024)Accelerating Graph Analytics Using Attention-Based Data PrefetcherSN Computer Science10.1007/s42979-024-02989-w5:5Online publication date: 13-Jun-2024
  • (2024)RL-CoPref: a reinforcement learning-based coordinated prefetching controller for multiple prefetchersThe Journal of Supercomputing10.1007/s11227-024-05938-980:9(13001-13026)Online publication date: 1-Jun-2024
  • (2023)CluMP: Clustered Markov Chain for Storage I/O PrefetchElectronics10.3390/electronics1215329312:15(3293)Online publication date: 31-Jul-2023
  • (2023)Micro-Armed Bandit: Lightweight & Reusable Reinforcement Learning for Microarchitecture Decision-MakingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623780(698-713)Online publication date: 28-Oct-2023
  • (2023)CLIP: Load Criticality based Data Prefetching for Bandwidth-constrained Many-core SystemsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614245(714-727)Online publication date: 28-Oct-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media