research-article

Perceptron-based prefetch filtering

Authors:

Daniel A. JiménezAuthors Info & Claims

ISCA '19: Proceedings of the 46th International Symposium on Computer Architecture

Pages 1 - 13

https://doi.org/10.1145/3307650.3322207

Published: 22 June 2019 Publication History

Abstract

Hardware prefetching is an effective technique for hiding cache miss latencies in modern processor designs. Prefetcher performance can be characterized by two main metrics that are generally at odds with one another: coverage, the fraction of baseline cache misses which the prefetcher brings into the cache; and accuracy, the fraction of prefetches which are ultimately used. An overly aggressive prefetcher may improve coverage at the cost of reduced accuracy. Thus, performance may be harmed by this over-aggressiveness because many resources are wasted, including cache capacity and bandwidth. An ideal prefetcher would have both high coverage and accuracy.

In this paper, we introduce Perceptron-based Prefetch Filtering (PPF) as a way to increase the coverage of the prefetches generated by an underlying prefetcher without negatively impacting accuracy. PPF enables more aggressive tuning of the underlying prefetcher, leading to increased coverage by filtering out the growing numbers of inaccurate prefetches such an aggressive tuning implies. We also explore a range of features to use to train PPF's perceptron layer to identify inaccurate prefetches. PPF improves performance on a memory-intensive subset of the SPEC CPU 2017 benchmarks by 3.78% for a single-core configuration, and by 11.4% for a 4-core configuration, compared to the underlying prefetcher alone.

References

[1]

W. A. Wulf and S. A. McKee, "Hitting the memory wall: Implications of the obvious," SIGARCH Comput. Archit. News, vol. 23, pp. 20--24, Mar. 1995.

Digital Library

[2]

J. Kim, S. H. Pugsley, P. V. Gratz, A. L. N. Reddy, C. Wilkerson, and Z. Chishti, "Path confidence based lookahead prefetching," in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1--12, Oct 2016.

Digital Library

[3]

N. D. E. Jerger, E. L. Hill, and M. H. Lipasti, "Friendly fire: understanding the effects of multiprocessor prefetches," in 2006 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 177--188, March 2006.

[4]

L. Peled, S. Mannor, U. Weiser, and Y. Etsion, "Semantic locality and context-based prefetching using reinforcement learning," in 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), pp. 285--297, June 2015.

Digital Library

[5]

S. Liao, T. Hung, D. Nguyen, C. Chou, C. Tu, and H. Zhou, "Machine learning-based prefetch optimization for data center applications," in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1--10, Nov 2009.

Digital Library

[6]

N. P. Jouppi, "Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers," in Proceedings of the 17th Annual International Symposium on Computer Architecture, ISCA '90, (New York, NY, USA), pp. 364--373, ACM, 1990.

Digital Library

[7]

A. J. Smith, "Sequential program prefetching in memory hierarchies," Computer, vol. 11, pp. 7--21, Dec. 1978.

Digital Library

[8]

J.-L. Baer and T.-F. Chen, "An effective on-chip preloading scheme to reduce data access penalty," in Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, Supercomputing '91, (New York, NY, USA), pp. 176--186, ACM, 1991.

Digital Library

[9]

J.-L. Baer and T.-F. Chen, "Effective hardware-based data prefetching for high-performance processors," IEEE Trans. Comput., vol. 44, pp. 609--623, May 1995.

Digital Library

[10]

T. F. Wenisch, M. Ferdman, A. Ailamaki, B. Falsafi, and A. Moshovos, "Making address-correlated prefetching practical," IEEE Micro, vol. 30, pp. 50--59, Jan 2010.

Digital Library

[11]

Y. Ishii, M. Inaba, and K. Hiraki, "Access map pattern matching for data cache prefetch," in Proceedings of the 23rd International Conference on Supercomputing, ICS '09, (New York, NY, USA), pp. 499--500, ACM, 2009.

Digital Library

[12]

C. F. Chen, S. . Yang, B. Falsafi, and A. Moshovos, "Accurate and complexity-effective spatial pattern prediction," in 10th International Symposium on High Performance Computer Architecture (HPCA'04), pp. 276--287, Feb 2004.

Digital Library

[13]

S. Somogyi, T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos, "Spatial memory streaming," in Proceedings of the 33rd Annual International Symposium on Computer Architecture, ISCA '06, (Washington, DC, USA), pp. 252--263, IEEE Computer Society, 2006.

Digital Library

[14]

M. Ferdman, T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos, "Temporal instruction fetch streaming," in Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 41, (Washington, DC, USA), pp. 1--10, IEEE Computer Society, 2008.

Digital Library

[15]

T. F. Wenisch, M. Ferdman, A. Ailamaki, B. Falsafi, and A. Moshovos, "Practical off-chip meta-data for temporal memory streaming," in 2009 IEEE 15th International Symposium on High Performance Computer Architecture, pp. 79--90, Feb 2009.

[16]

S. Somogyi, T. F. Wenisch, A. Ailamaki, and B. Falsafi, "Spatio-temporal memory streaming," in Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA '09, (New York, NY, USA), pp. 69--80, ACM, 2009.

Digital Library

[17]

S. Somogyi, T. F. Wenisch, M. Ferdman, and B. Falsafi, "Spatial memory streaming," J. Instruction-Level Parallelism, vol. 13, 2011.

[18]

D. Kadjo, J. Kim, P. Sharma, R. Panda, P. Gratz, and D. Jiménez, "B-fetch: Branch prediction directed prefetching for chip-multiprocessors," in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 623--634, Dec 2014.

Digital Library

[19]

L. M. AlBarakat, P. V. Gratz, and D. A. Jiménez, "Mtb-fetch: Multithreading aware hardware prefetching for chip multiprocessors," IEEE Computer Architecture Letters, vol. 17, pp. 175--178, July 2018.

[20]

D. A. Jiménez and C. Lin, "Dynamic branch prediction with perceptrons," in Proceedings of the 7th International Symposium on High Performance Computer Architecture (HPCA-7), pp. 197--206, 2001.

Digital Library

[21]

D. Tarjan and K. Skadron, "Merging path and gshare indexing in perceptron branch prediction," ACM Trans. Archit. Code Optim., vol. 2, pp. 280--300, Sept. 2005.

Digital Library

[22]

E. Teran, Z. Wang, and D. A. Jiménez, "Perceptron learning for reuse prediction," in The 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-49, (Piscataway, NJ, USA), pp. 2:1--2:12, IEEE Press, 2016.

Digital Library

[23]

D. A. Jiménez and E. Teran, "Multiperspective reuse prediction," in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-50 '17, (New York, NY, USA), pp. 436--448, ACM, 2017.

Digital Library

[24]

J. A. Joao, O. Mutlu, C. J. Lee, R. Cohn, Y. N. Patt, and H. Kim, "Virtual program counter (vpc) prediction: Very low cost indirect branch prediction using conditional branch prediction hardware," IEEE Transactions on Computers, vol. 58, pp. 1153--1170, 12 2008.

[25]

"The champsim simulator." https://github.com/ChampSim/ChampSim.

[26]

S. H. Pugsley, A. R. Alameldeen, C. Wilkerson, and H. Kim, "The 2nd data prefetching championship (dpc-2)." http://comparch-conf.gatech.edu/dpc2/.

[27]

"The 2nd cache replacement championship (crc-2)."

[28]

"Standard performance evaluation corporation cpu2017 benchmark suite." http://www.spec.org/cpu2017/.

[29]

E. Perelman, G. Hamerly, M. Van Biesbrouck, T. Sherwood, and B. Calder, "Using simpoint for accurate and efficient simulation," in Proceedings of the 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '03, (New York, NY, USA), pp. 318--319, ACM, 2003.

Digital Library

[30]

"Standard performance evaluation corporation cpu2006 benchmark suite." http://www.spec.org/cpu2006/.

[31]

M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the clouds: A study of emerging scale-out workloads on modern hardware," in Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, (New York, NY, USA), pp. 37--48, ACM, 2012.

Digital Library

[32]

Y. Ishii, M. Inaba, and K. Hiraki, "Unified memory optimizing architecture: Memory subsystem control with a unified predictor," in Proceedings of the 26th ACM International Conference on Supercomputing, ICS '12, (NY, USA), pp. 267--278, ACM, 2012.

Digital Library

[33]

S. H. Pugsley, Z. Chishti, C. Wilkerson, P. Chuang, R. L. Scott, A. Jaleel, S. Lu, K. Chow, and R. Balasubramonian, "Sandbox prefetching: Safe run-time evaluation of aggressive prefetchers," in 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), pp. 626--637, Feb 2014.

[34]

P. Michaud, "Best-offset hardware prefetching," in 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 469--480, March 2016.

[35]

M. Shevgoor, S. Koladiya, R. Balasubramonian, C. Wilkerson, S. H. Pugsley, and Z. Chishti, "Efficiently prefetching complex address patterns," in 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 141--152, Dec 2015.

Digital Library

[36]

J. Kim, E. Teran, P. V. Gratz, D. A. Jiménez, S. H. Pugsley, and C. Wilkerson, "Kill the program counter: Reconstructing program behavior in the processor cache hierarchy," in Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '17, (New York, NY, USA), pp. 737--749, ACM, 2017.

Digital Library

[37]

C. Wu, A. Jaleel, M. Martonosi, S. C. Steely, and J. Emer, "Pacman: Prefetch-aware cache management for high performance caching," in 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 442--453, Dec 2011.

Digital Library

[38]

V. Seshadri, S. Yedkar, H. Xin, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, "Mitigating prefetcher-caused pollution using informed caching policies for prefetched blocks," ACM Trans. Archit. Code Optim., vol. 11, pp. 51:1--51:22, Jan. 2015.

Digital Library

[39]

V. Seshadri, O. Mutlu, M. A. Kozuch, and T. C. Mowry, "The evicted-address filter: A unified mechanism to address both cache pollution and thrashing," in Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT '12, (New York, NY, USA), pp. 355--366, ACM, 2012.

Digital Library

[40]

A. Jain and C. Lin, "Rethinking belady's algorithm to accommodate prefetching," in 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 110--123, June 2018.

Digital Library

[41]

E. Ebrahimi, O. Mutlu, C. J. Lee, and Y. N. Patt, "Coordinated control of multiple prefetchers in multi-core systems," in Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, (New York, NY, USA), pp. 316--326, ACM, 2009.

Digital Library

[42]

M. Hashemi, K. Swersky, J. A. Smith, G. Ayers, H. Litz, J. Chang, C. Kozyrakis, and P. Ranganathan, "Learning memory access patterns," CoRR, vol. abs/1803.02329, 2018.

[43]

H. Wang and Z. Luo, "Data cache prefetching with perceptron learning," CoRR, vol. abs/1712.00905, 2017.

[44]

S. M. Khan, Y. Tian, and D. A. Jiménez, "Sampling dead block prediction for last-level caches," in Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '43, (Washington, DC, USA), pp. 175--186, IEEE Computer Society, 2010.

Digital Library

Cited By

Martínez Palau FTorrents MArmejach ACasas M(2024)Exploiting Vector Code Semantics for Efficient Data Cache PrefetchingProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656635(98-109)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656635
Ainsworth SMukhanov L(2024)Triangel: A High-Performance, Accurate, Timely On-Chip Temporal Prefetcher2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00090(1202-1216)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00090
Duong QJain ALin C(2024)A New Formulation of Neural Data Prefetching2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00088(1173-1187)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00088
Show More Cited By

Recommendations

Reducing Cache Pollution via Dynamic Data Prefetch Filtering

In order to bridge the gap of the growing speed disparity between processors and their memory subsystems, aggressive prefetch mechanisms, either hardware-based or compiler-assisted, are employed to hide memory latencies. As the first-level cache gets ...
Near-side prefetch throttling: adaptive prefetching for high-performance many-core processors
PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques

In modern processors, prefetching is an essential component for hiding long-latency memory accesses. However, prefetching too aggressively can easily degrade performance by evicting useful data from cache, or by saturating precious memory bandwidth. ...
Expert Prefetch Prediction: An Expert Predicting the Usefulness of Hardware Prefetchers

Hardware prefetching improves system performance by hiding and tolerating the latencies of lower levels of cache and off-chip DRAM. An accurate prefetcher improves system performance whereas an inaccurate prefetcher can cause cache pollution and consume ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '19: Proceedings of the 46th International Symposium on Computer Architecture

June 2019

849 pages

ISBN:9781450366694

DOI:10.1145/3307650

General Chair:
Srilatha (Bobbie) Manne
Microsoft
,
Program Chairs:
Hillery Hunter
IBM
,
Erik Altman
IBM Research

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ISCA '19

Sponsor:

SIGARCH

ISCA '19: The 46th Annual International Symposium on Computer Architecture

June 22 - 26, 2019

Arizona, Phoenix

Acceptance Rates

ISCA '19 Paper Acceptance Rate 62 of 365 submissions, 17%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

47
Total Citations
View Citations
2,857
Total Downloads

Downloads (Last 12 months)218
Downloads (Last 6 weeks)23

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Martínez Palau FTorrents MArmejach ACasas M(2024)Exploiting Vector Code Semantics for Efficient Data Cache PrefetchingProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656635(98-109)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656635
Ainsworth SMukhanov L(2024)Triangel: A High-Performance, Accurate, Timely On-Chip Temporal Prefetcher2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00090(1202-1216)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00090
Duong QJain ALin C(2024)A New Formulation of Neural Data Prefetching2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00088(1173-1187)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00088
Jamet AVavouliotis GJiménez DAlvarez LCasas M(2024)A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00046(528-542)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00046
Singh APanda B(2024)Rowhammer Cache: A Last-Level Cache for Low-Overhead Rowhammer Tracking2024 IEEE International Symposium on Hardware Oriented Security and Trust (HOST)10.1109/HOST55342.2024.10545410(349-360)Online publication date: 6-May-2024
https://doi.org/10.1109/HOST55342.2024.10545410
Zhang PKannan RNori APrasanna V(2024)Accelerating Graph Analytics Using Attention-Based Data PrefetcherSN Computer Science10.1007/s42979-024-02989-w5:5Online publication date: 13-Jun-2024
https://doi.org/10.1007/s42979-024-02989-w
Yang HFang JSu XCai ZWang Y(2024)RL-CoPref: a reinforcement learning-based coordinated prefetching controller for multiple prefetchersThe Journal of Supercomputing10.1007/s11227-024-05938-980:9(13001-13026)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s11227-024-05938-9
Jung SLee HJo H(2023)CluMP: Clustered Markov Chain for Storage I/O PrefetchElectronics10.3390/electronics1215329312:15(3293)Online publication date: 31-Jul-2023
https://doi.org/10.3390/electronics12153293
Gerogiannis GTorrellas J(2023)Micro-Armed Bandit: Lightweight & Reusable Reinforcement Learning for Microarchitecture Decision-MakingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623780(698-713)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3623780
Panda B(2023)CLIP: Load Criticality based Data Prefetching for Bandwidth-constrained Many-core SystemsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614245(714-727)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614245
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents