Wang J, Panda R and John L. (2018). SelSMaP. ACM Transactions on Architecture and Code Optimization. 15:4. (1-21). Online publication date: 31-Dec-2019.

Nori A, Gaur J, Rai S, Subramoney S and Wang H. Criticality aware tiered cache hierarchy. Proceedings of the 45th Annual International Symposium on Computer Architecture. (96-109).

Shevgoor M, Koladiya S, Balasubramonian R, Wilkerson C, Pugsley S and Chishti Z. Efficiently prefetching complex address patterns. Proceedings of the 48th International Symposium on Microarchitecture. (141-152).

https://doi.org/10.1145/2830772.2830793

Chen Y and Liu Y. Dual-addressing memory architecture for two-dimensional memory access patterns. Proceedings of the Conference on Design, Automation and Test in Europe. (71-76).

/doi/10.5555/2485288.2485308

Grannaes M, Jahre M and Natvig L. Multi-level hardware prefetching using low complexity delta correlating prediction tables with partial matching. Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers. (247-261).

https://doi.org/10.1007/978-3-642-11515-8_19

Zhang Z, Kulkarni A, Ma X and Zhou Y. Memory resource allocation for file system prefetching. Proceedings of the 4th ACM European conference on Computer systems. (75-88).

https://doi.org/10.1145/1519065.1519075

Ramos L, Briz J, Ibáñez P and Viñals V. Low-Cost Adaptive Data Prefetching. Proceedings of the 14th international Euro-Par conference on Parallel Processing. (327-336).

https://doi.org/10.1007/978-3-540-85451-7_36

Gill B and Bathen L. (2007). Optimal multistream sequential prefetching in a shared cache. ACM Transactions on Storage. 3:3. (10-es). Online publication date: 1-Oct-2007.

https://doi.org/10.1145/1288783.1288789

Malkowski K, Lee I, Raghavan P and Irwin M. Conjugate gradient sparse solvers. Proceedings of the 20th international conference on Parallel and distributed processing. (297-297).

/doi/10.5555/1898699.1898822

Chuang P, Hsiao Y and Chiu Y. (2004). An Efficient Value Predictor Dynamically Using Loop and Locality Properties. The Journal of Supercomputing. 30:1. (19-36). Online publication date: 1-Oct-2004.

https://doi.org/10.1023/B:SUPE.0000032779.88101.24

Katsinis C. (2004). Merging, sorting and matrix operations on the SOME-bus multiprocessor architecture. Future Generation Computer Systems. 20:4. (643-661). Online publication date: 1-May-2004.

https://doi.org/10.1016/S0167-739X(03)00129-8

Dubois M. Fighting the memory wall with assisted execution. Proceedings of the 1st conference on Computing frontiers. (168-180).

https://doi.org/10.1145/977091.977116

Luk C, Muth R, Patil H, Weiss R, Lowney P and Cohn R. Profile-guided post-link stride prefetching. Proceedings of the 16th international conference on Supercomputing. (167-178).

https://doi.org/10.1145/514191.514217

Wu Y. Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching. Proceedings of the ACM SIGPLAN 2002 conference on Programming language design and implementation. (210-221).

https://doi.org/10.1145/512529.512555

Wu Y. (2002). Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching. ACM SIGPLAN Notices. 37:5. (210-221). Online publication date: 17-May-2002.

https://doi.org/10.1145/543552.512555

Lim H and Yew P. (2001). Efficient Integration of Compiler-Directed Cache Coherence and Data Prefetching. Journal of Parallel and Distributed Computing. 61:12. (1775-1802). Online publication date: 1-Dec-2001.

https://doi.org/10.1006/jpdc.2001.1784

Hariprakash. G , Achutharaman. R and Omondi A. DSTRIDE. Proceedings of the 6th Australasian conference on Computer systems architecture. (62-70).

/doi/10.5555/545596.545604

Hariprakash. G , Achutharaman. R and Omondi A. (2001). DSTRIDE. Australian Computer Science Communications. 23:4. (62-70). Online publication date: 15-Jan-2001.

/doi/10.5555/545615.545604

Chang C, Chen T and Sheu J. (2000). Improving Memory Traffic by Assembly-Level Exploitation of Reuses for Vector Registers. The Journal of Supercomputing. 17:2. (187-204). Online publication date: 1-Sep-2000.

https://doi.org/10.1023/A:1008134522009

Saulsbury A, Dahlgren F and Stenström P. Recency-based TLB preloading. Proceedings of the 27th annual international symposium on Computer architecture. (117-127).

https://doi.org/10.1145/339647.339666

Saulsbury A, Dahlgren F and Stenström P. (2000). Recency-based TLB preloading. ACM SIGARCH Computer Architecture News. 28:2. (117-127). Online publication date: 1-May-2000.

https://doi.org/10.1145/342001.339666

Chi C, Yuan J and Cheung C. Cyclic dependence based data reference prediction. Proceedings of the 13th international conference on Supercomputing. (127-134).

https://doi.org/10.1145/305138.305186

Dahlgren F, Dubois M and Stenström P. (1998). Performance Evaluation and Cost Analysis of Cache Protocol Extensions for Shared-Memory Multiprocessors. IEEE Transactions on Computers. 47:10. (1041-1055). Online publication date: 1-Oct-1998.

https://doi.org/10.1109/12.729785

Lim H and Yew P. (1998). Maintaining Cache Coherence through Compiler-Directed Data Prefetching. Journal of Parallel and Distributed Computing. 53:2. (144-173). Online publication date: 15-Sep-1998.

https://doi.org/10.1006/jpdc.1998.1480

Chi C and Cheung C. Hardware-driven prefetching for pointer data references. Proceedings of the 12th international conference on Supercomputing. (377-384).

https://doi.org/10.1145/277830.277924

Wang K and Franklin M. Highly accurate data value prediction using hybrid predictors. Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture. (281-290).

/doi/10.5555/266800.266827

Parsons E, Brorsson M and Sevcik K. (1997). Predicting the performance of distributed virtual shared-memory applications. IBM Systems Journal. 36:4. (527-549). Online publication date: 1-Oct-1997.

https://doi.org/10.1147/sj.364.0527

Manjikia N. Combining Loop Fusion with Prefetching on Shared-memory Multiprocessors. Proceedings of the international Conference on Parallel Processing.

/doi/10.5555/645533.656505

Grahn H and Stenström P. Relative Performance of Hardware and Software-Only Directory Protocols Under Latency Tolerating and Reducing Techniques. Proceedings of the 11th International Symposium on Parallel Processing.

/doi/10.5555/645607.661344