• Wang J, Panda R and John L. (2018). SelSMaP. ACM Transactions on Architecture and Code Optimization. 15:4. (1-21). Online publication date: 31-Dec-2019.

    https://doi.org/10.1145/3274650

  • Nori A, Gaur J, Rai S, Subramoney S and Wang H. Criticality aware tiered cache hierarchy. Proceedings of the 45th Annual International Symposium on Computer Architecture. (96-109).

    https://doi.org/10.1109/ISCA.2018.00019

  • Shevgoor M, Koladiya S, Balasubramonian R, Wilkerson C, Pugsley S and Chishti Z. Efficiently prefetching complex address patterns. Proceedings of the 48th International Symposium on Microarchitecture. (141-152).

    https://doi.org/10.1145/2830772.2830793

  • Chen Y and Liu Y. Dual-addressing memory architecture for two-dimensional memory access patterns. Proceedings of the Conference on Design, Automation and Test in Europe. (71-76).

    /doi/10.5555/2485288.2485308

  • Grannaes M, Jahre M and Natvig L. Multi-level hardware prefetching using low complexity delta correlating prediction tables with partial matching. Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers. (247-261).

    https://doi.org/10.1007/978-3-642-11515-8_19

  • Zhang Z, Kulkarni A, Ma X and Zhou Y. Memory resource allocation for file system prefetching. Proceedings of the 4th ACM European conference on Computer systems. (75-88).

    https://doi.org/10.1145/1519065.1519075

  • Ramos L, Briz J, Ibáñez P and Viñals V. Low-Cost Adaptive Data Prefetching. Proceedings of the 14th international Euro-Par conference on Parallel Processing. (327-336).

    https://doi.org/10.1007/978-3-540-85451-7_36

  • Gill B and Bathen L. (2007). Optimal multistream sequential prefetching in a shared cache. ACM Transactions on Storage. 3:3. (10-es). Online publication date: 1-Oct-2007.

    https://doi.org/10.1145/1288783.1288789

  • Malkowski K, Lee I, Raghavan P and Irwin M. Conjugate gradient sparse solvers. Proceedings of the 20th international conference on Parallel and distributed processing. (297-297).

    /doi/10.5555/1898699.1898822

  • Chuang P, Hsiao Y and Chiu Y. (2004). An Efficient Value Predictor Dynamically Using Loop and Locality Properties. The Journal of Supercomputing. 30:1. (19-36). Online publication date: 1-Oct-2004.

    https://doi.org/10.1023/B:SUPE.0000032779.88101.24

  • Katsinis C. (2004). Merging, sorting and matrix operations on the SOME-bus multiprocessor architecture. Future Generation Computer Systems. 20:4. (643-661). Online publication date: 1-May-2004.

    https://doi.org/10.1016/S0167-739X(03)00129-8

  • Dubois M. Fighting the memory wall with assisted execution. Proceedings of the 1st conference on Computing frontiers. (168-180).

    https://doi.org/10.1145/977091.977116

  • Luk C, Muth R, Patil H, Weiss R, Lowney P and Cohn R. Profile-guided post-link stride prefetching. Proceedings of the 16th international conference on Supercomputing. (167-178).

    https://doi.org/10.1145/514191.514217

  • Wu Y. Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching. Proceedings of the ACM SIGPLAN 2002 conference on Programming language design and implementation. (210-221).

    https://doi.org/10.1145/512529.512555

  • Wu Y. (2002). Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching. ACM SIGPLAN Notices. 37:5. (210-221). Online publication date: 17-May-2002.

    https://doi.org/10.1145/543552.512555

  • Lim H and Yew P. (2001). Efficient Integration of Compiler-Directed Cache Coherence and Data Prefetching. Journal of Parallel and Distributed Computing. 61:12. (1775-1802). Online publication date: 1-Dec-2001.

    https://doi.org/10.1006/jpdc.2001.1784

  • Hariprakash. G , Achutharaman. R and Omondi A. DSTRIDE. Proceedings of the 6th Australasian conference on Computer systems architecture. (62-70).

    /doi/10.5555/545596.545604

  • Hariprakash. G , Achutharaman. R and Omondi A. (2001). DSTRIDE. Australian Computer Science Communications. 23:4. (62-70). Online publication date: 15-Jan-2001.

    /doi/10.5555/545615.545604

  • Chang C, Chen T and Sheu J. (2000). Improving Memory Traffic by Assembly-Level Exploitation of Reuses for Vector Registers. The Journal of Supercomputing. 17:2. (187-204). Online publication date: 1-Sep-2000.

    https://doi.org/10.1023/A:1008134522009

  • Saulsbury A, Dahlgren F and Stenström P. Recency-based TLB preloading. Proceedings of the 27th annual international symposium on Computer architecture. (117-127).

    https://doi.org/10.1145/339647.339666

  • Saulsbury A, Dahlgren F and Stenström P. (2000). Recency-based TLB preloading. ACM SIGARCH Computer Architecture News. 28:2. (117-127). Online publication date: 1-May-2000.

    https://doi.org/10.1145/342001.339666

  • Chi C, Yuan J and Cheung C. Cyclic dependence based data reference prediction. Proceedings of the 13th international conference on Supercomputing. (127-134).

    https://doi.org/10.1145/305138.305186

  • Dahlgren F, Dubois M and Stenström P. (1998). Performance Evaluation and Cost Analysis of Cache Protocol Extensions for Shared-Memory Multiprocessors. IEEE Transactions on Computers. 47:10. (1041-1055). Online publication date: 1-Oct-1998.

    https://doi.org/10.1109/12.729785

  • Lim H and Yew P. (1998). Maintaining Cache Coherence through Compiler-Directed Data Prefetching. Journal of Parallel and Distributed Computing. 53:2. (144-173). Online publication date: 15-Sep-1998.

    https://doi.org/10.1006/jpdc.1998.1480

  • Chi C and Cheung C. Hardware-driven prefetching for pointer data references. Proceedings of the 12th international conference on Supercomputing. (377-384).

    https://doi.org/10.1145/277830.277924

  • Wang K and Franklin M. Highly accurate data value prediction using hybrid predictors. Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture. (281-290).

    /doi/10.5555/266800.266827

  • Parsons E, Brorsson M and Sevcik K. (1997). Predicting the performance of distributed virtual shared-memory applications. IBM Systems Journal. 36:4. (527-549). Online publication date: 1-Oct-1997.

    https://doi.org/10.1147/sj.364.0527

  • Manjikia N. Combining Loop Fusion with Prefetching on Shared-memory Multiprocessors. Proceedings of the international Conference on Parallel Processing.

    /doi/10.5555/645533.656505

  • Grahn H and Stenström P. Relative Performance of Hardware and Software-Only Directory Protocols Under Latency Tolerating and Reducing Techniques. Proceedings of the 11th International Symposium on Parallel Processing.

    /doi/10.5555/645607.661344