Roelandts J, Naithani A, Ainsworth S, Jones T and Eeckhout L. (2024). Scalar Vector Runahead 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). 10.1109/MICRO61859.2024.00101. 979-8-3503-5057-9. (1367-1381).

https://ieeexplore.ieee.org/document/10764499/

Schwedock B and Beckmann N. (2024). Leviathan: A Unified System for General-Purpose Near-Data Computing 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). 10.1109/MICRO61859.2024.00095. 979-8-3503-5057-9. (1278-1294).

https://ieeexplore.ieee.org/document/10764520/

Lee H and Sanchez D. (2024). Terminus: A Programmable Accelerator for Read and Update Operations on Sparse Data Structures 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). 10.1109/MICRO61859.2024.00092. 979-8-3503-5057-9. (1233-1246).

https://ieeexplore.ieee.org/document/10764666/

Pal A, Desai K, Chatterjee R and San Miguel J. (2024). Camouflage: Utility-Aware Obfuscation for Accurate Simulation of Sensitive Program Traces. ACM Transactions on Architecture and Code Optimization. 21:2. (1-23). Online publication date: 30-Jun-2024.

https://doi.org/10.1145/3650110

Ainsworth S and Mukhanov L. (2024). Triangel: A High-Performance, Accurate, Timely On-Chip Temporal Prefetcher 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). 10.1109/ISCA59077.2024.00090. 979-8-3503-2658-1. (1202-1216).

https://ieeexplore.ieee.org/document/10609579/

Bera R, Ranganathan A, Rakshit J, Mahto S, Nori A, Gaur J, Olgun A, Kanellopoulos K, Sadrosadati M, Subramoney S and Mutlu O. (2024). Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). 10.1109/ISCA59077.2024.00017. 979-8-3503-2658-1. (88-102).

https://ieeexplore.ieee.org/document/10609589/

Jamet A, Vavouliotis G, Jiménez D, Alvarez L and Casas M. (2024). Practically Tackling Memory Bottlenecks of Graph-Processing Workloads 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 10.1109/IPDPS57955.2024.00096. 979-8-3503-8711-7. (1034-1045).

https://ieeexplore.ieee.org/document/10579233/

Jain A, Lin H, Villavieja C, Kasikci B, Kennelly C, Hashemi M and Ranganathan P. Limoncello: Prefetchers for Scale. Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. (577-590).

https://doi.org/10.1145/3620666.3651373

Zhang X, Liu C, Ni J, Cheng Y, Zhang L, Li H and Li X. PDG: A Prefetcher for Dynamic Graph Updating. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 10.1109/TCAD.2023.3335880. 43:4. (1246-1259).

https://ieeexplore.ieee.org/document/10327765/

Schrick N and Hawrylak P. (2024). Application-Level Checkpoint/Restart for Large-Scale Attack and Compliance Graphs SoutheastCon 2024. 10.1109/SoutheastCon52093.2024.10500065. 979-8-3503-1710-7. (1450-1455).

https://ieeexplore.ieee.org/document/10500065/

Jamet A, Vavouliotis G, Jiménez D, Alvarez L and Casas M. (2024). A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 10.1109/HPCA57654.2024.00046. 979-8-3503-9313-2. (528-542).

https://ieeexplore.ieee.org/document/10476485/

Fu G, Xia T, Luo Z, Chen R, Zhao W and Ren P. (2024). Differential-Matching Prefetcher for Indirect Memory Access 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 10.1109/HPCA57654.2024.00040. 979-8-3503-9313-2. (439-453).

https://ieeexplore.ieee.org/document/10476460/

Chou Y, Nowicki T and Aamodt T. Treelet Prefetching For Ray Tracing. Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. (742-755).

https://doi.org/10.1145/3613424.3614288

Siracusa M, Soria-Pardos V, Sgherzi F, Randall J, Joseph D, Moretó Planas M and Armejach A. A Tensor Marshaling Unit for Sparse Tensor Algebra on General-Purpose Processors. Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. (1332-1346).

https://doi.org/10.1145/3613424.3614284

Naithani A, Roelandts J, Ainsworth S, Jones T and Eeckhout L. Decoupled Vector Runahead. Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. (17-31).

https://doi.org/10.1145/3613424.3614255

Ocalan B and Ozturk O. (2023). Utilizing Prefetch Buffers for Iterative Graph Applications 2023 26th Euromicro Conference on Digital System Design (DSD). 10.1109/DSD60849.2023.00057. 979-8-3503-4419-6. (359-365).

https://ieeexplore.ieee.org/document/10456778/

Khojasteh H and Tabatabaei H. (2023). A Survey on the Proposed Architectures for Efficient Execution of Irregular Applications Using Pipeline Parallelism 2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE). 10.1109/CSCE60160.2023.00342. 979-8-3503-2759-5. (2080-2087).

https://ieeexplore.ieee.org/document/10487332/

Yang Y, Li R, Shi Q, Li X, Hu G, Li X and Yuan M. (2023). SGDP: A Stream-Graph Neural Network Based Data Prefetcher 2023 International Joint Conference on Neural Networks (IJCNN). 10.1109/IJCNN54540.2023.10191927. 978-1-6654-8867-9. (1-8).

https://ieeexplore.ieee.org/document/10191927/

Manocha A, Aragon J and Martonosi M. Graphfire: Synergizing Fetch, Insertion, and Replacement Policies for Graph Analytics. IEEE Transactions on Computers. 10.1109/TC.2022.3157525. 72:1. (291-304).

https://ieeexplore.ieee.org/document/9730090/

Deng J, Fu X, Zhang B, Wang J, Zhang P and Xie X. (2022). Graph_CC: Accelerator of Connected Component Search in Graph Computing 2022 7th International Conference on Integrated Circuits and Microsystems (ICICM). 10.1109/ICICM56102.2022.10011381. 978-1-6654-6043-9. (441-447).

https://ieeexplore.ieee.org/document/10011381/

Pronold J, Jordan J, Wylie B, Kitayama I, Diesmann M and Kunkel S. (2022). Routing brain traffic through the von Neumann bottleneck. Parallel Computing. 113:C. Online publication date: 1-Oct-2022.

https://doi.org/10.1016/j.parco.2022.102952

Wu Q, Ekanayake A, Li R, Beard J and John L. SPAMeR: Speculative Push for Anticipated Message Requests in Multi-Core Systems. Proceedings of the 51st International Conference on Parallel Processing. (1-12).

https://doi.org/10.1145/3545008.3545044

Vijaykumar N, Olgun A, Kanellopoulos K, Bostanci F, Hassan H, Lotfi M, Gibbons P and Mutlu O. (2022). MetaSys: A Practical Open-source Metadata Management System to Implement and Evaluate Cross-layer Optimizations. ACM Transactions on Architecture and Code Optimization. 19:2. (1-29). Online publication date: 30-Jun-2022.

https://doi.org/10.1145/3505250

Talati N, Ye H, Yang Y, Belayneh L, Chen K, Blaauw D, Mudge T and Dreslinski R. NDMiner. Proceedings of the 49th Annual International Symposium on Computer Architecture. (146-159).

https://doi.org/10.1145/3470496.3527437

Orenes-Vera M, Manocha A, Balkind J, Gao F, Aragón J, Wentzlaff D and Martonosi M. Tiny but mighty. Proceedings of the 49th Annual International Symposium on Computer Architecture. (817-830).

https://doi.org/10.1145/3470496.3527400

Vicarte J, Flanders M, Paccagnella R, Garrett-Grossman G, Morrison A, Fletcher C and Kohlbrenner D. (2022). Augury: Using Data Memory-Dependent Prefetchers to Leak Data at Rest 2022 IEEE Symposium on Security and Privacy (SP). 10.1109/SP46214.2022.9833570. 978-1-6654-1316-9. (1491-1505).

https://ieeexplore.ieee.org/document/9833570/

Jalili M and Erez M. (2022). Reducing Load Latency with Cache Level Prediction 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 10.1109/HPCA53966.2022.00054. 978-1-6654-2027-3. (648-661).

https://ieeexplore.ieee.org/document/9773263/

Wang Q, Zheng L, Yuan J, Huang Y, Yao P, Gui C, Hu A, Liao X and Jin H. (2022). Hardware-Accelerated Hypergraph Processing with Chain-Driven Scheduling 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 10.1109/HPCA53966.2022.00022. 978-1-6654-2027-3. (184-198).

https://ieeexplore.ieee.org/document/9773270/

Jamilan S, Khan T, Ayers G, Kasikci B and Litz H. APT-GET. Proceedings of the Seventeenth European Conference on Computer Systems. (747-764).

https://doi.org/10.1145/3492321.3519583

Talati N, Jin D, Ye H, Brahmakshatriya A, Dasika G, Amarasinghe S, Mudge T, Koutra D and Dreslinski R. (2021). A Deep Dive Into Understanding The Random Walk-Based Temporal Graph Learning 2021 IEEE International Symposium on Workload Characterization (IISWC). 10.1109/IISWC53511.2021.00019. 978-1-6654-4173-5. (87-100).

https://ieeexplore.ieee.org/document/9668298/

Basak A, Qu Z, Lin J, Alameldeen A, Chishti Z, Ding Y and Xie Y. Improving Streaming Graph Processing Performance using Input Knowledge. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. (1036-1050).

https://doi.org/10.1145/3466752.3480096

Yang Y, Emer J and Sanchez D. SpZip. Proceedings of the 48th Annual International Symposium on Computer Architecture. (1069-1082).

https://doi.org/10.1109/ISCA52012.2021.00087

Vicarte J, Shome P, Nayak N, Trippel C, Morrison A, Kohlbrenner D and Fletcher C. Opening pandora's box. Proceedings of the 48th Annual International Symposium on Computer Architecture. (347-360).

https://doi.org/10.1109/ISCA52012.2021.00035

Naithani A, Ainsworth S, Jones T and Eeckhout L. Vector runahead. Proceedings of the 48th Annual International Symposium on Computer Architecture. (195-208).

https://doi.org/10.1109/ISCA52012.2021.00024

Barredo A, Armejach A, Beard J and Moreto M. PLANAR. Proceedings of the 35th ACM International Conference on Supercomputing. (164-176).

https://doi.org/10.1145/3447818.3460368

Balaji V, Crago N, Jaleel A and Lucia B. (2021). P-OPT: Practical Optimal Cache Replacement for Graph Analytics 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 10.1109/HPCA51647.2021.00062. 978-1-6654-2235-2. (668-681).

https://ieeexplore.ieee.org/document/9407090/

Talati N, May K, Behroozi A, Yang Y, Kaszyk K, Vasiladiotis C, Verma T, Li L, Nguyen B, Sun J, Morton J, Ahmadi A, Austin T, O'Boyle M, Mahlke S, Mudge T and Dreslinski R. (2021). Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 10.1109/HPCA51647.2021.00061. 978-1-6654-2235-2. (654-667).

https://ieeexplore.ieee.org/document/9407222/

Zhang Y, Liao X, Jin H, He L, He B, Liu H and Gu L. (2021). DepGraph: A Dependency-Driven Accelerator for Efficient Iterative Graph Processing 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 10.1109/HPCA51647.2021.00039. 978-1-6654-2235-2. (371-384).

https://ieeexplore.ieee.org/document/9407071/

Choi S, Kim J and Kim S. (2021). Adaptive Granularity Based Last-Level Cache Prefetching Method with eDRAM Prefetch Buffer for Graph Processing Applications. Applied Sciences. 10.3390/app11030991. 11:3. (991).

https://www.mdpi.com/2076-3417/11/3/991

Oliveira G, Gomez-Luna J, Orosa L, Ghose S, Vijaykumar N, Fernandez I, Sadrosadati M and Mutlu O. DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks. IEEE Access. 10.1109/ACCESS.2021.3110993. 9. (134457-134502).

https://ieeexplore.ieee.org/document/9530719/

Nguyen Q and Sanchez D. (2020). Pipette: Improving Core Utilization on Irregular Applications through Intra-Core Pipeline Parallelism 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 10.1109/MICRO50266.2020.00056. 978-1-7281-7383-2. (596-608).

https://ieeexplore.ieee.org/document/9251856/

Basak A, Lin J, Lorica R, Xie X, Chishti Z, Alameldeen A and Xie Y. (2020). SAGA-Bench: Software and Hardware Characterization of Streaming Graph Analytics Workloads 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 10.1109/ISPASS48437.2020.00012. 978-1-7281-4798-7. (12-23).

https://ieeexplore.ieee.org/document/9238598/

Faldu P, Diamond J and Grot B. (2020). Domain-Specialized Cache Management for Graph Analytics 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 10.1109/HPCA47549.2020.00028. 978-1-7281-6149-5. (234-248).

https://ieeexplore.ieee.org/document/9065556/

Mukkara A, Beckmann N and Sanchez D. PHI. Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. (1009-1022).

https://doi.org/10.1145/3352460.3358254

Lee E, Kim J, Lim K, Noh S and Seo J. Pre-select static caching and neighborhood ordering for BFS-like algorithms on disk-based graph engines. Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference. (459-473).

/doi/10.5555/3358807.3358846

Liu W, Liu H, Liao X, Jin H and Zhang Y. NGraph: Parallel Graph Processing in Hybrid Memory Systems. IEEE Access. 10.1109/ACCESS.2019.2931058. 7. (103517-103529).

https://ieeexplore.ieee.org/document/8772041/

Zhang D, Ma X, Thomson M and Chiou D. (2018). Minnow. ACM SIGPLAN Notices. 53:2. (593-607). Online publication date: 30-Nov-2018.

https://doi.org/10.1145/3296957.3173197

Ainsworth S and Jones T. (2018). An Event-Triggered Programmable Prefetcher for Irregular Workloads. ACM SIGPLAN Notices. 53:2. (578-592). Online publication date: 30-Nov-2018.

https://doi.org/10.1145/3296957.3173189

Qian C, Childers B, Huang L, Guo H and Wang Z. (2018). CGAcc: A Compressed Sparse Row Representation-Based BFS Graph Traversal Accelerator on Hybrid Memory Cube. Electronics. 10.3390/electronics7110307. 7:11. (307).

https://www.mdpi.com/2079-9292/7/11/307

Mukkara A, Beckmann N, Abeydeera M, Ma X and Sanchez D. Exploiting locality in graph analytics through hardware-accelerated traversal scheduling. Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture. (1-14).

https://doi.org/10.1109/MICRO.2018.00010

Zhou M, Imani M, Gupta S and Rosing T. GAS. Proceedings of the International Symposium on Low Power Electronics and Design. (1-6).

https://doi.org/10.1145/3218603.3218631

Zhang D, Ma X, Thomson M and Chiou D. Minnow. Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. (593-607).

https://doi.org/10.1145/3173162.3173197

Ainsworth S and Jones T. An Event-Triggered Programmable Prefetcher for Irregular Workloads. Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. (578-592).

https://doi.org/10.1145/3173162.3173189

Michelogiannakis G and Shalf J. (2017). Last Level Collective Hardware Prefetching For Data-Parallel Applications 2017 IEEE 24th International Conference on High Performance Computing (HiPC). 10.1109/HiPC.2017.00018. 978-1-5386-2293-3. (72-83).

http://ieeexplore.ieee.org/document/8287737/

Zhang D, Ma X and Chiou D. Worklist-Directed Prefetching. IEEE Computer Architecture Letters. 10.1109/LCA.2016.2627571. 16:2. (170-173).

http://ieeexplore.ieee.org/document/7740958/

Dong Y, Ye C, Liu H, Tang L, Liao X, Jin H, Chen C, Li Y and Wang Y. DTAP: Accelerating Strongly-Typed Programs with Data Type-Aware Hardware Prefetching. ACM Transactions on Architecture and Code Optimization. 0:0.

https://doi.org/10.1145/3701994