• Roelandts J, Naithani A, Ainsworth S, Jones T and Eeckhout L. (2024). Scalar Vector Runahead 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). 10.1109/MICRO61859.2024.00101. 979-8-3503-5057-9. (1367-1381).

    https://ieeexplore.ieee.org/document/10764499/

  • Schwedock B and Beckmann N. (2024). Leviathan: A Unified System for General-Purpose Near-Data Computing 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). 10.1109/MICRO61859.2024.00095. 979-8-3503-5057-9. (1278-1294).

    https://ieeexplore.ieee.org/document/10764520/

  • Lee H and Sanchez D. (2024). Terminus: A Programmable Accelerator for Read and Update Operations on Sparse Data Structures 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). 10.1109/MICRO61859.2024.00092. 979-8-3503-5057-9. (1233-1246).

    https://ieeexplore.ieee.org/document/10764666/

  • Pal A, Desai K, Chatterjee R and San Miguel J. (2024). Camouflage: Utility-Aware Obfuscation for Accurate Simulation of Sensitive Program Traces. ACM Transactions on Architecture and Code Optimization. 21:2. (1-23). Online publication date: 30-Jun-2024.

    https://doi.org/10.1145/3650110

  • Ainsworth S and Mukhanov L. (2024). Triangel: A High-Performance, Accurate, Timely On-Chip Temporal Prefetcher 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). 10.1109/ISCA59077.2024.00090. 979-8-3503-2658-1. (1202-1216).

    https://ieeexplore.ieee.org/document/10609579/

  • Bera R, Ranganathan A, Rakshit J, Mahto S, Nori A, Gaur J, Olgun A, Kanellopoulos K, Sadrosadati M, Subramoney S and Mutlu O. (2024). Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). 10.1109/ISCA59077.2024.00017. 979-8-3503-2658-1. (88-102).

    https://ieeexplore.ieee.org/document/10609589/

  • Jamet A, Vavouliotis G, Jiménez D, Alvarez L and Casas M. (2024). Practically Tackling Memory Bottlenecks of Graph-Processing Workloads 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 10.1109/IPDPS57955.2024.00096. 979-8-3503-8711-7. (1034-1045).

    https://ieeexplore.ieee.org/document/10579233/

  • Jain A, Lin H, Villavieja C, Kasikci B, Kennelly C, Hashemi M and Ranganathan P. Limoncello: Prefetchers for Scale. Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. (577-590).

    https://doi.org/10.1145/3620666.3651373

  • Zhang X, Liu C, Ni J, Cheng Y, Zhang L, Li H and Li X. PDG: A Prefetcher for Dynamic Graph Updating. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 10.1109/TCAD.2023.3335880. 43:4. (1246-1259).

    https://ieeexplore.ieee.org/document/10327765/

  • Schrick N and Hawrylak P. (2024). Application-Level Checkpoint/Restart for Large-Scale Attack and Compliance Graphs SoutheastCon 2024. 10.1109/SoutheastCon52093.2024.10500065. 979-8-3503-1710-7. (1450-1455).

    https://ieeexplore.ieee.org/document/10500065/

  • Jamet A, Vavouliotis G, Jiménez D, Alvarez L and Casas M. (2024). A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 10.1109/HPCA57654.2024.00046. 979-8-3503-9313-2. (528-542).

    https://ieeexplore.ieee.org/document/10476485/

  • Fu G, Xia T, Luo Z, Chen R, Zhao W and Ren P. (2024). Differential-Matching Prefetcher for Indirect Memory Access 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 10.1109/HPCA57654.2024.00040. 979-8-3503-9313-2. (439-453).

    https://ieeexplore.ieee.org/document/10476460/

  • Chou Y, Nowicki T and Aamodt T. Treelet Prefetching For Ray Tracing. Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. (742-755).

    https://doi.org/10.1145/3613424.3614288

  • Siracusa M, Soria-Pardos V, Sgherzi F, Randall J, Joseph D, Moretó Planas M and Armejach A. A Tensor Marshaling Unit for Sparse Tensor Algebra on General-Purpose Processors. Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. (1332-1346).

    https://doi.org/10.1145/3613424.3614284

  • Naithani A, Roelandts J, Ainsworth S, Jones T and Eeckhout L. Decoupled Vector Runahead. Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. (17-31).

    https://doi.org/10.1145/3613424.3614255

  • Ocalan B and Ozturk O. (2023). Utilizing Prefetch Buffers for Iterative Graph Applications 2023 26th Euromicro Conference on Digital System Design (DSD). 10.1109/DSD60849.2023.00057. 979-8-3503-4419-6. (359-365).

    https://ieeexplore.ieee.org/document/10456778/

  • Khojasteh H and Tabatabaei H. (2023). A Survey on the Proposed Architectures for Efficient Execution of Irregular Applications Using Pipeline Parallelism 2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE). 10.1109/CSCE60160.2023.00342. 979-8-3503-2759-5. (2080-2087).

    https://ieeexplore.ieee.org/document/10487332/

  • Yang Y, Li R, Shi Q, Li X, Hu G, Li X and Yuan M. (2023). SGDP: A Stream-Graph Neural Network Based Data Prefetcher 2023 International Joint Conference on Neural Networks (IJCNN). 10.1109/IJCNN54540.2023.10191927. 978-1-6654-8867-9. (1-8).

    https://ieeexplore.ieee.org/document/10191927/

  • Manocha A, Aragon J and Martonosi M. Graphfire: Synergizing Fetch, Insertion, and Replacement Policies for Graph Analytics. IEEE Transactions on Computers. 10.1109/TC.2022.3157525. 72:1. (291-304).

    https://ieeexplore.ieee.org/document/9730090/

  • Deng J, Fu X, Zhang B, Wang J, Zhang P and Xie X. (2022). Graph_CC: Accelerator of Connected Component Search in Graph Computing 2022 7th International Conference on Integrated Circuits and Microsystems (ICICM). 10.1109/ICICM56102.2022.10011381. 978-1-6654-6043-9. (441-447).

    https://ieeexplore.ieee.org/document/10011381/

  • Pronold J, Jordan J, Wylie B, Kitayama I, Diesmann M and Kunkel S. (2022). Routing brain traffic through the von Neumann bottleneck. Parallel Computing. 113:C. Online publication date: 1-Oct-2022.

    https://doi.org/10.1016/j.parco.2022.102952

  • Wu Q, Ekanayake A, Li R, Beard J and John L. SPAMeR: Speculative Push for Anticipated Message Requests in Multi-Core Systems. Proceedings of the 51st International Conference on Parallel Processing. (1-12).

    https://doi.org/10.1145/3545008.3545044

  • Vijaykumar N, Olgun A, Kanellopoulos K, Bostanci F, Hassan H, Lotfi M, Gibbons P and Mutlu O. (2022). MetaSys: A Practical Open-source Metadata Management System to Implement and Evaluate Cross-layer Optimizations. ACM Transactions on Architecture and Code Optimization. 19:2. (1-29). Online publication date: 30-Jun-2022.

    https://doi.org/10.1145/3505250

  • Talati N, Ye H, Yang Y, Belayneh L, Chen K, Blaauw D, Mudge T and Dreslinski R. NDMiner. Proceedings of the 49th Annual International Symposium on Computer Architecture. (146-159).

    https://doi.org/10.1145/3470496.3527437

  • Orenes-Vera M, Manocha A, Balkind J, Gao F, Aragón J, Wentzlaff D and Martonosi M. Tiny but mighty. Proceedings of the 49th Annual International Symposium on Computer Architecture. (817-830).

    https://doi.org/10.1145/3470496.3527400

  • Vicarte J, Flanders M, Paccagnella R, Garrett-Grossman G, Morrison A, Fletcher C and Kohlbrenner D. (2022). Augury: Using Data Memory-Dependent Prefetchers to Leak Data at Rest 2022 IEEE Symposium on Security and Privacy (SP). 10.1109/SP46214.2022.9833570. 978-1-6654-1316-9. (1491-1505).

    https://ieeexplore.ieee.org/document/9833570/

  • Jalili M and Erez M. (2022). Reducing Load Latency with Cache Level Prediction 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 10.1109/HPCA53966.2022.00054. 978-1-6654-2027-3. (648-661).

    https://ieeexplore.ieee.org/document/9773263/

  • Wang Q, Zheng L, Yuan J, Huang Y, Yao P, Gui C, Hu A, Liao X and Jin H. (2022). Hardware-Accelerated Hypergraph Processing with Chain-Driven Scheduling 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 10.1109/HPCA53966.2022.00022. 978-1-6654-2027-3. (184-198).

    https://ieeexplore.ieee.org/document/9773270/

  • Jamilan S, Khan T, Ayers G, Kasikci B and Litz H. APT-GET. Proceedings of the Seventeenth European Conference on Computer Systems. (747-764).

    https://doi.org/10.1145/3492321.3519583

  • Talati N, Jin D, Ye H, Brahmakshatriya A, Dasika G, Amarasinghe S, Mudge T, Koutra D and Dreslinski R. (2021). A Deep Dive Into Understanding The Random Walk-Based Temporal Graph Learning 2021 IEEE International Symposium on Workload Characterization (IISWC). 10.1109/IISWC53511.2021.00019. 978-1-6654-4173-5. (87-100).

    https://ieeexplore.ieee.org/document/9668298/

  • Basak A, Qu Z, Lin J, Alameldeen A, Chishti Z, Ding Y and Xie Y. Improving Streaming Graph Processing Performance using Input Knowledge. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. (1036-1050).

    https://doi.org/10.1145/3466752.3480096

  • Yang Y, Emer J and Sanchez D. SpZip. Proceedings of the 48th Annual International Symposium on Computer Architecture. (1069-1082).

    https://doi.org/10.1109/ISCA52012.2021.00087

  • Vicarte J, Shome P, Nayak N, Trippel C, Morrison A, Kohlbrenner D and Fletcher C. Opening pandora's box. Proceedings of the 48th Annual International Symposium on Computer Architecture. (347-360).

    https://doi.org/10.1109/ISCA52012.2021.00035

  • Naithani A, Ainsworth S, Jones T and Eeckhout L. Vector runahead. Proceedings of the 48th Annual International Symposium on Computer Architecture. (195-208).

    https://doi.org/10.1109/ISCA52012.2021.00024

  • Barredo A, Armejach A, Beard J and Moreto M. PLANAR. Proceedings of the 35th ACM International Conference on Supercomputing. (164-176).

    https://doi.org/10.1145/3447818.3460368

  • Balaji V, Crago N, Jaleel A and Lucia B. (2021). P-OPT: Practical Optimal Cache Replacement for Graph Analytics 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 10.1109/HPCA51647.2021.00062. 978-1-6654-2235-2. (668-681).

    https://ieeexplore.ieee.org/document/9407090/

  • Talati N, May K, Behroozi A, Yang Y, Kaszyk K, Vasiladiotis C, Verma T, Li L, Nguyen B, Sun J, Morton J, Ahmadi A, Austin T, O'Boyle M, Mahlke S, Mudge T and Dreslinski R. (2021). Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 10.1109/HPCA51647.2021.00061. 978-1-6654-2235-2. (654-667).

    https://ieeexplore.ieee.org/document/9407222/

  • Zhang Y, Liao X, Jin H, He L, He B, Liu H and Gu L. (2021). DepGraph: A Dependency-Driven Accelerator for Efficient Iterative Graph Processing 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 10.1109/HPCA51647.2021.00039. 978-1-6654-2235-2. (371-384).

    https://ieeexplore.ieee.org/document/9407071/

  • Choi S, Kim J and Kim S. (2021). Adaptive Granularity Based Last-Level Cache Prefetching Method with eDRAM Prefetch Buffer for Graph Processing Applications. Applied Sciences. 10.3390/app11030991. 11:3. (991).

    https://www.mdpi.com/2076-3417/11/3/991

  • Oliveira G, Gomez-Luna J, Orosa L, Ghose S, Vijaykumar N, Fernandez I, Sadrosadati M and Mutlu O. DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks. IEEE Access. 10.1109/ACCESS.2021.3110993. 9. (134457-134502).

    https://ieeexplore.ieee.org/document/9530719/

  • Nguyen Q and Sanchez D. (2020). Pipette: Improving Core Utilization on Irregular Applications through Intra-Core Pipeline Parallelism 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 10.1109/MICRO50266.2020.00056. 978-1-7281-7383-2. (596-608).

    https://ieeexplore.ieee.org/document/9251856/

  • Basak A, Lin J, Lorica R, Xie X, Chishti Z, Alameldeen A and Xie Y. (2020). SAGA-Bench: Software and Hardware Characterization of Streaming Graph Analytics Workloads 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 10.1109/ISPASS48437.2020.00012. 978-1-7281-4798-7. (12-23).

    https://ieeexplore.ieee.org/document/9238598/

  • Faldu P, Diamond J and Grot B. (2020). Domain-Specialized Cache Management for Graph Analytics 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 10.1109/HPCA47549.2020.00028. 978-1-7281-6149-5. (234-248).

    https://ieeexplore.ieee.org/document/9065556/

  • Mukkara A, Beckmann N and Sanchez D. PHI. Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. (1009-1022).

    https://doi.org/10.1145/3352460.3358254

  • Lee E, Kim J, Lim K, Noh S and Seo J. Pre-select static caching and neighborhood ordering for BFS-like algorithms on disk-based graph engines. Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference. (459-473).

    /doi/10.5555/3358807.3358846

  • Liu W, Liu H, Liao X, Jin H and Zhang Y. NGraph: Parallel Graph Processing in Hybrid Memory Systems. IEEE Access. 10.1109/ACCESS.2019.2931058. 7. (103517-103529).

    https://ieeexplore.ieee.org/document/8772041/

  • Zhang D, Ma X, Thomson M and Chiou D. (2018). Minnow. ACM SIGPLAN Notices. 53:2. (593-607). Online publication date: 30-Nov-2018.

    https://doi.org/10.1145/3296957.3173197

  • Ainsworth S and Jones T. (2018). An Event-Triggered Programmable Prefetcher for Irregular Workloads. ACM SIGPLAN Notices. 53:2. (578-592). Online publication date: 30-Nov-2018.

    https://doi.org/10.1145/3296957.3173189

  • Qian C, Childers B, Huang L, Guo H and Wang Z. (2018). CGAcc: A Compressed Sparse Row Representation-Based BFS Graph Traversal Accelerator on Hybrid Memory Cube. Electronics. 10.3390/electronics7110307. 7:11. (307).

    https://www.mdpi.com/2079-9292/7/11/307

  • Mukkara A, Beckmann N, Abeydeera M, Ma X and Sanchez D. Exploiting locality in graph analytics through hardware-accelerated traversal scheduling. Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture. (1-14).

    https://doi.org/10.1109/MICRO.2018.00010

  • Zhou M, Imani M, Gupta S and Rosing T. GAS. Proceedings of the International Symposium on Low Power Electronics and Design. (1-6).

    https://doi.org/10.1145/3218603.3218631

  • Zhang D, Ma X, Thomson M and Chiou D. Minnow. Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. (593-607).

    https://doi.org/10.1145/3173162.3173197

  • Ainsworth S and Jones T. An Event-Triggered Programmable Prefetcher for Irregular Workloads. Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. (578-592).

    https://doi.org/10.1145/3173162.3173189

  • Michelogiannakis G and Shalf J. (2017). Last Level Collective Hardware Prefetching For Data-Parallel Applications 2017 IEEE 24th International Conference on High Performance Computing (HiPC). 10.1109/HiPC.2017.00018. 978-1-5386-2293-3. (72-83).

    http://ieeexplore.ieee.org/document/8287737/

  • Zhang D, Ma X and Chiou D. Worklist-Directed Prefetching. IEEE Computer Architecture Letters. 10.1109/LCA.2016.2627571. 16:2. (170-173).

    http://ieeexplore.ieee.org/document/7740958/

  • Dong Y, Ye C, Liu H, Tang L, Liao X, Jin H, Chen C, Li Y and Wang Y. DTAP: Accelerating Strongly-Typed Programs with Data Type-Aware Hardware Prefetching. ACM Transactions on Architecture and Code Optimization. 0:0.

    https://doi.org/10.1145/3701994