default search action
ACM Transactions on Architecture and Code Optimization, Volume 15
Volume 15, Number 1, April 2018
- Hochan Lee, Mansureh S. Moghaddam, Dongkwan Suh, Bernhard Egger
:
Improving Energy Efficiency of Coarse-Grain Reconfigurable Arrays Through Modulo Schedule Compression/Decompression. 1:1-1:26 - Karthik Sangaiah, Michael Lui, Radhika Jagtap, Stephan Diestelhorst, Siddharth Nilakantan, Ankit More, Baris Taskin, Mark Hempstead:
SynchroTrace: Synchronization-Aware Architecture-Agnostic Traces for Lightweight Multicore Simulation of CMP and HPC Workloads. 2:1-2:26 - Long Zheng
, Xiaofei Liao, Hai Jin:
Efficient and Scalable Graph Parallel Processing With Symbolic Execution. 3:1-3:25 - Jae-Eon Jo
, Gyu-hyeon Lee, Hanhwi Jang, Jaewon Lee, Mohammadamin Ajdari, Jangwoo Kim:
DiagSim: Systematically Diagnosing Simulators for Healthy Simulations. 4:1-4:27 - Sushant Kondguli, Michael C. Huang
:
A Case for a More Effective, Power-Efficient Turbo Boosting. 5:1-5:22 - Kuan-Chung Chen
, Chung-Ho Chen:
Enabling SIMT Execution Model on Homogeneous Multi-Core System. 6:1-6:26 - Mingzhe Zhang
, King Tin Lam, Xin Yao, Cho-Li Wang:
SIMPO: A Scalable In-Memory Persistent Object Framework Using NVRAM for Reliable Big Data Computing. 7:1-7:28 - Bobin Deng, Sriseshan Srikanth, Eric R. Hein, Thomas M. Conte
, Erik DeBenedictis, Jeanine E. Cook, Michael P. Frank:
Extending Moore's Law via Computationally Error-Tolerant Computing. 8:1-8:27 - Dave Dice, Maurice Herlihy, Alex Kogan:
Improving Parallelism in Hardware Transactional Memory. 9:1-9:24 - Namhyung Kim, Junwhan Ahn
, Kiyoung Choi, Daniel Sánchez, Donghoon Yoo, Soojung Ryu:
Benzene: An Energy-Efficient Distributed Hybrid Cache Architecture for Manycore Systems. 10:1-10:23 - Yulong Ao, Chao Yang, Fangfang Liu, Wanwang Yin, Lijuan Jiang, Qiao Sun:
Performance Optimization of the HPCG Benchmark on the Sunway TaihuLight Supercomputer. 11:1-11:20 - Saeed Rashidi, Majid Jalili, Hamid Sarbazi-Azad:
Improving MLC PCM Performance through Relaxed Write and Read for Intermediate Resistance Levels. 12:1-12:31 - Wenlai Zhao, Haohuan Fu, Jiarui Fang
, Weijie Zheng
, Lin Gan, Guangwen Yang:
Optimizing Convolutional Neural Networks on the Sunway TaihuLight Supercomputer. 13:1-13:26 - Dimitrios Mbakoyiannis, Othon Tomoutzoglou
, George Kornaros
:
Energy-Performance Considerations for Data Offloading to FPGA-Based Accelerators Over PCIe. 14:1-14:24 - Zhen Lin
, Michael Mantor, Huiyang Zhou
:
GPU Performance vs. Thread-Level Parallelism: Scalability Analysis and a Novel Way to Improve TLP. 15:1-15:21 - Oleksandr Zinenko
, Stéphane Huot, Cédric Bastoul:
Visual Program Manipulation in the Polyhedral Model. 16:1-16:25
Volume 15, Number 2, June 2018
- Mustafa M. Shihab, Jie Zhang, Myoungsoo Jung, Mahmut T. Kandemir:
ReveNAND: A Fast-Drift-Aware Resilient 3D NAND Flash Design. 17:1-17:26 - Seyed Majid Zahedi, Songchun Fan, Benjamin C. Lee:
Managing Heterogeneous Datacenters with Tokens. 18:1-18:23 - Miquel Pericàs
:
Elastic Places: An Adaptive Resource Manager for Scalable and Portable Performance. 19:1-19:26 - Matthew Benjamin Olson
, Joseph T. Teague, Divyani Rao, Michael R. Jantz
, Kshitij A. Doshi, Prasad A. Kulkarni:
Cross-Layer Memory Management to Improve DRAM Energy Efficiency. 20:1-20:27 - Davide Zoni
, Luca Colombo, William Fornaciari
:
DarkCache: Energy-Performance Optimization of Tiled Multi-Cores by Adaptively Power-Gating LLC Banks. 21:1-21:26 - Yang Zhang
, Dan Feng, Wei Tong, Yu Hua, Jingning Liu, Zhipeng Tan, Chengning Wang
, Bing Wu, Zheng Li
, Gaoxiang Xu:
CACF: A Novel Circuit Architecture Co-optimization Framework for Improving Performance, Reliability and Energy of ReRAM-based Main Memory System. 22:1-22:26 - Nicolai Stawinoga
, Tony Field:
Predictable Thread Coarsening. 23:1-23:26 - Probir Roy
, Shuaiwen Leon Song, Sriram Krishnamoorthy, Abhinav Vishnu, Dipanjan Sengupta, Xu Liu:
NUMA-Caffe: NUMA-Aware Deep Learning Neural Networks. 24:1-24:26 - Ahsen Ejaz
, Vassilios Papaefstathiou, Ioannis Sourdis:
DDRNoC: Dual Data-Rate Network-on-Chip. 25:1-25:24 - Ying Cai, Yulong Ao, Chao Yang
, Wenjing Ma, Haitao Zhao:
Extreme-Scale High-Order WENO Simulations of 3-D Detonation Wave with 10 Million Cores. 26:1-26:21
Volume 15, Number 3, October 2018
- Yannis Sfakianakis
, Christos Kozanitis, Christos Kozyrakis, Angelos Bilas
:
QuMan: Profile-based Improvement of Cluster Utilization. 27:1-27:25 - Engin Kayraklioglu, Michael P. Ferguson, Tarek A. El-Ghazawi:
LAPPS: Locality-Aware Productive Prefetching Support for PGAS. 28:1-28:26 - Akrem Benatia, Weixing Ji, Yizhuo Wang, Feng Shi:
BestSF: A Sparse Meta-Format for Optimizing SpMV on GPU. 29:1-29:27 - Pierre Michaud
:
An Alternative TAGE-like Conditional Branch Predictor. 30:1-30:23 - James Garland
, David Gregg:
Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing. 31:1-31:24 - Hyojong Kim, Ramyad Hadidi
, Lifeng Nai
, Hyesoon Kim, Nuwan Jayasena, Yasuko Eckert, Onur Kayiran, Gabriel H. Loh:
CODA: Enabling Co-location of Computation and Data for Multiple GPU Systems. 32:1-32:23 - Madhavan Manivannan
, Miquel Pericàs, Vassilis Papaefstathiou, Per Stenström:
Global Dead-Block Management for Task-Parallel Programs. 33:1-33:25 - Roman Gareev, Tobias Grosser
, Michael Kruse
:
High-Performance Generalized Tensor Operations: A Compiler-Oriented Approach. 34:1-34:27 - Hervé Yviquel, Lauro Cruz, Guido Araujo:
Cluster Programming using the OpenMP Accelerator Model. 35:1-35:23 - Mohammad Khavari Tavana
, Amir Kavyan Ziabari
, David R. Kaeli
:
Block Cooperation: Advancing Lifetime of Resistive Memories by Increasing Utilization of Error Correcting Codes. 36:1-36:26 - Hai Jin, Bo Liu
, Wenbin Jiang, Yang Ma
, Xuanhua Shi, Bingsheng He
, Shaofeng Zhao:
Layer-Centric Memory Reuse and Data Migration for Extreme-Scale Deep Learning on Many-Core Architectures. 37:1-37:26 - Dani Voitsechov, Arslan Zulfiqar, Mark Stephenson, Mark Gebhart, Stephen W. Keckler:
Software-Directed Techniques for Improved GPU Register File Utilization. 38:1-38:23 - Huanxin Lin
, Cho-Li Wang, Hongyuan Liu
:
On-GPU Thread-Data Remapping for Branch Divergence Reduction. 39:1-39:24
Volume 15, Number 4, January 2019
- Stefan Kronawitter
, Christian Lengauer:
Polyhedral Search Space Exploration in the ExaStencils Code Generator. 40:1-40:25 - Jingheng Xu
, Haohuan Fu, Wen Shi, Lin Gan, Yuxuan Li, Wayne Luk, Guangwen Yang:
Performance Tuning and Analysis for Stencil-Based Applications on POWER8 Processor. 41:1-41:25 - Jiajun Wang
, Reena Panda, Lizy K. John:
SelSMaP: A Selective Stride Masking Prefetching Scheme. 42:1-42:21 - Xing Su
, Xiangke Liao, Hao Jiang, Canqun Yang, Jingling Xue
:
SCP: Shared Cache Partitioning for High-Performance GEMM. 43:1-43:21 - Fernando Magno Quintão Pereira
, Guilherme V. Leobas, Abdoulaye Gamatié:
Static Prediction of Silent Stores. 44:1-44:26 - Neal Clayton Crago, Mark Stephenson, Stephen W. Keckler:
Exposing Memory Access Patterns to Improve Instruction and Memory Efficiency in GPUs. 45:1-45:23 - Feng Zhang, Jingling Xue
:
Poker: Permutation-Based SIMD Execution of Intensive Tree Search by Path Encoding. 46:1-46:28 - Nicolas Belleville
, Damien Couroussé
, Karine Heydemann, Henri-Pierre Charles
:
Automated Software Protection for the Masses Against Side-Channel Attacks. 47:1-47:27 - Chao Yu
, Yuebin Bai, Qingxiao Sun, Hailong Yang:
Improving Thread-level Parallelism in GPUs Through Expanding Register File to Scratchpad Memory. 48:1-48:24 - Lois Orosa
, Rodolfo Azevedo, Onur Mutlu:
AVPP: Address-first Value-next Predictor with Value Prefetching for Improving the Efficiency of Load Value Prediction. 49:1-49:30 - Jun Zhang, Rui Hou, Wei Song
, Sally A. McKee, Zhen Jia, Chen Zheng, Mingyu Chen, Lixin Zhang, Dan Meng:
RAGuard: An Efficient and User-Transparent Hardware Mechanism against ROP Attacks. 50:1-50:21 - Ping Wang, Luke McHale, Paul V. Gratz
, Alex Sprintson:
GenMatcher: A Generic Clustering-Based Arbitrary Matching Framework. 51:1-51:22 - Ding-Yong Hong
, Jan-Jan Wu, Yu-Ping Liu, Sheng-Yu Fu, Wei-Chung Hsu:
Processor-Tracing Guided Region Formation in Dynamic Binary Translation. 52:1-52:25 - Yu Wang, Victor Lee, Gu-Yeon Wei, David M. Brooks:
Predicting New Workload or CPU Performance by Analyzing Public Datasets. 53:1-53:21 - Hyukwoo Park
, SungKook Kim, Jung-Geun Park, Soo-Mook Moon:
Reusing the Optimized Code for JavaScript Ahead-of-Time Compilation. 54:1-54:20 - Han Zhao, Quan Chen
, Yuxian Qiu, Ming Wu, Yao Shen, Jingwen Leng, Chao Li, Minyi Guo:
Bandwidth and Locality Aware Task-stealing for Manycore Architectures with Bandwidth-Asymmetric Memory. 55:1-55:26 - Stefan Ganser, Armin Größlinger, Norbert Siegmund, Sven Apel
, Christian Lengauer:
Speeding up Iterative Polyhedral Schedule Optimization with Surrogate Performance Models. 56:1-56:27 - Song Wu, Fang Zhou, Xiang Gao, Hai Jin, Jinglei Ren:
Dual-Page Checkpointing: An Architectural Approach to Efficient Data Persistence for In-Memory Applications. 57:1-57:27 - Mohsen Kiani, Amir Rajabzadeh
:
Efficient Cache Performance Modeling in GPUs Using Reuse Distance Analysis. 58:1-58:24 - Thomas Debrunner
, Sajad Saeedi
, Paul H. J. Kelly
:
AUKE: Automatic Kernel Code Generation for an Analogue SIMD Focal-Plane Sensor-Processor Array. 59:1-59:26 - You Zhou, Fei Wu, Zhonghai Lu
, Xubin He, Ping Huang, Changsheng Xie:
SCORE: A Novel Scheme to Efficiently Cache Overlong ECCs in NAND Flash Memory. 60:1-60:25 - Francisco J. Andújar
, Salvador Coll
, Marina Alonso, Pedro López, Juan-Miguel Martínez:
POWAR: Power-Aware Routing in HPC Networks with On/Off Links. 61:1-61:22 - Rahim Mammadli
, Felix Wolf, Ali Jannesari
:
The Art of Getting Deep Neural Networks in Shape. 62:1-62:21 - Stavros Tzilis, Pedro Trancoso
, Ioannis Sourdis:
Energy-Efficient Runtime Management of Heterogeneous Multicores using Online Projection. 63:1-63:26 - Matthew Kay Fei Lee, Yingnan Cui, Thannirmalai Somu, Tao Luo
, Jun Zhou, Wai Teng Tang
, Weng-Fai Wong
, Rick Siow Mong Goh:
A System-Level Simulator for RRAM-Based Neuromorphic Computing Chips. 64:1-64:24 - Evangelos Vasilakis, Vassilis Papaefstathiou, Pedro Trancoso
, Ioannis Sourdis:
Decoupled Fused Cache: Fusing a Decoupled LLC with a DRAM Cache. 65:1-65:23 - Peter Pirkelbauer
, Amalee Wilson, Christina L. Peterson
, Damian Dechev:
Blaze-Tasks: A Framework for Computing Parallel Reductions over Tasks. 66:1-66:25 - Yukinori Sato
, Tomoya Yuki, Toshio Endo:
An Autotuning Framework for Scalable Execution of Tiled Code via Iterative Polyhedral Compilation. 67:1-67:23 - S. Kazem Shekofteh
, Hamid Noori, Mahmoud Naghibzadeh, Hadi Sadoghi Yazdi, Holger Fröning:
Metric Selection for GPU Kernel Classification. 68:1-68:27 - Angelos Bilas
:
List of 2018 Distinguished Reviewers ACM TACO. 69:1
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.