default search action
31st ICS 2017: Chicago, IL, USA
- William D. Gropp, Pete Beckman, Zhiyuan Li, Francisco J. Cazorla:
Proceedings of the International Conference on Supercomputing, ICS 2017, Chicago, IL, USA, June 14-16, 2017. ACM 2017, ISBN 978-1-4503-5020-4
Automata and tree-mining optimization
- Marziyeh Nourian, Xiang Wang, Xiaodong Yu, Wu-chun Feng, Michela Becchi:
Demystifying automata processing: GPUs, FPGAs or Micron's AP? 1:1-1:11 - Junqiao Qiu, Zhijia Zhao, Bo Wu, Abhinav Vishnu, Shuaiwen Leon Song:
Enabling scalability-sensitive speculative parallelization for FSM computations. 2:1-2:10 - Nikhil Hegde, Jianqiao Liu, Milind Kulkarni:
SPIRIT: a framework for creating distributed recursive tree applications. 3:1-3:11 - Elaheh Sadredini, Reza Rahimi, Ke Wang, Kevin Skadron:
Frequent subtree mining on the automata processor: challenges and opportunities. 4:1-4:11
GPUs - part 1
- Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack J. Dongarra:
Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs. 5:1-5:10 - Kyung Hoon Kim, Rahul Boyapati, Jiayi Huang, Yuho Jin, Ki Hwan Yum, Eun Jung Kim:
Packet coalescing exploiting data redundancy in GPGPU architectures. 6:1-6:10 - Andreas Derler, Rhaleb Zayer, Hans-Peter Seidel, Markus Steinberger:
Dynamic scheduling for efficient hierarchical sparse matrix operations on the GPU. 7:1-7:10
Compilation techniques
- Aleksandar Zlateski, H. Sebastian Seung:
Compile-time optimized and statically scheduled N-D convnet primitives for multi-core and many-core (Xeon Phi) CPUs. 8:1-8:10 - Ehsan Totoni, Todd A. Anderson, Tatiana Shpeisman:
HPAT: high performance analytics with scripting ease-of-use. 9:1-9:10 - Diogo Nunes Sampaio, Louis-Noël Pouchet, Fabrice Rastello:
Simplification and runtime resolution of data dependence constraints for loop transformations. 10:1-10:11 - Suyash Gupta, Rahul Shrivastava, V. Krishna Nandivada:
Optimizing recursive task parallel programs. 11:1-11:11
GPUs - part 2
- Kaixi Hou, Weifeng Liu, Hao Wang, Wu-chun Feng:
Fast segmented sort on GPUs. 12:1-12:10 - Markus Steinberger, Rhaleb Zayer, Hans-Peter Seidel:
Globally homogeneous, locally adaptive sparse matrix-vector multiplication on the GPU. 13:1-13:11 - Rakshith Kunchum, Ankur Chaudhry, Aravind Sukumaran-Rajam, Qingpeng Niu, Israt Nisa, P. Sadayappan:
On improving performance of sparse matrix-matrix multiplication on GPUs. 14:1-14:11 - Keren Zhou, Guangming Tan, Xiuxia Zhang, Chaowei Wang, Ninghui Sun:
A performance analysis framework for exploiting GPU microarchitectural capability. 15:1-15:10
Application load imbalance, task and data mapping
- Jiawen Sun, Hans Vandierendonck, Dimitrios S. Nikolopoulos:
GraphGrind: addressing load imbalance of graph partitioning. 16:1-16:10 - Juan J. Galvez, Nikhil Jain, Laxmikant V. Kalé:
Automatic topology mapping of diverse large-scale parallel applications. 17:1-17:10 - Seongdae Yu, Seongbeom Park, Woongki Baek:
Design and implementation of bandwidth-aware memory placement and migration policies for heterogeneous memory systems. 18:1-18:10
Hardware design
- Xi-Yue Xiang, Wentao Shi, Saugata Ghose, Lu Peng, Onur Mutlu, Nian-Feng Tzeng:
Carpool: a bufferless on-chip network supporting adaptive multicast and hotspot alleviation. 19:1-19:11 - J. Rubén Titos Gil, Antonio Flores, Ricardo Fernández Pascual, Alberto Ros, Manuel E. Acacio:
Way-combining directory: an adaptive and scalable low-cost coherence directory. 20:1-20:10
Runtimes and algorithms for parallel-application performance and reliability support
- Sicong Zhuang, Marc Casas:
Iteration-fusing conjugate gradient. 21:1-21:10 - Antonio J. Peña, Vicenç Beltran, Carsten Clauss, Thomas Moschny:
Supporting automatic recovery in offloaded distributed programming models through MPI-3 techniques. 22:1-22:10 - Aurangzeb, Rudolf Eigenmann:
HiPA: history-based piecewise approximation for functions. 23:1-23:10
Data aggregation and hardware/software co-design approaches
- Peng Jiang, Gagan Agrawal:
Efficient SIMD and MIMD parallelization of hash-based aggregation by conflict mitigation. 24:1-24:11 - Joao P. L. de Carvalho, Guido Araujo, Alexandro Baldassin:
Revisiting phased transactional memory. 25:1-25:10 - Haikun Liu, Yujie Chen, Xiaofei Liao, Hai Jin, Bingsheng He, Long Zheng, Rentong Guo:
Hardware/software cooperative caching for hybrid DRAM/NVM memory architectures. 26:1-26:10 - Xuanhua Shi, Ming Li, Wei Liu, Hai Jin, Chen Yu, Yong Chen:
SSDUP: a traffic-aware ssd burst buffer for HPC systems. 27:1-27:10 - Cristobal Ortega, Miquel Moretó, Marc Casas, Ramon Bertran, Alper Buyuktosunoglu, Alexandre E. Eichenberger, Pradip Bose:
libPRISM: an intelligent adaptation of prefetch and SMT levels. 28:1-28:10
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.