default search action
27th ICS 2013: Eugene, OR, USA
- Allen D. Malony, Mario Nemirovsky, Samuel P. Midkiff:
International Conference on Supercomputing, ICS'13, Eugene, OR, USA - June 10 - 14, 2013. ACM 2013, ISBN 978-1-4503-2130-3
Keynote address
- Bob Blainey:
Business meets supercomputing: keynote talk. 1-2
DSLs and semantic based compilation 1
- Andrew Stone, Michelle Mills Strout:
Abstractions to separate concerns in semi-regular grids. 3-12 - Thomas Henretty, Richard Veras, Franz Franchetti, Louis-Noël Pouchet, J. Ramanujam, P. Sadayappan:
A stencil compiler for short-vector SIMD architectures. 13-24 - Chenyang Liu, Muhammad Hasan Jamal, Milind Kulkarni, Arun Prakash, Vijay S. Pai:
Exploiting domain knowledge to optimize parallel computational mechanics codes. 25-36
Tools and performance debugging
- José-María Arnau, Joan-Manuel Parcerisa, Polychronis Xekalakis:
TEAPOT: a toolset for evaluating performance, power and image quality on mobile graphics systems. 37-46 - Chang-Seo Park, Koushik Sen, Costin Iancu:
Scaling data race detection for partitioned global address space programs. 47-58 - Xing Wu, Frank Mueller:
Elastic and scalable tracing and accurate replay of non-deterministic events. 59-68 - Xu Liu, John M. Mellor-Crummey, Michael W. Fagan:
A new approach for performance analysis of openMP programs. 69-80
Memory and storage
- Kun Fang, Zhichun Zhu:
Conservative row activation to improve memory power efficiency. 81-90 - Sangyeun Cho, Chanik Park, Hyunok Oh, Sungchan Kim, Youngmin Yi, Gregory R. Ganger:
Active disk meets flash: a case for intelligent SSDs. 91-102 - Myoungsoo Jung, John Shalf, Mahmut T. Kandemir:
Design of a large-scale storage-class RRAM system. 103-114 - Ju-Young Jung, Sangyeun Cho:
Memorage: emerging persistent RAM based malleable main memory and storage architecture. 115-126
Keynote address
- Steven L. Teig:
Function, latency, bandwidth, power: towards a better computer. 127-128
Communication and heterogeneous systems
- Michail Alvanos, Montse Farreras, Ettore Tiotto, José Nelson Amaral, Xavier Martorell:
Improving communication in PGAS environments: static and dynamic coalescing in UPC. 129-138 - Bogdan Prisacari, Germán Rodríguez, Cyriel Minkenberg, Torsten Hoefler:
Bandwidth-optimal all-to-all exchanges in fat tree networks. 139-148 - Klaus Kofler, Ivan Grasso, Biagio Cosenza, Thomas Fahringer:
An automatic input-sensitive approach for heterogeneous task partitioning. 149-160 - Ivan Grasso, Simone Pellegrini, Biagio Cosenza, Thomas Fahringer:
LibWater: heterogeneous distributed computing made easy. 161-172
Architecture 1
- Tapasya Patki, David K. Lowenthal, Barry Rountree, Martin Schulz, Bronis R. de Supinski:
Exploring hardware overprovisioning in power-constrained, high performance computing. 173-182 - Ramakrishnan Rajamony, Mark W. Stephenson, William Evan Speight:
The power 775 architecture at scale. 183-192 - Ruisheng Wang, Lizhong Chen, Timothy Mark Pinkston:
Bubble coloring: avoiding routing- and protocol-induced deadlocks with minimal virtual channel requirement. 193-202 - Keith D. Underwood, Eric Borch, John Sizer, Timothy Stremcha, Michael Strom:
Evaluating on-die interconnects for a 4 TB/s router. 203-212
Algorithms
- Matthew Badin, Paolo D'Alberto, Lubomir Bic, Michael B. Dillencourt, Alexandru Nicolau:
Improving numerical accuracy for non-negative matrix multiplication on GPUs using recursive algorithms. 213-222 - Azzam Haidar, Mark Gates, Stanimire Tomov, Jack J. Dongarra:
Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication. 223-232 - Panagiotis A. Foteinos, Nikos Chrisochoides:
High quality real-time image-to-mesh conversion for finite element simulations. 233-242
Architecture 2
- Komal Jothi, Haitham Akkary:
Tuning the continual flow pipeline architecture. 243-252 - Konstantinos Koukos, David Black-Schaffer, Vasileios Spiliopoulos, Stefanos Kaxiras:
Towards more efficient execution: a decoupled access-execute approach. 253-262 - Souad Koliai, Zakaria Bendifallah, Mathieu Tribalat, Cédric Valensi, Jean-Thomas Acquaviva, William Jalby:
Quantifying performance bottleneck cost through differential analysis. 263-272
Irregular algorithms
- Xing Liu, Mikhail Smelyanskiy, Edmond Chow, Pradeep Dubey:
Efficient sparse matrix-vector multiplication on x86-based many-core processors. 273-282 - Nicholas Gerard Edmonds, Jeremiah Willcock, Andrew Lumsdaine:
Expressing graph algorithms using generalized active messages. 283-292 - Hari Sundar, Dhairya Malhotra, George Biros:
HykSort: a new variant of hypercube quicksort on distributed memory architectures. 293-302
Memory
- Gabriel Marin, Collin McCurdy, Jeffrey S. Vetter:
Diagnosis and optimization of application prefetching performance. 303-312 - Changhui Lin, Vijay Nagarajan, Rajiv Gupta:
Address-aware fences. 313-324 - Vassilis Papaefstathiou, Manolis Katevenis, Dimitrios S. Nikolopoulos, Dionisios N. Pnevmatikatos:
Prefetching and cache management using task lifetimes. 325-334
Keynote address
- James E. Smith:
The role of computer designers in reverse-engineering the brain. 335-336
Runtime techniques
- Srinath Sridharan, Gagan Gupta, Gurindar S. Sohi:
Holistic run-time parallelism management for time and energy efficiency. 337-348 - R. Vasudevan, Sathish S. Vadhiyar, Laxmikant V. Kalé:
G-Charm: an adaptive runtime system for message-driven parallel applications on hybrid systems. 349-358 - Javier Bueno, Xavier Martorell, Rosa M. Badia, Eduard Ayguadé, Jesús Labarta:
Implementing OmpSs support for regions of data in architectures with multiple address spaces. 359-368 - Michael O. Lam, Jeffrey K. Hollingsworth, Bronis R. de Supinski, Matthew P. LeGendre:
Automatically adapting programs for mixed-precision floating-point computation. 369-378
Order in the house
- Pablo Prieto, Valentin Puente, José-Ángel Gregorio:
CMP off-chip bandwidth scheduling guided by instruction criticality. 379-388 - Wolfgang Frings, Dong H. Ahn, Matthew P. LeGendre, Todd Gamblin, Bronis R. de Supinski, Felix Wolf:
Massively parallel loading. 389-398 - Khaled Hamidouche, Sreeram Potluri, Hari Subramoni, Krishna Chaitanya Kandalla, Dhabaleswar K. Panda:
MIC-RO: enabling efficient remote offload on heterogeneous many integrated core (MIC) clusters with InfiniBand. 399-408
GPUs
- Xin Huo, Sriram Krishnamoorthy, Gagan Agrawal:
Efficient scheduling of recursive control flow on GPUs. 409-420 - Nabeel AlSaber, Milind Kulkarni:
SemCache: semantics-aware caching for efficient GPU offloading. 421-432 - Ping Xiang, Yi Yang, Mike Mantor, Norm Rubin, Lisa R. Hsu, Huiyang Zhou:
Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement. 433-442 - Amit Sabne, Putt Sakdhnagool, Rudolf Eigenmann:
Scaling large-data computations on multi-GPU accelerators. 443-454
Posters
- Sriram Aananthakrishnan, Greg Bronevetsky, Ganesh Gopalakrishnan:
Hybrid approach for data-flow analysis of MPI programs. 455-456 - Michail Alvanos, Gabriel Tanase, Montse Farreras, Ettore Tiotto, José Nelson Amaral, Xavier Martorell:
Improving performance of all-to-all communication through loop scheduling in PGAS environments. 457-458 - Madhur Amilkanthwar, Shankar Balachandran:
CUPL: a compile-time uncoalesced memory access pattern locator for CUDA. 459-460 - Weiwei Chen, Ewa Deelman, Rizos Sakellariou:
Imbalance optimization in scientific workflows. 461-462 - Catalin Bogdan Ciobanu, Dionisios N. Pnevmatikatos, Kyprianos D. Papadimitriou, Georgi Nedeltchev Gaydadjiev:
FASTER run-time reconfiguration management. 463-464 - Hadrien A. Clarke, Antoine Trouvé, Kazuaki J. Murakami:
MAD7: a memory architecture simulator targeted at design space exploration. 465-466 - Truong Vinh Truong Duy, Taisuke Ozaki:
A decomposition method with minimal communication volume for parallelization of multi-dimensional FFTs. 467-468 - Truong Vinh Truong Duy, Taisuke Ozaki:
A massively parallel domain decomposition method for large-scale DFT electronic structure calculations. 469-470 - Panagiotis A. Foteinos, Daming Feng, Andrey N. Chernikov, Nikos Chrisochoides:
Multi-layered unstructured mesh generation. 471-472 - Justin A. Hogan, Raymond J. Weber, Brock J. LaMeres, Todd Kaiser:
Network-on-chip for a partially reconfigurable FPGA system. 473-474 - Saurabh Jha, Tejaswi Agarwal, B. Rajesh Kanna:
Exploiting data parallelism in the yConvex hypergraph algorithm for image representation using GPGPUs. 475-476 - Tao Jiang, Lele Zhang, Rui Hou, Yi Zhang, Qianlong Zhang, Lin Chai, Jing Han, Wuxiong Zhang, Cong Wang, Lixin Zhang:
The ARMv8 simulator. 477-478 - Erik Keever, James N. Imamura:
Imogen: a parallel 3D fluid and MHD code for GPUs. 479-480 - Min Li, Sushil Mantri, Pin Zhou, Ali Raza Butt:
SMIO: I/O similarity aware virtual machine management invirtual desktop environments. 481-482 - David Ozog, Sameer Shende, Allen D. Malony, Jeff R. Hammond, James Dinan, Pavan Balaji:
Inspector/executor load balancing algorithms for block-sparse tensor contractions. 483-484 - Swaroop Pophale, Tony Curtis, Barbara M. Chapman:
Improving performance of openSHMEM reference library by portable PE mapping technique. 485-486 - Sonish Shrestha:
Using platform-independent data locality analysis to predict cache performance on abstract hardware platforms. 487-488 - Tyler Sorensen, Ganesh Gopalakrishnan, Vinod Grover:
Towards shared memory consistency models for GPUs. 489-490 - Alejandro Valero, Julio Sahuquillo, Salvador Petit, José Duato:
Exploiting reuse information to reduce refresh energy in on-chip eDRAM caches. 491-492 - Cong Wang, Tao Jiang, Rui Hou:
V-OpenCL: a method to use remote GPGPU. 493-494 - Raymond J. Weber, Justin A. Hogan, Brock J. LaMeres, Todd Kaiser:
Power efficiency in a partially reconfigurable multiprocessor system. 495-496
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.