No abstract available.
[Front cover]
Presents the front cover or splash screen of the proceedings record.
[Title page i]
Presents the title page of the proceedings record.
[Title page iii]
Presents the title page of the proceedings record.
Message from the General Co-chairs and Vice-General Co-chairs
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
Message from the Program Chair
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
Message from the Steering Chair
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
HiPC 2015 Committees
Provides a listing of current committee members and society officers.
HiPC 2015 Technical Program
Provides a schedule of conference events and a listing of which papers were presented in each session.
Scale-out Beyond Map-Reduce
Until recently, data was gathered for well-defined objectives such as auditing, forensics, reporting and line-ofbusiness operations; now, exploratory and predictive analysis is becoming ubiquitous, and the default increasingly is to capture and store ...
Which Verification for Soft Error Detection?
Many methods are available to detect silent errors in high-performance computing (HPC) applications. Each comes with a given cost and recall (fraction of all errors that are actually detected). The main contribution of this paper is to characterize the ...
Throughput Regulation in Shared Memory Multicore Processors
Performance scaling is now synonymous with scaling the number of cores. One of the consequences of this shift is the increasing difficulty of designing processors with predictable and controllable performance. To address this challenge this paper ...
Application Taxonomy via Algorithmic Commonality for Domain-Specific Architecture Desgin
In this paper, we propose an approach of application taxonomy from a perspective of algorithmic commonality. The taxonomy exploits algorithm-inherent characterization to imply a categorization of domain-specific architecture in the initial phase of ...
FlexCore: A Reconfigurable Processor Supporting Flexible, Dynamic Morphing
In the realm of desktop and server class processors, the prevailing trend is to use out-of-order superscalar cores that exploit the hidden instruction-level parallelism in a program. In superscalar designs, the performance (as measured by the IPC, ...
High Efficiency Generalized Parallel Counters for Xilinx FPGAs
Generalized Parallel Counters (GPCs) are frequently used in constructing high speed compressor trees. Prior work on GPC synthesis using FPGAs has focused on utilizing the fast carry chain and mapping the logic onto LUTs. This mapping is not optimal in ...
2QW-Clock: An Efficient SSD Buffer Management Algorithm
Modern solid state disk (SSD) has a buffer (SDRAM), which is used to store commonly used data and map in the near future. How to efficient management of this buffer is an important things of improving performance of SSD. Flash read and write speed have ...
Task-Based Multifrontal QR Solver for GPU-Accelerated Multicore Architectures
Recent studies have shown the potential of task-based programming paradigms for implementing robust, scalable sparse direct solvers for modern computing platforms. Yet, designing task flows that efficiently exploit heterogeneous architectures remains ...
Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices
Sparse matrix vector multiplication (SpMV) is an important linear algebra primitive. Recent research has focused on improving the performance of SpMV on GPUs when using compressed sparse row (CSR), the most frequently used matrix storage format on CPUs. ...
On the Resilience of Parallel Sparse Hybrid Solvers
As the computational power of high performance computing (HPC) systems continues to increase by using a huge number of CPU cores or specialized processing units, extreme-scale applications are increasingly prone to faults. Consequently, the HPC ...
New Tridiagonal Systems Solvers on GPU Architectures
Modern GPUs (Graphics Processing Units) offer very high computing power at relatively low cost. Nevertheless, designing efficient algorithms for the GPUs usually requires additional time and effort, even for experienced programmers. On the other hand, ...
A Stable Parallel Algorithm for Diagonally Dominant Tridiagonal Linear Systems
In this work, we present a stable parallel algorithm based on WZ factorization for solving diagonally dominant tridiagonal linear system of algebraic equations, using divide and conquer approach. Existence results are given and the backward error ...
Optimizing Approximate Weighted Matching on Nvidia Kepler K40
Matching is a fundamental graph problem with numerous applications in science and engineering. While algorithms for computing optimal matchings are difficult to parallelize, approximation algorithms on the other hand generally compute high quality ...
Improving Communication Throughput by Multipath Load Balancing on Blue Gene/Q
- Huy Bui,
- Preeti Malakar,
- Venkatram Vishwanath,
- Todd S. Munson,
- Eun-Sung Jung,
- Andrew Johnson,
- Michael E. Papka,
- Jason Leigh
Achievable networking performance of applications in a supercomputer depends on the exact combination of the communication patterns of the applications and the routing algorithms used by the supercomputer. In order to achieve the highest networking ...
Dynamic Adaptation for Elastic System Services Using Virtual Servers
A vast majority of legacy runtime systems and middleware prevalent in cluster and supercomputing environments are static in nature. Due to the rising scale and complexity of high-performance computing systems, the static nature of systems software would ...
Understanding the Performance Benefit of Asynchronous Data Transfers in OpenCL Programs Executing on Media Processors
In this work, we study the performance benefits of using asynchronous data transfers in OpenCL programs executing on media processors. Asynchronous data transfers are typically implemented by use of Direct Memory Access (DMA) engines that can be ...
Hardware-Transactional-Memory Based Speculative Parallel Discrete Event Simulation of Very Fine Grain Models
This article presents an innovative runtime support for speculative parallel processing of discrete event simulation models on multi-core architectures, which exploits Hardware-Transactional-Memory (HTM) facilities for the purpose of state ...
Towards Practical Page Placement for a Green Memory Manager
Increased performance demand of modern applications has resulted in large memory modules and higher performance processors in computing systems. Power consumption becomes an important aspect when these resources go underutilized in a running system, ...
Efficient Barrier Implementation on the POWER8 Processor
POWER8 is a new generation of POWER processor capable of 8-way simultaneous multi-threading per core. High-performance computing capabilities, such as high amount of instruction-level and thread level parallelism, are integrated with a deep memory ...
Compilers and the Furture of High Performance Computing
Compiler technology has enabled the software advances of the last sixty years. It has given us machine-independent programming and improved productivity by automatically handling a number of issues, such as instruction selection and register allocation. ...
On Accelerating Concurrent PCA Computations for Financial Risk Applications
Principal component analysis (PCA) is a widely used mathematical technique for dimensionality reduction that works by identifying a smaller number of linearly uncorrelated variables (principal components) to explain the variation found in a data set. ...