Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2008
369 Tflop/s molecular dynamics simulations on the Roadrunner general-purpose heterogeneous supercomputer
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 64, Pages 1–10We present timing and performance numbers for a short-range parallel molecular dynamics (MD) code, SPaSM, that has been rewritten for the heterogeneous Roadrunner supercomputer. Each Roadrunner compute node consists of two AMD Opteron dualcore ...
- research-articleNovember 2008
0.374 Pflop/s trillion-particle kinetic modeling of laser plasma interaction on Roadrunner
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 63, Pages 1–11We demonstrate the outstanding performance and scalability of the VPIC kinetic plasma modeling code on the heterogeneous IBM Roadrunner supercomputer at Los Alamos National Laboratory. VPIC is a three-dimensional, relativistic, electromagnetic, particle-...
- research-articleNovember 2008
Prefetch throttling and data pinning for improving performance of shared caches
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 59, Pages 1–12In this paper, we (i) quantify the impact of compiler-directed I/O prefetching on shared caches at I/O nodes. The experimental data collected shows that while I/O prefetching brings some benefits, its effectiveness reduces significantly as the number of ...
- research-articleNovember 2008
Global trees: a framework for linked data structures on distributed memory parallel systems
- D. Brian Larkins,
- James Dinan,
- Sriram Krishnamoorthy,
- Srinivasan Parthasarathy,
- Atanas Rountev,
- P. Sadayappan
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 57, Pages 1–13This paper describes the Global Trees (GT) system that provides a multi-layered interface to a global address space view of distributed tree data structures, while providing scalable performance on distributed memory systems. The Global Trees system ...
- research-articleNovember 2008
Server-storage virtualization: integration and load balancing in data centers
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 53, Pages 1–12We describe the design of an agile data center with integrated server and storage virtualization technologies. Such data centers form a key building block for new cloud computing architectures. We also show how to leverage this integrated agility for ...
-
- research-articleNovember 2008
Analysis of application heartbeats: learning structural and temporal features in time series data for identification of performance problems
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 52, Pages 1–12Grids promote new modes of scientific collaboration and discovery by connecting distributed instruments, data and computing facilities. Because many resources are shared, application performance can vary widely and unexpectedly. We describe a novel ...
- research-articleNovember 2008
The cost of doing science on the cloud: the Montage example
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 50, Pages 1–12Utility grids such as the Amazon EC2 cloud and Amazon S3 offer computational and storage resources that can be used on-demand for a fee by compute and data-intensive applications. The cost of running an application on such a cloud depends on the compute,...
- research-articleNovember 2008
Scalable load-balance measurement for SPMD codes
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 46, Pages 1–12Good load balance is crucial on very large parallel systems, but the most sophisticated algorithms introduce dynamic imbalances through adaptation in domain decomposition or use of adaptive solvers. To observe and diagnose imbalance, developers need ...
- research-articleNovember 2008
BitDew: a programmable environment for large-scale data management and distribution
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 45, Pages 1–12Desktop Grids use the computing, network and storage resources from idle desktop PC's distributed over multiple-LAN's or the Internet to compute a large variety of resource-demanding distributed applications. While these applications need to access, ...
- research-articleNovember 2008
Proactive process-level live migration in HPC environments
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 43, Pages 1–12As the number of nodes in high-performance computing environments keeps increasing, faults are becoming common place. Reactive fault tolerance (FT) often does not scale due to massive I/O requirements and relies on manual job resubmission.
This work ...
- research-articleNovember 2008
A dynamic scheduler for balancing HPC applications
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 41, Pages 1–12Load imbalance cause significant performance degradation in High Performance Computing applications. In our previous work we showed that load imbalance can be alleviated by modern MT processors that provide mechanisms for controlling the allocation of ...
- research-articleNovember 2008
PAM: a novel performance/power aware meta-scheduler for multi-core systems
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 39, Pages 1–12Sharing resources such as caches and main memory bandwidth in multi-core systems requires a more sophisticated scheduling scheme. PAM is a low-overhead, user-level meta-scheduler which does not require any hardware or software changes. In particular, it ...
- research-articleNovember 2008
An adaptive cut-off for task parallelism
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 36, Pages 1–11In task parallel languages, an important factor for achieving a good performance is the use of a cut-off technique to reduce the number of tasks created. Using a cut-off to avoid an excessive number of tasks helps the runtime system to reduce the total ...
- research-articleNovember 2008
Massively parallel genomic sequence search on the Blue Gene/P architecture
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 33, Pages 1–11This paper presents our first experiences in mapping and optimizing genomic sequence search onto the massively parallel IBM Blue Gene/P (BG/P) platform. Specifically, we performed our work on mpiBLAST, a parallel sequence-search code that has been ...
- research-articleNovember 2008
High-radix crossbar switches enabled by proximity communication
- Hans Eberle,
- Pedro J. Garcia,
- José Flich,
- José Duato,
- Robert Drost,
- Nils Gura,
- David Hopkins,
- Wladek Olesinski
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 32, Pages 1–12We describe a novel way to implement high-radix crossbar switches. Our work is enabled by a new chip interconnect technology called Proximity Communication (PxC) that offers unparalleled chip IO density. First, we show how a crossbar architecture is ...
- research-articleNovember 2008
Extending CC-NUMA systems to support write update optimizations
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 30, Pages 1–12Processor stalls and protocol messages caused by coherence misses limit the performance of shared memory applications. Modern multiprocessors employ write-invalidate coherence protocols, which induce read misses to ensure consistency. Previous research ...
- research-articleNovember 2008
A novel migration-based NUCA design for chip multiprocessors
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 28, Pages 1–12Chip Multiprocessors (CMFs) and Non-Uniform Cache Architectures (NUCAs) represent two emerging trends in computer architecture. Targeting future CMP based systems with NUCA type L2 caches, this paper proposes a novel data migration algorithm for ...
- research-articleNovember 2008
Applying double auctions for scheduling of workflows on the Grid
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 27, Pages 1–11Grid economy models have long been considered as a promising alternative for the classical Grid resource management, due to their dynamic and decentralized nature, and because the financial valuation of resources and services is inherent in any such ...
- research-articleNovember 2008
SMARTMAP: operating system support for efficient data sharing among processes on a multi-core processor
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 25, Pages 1–12This paper describes SMARTMAP, an operating system technique that implements fixed offset virtual memory addressing. SMARTMAP allows the application processes on a multi-core processor to directly access each other's memory without the overhead of ...
- research-articleNovember 2008
Nimrod/K: towards massively parallel dynamic grid workflows
SC '08: Proceedings of the 2008 ACM/IEEE conference on SupercomputingArticle No.: 24, Pages 1–11A challenge for Grid computing is the difficulty in developing software that is parallel, distributed and highly dynamic. Whilst there have been many general purpose mechanisms developed over the years, Grid programming still remains a low level, error ...