SPCS: Vol 26, No 3-4

Volume 26, Issue 3-4June 2011

Volume 26, Issue 3-4

June 2011

Publisher:

Springer-Verlag
Berlin, Heidelberg

ISSN:1865-2034

Tags:

Bibliometrics

Select All

Export Citations Save to Binder

article

ISC'11 research paper sessions proceedings

Pages 143–144https://doi.org/10.1007/s00450-011-0174-0

article

Astrophysical particle simulations with large custom GPU clusters on three continents

Pages 145–151https://doi.org/10.1007/s00450-011-0173-1

We present direct astrophysical N-body simulations with up to six million bodies using our parallel MPI-CUDA code on large GPU clusters in Beijing, Berkeley, and Heidelberg, with different kinds of GPU hardware. The clusters are linked in the ...

article

Optimized HPL for AMD GPU and multi-core CPU usage

Pages 153–164https://doi.org/10.1007/s00450-011-0161-5

The installation of the LOEWE-CSC ( http://csc.uni-frankfurt.de/csc/__ __51 ) supercomputer at the Goethe University in Frankfurt lead to the development of a Linpack which can fully utilize the installed AMD Cypress GPUs. At its core, a fast DGEMM for ...

article

Simulation of bevel gear cutting with GPGPUs--performance and productivity

Pages 165–174https://doi.org/10.1007/s00450-011-0158-0

The desire for general purpose computation on graphics processing units caused the advance of new programming paradigms, e.g. OpenCL C/C++, CUDA C or the PGI Accelerator Model. In this paper, we apply these programming approaches to the software ...

article

Predictive analysis of a hydrodynamics application on large-scale CMP clusters

Pages 175–185https://doi.org/10.1007/s00450-011-0164-2

We present the development of a predictive performance model for the high-performance computing code Hydra, a hydrodynamics benchmark developed and maintained by the United Kingdom Atomic Weapons Establishment (AWE). The developed model elucidates the ...

article

Shared-memory, distributed-memory, and mixed-mode parallelisation of a CFD simulation code

Pages 187–195https://doi.org/10.1007/s00450-011-0162-4

This paper presents some different approaches to the parallelisation of a harmonic balance Navier-Stokes solver for unsteady aerodynamics. Such simulation codes can require very large amounts of computational resource for realistic simulations, and ...

article

Wavelet-based adaptive multi-resolution solver on heterogeneous parallel architecture for computational fluid dynamics

Pages 197–203https://doi.org/10.1007/s00450-011-0167-z

For the efficient simulation of fluid flows governed by a wide range of scales a wavelet-based adaptive multi-resolution solver on heterogeneous parallel architectures is proposed for computational fluid dynamics. Both data- and task-based parallelisms ...

article

Automatic code generation and tuning for stencil kernels on modern shared memory architectures

Pages 205–210https://doi.org/10.1007/s00450-011-0160-6

In this paper, we present Patus, a code generation and auto-tuning framework for stencil computations targeted at multi- and manycore processors, such as multicore CPUs and graphics processing units. Patus, which stands for " P arallel A uto tu ned S ...

article

Designing and dynamically load balancing hybrid LU for multi/many-core

Pages 211–220https://doi.org/10.1007/s00450-011-0169-x

Designing high-performance LU factorization for modern hybrid multi/many-core systems requires highly-tuned BLAS subroutines, hiding communication latency and balancing the load across devices of variable processing capabilities. In this paper we show ...

article

Scalable parallel AMG on ccNUMA machines with OpenMP

Pages 221–228https://doi.org/10.1007/s00450-011-0159-z

In many numerical simulation codes the backbone of the application covers the solution of linear systems of equations. Often, being created via a discretization of differential equations, the corresponding matrices are very sparse. One popular way to ...

article

Unbalanced tree search on a manycore system using the GPI programming model

Pages 229–236https://doi.org/10.1007/s00450-011-0163-3

The recent developments in computer architectures progress towards systems with large core count (Manycore) which expose more parallelism to applications. Some applications named irregular and unbalanced applications demand a dynamic and asynchronous ...

article

High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters: a study with parallel 3D FFT

Pages 237–246https://doi.org/10.1007/s00450-011-0170-4

Three-dimensional FFT is an important component of many scientific computing applications ranging from fluid dynamics, to astrophysics and molecular dynamics. P3DFFT is a widely used three-dimensional FFT package. It uses the Message Passing Interface (...

article

Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems

Pages 247–256https://doi.org/10.1007/s00450-011-0168-y

For parallel applications running on high-end computing systems, which processes of an application get launched on which processing cores is typically determined at application launch time without any information about the application characteristics. ...

article

MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters

Pages 257–266https://doi.org/10.1007/s00450-011-0171-3

Data parallel architectures, such as General Purpose Graphics Units (GPGPUs) have seen a tremendous rise in their application for High End Computing. However, data movement in and out of GPGPUs remain the biggest hurdle to overall performance and ...

article

The development of Mellanox/NVIDIA GPUDirect over InfiniBand--a new model for GPU to GPU communications

Pages 267–273https://doi.org/10.1007/s00450-011-0157-1

The usage and adoption of General Purpose GPUs (GPGPU) in HPC systems is increasing due to the unparalleled performance advantage of the GPUs and the ability to fulfill the ever-increasing demands for floating points operations. While the GPU can ...

article

A system level view of Petascale I/O on IBM Blue Gene/P

Pages 275–283https://doi.org/10.1007/s00450-011-0154-4

Petascale supercomputers rely on highly efficient Petascale I/O subsystems. This work describes the tuning and scaling behavior of the GPFS parallel file system on JUGENE, the largest IBM Blue Gene/P installation worldwide and the first PetaFlop/s HPC ...

article

Baler: deterministic, lossless log message clustering tool

Pages 285–295https://doi.org/10.1007/s00450-011-0155-3

The rate of failures in HPC systems continues to increase as the number of components comprising the systems increases. System logs are one of the valuable information sources that can be used to analyze system failures and their root causes. However, ...

article

Fault oblivious high performance computing with dynamic task replication and substitution

Pages 297–305https://doi.org/10.1007/s00450-011-0156-2

Traditional parallel programming techniques will suffer rapid deterioration of performance scaling with growing platform size, as the work of coping with increasingly frequent failures dominates over useful computation. To address this challenge, we ...

article

Ultra low latency market data feed on IBM PowerENTM

Pages 307–315https://doi.org/10.1007/s00450-011-0166-0

Financial Market IT solutions increasingly depend on ultra low latency message processing and target microseconds latencies in order to provide traders with a competitive advantages over their peers. Some solutions are available on the market, ranging ...

article

A system architecture supporting high-performance and cloud computing in an academic consortium environment

Pages 317–324https://doi.org/10.1007/s00450-011-0172-2

The University of Colorado (CU) and the National Center for Atmospheric Research (NCAR) have been deploying complimentary and federated resources supporting computational science in the Western United States since 2004. This activity has expanded to ...

article

Experiments with the Fresh Breeze tree-based memory model

Pages 325–337https://doi.org/10.1007/s00450-011-0165-1

The Fresh Breeze memory model and system architecture is proposed as an approach to achieving significant improvements in massively parallel computation by supporting fine-grain management of memory and processing resources and utilizing a global shared ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Computer Science - Research and Development

Sections

ISC'11 research paper sessions proceedings

Astrophysical particle simulations with large custom GPU clusters on three continents

Optimized HPL for AMD GPU and multi-core CPU usage

Simulation of bevel gear cutting with GPGPUs--performance and productivity

Predictive analysis of a hydrodynamics application on large-scale CMP clusters

Shared-memory, distributed-memory, and mixed-mode parallelisation of a CFD simulation code

Wavelet-based adaptive multi-resolution solver on heterogeneous parallel architecture for computational fluid dynamics

Automatic code generation and tuning for stencil kernels on modern shared memory architectures

Designing and dynamically load balancing hybrid LU for multi/many-core

Scalable parallel AMG on ccNUMA machines with OpenMP

Unbalanced tree search on a manycore system using the GPI programming model

High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters: a study with parallel 3D FFT

Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems

MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters

The development of Mellanox/NVIDIA GPUDirect over InfiniBand--a new model for GPU to GPU communications

A system level view of Petascale I/O on IBM Blue Gene/P

Baler: deterministic, lossless log message clustering tool

Fault oblivious high performance computing with dynamic task replication and substitution

Ultra low latency market data feed on IBM PowerENTM

A system architecture supporting high-performance and cloud computing in an academic consortium environment

Experiments with the Fresh Breeze tree-based memory model