Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming

PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming

March 2006

2006 Proceeding

General Chair:
Josep Torrellas
University of Illinois
,
Program Chair:
Siddhartha Chatterjee
IBM Research

Publisher:

Association for Computing Machinery
New York
NY
United States

Conference:

PPoPP06: ACM SIGPLAN 2006 Symposium on Principles and Practice of Parallel Programming 2006 New York New York USA March 29 - 31, 2006

ISBN:

978-1-59593-189-4

Published:

29 March 2006

Sponsors:

SIGPLAN, ACM

Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Bibliometrics

Abstract

I welcome you all to New York City, to the 2006 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'06). The conference is being held at Columbia University, which has graciously allowed the conference to use its facilities. In addition, we are excited to have the conference co-located with the 4^th International Symposium on Code Generation and Optimization (CGO-4). We hope to leverage the synergies between the two conference themes.One important change is that, starting this year, PPoPP will be held annually. It is widely expected that the upcoming wide availability of multi-threaded and multi-core processors will drive major advances in parallel programming. The PPoPP Steering Committee and the Organizing Committee feel that PPoPP is a forum that is uniquely positioned to capture the exciting new ideas that will flourish in this area. A yearly conference will fulfill these expectations better.At the conference, I am looking forward to exciting discussions with my colleagues on cutting-edge research on parallel programming. In addition, I am looking forward to all the amenities that New York City provides. In particular, our Local Arrangements Co-Chair, Calin Cascaval, has organized a dinner and theater evening in the Theater District. This is something you will not want to miss.

Proceeding Downloads

PDF(title page, copyright, welcome, contents, committees, sponsors, reviewers)

PDF(author index)

Select All

Export Citations Save to Binder

Article

Parallel programming and code selection in fortress

Guy L. Steele

Page 1https://doi.org/10.1145/1122971.1122972

As part of the DARPA program for High Productivity Computing Systems, the Programming Language Research Group at Sun Microsystems Laboratories is developing Fortress, a language intended to support large-scale scientific computation with the same level ...

SESSION: Communication

Article

Collective communication on architectures that support simultaneous communication over multiple links

Pages 2–11https://doi.org/10.1145/1122971.1122975

Traditional collective communication algorithms are designed with the assumption that a node can communicate with only one other node at a time. On new parallel architectures such as the IBM Blue Gene/L, a node can communicate with multiple nodes ...

Article

Performance evaluation of adaptive MPI

Pages 12–21https://doi.org/10.1145/1122971.1122976

Processor virtualization via migratable objects is a powerful technique that enables the runtime system to carry out intelligent adaptive optimizations like dynamic resource management. Charm++ is an early language/system that supports migratable ...

Article

Mobile MPI programs in computational grids

Pages 22–31https://doi.org/10.1145/1122971.1122977

Utility computing is becoming a popular way of exploiting the potential of computational grids. In utility computing, users are provided with computational power in a transparent manner similar to the way in which electrical utilities supply power to ...

Article

RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits

Pages 32–39https://doi.org/10.1145/1122971.1122978

Message Passing Interface (MPI) is a popular parallel programming model for scientific applications. Most high-performance MPI implementations use Rendezvous Protocol for efficient transfer of large messages. This protocol can be designed using either ...

SESSION: Languages

Article

Global-view abstractions for user-defined reductions and scans

Pages 40–47https://doi.org/10.1145/1122971.1122980

Since APL, reductions and scans have been recognized as powerful programming concepts. Abstracting an accumulation loop (reduction) and an update loop (scan), the concepts have efficient parallel implementations based on the parallel prefix algorithm. ...

Article

Programming for parallelism and locality with hierarchically tiled arrays

Pages 48–57https://doi.org/10.1145/1122971.1122981

Tiling has proven to be an effective mechanism to develop high performance implementations of algorithms. Tiling can be used to organize computations so that communication costs in parallel programs are reduced and locality in sequential codes or ...

Article

Parallel programming in modern web search engines

Raymie Stata

Page 58https://doi.org/10.1145/1122971.1122973

When a Search Engine responds to your query, thousands of machines from around the world have cooperated to produce your result. With a global reach of hundreds-of-millions of users, Search Engines are arguably the most commonly used massively-parallel ...

SESSION: Performance characterization

Article

Performance characterization of molecular dynamics techniques for biomolecular simulations

Pages 59–68https://doi.org/10.1145/1122971.1122983

Large-scale simulations and computational modeling using molecular dynamics (MD) continues to make significant impacts in the field of biology. It is well known that simulations of biological events at native time and length scales requires computing ...

Article

On-line automated performance diagnosis on thousands of processes

Pages 69–80https://doi.org/10.1145/1122971.1122984

Performance analysis tools are critical for the effective use of large parallel computing resources, but existing tools have failed to address three problems that limit their scalability: (1) management and processing of the volume of performance data ...

Article

A case study in top-down performance estimation for a large-scale parallel application

Pages 81–89https://doi.org/10.1145/1122971.1122985

This work presents a general methodology for estimating the performance of an HPC workload when running on a future hardware architecture. Further, it demonstrates the methodology by estimating the performance of a significant scientific application -- ...

SESSION: Shared memory parallelism

Article

Hardware profile-guided automatic page placement for ccNUMA systems

Pages 90–99https://doi.org/10.1145/1122971.1122987

Cache coherent non-uniform memory architectures (ccNUMA) constitute an important class of high-performance computing plat-forms. Contemporary ccNUMA systems, such as the SGI Altix, have a large number of nodes, where each node consists of a small number ...

Article

Adaptive scheduling with parallelism feedback

Pages 100–109https://doi.org/10.1145/1122971.1122988

Multiprocessor scheduling in a shared multiprogramming environment is often structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level task scheduler schedules the work of a job on the allotted ...

Article

Predicting bounds on queuing delay for batch-scheduled parallel machines

Pages 110–118https://doi.org/10.1145/1122971.1122989

Most space-sharing parallel computers presently operated by high-performance computing centers use batch-queuing systems to manage processor allocation. In many cases, users wishing to use these batch-queued resources have accounts at multiple sites and ...

Article

Optimizing irregular shared-memory applications for distributed-memory systems

Pages 119–128https://doi.org/10.1145/1122971.1122990

In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming to distributed-memory platforms by automatic translation of OpenMP programs to MPI. In the case of irregular applications, the performance of this ...

SESSION: Atomicity issues

Article

Proving correctness of highly-concurrent linearisable objects

Pages 129–136https://doi.org/10.1145/1122971.1122992

We study a family of implementations for linked lists using fine-grain synchronisation. This approach enables greater concurrency, but correctness is a greater challenge than for classical, coarse-grain synchronisation. Our examples are demonstrative of ...

Article

Accurate and efficient runtime detection of atomicity errors in concurrent programs

Pages 137–146https://doi.org/10.1145/1122971.1122993

Atomicity is an important correctness condition for concurrent systems. Informally, atomicity is the property that every concurrent execution of a set of transactions is equivalent to some serial execution of the same transactions. In multi-threaded ...

Article

Scalable synchronous queues

Pages 147–156https://doi.org/10.1145/1122971.1122994

We present two new nonblocking and contention-free implementations of synchronous queues ,concurrent transfer channels in which producers wait for consumers just as consumers wait for producers. Our implementations extend our previous work in dual ...

PANEL SESSION: Software issues for multicore systems

section

Session details: Software issues for multicore systems

https://doi.org/10.1145/3244507

SESSION: Multicore software

Article

POSH: a TLS compiler that exploits program structure

Pages 158–167https://doi.org/10.1145/1122971.1122997

As multi-core architectures with Thread-Level Speculation (TLS) are becoming better understood, it is important to focus on TLS compilation. TLS compilers are interesting in that, while they do not need to fully prove the independence of concurrent ...

Article

High-performance IPv6 forwarding algorithm for multi-core and multithreaded network processor

Pages 168–177https://doi.org/10.1145/1122971.1122998

IP forwarding is one of the main bottlenecks in Internet backbone routers, as it requires performing the longest-prefix match at 10Gbps speed or higher. IPv6 forwarding further exacerbates the situation because its search space is quadrupled. We propose ...

Article

"MAMA!": a memory allocator for multithreaded architectures

Pages 178–186https://doi.org/10.1145/1122971.1122999

While the high-performance computing world is dominated by distributed memory computer systems, applications that require random access into large shared data structures continue to motivate development of ever larger shared-memory parallel computers ...

SESSION: Transactional memory

Article

McRT-STM: a high performance software transactional memory system for a multi-core runtime

Pages 187–197https://doi.org/10.1145/1122971.1123001

Applications need to become more concurrent to take advantage of the increased computational power provided by chip level multiprocessing. Programmers have traditionally managed this concurrency using locks (mutex based synchronization). Unfortunately, ...

Article

Exploiting distributed version concurrency in a transactional memory cluster

Pages 198–208https://doi.org/10.1145/1122971.1123002

We investigate a transactional memory runtime system providing scaling and strong consistency, i.e., 1-copy serializability on commodity clusters for both distributed scientific applications and database applications. We introduce a novel page-level ...

Article

Hybrid transactional memory

Pages 209–220https://doi.org/10.1145/1122971.1123003

High performance parallel programs are currently difficult to write and debug. One major source of difficulty is protecting concurrent accesses to shared data with an appropriate synchronization mechanism. Locks are the most common mechanism but they ...

SESSION: Potpourri

Article

Fast and transparent recovery for continuous availability of cluster-based servers

Pages 221–229https://doi.org/10.1145/1122971.1123005

Recently there has been renewed interest in building reliable servers that support continuous application operation. Besides maintaining system state consistent after a failure, one of the main challenges in achieving continuous operation is to provide ...

Article

Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster

Pages 230–238https://doi.org/10.1145/1122971.1123006

Recently, the high-performance computing community has realized that power is a performance-limiting factor. One reason for this is that supercomputing centers have limited power capacity and machines are starting to hit that limit. In addition, the ...

Article

Teaching parallel computing to science faculty: best practices and common pitfalls

Pages 239–246https://doi.org/10.1145/1122971.1123007

In 2002, we first brought High Performance Computing (HPC) methods to the college classroom as a way to enrich Computational Science education. Through the years, we have continued to facilitate college faculty in science, technology, engineering, and ...

Cited By

CHEVERESAN R and HOLBAN. S (2009). Workload Characterization an Essential Step in Computer Systems Performance Analysis - Methodology and Tools, Advances in Electrical and Computer Engineering, 10.4316/aece.2009.03018, 9:3, (100-106),

Contributors

Josep Torrellas
University of Illinois Urbana-Champaign
- Publication Years1988 - 2024
- Publication counts212
- Citation count9,251
- Available for Download190
- Downloads (cumulative)143,599
- Downloads (12 months)19,669
- Downloads (6 weeks)2,438
- Average Downloads per Article756
- Average Citation per Article44
View Full Profile
Siddhartha Chatterjee
IBM Research
- Publication Years1990 - 2009
- Publication counts55
- Citation count1,788
- Available for Download26
- Downloads (cumulative)87,289
- Downloads (12 months)6,402
- Downloads (6 weeks)572
- Average Downloads per Article3,357
- Average Citation per Article33
View Full Profile

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Year	Submitted	Accepted	Rate
PPoPP '21	150	31	21%
PPoPP '20	121	28	23%
PPoPP '19	152	29	19%
PPoPP '17	132	29	22%
PPoPP '14	184	28	15%
PPoPP '07	65	22	34%
PPoPP '03	45	20	44%
PPoPP '99	79	17	22%
PPOPP '97	86	26	30%
Overall	1,014	230	23%

PPOPP

Sections

Proceeding Downloads

Parallel programming and code selection in fortress

Collective communication on architectures that support simultaneous communication over multiple links

Performance evaluation of adaptive MPI

Mobile MPI programs in computational grids

RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits

Global-view abstractions for user-defined reductions and scans

Programming for parallelism and locality with hierarchically tiled arrays

Parallel programming in modern web search engines

Performance characterization of molecular dynamics techniques for biomolecular simulations

On-line automated performance diagnosis on thousands of processes

A case study in top-down performance estimation for a large-scale parallel application

Hardware profile-guided automatic page placement for ccNUMA systems

Adaptive scheduling with parallelism feedback

Predicting bounds on queuing delay for batch-scheduled parallel machines

Optimizing irregular shared-memory applications for distributed-memory systems

Proving correctness of highly-concurrent linearisable objects

Accurate and efficient runtime detection of atomicity errors in concurrent programs

Scalable synchronous queues

Session details: Software issues for multicore systems

POSH: a TLS compiler that exploits program structure

High-performance IPv6 forwarding algorithm for multi-core and multithreaded network processor

"MAMA!": a memory allocator for multithreaded architectures

McRT-STM: a high performance software transactional memory system for a multi-core runtime

Exploiting distributed version concurrency in a transactional memory cluster

Hybrid transactional memory

Fast and transparent recovery for continuous availability of cluster-based servers

Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster

Teaching parallel computing to science faculty: best practices and common pitfalls

Cited By

PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming

PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming

PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming

Acceptance Rates