I welcome you all to New York City, to the 2006 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'06). The conference is being held at Columbia University, which has graciously allowed the conference to use its facilities. In addition, we are excited to have the conference co-located with the 4th International Symposium on Code Generation and Optimization (CGO-4). We hope to leverage the synergies between the two conference themes.One important change is that, starting this year, PPoPP will be held annually. It is widely expected that the upcoming wide availability of multi-threaded and multi-core processors will drive major advances in parallel programming. The PPoPP Steering Committee and the Organizing Committee feel that PPoPP is a forum that is uniquely positioned to capture the exciting new ideas that will flourish in this area. A yearly conference will fulfill these expectations better.At the conference, I am looking forward to exciting discussions with my colleagues on cutting-edge research on parallel programming. In addition, I am looking forward to all the amenities that New York City provides. In particular, our Local Arrangements Co-Chair, Calin Cascaval, has organized a dinner and theater evening in the Theater District. This is something you will not want to miss.
Proceeding Downloads
Parallel programming and code selection in fortress
As part of the DARPA program for High Productivity Computing Systems, the Programming Language Research Group at Sun Microsystems Laboratories is developing Fortress, a language intended to support large-scale scientific computation with the same level ...
Collective communication on architectures that support simultaneous communication over multiple links
Traditional collective communication algorithms are designed with the assumption that a node can communicate with only one other node at a time. On new parallel architectures such as the IBM Blue Gene/L, a node can communicate with multiple nodes ...
Performance evaluation of adaptive MPI
Processor virtualization via migratable objects is a powerful technique that enables the runtime system to carry out intelligent adaptive optimizations like dynamic resource management. Charm++ is an early language/system that supports migratable ...
Mobile MPI programs in computational grids
Utility computing is becoming a popular way of exploiting the potential of computational grids. In utility computing, users are provided with computational power in a transparent manner similar to the way in which electrical utilities supply power to ...
RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits
Message Passing Interface (MPI) is a popular parallel programming model for scientific applications. Most high-performance MPI implementations use Rendezvous Protocol for efficient transfer of large messages. This protocol can be designed using either ...
Global-view abstractions for user-defined reductions and scans
Since APL, reductions and scans have been recognized as powerful programming concepts. Abstracting an accumulation loop (reduction) and an update loop (scan), the concepts have efficient parallel implementations based on the parallel prefix algorithm. ...
Programming for parallelism and locality with hierarchically tiled arrays
- Ganesh Bikshandi,
- Jia Guo,
- Daniel Hoeflinger,
- Gheorghe Almasi,
- Basilio B. Fraguela,
- María J. Garzarán,
- David Padua,
- Christoph von Praun
Tiling has proven to be an effective mechanism to develop high performance implementations of algorithms. Tiling can be used to organize computations so that communication costs in parallel programs are reduced and locality in sequential codes or ...
Parallel programming in modern web search engines
When a Search Engine responds to your query, thousands of machines from around the world have cooperated to produce your result. With a global reach of hundreds-of-millions of users, Search Engines are arguably the most commonly used massively-parallel ...
Performance characterization of molecular dynamics techniques for biomolecular simulations
Large-scale simulations and computational modeling using molecular dynamics (MD) continues to make significant impacts in the field of biology. It is well known that simulations of biological events at native time and length scales requires computing ...
On-line automated performance diagnosis on thousands of processes
Performance analysis tools are critical for the effective use of large parallel computing resources, but existing tools have failed to address three problems that limit their scalability: (1) management and processing of the volume of performance data ...
A case study in top-down performance estimation for a large-scale parallel application
This work presents a general methodology for estimating the performance of an HPC workload when running on a future hardware architecture. Further, it demonstrates the methodology by estimating the performance of a significant scientific application -- ...
Hardware profile-guided automatic page placement for ccNUMA systems
Cache coherent non-uniform memory architectures (ccNUMA) constitute an important class of high-performance computing plat-forms. Contemporary ccNUMA systems, such as the SGI Altix, have a large number of nodes, where each node consists of a small number ...
Adaptive scheduling with parallelism feedback
Multiprocessor scheduling in a shared multiprogramming environment is often structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level task scheduler schedules the work of a job on the allotted ...
Predicting bounds on queuing delay for batch-scheduled parallel machines
Most space-sharing parallel computers presently operated by high-performance computing centers use batch-queuing systems to manage processor allocation. In many cases, users wishing to use these batch-queued resources have accounts at multiple sites and ...
Optimizing irregular shared-memory applications for distributed-memory systems
In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming to distributed-memory platforms by automatic translation of OpenMP programs to MPI. In the case of irregular applications, the performance of this ...
Proving correctness of highly-concurrent linearisable objects
We study a family of implementations for linked lists using fine-grain synchronisation. This approach enables greater concurrency, but correctness is a greater challenge than for classical, coarse-grain synchronisation. Our examples are demonstrative of ...
Accurate and efficient runtime detection of atomicity errors in concurrent programs
Atomicity is an important correctness condition for concurrent systems. Informally, atomicity is the property that every concurrent execution of a set of transactions is equivalent to some serial execution of the same transactions. In multi-threaded ...
Scalable synchronous queues
We present two new nonblocking and contention-free implementations of synchronous queues ,concurrent transfer channels in which producers wait for consumers just as consumers wait for producers. Our implementations extend our previous work in dual ...
POSH: a TLS compiler that exploits program structure
As multi-core architectures with Thread-Level Speculation (TLS) are becoming better understood, it is important to focus on TLS compilation. TLS compilers are interesting in that, while they do not need to fully prove the independence of concurrent ...
High-performance IPv6 forwarding algorithm for multi-core and multithreaded network processor
IP forwarding is one of the main bottlenecks in Internet backbone routers, as it requires performing the longest-prefix match at 10Gbps speed or higher. IPv6 forwarding further exacerbates the situation because its search space is quadrupled. We propose ...
"MAMA!": a memory allocator for multithreaded architectures
While the high-performance computing world is dominated by distributed memory computer systems, applications that require random access into large shared data structures continue to motivate development of ever larger shared-memory parallel computers ...
McRT-STM: a high performance software transactional memory system for a multi-core runtime
Applications need to become more concurrent to take advantage of the increased computational power provided by chip level multiprocessing. Programmers have traditionally managed this concurrency using locks (mutex based synchronization). Unfortunately, ...
Exploiting distributed version concurrency in a transactional memory cluster
We investigate a transactional memory runtime system providing scaling and strong consistency, i.e., 1-copy serializability on commodity clusters for both distributed scientific applications and database applications. We introduce a novel page-level ...
Hybrid transactional memory
High performance parallel programs are currently difficult to write and debug. One major source of difficulty is protecting concurrent accesses to shared data with an appropriate synchronization mechanism. Locks are the most common mechanism but they ...
Fast and transparent recovery for continuous availability of cluster-based servers
Recently there has been renewed interest in building reliable servers that support continuous application operation. Besides maintaining system state consistent after a failure, one of the main challenges in achieving continuous operation is to provide ...
Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster
Recently, the high-performance computing community has realized that power is a performance-limiting factor. One reason for this is that supercomputing centers have limited power capacity and machines are starting to hit that limit. In addition, the ...
Teaching parallel computing to science faculty: best practices and common pitfalls
In 2002, we first brought High Performance Computing (HPC) methods to the college classroom as a way to enrich Computational Science education. Through the years, we have continued to facilitate college faculty in science, technology, engineering, and ...
- Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming