An open enviornment for building parallel programming systems
PRESTO is a set of tools for building parallel programming systems on shared-memory multiprocessors. PRESTO's goal is to provide a framework within which one can easily build efficient support for any of a wide variety of “models” of parallel ...
Experiences with poker
- David Notkin,
- Lawrence Snyder,
- David Socha,
- Mary L. Bailey,
- Bruce Forstall,
- Kevin Gates,
- Ray Greenlaw,
- Willian G. Griswold,
- Thomas J. Holman,
- Richard Korry,
- Gemini Lasswell,
- Robert Mitchell,
- Philip A. Nelson
Experience from over five years of building nonshared memory parallel programs using the Poker Parallel Programming Environment has positioned us to evaluate our approach to defining and developing parallel programs. This paper presents the more ...
Non-intrusive and interactive profiling in parasight
Debugging the performance of parallel applications is crucial for fully utilizing the potential of multiprocessor hardware. This paper describes profiling tools in Parasight, a programming environment that is geared towards non-intrusive performance ...
Program development for a systolic array
The primary objective of the Warp programming environment (WPE) is to simplify the use of Warp, a high-performance programmable linear systolic array connected to a general-purpose workstation host. WPE permits the development of distributed ...
Compiling Fortran 8x array features for the connection machine computer system
The Connection Machine® computer system supports a data parallel programming style, making it a natural target architecture for Fortran 8x array constructs. The Connection Machine Fortran compiler generates VAX code that performs scalar operations and ...
Compiling C* programs for a hypercube multicomputer
A data parallel language such as C* has a number of advantages over conventional hypercube programming languages. The algorithm design process is simpler, because (1) message passing is invisible, (2) race conditions are nonexistent, and (3) the data ...
Using data partitioning to implement a parallel assembler
A technique for implementing algorithms on a multiprocessor computer system is data partitioning, in which input data for a problem is partitioned among many processors that cooperate to solve the problem. This paper demonstrates that data partitioning ...
Automatic discovery of parallelism: a tool and an experiment (extended abstract)
This paper reports preliminary results from applying advanced techniques to the parallelization of sequential programs. Such techniques include interprocedural analysis and the identification of nested parallelism. These techniques have been proposed ...
Efficient interprocedural analysis for program parallelization and restructuring
An approach to efficient interprocedural analysis for program parallelization and restructuring is presented. Such analysis is needed for parallelizing loops which contain procedure calls. Our approach captures call effect on data dependencies by ...
Restructuring Lisp programs for concurrent execution
This paper describes the techniques that the program transformation system CURARE uses to restructure Lisp programs for concurrent execution in multiprocessor Lisp systems and discusses the problems inherent in producing concurrent programs in a ...
Qlisp: experience and new directions
Qlisp, a dialect of Common Lisp, has been proposed as a multiprocessing programming language which is suitable for studying the styles of parallel programming at the medium-grain level. An initial version of Qlisp has been implemented on a ...
Parallel discrete-event simulation of FCFS stochastic queueing networks
Physical systems are inherently parallel; intuition suggests that simulations of these systems may be amenable to parallel execution. The parallel execution of a discrete-event simulation requires careful synchronization of processes in order to ensure ...
The parallel decomposition and implementation of an integrated circuit global router
Better quality automatic layout of integrated circuits can be obtained by combining the placement and routing phases so that routing is used as the cost function for placement optimization. Conventional routers are too slow to make this feasible, and so ...
Soar/PSM-E: investigating match parallelism in a learning production sytsem
Soar is an attempt to realize a set of hypotheses on the nature of general intelligence within a single system. Soar uses a production system (rule based system) to encode its knowledge base. Its learning mechanism, chunking, adds productions ...
Large-scale parallel programming: experience with BBN butterfly parallel processor
For three years, members of the Computer Science Department at the University of Rochester have used a collection of BBN Butterfly™ Parallel Processors to conduct research in parallel systems and applications. For most of that time, Rochester's 128-node ...
Applications experience with Linda
We describe three experiments using C-Linda to write parallel codes. The first involves assessing the similarity of DNA sequences. The results demonstrate Linda's flexibility—Linda solutions are presented that work well at two quite different levels of ...
On the implementation of applicative languages on shared-memory, MIMD multiproce
This paper presents the performance of a set of algorithms written in SISAL [MSA*85] and run on multiprocessor Sequent, DEC, and Cray computers. We describe our current runtime system and discuss its implementation on each machine. We indicate where our ...
Characterizing the synchronization behavior of parallel programs
Contention for synchronization locks and delays waiting for synchronization events can substantially increase the running time of a parallel program. This makes it important to characterize the synchronization behavior of programs and to provide ...
Exploiting variable grain parallelism at runtime
Currently, almost all parallel implementations of programs fix the granularity at which parallelism is exploited at design time. Depending on the application structure and the parallel hardware structure, the programmer decides to exploit parallelism at ...
Communication-sensitive heuristics and algorithms for mapping compilers
The mapping problem arises when parallel algorithms are implemented on parallel machines. When the number of processes exceeds the number of available processing elements, the mapping problem includes the contraction problem. In this paper, we identify ...
Compile-time techniques for efficient utilization of parallel memories
The partitioning of shared memory into a number of memory modules is an approach to achieve high memory bandwidth for parallel processors. Memory access conflicts can occur when several processors simultaneously request data from the same memory module. ...