Parallel processing on dynamic resources with carmi
In every production parallel processing environment, the set of resources potentially
available to an application fluctuate due to changes in the load on the system. This is true for
clusters of workstations which are an increasingly popular platform for parallel computing.
Today's parallel programming environments have largely succeeded in making the
communication aspect of parallel programming much easier, but they have not provided
adequate resource management services which are needed to adapt to such changes in …
available to an application fluctuate due to changes in the load on the system. This is true for
clusters of workstations which are an increasingly popular platform for parallel computing.
Today's parallel programming environments have largely succeeded in making the
communication aspect of parallel programming much easier, but they have not provided
adequate resource management services which are needed to adapt to such changes in …
Abstract
In every production parallel processing environment, the set of resources potentially available to an application fluctuate due to changes in the load on the system. This is true for clusters of workstations which are an increasingly popular platform for parallel computing. Today's parallel programming environments have largely succeeded in making the communication aspect of parallel programming much easier, but they have not provided adequate resource management services which are needed to adapt to such changes in availability. To fill this need, we have developed CARMI, a resource management system, aimed at allowing a parallel application to make use of all available computing power. CARMI permits an application to grow as new resources become available, and shrink when resources are reclaimed. Building upon CARMI, we have also developed WoDi which provides a simple interface for writing master-workers programs in a dynamic resource environment. Both CARMI and WoDi are operational, and have been used on a pool of more than 200 workstations managed by the Condor batch system. Experience with the two systems has shown them to be easy to use, and capable of providing large numbers of cycles to parallel applications even in a real-life production environment in which no resources are dedicated to parallel processing.
Springer