The Cilkview scalability analyzer

Y He, CE Leiserson, WM Leiserson - … of the twenty-second annual ACM …, 2010 - dl.acm.org
Y He, CE Leiserson, WM Leiserson
Proceedings of the twenty-second annual ACM symposium on Parallelism in …, 2010dl.acm.org
The Cilkview scalability analyzer is a software tool for profiling, estimating scalability, and
benchmarking multithreaded Cilk++ applications. Cilkview monitors logical parallelism
during an instrumented execution of the Cilk++ application on a single processing core. As
Cilkview executes, it analyzes logical dependencies within the computation to determine its
work and span (critical-path length). These metrics allow Cilkview to estimate parallelism
and predict how the application will scale with the number of processing cores. In addition …
The Cilkview scalability analyzer is a software tool for profiling, estimating scalability, and benchmarking multithreaded Cilk++ applications. Cilkview monitors logical parallelism during an instrumented execution of the Cilk++ application on a single processing core. As Cilkview executes, it analyzes logical dependencies within the computation to determine its work and span (critical-path length). These metrics allow Cilkview to estimate parallelism and predict how the application will scale with the number of processing cores. In addition, Cilkview analyzes cheduling overhead using the concept of a "burdened dag," which allows it to diagnose performance problems in the application due to an insufficient grain size of parallel subcomputations.
Cilkview employs the Pin dynamic-instrumentation framework to collect metrics during a serial execution of the application code. It operates directly on the optimized code rather than on a debug version. Metadata embedded by the Cilk++ compiler in the binary executable identifies the parallel control constructs in the executing application. This approach introduces little or no overhead to the program binary in normal runs.
Cilkview can perform real-time scalability benchmarking automatically, producing gnuplot-compatible output that allows developers to compare an application's performance with the tool's predictions. If the program performs beneath the range of expectation, the programmer can be confident in seeking a cause such as insufficient memory bandwidth, false sharing, or contention rather than inadequate parallelism or insufficient grain size.
ACM Digital Library