Abstract
Sort-last parallel rendering is widely used. Recent GPU developments mean that a PC equipped with multiple GPUs is a viable alternative to a high-cost supercomputer: the Fermi architecture of a single GPU supports uniform virtual addressing, providing a foundation for non-uniform memory access (NUMA) on multi-GPU platforms. Such hardware changes require the user to reconsider the parallel rendering algorithms. In this paper, we thoroughly investigate the NUMA-aware image compositing problem, which is the key final stage in sort-last parallel rendering. Based on a proven radix-k strategy, we find one optimal compositing algorithm, which takes advantage of NUMA architecture on the multi-GPU platform. We quantitatively analyze different image compositing modes for practical image compositing, taking into account peer-to-peer communication costs between GPUs. Our experiments on various datasets show that our image compositing method is very fast, an image of a few megapixels can be composited in about 10 ms by eight GPUs.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Cavin, X., Demengeon, O.: Shift-based parallel image compositing on infiniband™ fat-trees. In: Eurographics Symposium on Parallel Graphics and Visualization, pp. 129–138 (2012)
Cavin, X., Mion, C., Filbois, A.: Cots cluster-based sort-last rendering: performance evaluation and pipelined implementation. In: IEEE Visualization, p. 15. IEEE Comput. Soc., Los Alamitos (2005)
Chan, E., Heimlich, M., Purkayastha, A., van de Geijn, R.: Collective communication: theory, practice, and experience. Concurr. Comput. 19(13), 1749–1783 (2007)
Eilemann, S., Pajarola, R.: Direct send compositing for parallel sort-last rendering. In: ACM SIGGRAPH ASIA, pp. 39:1–39:8. ACM, New York (2008), courses, 2008
Eilemann, S., Bilgili, A., Abdellah, M., Hernando, J., Makhinya, M., Pajarola, R., Schürmann, F.: Parallel rendering on hybrid multi-GPU clusters. In: Eurographics Symposium on Parallel Graphics and Visualization, pp. 109–117 (2012)
Kendall, W., Peterka, T., Huang, J., Shen, H.W., Ross, R.B.: Accelerating and benchmarking radix-k image compositing at large scale. In: Eurographics Symposium on Parallel Graphics and Visualization, pp. 101–110 (2010)
Ma, K., Painter, J.S., Hansen, C.D.: Parallel volume rendering using binary-swap compositing. IEEE Comput. Graph. Appl. 14, 59–68 (1994)
Marchesin, S., Mongenet, C., Dischler, J.M.: Multi-GPU sort-last volume visualization. In: Eurographics Symposium on Parallel Graphics and Visualization, pp. 1–8 (2008)
Moerschell, A., Owens, J.D.: Distributed texture memory in a multi-GPU environment. In: Graphics Hardware, pp. 31–38 (2006)
Molnar, S., Cox, M., Ellsworth, D., Fuchs, H.: A sorting classification of parallel rendering. IEEE Comput. Graph. Appl. 14, 23–32 (1994)
Moreland, K., Kendall, W., Peterka, T., Huang, J.: An image compositing solution at scale. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 25:1–25:10. ACM, New York (2011)
Neumann, U.: Communication costs for parallel volume-rendering algorithms. IEEE Comput. Graph. Appl. 14(4), 49–58 (1994)
NVIDIA: Cuda toolkit 4.0 (2012). http://developer.nvidia.com/cuda-toolkit-40
Peterka, T., Goodell, D., Ross, R., Shen, H.W., Thakur, R.: A configurable algorithm for parallel image-compositing applications. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 4:1–4:10 (2009)
Porter, T., Duff, T.: Compositing digital images. In: SIGGRAPH, pp. 253–259 (1984)
Schroeder, T.C.: Peer-to-peer and unified virtual addressing. Tech. rep. (2011)
Spafford, K., Meredith, J.S., Vetter, J.S.: Quantifying numa and contention effects in multi-GPU systems. In: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, pp. 11:1–11:7. ACM, New York (2011)
Stompel, A., Ma, K.L., Lum, E.B., Ahrens, J., Patchett, J.: Slic: scheduled linear image compositing for parallel volume rendering. In: IEEE Symposium on Parallel and Large-Data Visualization and Graphics, pp. 6–12 (2003)
Yu, H., Wang, C., Ma, K.L.: Massively parallel volume rendering using 2–3 swap image compositing. In: ACM SIGGRAPH ASIA, pp. 40:1–40:11. ACM, New York (2008), courses, 2008
Acknowledgements
We would like to thank the anonymous reviewers for their comments. We would like to also thank Hongbin Zhuo of College of Science in National University of Defense Technology for providing us with the electromagnetic volume data.
This work is supported by the National Basic Research Program (No. 2009CB723803), National Science Foundation Program (Nos. 61103084, 61272334, 61170157 and No. 61272009) of China and Research Funding Program of National University of Defense Technology.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, P., Cheng, Z., Martin, R. et al. NUMA-aware image compositing on multi-GPU platform. Vis Comput 29, 639–649 (2013). https://doi.org/10.1007/s00371-013-0803-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-013-0803-7