NUMA-aware image compositing on multi-GPU platform

Pan Wang¹,
Zhiquan Cheng¹,
Ralph Martin²,
Huahai Liu¹,
Xun Cai¹ &
…
Sikun Li¹

287 Accesses
3 Citations
Explore all metrics

Abstract

Sort-last parallel rendering is widely used. Recent GPU developments mean that a PC equipped with multiple GPUs is a viable alternative to a high-cost supercomputer: the Fermi architecture of a single GPU supports uniform virtual addressing, providing a foundation for non-uniform memory access (NUMA) on multi-GPU platforms. Such hardware changes require the user to reconsider the parallel rendering algorithms. In this paper, we thoroughly investigate the NUMA-aware image compositing problem, which is the key final stage in sort-last parallel rendering. Based on a proven radix-k strategy, we find one optimal compositing algorithm, which takes advantage of NUMA architecture on the multi-GPU platform. We quantitatively analyze different image compositing modes for practical image compositing, taking into account peer-to-peer communication costs between GPUs. Our experiments on various datasets show that our image compositing method is very fast, an image of a few megapixels can be composited in about 10 ms by eight GPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

mSwap: a large-scale image-compositing method with optimal m-ary tree

Article Open access 27 January 2021

An Optimization Scheme for Demosaicing Algorithm on GPU Using OpenCL

Multi-GPU multi-display rendering of extremely large 3D environments

Article 19 December 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Cavin, X., Demengeon, O.: Shift-based parallel image compositing on infiniband™ fat-trees. In: Eurographics Symposium on Parallel Graphics and Visualization, pp. 129–138 (2012)
Google Scholar
Cavin, X., Mion, C., Filbois, A.: Cots cluster-based sort-last rendering: performance evaluation and pipelined implementation. In: IEEE Visualization, p. 15. IEEE Comput. Soc., Los Alamitos (2005)
Google Scholar
Chan, E., Heimlich, M., Purkayastha, A., van de Geijn, R.: Collective communication: theory, practice, and experience. Concurr. Comput. 19(13), 1749–1783 (2007)
Article Google Scholar
Eilemann, S., Pajarola, R.: Direct send compositing for parallel sort-last rendering. In: ACM SIGGRAPH ASIA, pp. 39:1–39:8. ACM, New York (2008), courses, 2008
Google Scholar
Eilemann, S., Bilgili, A., Abdellah, M., Hernando, J., Makhinya, M., Pajarola, R., Schürmann, F.: Parallel rendering on hybrid multi-GPU clusters. In: Eurographics Symposium on Parallel Graphics and Visualization, pp. 109–117 (2012)
Google Scholar
Kendall, W., Peterka, T., Huang, J., Shen, H.W., Ross, R.B.: Accelerating and benchmarking radix-k image compositing at large scale. In: Eurographics Symposium on Parallel Graphics and Visualization, pp. 101–110 (2010)
Google Scholar
Ma, K., Painter, J.S., Hansen, C.D.: Parallel volume rendering using binary-swap compositing. IEEE Comput. Graph. Appl. 14, 59–68 (1994)
Google Scholar
Marchesin, S., Mongenet, C., Dischler, J.M.: Multi-GPU sort-last volume visualization. In: Eurographics Symposium on Parallel Graphics and Visualization, pp. 1–8 (2008)
Google Scholar
Moerschell, A., Owens, J.D.: Distributed texture memory in a multi-GPU environment. In: Graphics Hardware, pp. 31–38 (2006)
Google Scholar
Molnar, S., Cox, M., Ellsworth, D., Fuchs, H.: A sorting classification of parallel rendering. IEEE Comput. Graph. Appl. 14, 23–32 (1994)
Article Google Scholar
Moreland, K., Kendall, W., Peterka, T., Huang, J.: An image compositing solution at scale. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 25:1–25:10. ACM, New York (2011)
Google Scholar
Neumann, U.: Communication costs for parallel volume-rendering algorithms. IEEE Comput. Graph. Appl. 14(4), 49–58 (1994)
Article Google Scholar
NVIDIA: Cuda toolkit 4.0 (2012). http://developer.nvidia.com/cuda-toolkit-40
Peterka, T., Goodell, D., Ross, R., Shen, H.W., Thakur, R.: A configurable algorithm for parallel image-compositing applications. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 4:1–4:10 (2009)
Google Scholar
Porter, T., Duff, T.: Compositing digital images. In: SIGGRAPH, pp. 253–259 (1984)
Google Scholar
Schroeder, T.C.: Peer-to-peer and unified virtual addressing. Tech. rep. (2011)
Spafford, K., Meredith, J.S., Vetter, J.S.: Quantifying numa and contention effects in multi-GPU systems. In: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, pp. 11:1–11:7. ACM, New York (2011)
Google Scholar
Stompel, A., Ma, K.L., Lum, E.B., Ahrens, J., Patchett, J.: Slic: scheduled linear image compositing for parallel volume rendering. In: IEEE Symposium on Parallel and Large-Data Visualization and Graphics, pp. 6–12 (2003)
Google Scholar
Yu, H., Wang, C., Ma, K.L.: Massively parallel volume rendering using 2–3 swap image compositing. In: ACM SIGGRAPH ASIA, pp. 40:1–40:11. ACM, New York (2008), courses, 2008
Google Scholar

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their comments. We would like to also thank Hongbin Zhuo of College of Science in National University of Defense Technology for providing us with the electromagnetic volume data.

This work is supported by the National Basic Research Program (No. 2009CB723803), National Science Foundation Program (Nos. 61103084, 61272334, 61170157 and No. 61272009) of China and Research Funding Program of National University of Defense Technology.

Author information

Authors and Affiliations

School of Computer Science, National University of Defense Technology, Hunan, China
Pan Wang, Zhiquan Cheng, Huahai Liu, Xun Cai & Sikun Li
School of Computer Science & Informatics, Cardiff University, Cardiff, UK
Ralph Martin

Authors

Pan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiquan Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Ralph Martin
View author publications
You can also search for this author in PubMed Google Scholar
Huahai Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xun Cai
View author publications
You can also search for this author in PubMed Google Scholar
Sikun Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiquan Cheng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, P., Cheng, Z., Martin, R. et al. NUMA-aware image compositing on multi-GPU platform. Vis Comput 29, 639–649 (2013). https://doi.org/10.1007/s00371-013-0803-7

Download citation

Published: 26 April 2013
Issue Date: June 2013
DOI: https://doi.org/10.1007/s00371-013-0803-7

NUMA-aware image compositing on multi-GPU platform

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

mSwap: a large-scale image-compositing method with optimal m-ary tree

An Optimization Scheme for Demosaicing Algorithm on GPU Using OpenCL

Multi-GPU multi-display rendering of extremely large 3D environments

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

NUMA-aware image compositing on multi-GPU platform

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

mSwap: a large-scale image-compositing method with optimal m-ary tree

An Optimization Scheme for Demosaicing Algorithm on GPU Using OpenCL

Multi-GPU multi-display rendering of extremely large 3D environments

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation