Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3466752.3480097acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Intersection Prediction for Accelerated GPU Ray Tracing

Published: 17 October 2021 Publication History

Abstract

Ray tracing has been used for years in motion picture to generate photorealistic images while faster raster-based shading techniques have been preferred for video games to meet real-time requirements. However, recent Graphics Processing Units (GPUs) incorporate hardware accelerator units designed for ray tracing. These accelerator units target the process of traversing hierarchical tree data structures used to test for ray-object intersections. Distinct rays following similar paths through these structures execute many redundant ray-box intersection tests. We propose a ray intersection predictor that speculatively elides redundant operations during this process and proceeds directly to test primitives that the ray is likely to intersect. A key aspect of our predictor strategy involves identifying hash functions that preserve enough spatial information to identify redundant traversals. We explore how to integrate our ray prediction strategy into existing GPU pipelines along with improving the predictor effectiveness by predicting nodes higher in the tree as well as regrouping and scheduling traversal operations in a low cost, judicious manner. On a mobile class GPU with a ray tracing accelerator unit, we find the addition of a 5.5KB predictor per streaming multiprocessor improves performance for ambient occlusion workloads by a geometric mean of 26%.

References

[1]
Timo Aila and Tero Karras. 2010. Architecture considerations for tracing incoherent rays. In Proc. ACM Conf. on High Performance Graphics (HPG). 113–122.
[2]
Timo Aila and Samuli Laine. 2009. Understanding the efficiency of ray traversal on GPUs. In Proc. ACM Conf. on High Performance Graphics (HPG). 145–149.
[3]
Martí Anglada, Enrique de Lucas, Joan-Manuel Parcerisa, Juan L Aragón, Pedro Marcuello, and Antonio González. 2019. Rendering elimination: Early discard of redundant tiles in the graphics pipeline. In Proc. IEEE Symp. on High-Perf. Computer Architecture (HPCA). 623–634.
[4]
Jose-Maria Arnau, Joan-Manuel Parcerisa, and Polychronis Xekalakis. 2014. Eliminating Redundant Fragment Shader Executions on a Mobile GPU via Hardware Memoization. In Proc. IEEE/ACM Int’l Symp. on Computer Architecture (ISCA). 529–540.
[5]
Rajeev Balasubramonian, Andrew B Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New tools for interconnect exploration in innovative off-chip memories. ACM Transactions on Architecture and Code Optimization (TACO) 14, 2(2017), 1–25.
[6]
Colin Barré-Brisebois, Henrik Halén, Graham Wihlidal, Andrew Lauritzen, Jasper Bekkers, Tomasz Stachowiak, and Johan Andersson. 2019. Hybrid rendering for real-time ray tracing. In Ray Tracing Gems. 437–473.
[7]
Nikolaus Binder and Alexander Keller. 2016. Efficient Stackless Hierarchy Traversal on GPUs with Backtracking in Constant Time. In Proc. ACM Conf. on High Performance Graphics (HPG). 41–50.
[8]
Blizzard Entertainment. 2021. Engineer’s Workshop: Enabling Ray-Traced Shadows in Shadowlands. Retrieved April 11, 2021 from https://worldofwarcraft.com/en-us/news/23494819/engineers-workshop-enabling-ray-traced-shadows-in-shadowlands
[9]
John Burgess. 2020. Rtx on—the nvidia turing gpu. IEEE Micro 40, 2 (2020), 36–44.
[10]
Jamison Collins, Suleyman Sair, Brad Calder, and Dean M Tullsen. 2002. Pointer cache assisted prefetching. In Proc. IEEE/ACM Symp. on Microarch. (MICRO). 62–73.
[11]
Francois Demoullin, Ayub Gubran, and Tor M Aamodt. 2019. Hash-Based Ray Path Prediction: Skipping BVH Traversal Computation by Exploiting Ray Locality. arXiv preprint arXiv:1910.01304(2019).
[12]
Francois M Demoullin. 2020. Hybrid rendering: in pursuit of real-time raytracing. Master’s thesis. University of British Columbia.
[13]
Yangdong Deng, Yufei Ni, Zonghui Li, Shuai Mu, and Wenjun Zhang. 2017. Toward real-time ray tracing: A survey on hardware acceleration and microarchitecture techniques. ACM Computing Surveys (CSUR) 50, 4 (2017), 1–41.
[14]
Wilson WL Fung and Tor M Aamodt. 2011. Thread block compaction for efficient SIMT control flow. In Proc. IEEE Symp. on High-Perf. Computer Architecture (HPCA). 25–36.
[15]
Wilson WL Fung, Ivan Sham, George Yuan, and Tor M Aamodt. 2007. Dynamic warp formation and scheduling for efficient GPU control flow. In Proc. IEEE/ACM Symp. on Microarch. (MICRO). 407–420.
[16]
Kirill Garanzha and Charles Loop. 2010. Fast Ray Sorting and Breadth-First Packet Traversal for GPU Ray Tracing. Computer Graphics Forum 29, 2 (2010), 289–298.
[17]
Michael Guthe. 2014. Latency Considerations of Depth-first GPU Ray Tracing. In Eurographics (Short Papers). 53–56.
[18]
Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. 2010. Understanding sources of inefficiency in general-purpose chips. In Proc. IEEE/ACM Int’l Symp. on Computer Architecture (ISCA). 37–47.
[19]
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. arxiv:1602.01528
[20]
Michal Hapala, Tomáš Davidovič, Ingo Wald, Vlastimil Havran, and Philipp Slusallek. 2011. Efficient Stack-Less BVH Traversal for Ray Tracing. In Proc. Spring Conference on Computer Graphics (SCCG). 7–12.
[21]
Hodgson, David. 2019. Modern Warfare Initial Intel: Call of Duty: Modern Warfare’s game engine is put through its paces. Retrieved April 14, 2021 from https://blog.activision.com/call-of-duty/2019-06/Initial-Intel-Call-of-Duty-Modern-Warfares-game-engine-is-put-through-its-paces
[22]
Daniel Reiter Horn, Jeremy Sugerman, Mike Houston, and Pat Hanrahan. 2007. Interactive K-d Tree GPU Raytracing. In Proc. ACM SIGGRAPH Symp. on Interactive 3D Graphics and Games (I3D). 167–174.
[23]
Alexander Keller, Timo Viitanen, Colin Barré-Brisebois, Christoph Schied, and Morgan McGuire. 2019. Are We Done with Ray Tracing?. In ACM SIGGRAPH Courses.
[24]
Mahmoud Khairy, Zhesheng Shen, Tor M Aamodt, and Timothy G Rogers. 2020. Accel-Sim: An extensible simulation framework for validated GPU modeling. In Proc. IEEE/ACM Int’l Symp. on Computer Architecture (ISCA). 473–486.
[25]
Daniel Kopta, Konstantin Shkurko, Josef Spjut, Erik Brunvand, and Al Davis. 2015. Memory considerations for low energy ray tracing. In Computer Graphics Forum, Vol. 34. Wiley Online Library, 47–59.
[26]
D. Kopta, J. Spjut, E. Brunvand, and A. Davis. 2010. Efficient MIMD architectures for high-performance ray tracing. In Proc. IEEE Conf. on Computer Design (ICCD). 9–16.
[27]
Samuli Laine. 2010. Restart trail for stackless BVH traversal. In Proc. ACM Conf. on High Performance Graphics (HPG). 107–111.
[28]
Won-Jong Lee, Youngsam Shin, Jaedon Lee, Jin-Woo Kim, Jae-Ho Nah, Seokyoon Jung, Shihwa Lee, Hyun-Sang Park, and Tack-Don Han. 2013. SGRT: A mobile GPU architecture for real-time ray tracing. In Proc. ACM Conf. on High Performance Graphics (HPG). 109–119.
[29]
Jingwen Leng, Tayler Hetherington, Ahmed ElTantawy, Syed Gilani, Nam Sung Kim, Tor M. Aamodt, and Vijay Janapa Reddi. 2013. GPUWattch: Enabling Energy Optimizations in GPGPUs. In Proc. IEEE/ACM Int’l Symp. on Computer Architecture (ISCA). 487–498.
[30]
Daqi Lin, Konstantin Shkurko, Ian Mallett, and Cem Yuksel. 2019. Dual-Split Trees. In Proc. ACM SIGGRAPH Symp. on Interactive 3D Graphics and Games (I3D). Article 3, 9 pages.
[31]
Yashuai Luü, Libo Huang, Li Shen, and Zhiying Wang. 2017. Unleashing the power of GPU for physically-based rendering via dynamic ray shuffling. In Proc. IEEE/ACM Symp. on Microarch. (MICRO). 560–573.
[32]
Scott McFarling. 1993. Combining branch predictors. Technical Report. WRL Technical Note TN-36.
[33]
Morgan McGuire. 2017. Computer Graphics Archive. https://casual-effects.com/data
[34]
Daniel Meister, Jakub Boksansky, Michael Guthe, and Jiri Bittner. 2020. On Ray Reordering Techniques for Faster GPU Ray Tracing. In Proc. ACM SIGGRAPH Symp. on Interactive 3D Graphics and Games (I3D). 1–9.
[35]
Daniel Meister, Shinji Ogaki, Carsten Benthin, Michael J Doyle, Michael Guthe, and Jiří Bittner. 2021. A Survey on Bounding Volume Hierarchies for Ray Tracing. In CGF, Vol. 40. 683–712.
[36]
Microsoft. 2021. DirectX Raytracing (DXR) Functional Spec: TraceRay control flow. Retrieved April 11, 2021 from https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html
[37]
Bochang Moon, Yongyoung Byun, Tae-Joon Kim, Pio Claudio, Hye-Sun Kim, Yun-Ji Ban, Seung Woo Nam, and Sung-Eui Yoon. 2010. Cache-Oblivious Ray Reordering. ACM Transactions on Graphics (TOG)(2010).
[38]
Jae-Ho Nah, Hyuck-Joo Kwon, Dong-Seok Kim, Cheol-Ho Jeong, Jinhong Park, Tack-Don Han, Dinesh Manocha, and Woo-Chan Park. 2014. RayCore: A ray-tracing hardware architecture for mobile devices. ACM Transactions on Graphics (TOG) 33, 5 (2014), 1–15.
[39]
Jae-Ho Nah, Jeong-Soo Park, Chanmin Park, Jin-Woo Kim, Yun-Hye Jung, Woo-Chan Park, and Tack-Don Han. 2011. T&I engine: Traversal and intersection engine for hardware accelerated ray tracing. In Proc. Int’l Conf. on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia). 1–10.
[40]
Veynu Narasiman, Michael Shebanow, Chang Joo Lee, Rustam Miftakhutdinov, Onur Mutlu, and Yale N Patt. 2011. Improving GPU performance via large warps and two-level warp scheduling. In Proc. IEEE/ACM Symp. on Microarch. (MICRO). 308–317.
[41]
Yufei Ni, Yangdong Deng, and Zonghui Li. 2021. Agglomerative Memory and Thread Scheduling for High Performance Ray Tracing on GPUs. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD) (2021).
[42]
NVIDIA. 2020. NVIDIA OptiX 7.2 - Programming Guide. Retrieved April 11, 2021 from https://raytracing-docs.nvidia.com/optix7/guide/index.html#device_side_functions
[43]
Elizabeth J O’neil, Patrick E O’neil, and Gerhard Weikum. 1993. The LRU-K page replacement algorithm for database disk buffering. ACM SIGMOD Record 22, 2 (1993), 297–306.
[44]
Matt Pharr, Craig Kolb, Reid Gershbein, and Pat Hanrahan. 1997. Rendering Complex Scenes with Memory-Coherent Ray Tracing. In Proc. Int’l Conf. on Computer Graphics and Interactive Techniques (SIGGRAPH). 101–108.
[45]
Bharath Pichai, Lisa Hsu, and Abhishek Bhattacharjee. 2014. Architectural Support for Address Translation on GPUs: Designing Memory Management Units for CPU/GPUs with Unified Address Spaces. In Proc. ACM Conf. on Arch. Support for Prog. Lang. and Op. Sys. (ASPLOS). 743–758.
[46]
J. Power, M. D. Hill, and D. A. Wood. 2014. Supporting x86-64 address translation for 100s of GPU lanes. In Proc. IEEE Symp. on High-Perf. Computer Architecture (HPCA). 568–578.
[47]
R. Rajwar and J. R. Goodman. 2001. Speculative lock elision: enabling highly concurrent multithreaded execution. In Proc. IEEE/ACM Symp. on Microarch. (MICRO). 294–305.
[48]
Timothy G Rogers, Mike O’Connor, and Tor M Aamodt. 2012. Cache-conscious wavefront scheduling. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. 72–83.
[49]
Schmid, Jan and Deligiannis, Johannes. 2019. It Just Works: Ray-Traced Reflections in ’Battlefield V’. Retrieved April 11, 2021 from https://www.gdcvault.com/play/1026282/It-Just-Works-Ray-Traced
[50]
J Schmittler, I Wald, and P Slusallek. 2002. SaarCOR: a hardware architecture for ray tracing. In Proc. ACM SIGGRAPH/EUROGRAPHICS Conf. on Graphics hardware (HWWS). 27–36.
[51]
André Seznec. 2011. A 64-Kbytes ITTAGE indirect branch predictor. In JWAC-2: Championship Branch Prediction. JILP.
[52]
Peter Shirley. 2016. Ray tracing in one weekend. Amazon Digital Services LLC 1 (2016).
[53]
Konstantin Shkurko, Tim Grant, Daniel Kopta, Ian Mallett, Cem Yuksel, and Erik Brunvand. 2017. Dual Streaming for Hardware-Accelerated Ray Tracing. In Proc. ACM Conf. on High Performance Graphics (HPG).
[54]
Josef Spjut, Andrew Kensler, Daniel Kopta, and Erik Brunvand. 2009. TRaX: A multicore hardware architecture for real-time ray tracing. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 28, 12 (2009), 1802–1815.
[55]
Michael Steffen and Joseph Zambreno. 2010. Improving simt efficiency of global rendering algorithms with architectural support for dynamic micro-kernels. In Proc. IEEE/ACM Symp. on Microarch. (MICRO). 237–248.
[56]
Geeky Gaming Stuff. 2021. What Are AAA Games? A Guide To Unofficial Terminology. https://geekygamingstuff.com/what-are-aaa-games/
[57]
The Khronos Vulkan Working Group. 2021. <Vulkan 1.2.174 - A Specification(with KHR extensions): Ray Result Determination. Retrieved April 11, 2021 from https://www.khronos.org/registry/vulkan/specs/1.2-khr-extensions/html/chap33.html
[58]
Ingo Wald. 2011. Active thread compaction for GPU path tracing. In Proc. ACM Conf. on High Performance Graphics (HPG). 51–58.
[59]
Turner Whitted. 2005. An improved illumination model for shaded display. In ACM SIGGRAPH Courses. 4–es.
[60]
Sascha Willems. 2019. Vulkan examples for ray traced shadows and reflections using VK_NV_ray_tracing. Retrieved November 4, 2020 from https://www.saschawillems.de/blog/2019/04/27/vulkan-examples-for-ray-traced-shadows-and-reflections-using-vk_nv_ray_tracing/
[61]
Sven Woop, Jörg Schmittler, and Philipp Slusallek. 2005. RPU: a programmable ray processing unit for realtime ray tracing. ACM Transactions on Graphics (TOG) 24, 3 (2005), 434–444.
[62]
Henri Ylitie, Tero Karras, and Samuli Laine. 2017. Efficient Incoherent Ray Traversal on GPUs through Compressed Wide BVHs. In Proc. ACM Conf. on High Performance Graphics (HPG). Article 4, 13 pages.

Cited By

View all
  • (2024)RTOD: Efficient Outlier Detection With Ray Tracing CoresIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.345390136:12(9192-9204)Online publication date: Dec-2024
  • (2024)Zatel: Sample Complexity-Aware Scale-Model Simulation for Ray Tracing2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00024(156-166)Online publication date: 5-May-2024
  • (2023)Visualizing Query Traversals Over Bounding Volume Hierarchies Using Treemaps2023 IEEE Visualization and Visual Analytics (VIS)10.1109/VIS54172.2023.00019(51-55)Online publication date: 21-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture
October 2021
1322 pages
ISBN:9781450385572
DOI:10.1145/3466752
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. GPU
  2. graphics
  3. hardware accelerator
  4. ray tracing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

MICRO '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)278
  • Downloads (Last 6 weeks)38
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)RTOD: Efficient Outlier Detection With Ray Tracing CoresIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.345390136:12(9192-9204)Online publication date: Dec-2024
  • (2024)Zatel: Sample Complexity-Aware Scale-Model Simulation for Ray Tracing2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00024(156-166)Online publication date: 5-May-2024
  • (2023)Visualizing Query Traversals Over Bounding Volume Hierarchies Using Treemaps2023 IEEE Visualization and Visual Analytics (VIS)10.1109/VIS54172.2023.00019(51-55)Online publication date: 21-Oct-2023
  • (2023)LumiBench: A Benchmark Suite for Hardware Ray Tracing2023 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC59245.2023.00011(1-14)Online publication date: 1-Oct-2023
  • (2023)Optimization strategies for GPUs: an overview of architectural approachesInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2023.217375238:2(140-154)Online publication date: 5-Feb-2023
  • (2023)Cluster-aware scheduling in multitasking GPUsReal-Time Systems10.1007/s11241-023-09409-x60:1(1-23)Online publication date: 22-Nov-2023
  • (2022)RT Engine: An Efficient Hardware Architecture for Ray TracingApplied Sciences10.3390/app1219959912:19(9599)Online publication date: 24-Sep-2022
  • (2022)Vulkan-Sim: A GPU Architecture Simulator for Ray TracingProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00027(263-281)Online publication date: 1-Oct-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media