research-article

Public Access

Dual streaming for hardware-accelerated ray tracing

Authors:

Konstantin Shkurko,

Erik BrunvandAuthors Info & Claims

HPG '17: Proceedings of High Performance Graphics

Article No.: 12, Pages 1 - 11

https://doi.org/10.1145/3105762.3105771

Published: 28 July 2017 Publication History

Abstract

Hardware acceleration for ray tracing has been a topic of great interest in computer graphics. However, even with proposed custom hardware, the inherent irregularity in the memory access pattern of ray tracing has limited its performance, compared with rasterization on commercial GPUs. We provide a different approach to hardware-accelerated ray tracing, beginning with modifying the order of rendering operations, inspired by the streaming character of rasterization. Our dual streaming approach organizes the memory access of ray tracing into two predictable data streams. The predictability of these streams allows perfect prefetching and makes the memory access pattern an excellent match for the behavior of DRAM memory systems. By reformulating ray tracing as fully predictable streams of rays and of geometry we alleviate many long-standing problems of high-performance ray tracing and expose new opportunities for future research. Therefore, we also include extensive discussions of potential avenues for future research aimed at improving the performance of hardware-accelerated ray tracing using dual streaming.

References

[1]

Timo Aila and Tero Karras. 2010. Architecture Considerations for Tracing Incoherent Rays. In Proc. High Performance Graphics.

Digital Library

[2]

Timo Aila and Samuli Laine. 2009. Understanding the efficiency of ray traversal on GPUs. In Proc. High Performance Graphics. ACM, New York, NY, USA, 145--149.

Digital Library

[3]

Timo Aila, Samuli Laine, and Tero Karras. 2012. Understanding the Efficiency of Ray Traversal on GPUs - Kepler and Fermi Addendum. NVIDIA Technical Report NVR-2012-02. NVIDIA Corporation.

[4]

R. Balasubramonian, D.H. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas. 2000. Memory Hierarchy Reconfiguration for Energy and Performance in General-Purpose Processor Architectures. In Proceedings of MICRO-33. 245--257.

Digital Library

[5]

Rasmus Barringer and Tomas Akenine-Möller. 2014. Dynamic ray stream traversal. ACM Transactions on Graphics (TOG) 33, 4 (2014), 151.

Digital Library

[6]

James Bigler, Abe Stephens, and Steven G. Parker. 2006. Design for Parallel Interactive Ray Tracing Systems. In Symposium on Interactive Ray Tracing (IRT06).

[7]

Jacco Bikker. 2012. Improving Data Locality for Efficient In-Core Path Tracing. In Computer Graphics Forum, Vol. 31. 1936--1947.

Digital Library

[8]

Mahdi Nazm Bojnordi and Engin Ipek. 2012. PARDIS: A Programmable Memory Controller for the DDRx Interfacing Standards. In International Symposium on Computer Architecture (ISCA '12).

Digital Library

[9]

Solomon Boulos, Dave Edwards, J Dylan Lacewell, Joe Kniss, Jan Kautz, Peter Shirley, and Ingo Wald. 2007. Packet-based Whitted and Distribution Ray Tracing. In Proc. Graphics Interface.

Digital Library

[10]

Erik Brunvand, Daniel Kopta, and Niladrish Chatterjee. 2014. Why Graphics Programmers Need to Know About DRAM. In ACM SIGGRAPH 2014 Courses.

Digital Library

[11]

N. Chatterjee, R. Balasubramonian, M. Shevgoor, S. Pugsley, A. Udipi, A. Shafiee, K. Sudan, M. Awasthi, and Z. Chishti. 2012. USIMM: the Utah SImulated Memory Module. Technical Report UUCS-12-02. University of Utah.

[12]

C. Eisenacher, G. Nichols, A. Selle, and B. Burley. 2013. Sorted Deferred Shading for Production Path Tracing. Computer Graphics Forum 32, 4 (2013).

Digital Library

[13]

Venkatraman Govindaraju, Peter Djeu, Karthikeyan Sankaralingam, Mary Vernon, and William R. Mark. 2008. Toward A Multicore Architecture for Real-time Ray-tracing. In IEEE/ACM International Conference on Microarchitecture.

Digital Library

[14]

Christiaan Gribble and Karthik Ramani. 2008. Coherent Ray Tracing via Stream Filtering. In Symposium on Interactive Ray Tracing (IRT08).

[15]

Bruce Jacob, Spencer Ng, and David Wang. 2008. Memory Systems - Cache, DRAM, Disk. Elsevier.

Digital Library

[16]

JDEC Standard. 2015. High Bandwidth Memory (HBM) DRAM. Technical Report JESD325A. JDEC Solid State Technology Association.

[17]

James T. Kajiya. 1986. The Rendering Equation. In Proceedings of SIGGRAPH. 143--150.

Digital Library

[18]

Sean Keely. 2014. Reduced Precision for Hardware Ray Tracing in GPUs. In High-Performance Graphics (HPG 2014).

Digital Library

[19]

John Kelm, Daniel Johnson, Matthew Johnson, Neal Crago, William Tuohy, Aqeel Mahesri, Steven Lumetta, Matthew Frank, and Sanjay Patel. 2009. Rigel: an architecture and scalable programming interface for a 1000-core accelerator. In ISCA '09.

Digital Library

[20]

Hong-Yun Kim, Young-Jun Kim, and Lee-Sup Kim. 2010. Reconfigurable mobile stream processor for ray tracing. In Custom Integrated Circuits Conference (CICC).

[21]

Hong-Yun Kim, Young-Jun Kim, and Lee-Sup Kim. 2012. MRTP: Mobile Ray Tracing Processor With Reconfigurable Stream Multi-Processors for High Datapath Utilization. IEEE Journal of Solid-State Circuits 47, 2 (feb. 2012), 518--535.

[22]

Daniel Kopta, Konstantin Shkurko, Josef Spjut, Erik Brunvand, and Al Davis. 2013. An energy and bandwidth efficient ray tracing architecture. In Proc. High-Performance Graphics. ACM, 121--128.

Digital Library

[23]

Daniel Kopta, Konstantin Shkurko, Josef Spjut, Erik Brunvand, and Al Davis. 2015. Memory Considerations for Low Energy Ray Tracing. Computer Graphics Forum 34, 1 (2015), 47--59.

Digital Library

[24]

Daniel Kopta, Josef Spjut, Erik Brunvand, and Alan Davis. 2010. Efficient MIMD architectures for high-performance ray tracing. In IEEE International Conference on Computer Design (ICCD).

[25]

Won-Jong Lee, Shi-Hwa Lee, Jae-Ho Nah, Jin-Woo Kim, Youngsam Shin, Jaedon Lee, and Seok-Yoon Jung. 2012. SGRT: a scalable mobile GPU architecture based on ray tracing. In ACM SIGGRAPH 2012 Posters (SIGGRAPH '12).

Digital Library

[26]

Won-Jong Lee, Youngsam Shin, Seok Joong Hwang, Seok Kang, Jeong-Joon Yoo, and Soojung Ryu. 2015. Reorder buffer: an energy-efficient multithreading architecture for hardware MIMD ray traversal. In Proc.High-Performance Graphics. ACM, 21--32.

Digital Library

[27]

Gábor Liktor and Karthik Vaidyanathan. 2016. Bandwidth-efficient BVH Layout for Incremental Hardware Traversal. In Proc. High Performance Graphics. ACM.

Digital Library

[28]

B. Moon, Y. Byun, T.-J. Kim, P. Claudio, H.-S. Kim, Y.-J. Ban, S. W. Nam, and S.-E. Yoon. 2010. Cache-oblivious ray reordering. ACM Trans. Graph. 29, 3 (2010).

Digital Library

[29]

N. Muralimanohar, R. Balasubramonian, and N. Jouppi. 2007. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In MICRO.

Digital Library

[30]

Jae-Ho Nah, Hyuck-Joo Kwon, Dong-Seok Kim, Cheol-Ho Jeong, Jinhong Park, Tack-Don Han, Dinesh Manocha, and Woo-Chan Park. 2014. RayCore: A Ray-Tracing Hardware Architecture for Mobile Devices. ACM Trans. Graph. 33, 5 (Sept. 2014).

Digital Library

[31]

Paul Navrátil, Donald Fussell, Calvin Lin, and William Mark. 2007. Dynamic ray scheduling to improve ray coherence and bandwidth utilization. In Interactive Ray Tracing, 2007. IEEE Symposium on. 95--104.

Digital Library

[32]

Steven G. Parker, James Bigler, Andreas Dietrich, Heiko Friedrich, Jared Hoberock, David Luebke, David McAllister, Morgan McGuire, Keith Morley, Austin Robison, and Martin Stich. 2010. OptiX: a general purpose ray tracing engine. In ACM SIGGRAPH 2010 papers (SIGGRAPH '10).

Digital Library

[33]

Matt Pharr, Craig Kolb, Reid Gershbein, and Pat Hanrahan. 1997. Rendering complex scenes with memory-coherent ray tracing. In SIGGRAPH '97. 101--108.

Digital Library

[34]

Timothy J. Purcell, Ian Buck, William R. Mark, and Pat Hanrahan. 2002. Ray Tracing on Programmable Graphics Hardware. ACM Transactions on Graphics 21, 3 (2002).

Digital Library

[35]

Karthik Ramani and Christiaan Gribble. 2009. StreamRay: A Stream Filtering Architecture for Coherent Ray Tracing. In ASPLOS '09.

Digital Library

[36]

J. Schmittler, I. Wald, and P. Slusallek. 2002. SaarCOR - A Hardware Architecture for Realtime Ray-Tracing. In EUROGRAPHICS Workshop on Graphics Hardware.

Digital Library

[37]

J. Schmittler, S. Woop, D. Wagner, W. Paul, and P. Slusallek. 2004. Realtime Ray Tracing of Dynamic Scenes on an FPGA Chip. In Graphics Hardware Conference. 95--106.

Digital Library

[38]

Maxim Shevtsov, Alexei Soupikov, Alexander Kapustin, and Nizhniy Novorod. 2007. Ray-Triangle Intersection Algorithm for Modern CPU Architectures. In Procedings of GraphiCon'2007. Moscow, Russia.

[39]

Josef Spjut, Andrew Kensler, Daniel Kopta, and Erik Brunvand. 2009. TRaX: A Multicore Hardware Architecture for Real-Time Ray Tracing. IEEE Trans. on CAD 28, 12 (2009).

Digital Library

[40]

Josef Spjut, Daniel Kopta, Solomon Boulos, Spencer Kellis, and Erik Brunvand. 2008. TRaX: A Multi-Threaded Architecture for Real-Time Ray Tracing. In IEEE Symposium on Application Specific Processors (SASP).

Digital Library

[41]

Ingo Wald, Christiaan P. Gribble, Solomon Boulos, and Andrew Kensler. 2007. SIMD Ray Stream Tracing-SIMD Ray Traversal with Generalized Ray Packets and On-the-fly Re-Ordering. Technical Report UUSCI-2007-012. SCI Institute, University of Utah.

[42]

I. Wald, S. Woop, C. Benthin, G. Johnson, and M. Ernst. 2014. Embree - A Kernel Framework for Efficient CPU Ray Tracing. In ACM SIGGRAPH.

Digital Library

[43]

Amy Williams, Steve Barrus, R. Keith Morley, and Peter Shirley. 2005. An Efficient and Robust Ray-Box Intersection Algorithm. Journal of Graphics Tools 10, 1 (2005).

[44]

Sven Woop, Erik Brunvand, and Philipp Slusallak. 2006. Estimating Performance of a Ray Tracing ASIC Design. In IRT06.

[45]

Sven Woop, Jörg Schmittler, and Philipp Slusallek. 2005. RPU: A Programmable Ray Processing Unit for Realtime Ray Tracing. ACM Trans. on Graphics 24, 3 (July 2005).

Digital Library

[46]

Wm. A. Wulf and S.A. McKee. 1995. Hitting the Memory Wall: Implications of the Obvious. Computer Architecture News 23, 1 (March 1995), 20--24.

Digital Library

[47]

Sung-Eui Yoon and Dinesh Manocha. 2006. Cache-Efficient Layouts of Bounding Volume Hierarchies. In Computer Graphics Forum, Vol. 25. 507--516.

Cited By

Feng YLin WLiu ZLeng JGuo MZhao HHou XZhao JZhu Y(2024)Potamoi: Accelerating Neural Rendering via a Unified Streaming ArchitectureACM Transactions on Architecture and Code Optimization10.1145/368934021:4(1-25)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3689340
Yan RSu YGuo HLü YWang JXiao NShen LWang YHuang L(2024)MPRTA: An Efficient Multilevel Parallel Mobile Accelerator for High-Performance Ray TracingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.333471132:2(396-400)Online publication date: Feb-2024
https://doi.org/10.1109/TVLSI.2023.3334711
Feng YLiu ZLeng JGuo MZhu Y(2024)Cicero: Addressing Algorithmic and Architectural Bottlenecks in Neural Rendering by Radiance Warping and Memory Optimizations2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00096(1293-1308)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00096
Show More Cited By

Index Terms

Dual streaming for hardware-accelerated ray tracing
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
2. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Graphics processors
    2. Rendering
      1. Ray tracing

Recommendations

Dual Streaming for Hardware-Accelerated Ray Tracing
Radiance interpolants for accelerated bounded-error ray tracing

Ray tracers, which sample radiance, are usually regarded as offline rendering algorithms that are too slow for interactive use. In this article we present a system that exploits object-space, ray-space, image-space, and temporal coherence to accelerate ...
Use of hardware Z-buffered rasterization to accelerate ray tracing
SAC '07: Proceedings of the 2007 ACM symposium on Applied computing

Ray tracing is a rendering technique for producing realistic 3D computer graphics. Compared to traditional scan-line rendering which is generally adopted by graphics pipeline, ray tracing can simulate more realistic global illumination, however, with ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

HPG '17: Proceedings of High Performance Graphics

July 2017

180 pages

ISBN:9781450351010

DOI:10.1145/3105762

General Chairs:
Morgan McGuire
Williams College
,
Anjul Patney
NVIDIA

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques
EUROGRAPHICS: The European Association for Computer Graphics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 July 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tag

raytracing hardware

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

HPG '17

Sponsor:

SIGGRAPH
EUROGRAPHICS

HPG '17: High-Performance Graphics

July 28 - 30, 2017

California, Los Angeles

Acceptance Rates

Overall Acceptance Rate 15 of 44 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
1,030
Total Downloads

Downloads (Last 12 months)235
Downloads (Last 6 weeks)20

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Feng YLin WLiu ZLeng JGuo MZhao HHou XZhao JZhu Y(2024)Potamoi: Accelerating Neural Rendering via a Unified Streaming ArchitectureACM Transactions on Architecture and Code Optimization10.1145/368934021:4(1-25)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3689340
Yan RSu YGuo HLü YWang JXiao NShen LWang YHuang L(2024)MPRTA: An Efficient Multilevel Parallel Mobile Accelerator for High-Performance Ray TracingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.333471132:2(396-400)Online publication date: Feb-2024
https://doi.org/10.1109/TVLSI.2023.3334711
Feng YLiu ZLeng JGuo MZhu Y(2024)Cicero: Addressing Algorithmic and Architectural Bottlenecks in Neural Rendering by Radiance Warping and Memory Optimizations2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00096(1293-1308)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00096
Chou YNowicki TAamodt T(2023)Treelet Prefetching For Ray TracingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614288(742-755)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614288
Yan RHuang LGuo HLü YYang LXiao NWang YShen LLan M(2022)RT Engine: An Efficient Hardware Architecture for Ray TracingApplied Sciences10.3390/app1219959912:19(9599)Online publication date: 24-Sep-2022
https://doi.org/10.3390/app12199599
Zhu YLee JAgrawal KSpear M(2022)RTNNProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508409(76-89)Online publication date: 2-Apr-2022
https://dl.acm.org/doi/10.1145/3503221.3508409
Vasiou EShkurko KBrunvand EYuksel C(2022)Mach-RT: A Many Chip Architecture for High Performance Ray TracingIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2020.302104828:3(1585-1596)Online publication date: 1-Mar-2022
https://doi.org/10.1109/TVCG.2020.3021048
Yan RHuang LGuo HLü YYang LXiao NShen LWang Y(2022)RTA: an Efficient SIMD Architecture for Ray Tracing2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00040(43-50)Online publication date: Dec-2022
https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00040
Liu LChang WDemoullin FChou YSaed MPankratz DNowicki TAamodt T(2021)Intersection Prediction for Accelerated GPU Ray TracingMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480097(709-723)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480097
Meister DOgaki SBenthin CDoyle MGuthe MBittner J(2021)A Survey on Bounding Volume Hierarchies for Ray TracingComputer Graphics Forum10.1111/cgf.14266240:2(683-712)Online publication date: 4-Jun-2021
https://doi.org/10.1111/cgf.142662
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents