Megakernel vs Wavefront GPU Path Tracing
2026-05-26 • Graphics
GraphicsHardware ArchitecturePerformance
AI summaryⓘ
The authors looked at two methods for making computer images using rays of light: forward path tracing (PT) and wavefront path tracing (WPT). They found that WPT is about 16% faster because it uses the computer's memory more efficiently. Both methods run on graphics cards, but neither is fully limited by the computer's speed, pointing to other problems like waiting for data to move around. The authors also suggest new ways to improve these methods in the future for real-time use.
GPURay tracingPath tracingWavefront path tracingForward path tracingCache localityMegakernelSynchronizationMemory latencyNVIDIA Nsight Graphics
Authors
Rafael Padilla, Kyle Webster, Austin Kim
Abstract
Over the last decade, advances in GPU hardware have been driven in large part by the demands of real-time graphics, culminating in dedicated hardware ray tracing cores (RT cores). These units accelerate ray scene intersection queries directly in hardware, making physically based ray tracing algorithms increasingly practical for interactive applications. This paper compares and analyzes the performance of two ray-based rendering algorithms: forward path tracing (PT) and wavefront path tracing (WPT). GPU-based PT computes the color of each pixel by having each thread trace a single path to completion, naturally leading to a megakernel approach - while WPT maintains state buffers between specialized kernel invocations to trace path stages simultaneously. We find that WPT affords a ~16% speedup over PT in our implementation. By analyzing traces from NVIDIA Nsight Graphics, we attributed this speedup to WPT's improved cache locality compared to PT. We also find that our implementation does not achieve maximum GPU throughput across any of its units, suggesting that communication and memory latency, as well as synchronization, are the limiting factors. Finally, we address potential algorithmic improvements and future work for real-time path tracing implementation for practical applications.