CPU-Based Parallel Ray Tracer
Abhishek Chugh, Chenxi Liu
During the first week, we implemented a sequential surface area heuristic (SAH) based bounding volume hierarchy (BVH). The sequential BVVH was almost completely based on the implementation in the PBRT book. Besides that, we read several papers on parallel BVH construction and acquired appropriate scenes for testing. We have been conducting our experiments on "Happy buddha", a test scene with over 1M triangles.
Next, we explored different approaches to maximally exploit parallel computation in BVH construction. The first approach we tried was to use different threads to do work for different nodes. This only constraint for this "Tree-level parallelism" is the fact that we can work on a node only if parent node has been built. We used OpenMP to exploit parallelism with multiple threads. Though this provided us decent results (3.3x using 6 threads on GHC 3K machines), a major factor limiting speed up was the lack of nodes available initially. This was significant limitation because this led to threads being idle for about 20% of time during the BVH build.
Apart from the tree level parallelism, there are opportunities parallelism for work inside a node. This "Node-level parallelism" involves building bounding boxes and calculating the costs of split boundaries in parallel. Thus, the benefits of this parallelism is prominent only when the number of primitives handled by the node is high. Thus, we use a hybrid solution. Initially, all available threads work on building nodes sequentially while exploiting the Node-level parallelism. When number of nodes available for work is greater than the number of available threads, we switch to using one thread per node and building multiple nodes in parallel. The work assignment is such that a thread works on nodes that are close to each other - to maintain good spatial locality. Using this combined approach we achieve a speed-up of 3.9x using 6 threads, on GHC 3K machines.
In the second week, we decided to use SIMD instructions for achieving even higher speed-up. For higher portability and because its ease of development, we use ISPC. After, including ISPC in the project and overcoming build issues, we implemented the Partition function (an operation inside one node) in ISPC. The overhead of current solution was analysed.
In our current implementation we saw a slow down after introducing ISPC. We attribute this to the multiple gather and scatter instructions required for accessing data in the form of Array of Structures(AOS). To achieve the best performance for SIMD, the data passed in should be in SOA (Struct Of Array) form. All vector type data in the current program should be re-wrapped into SOA. The modification may introduce some bugs. This issue should be settled before we move on to packet tracing because a packet is naturally mapped to SIMD vector lanes and wrong data structure would affect performance even more.
Apart from that, we haven't decided yet if we are going to flatten the BVH in parallel. A basic idea is to use Euler tour, prescan and pre-order number generation. However, we may not implement this since this flattening part only takes a small amount of time..
We have decided to work on packet tracing after finishing parallel BVH construction. We would like to demo real-time raytracing for smaller scenes and speed-up graphs for larger scenes.
One nice-to-have we mentioned in our write-up was on-demand tessellation of meshes. If we end things a little early, we will try to implement this in the last week.
||What We Plan To Do
|18-24, April||Implementing a wrapping class for ISPC SOA (first check header generated by ISPC compiler to see if there is anything helpful). Parallelizing bounding box construction with ISPC. Checking through sequential code for rendering and coming up with general design idea with packet tracing.|
|25-31, May||Implementing a packet class and experimenting with different strategies.|
|1-9, May||Fixing bugs, collecting data and writing Final Report. Give on-demand tessellation a try.|