After a few weeks of optimizing and getting a lot of good C++ advice from Ares Lagae, I consider my CPU-based voxel raycaster as done. I now have a good framework to test some more interesting stuff on. It offers real-time performance, even for large voxel grids, and I’ve structured the code so I can easily store more information per voxel.
I toyed with the idea to make the step to a CUDA implementation, but that would result in more work and optimizations that would hinder me in trying out new voxel representations (since that would result in a new memory layout, which triggers a new way to store all that stuff compressed in textures, yada yada). Performance is not the goal here. I’m going to dive into some papers about normal filtering, in addition to Representing Appearance and Pre-filtering Subpixel Data in Sparse Voxel Octrees, a paper which was presented at High Performance Graphics 2012 in Paris (co-located with EGSR).
I’m also going to look into some way of preserving the normals from the original .ply models, maybe write my own voxelizer which does a regional lookup and takes the average of normals from the original models. That way, I’ll finally be able to render some lighted models as well :)
So here’s some results of the current Voxel Raycaster – you can always browse the code at my WebSVN. The actual traversal algorithm, based on Revelles paper from 2000, is located in TreeTraverser.cpp. I adapted it for iterative use with a stack.
- CPU Raycaster – I’m not drawing OpenGL/DX primitives (cubes) here.
- The Sparse Voxel Octree gets built from a .binvox representation of the model, obtained through the Binvox Mesh Voxelizer. In addition, my own .moctree format is used, which encodes the voxel data in Morton Order. I discovered in Paris that this technique was presented at EGSR2011 by Eric Tabellion in the paper Coherent Out-of-core Point-Based Global Illumination.
- Results were obtained on a Dell Precision T7500 at a resolution of 600×600 pixels. I use OpenMP to multi-thread the raytracing steps. As you can see, even if the voxel grid size changes O(n^3), the time to render the grid only changes O(n). This is due to the efficient skipping of empty space in a Sparse Voxel Octree
On a sidenote, this is also the first time I’ve made heavy use of a C++ profiling tool. I’ve been using AMD Codeanalyst. Even though I have an Intel CPU, it’s free and still offers basic time-profiling.