I’m currently rewriting my Sparse Voxel Octree builder (and the preceding voxelization pipeline) for a paper I’m writing and hoping to submit to the HPG (High Performance Graphics) conference. The goal is to have a total out-of-core pipeline to go from polygon mesh to an annotated SVO.
- Annotated, as in: with additional appearance data like normals, texture coordinates … per voxel.
Out-Of-Core, as in: able to process data sets which are many times larger than available system memory: you can effectively stream in data, and the voxelizing / SVO building process only needs one pass over it: only a very limited amount of input data has to be in memory at runtime.
- For the voxelizing step, this is configurable: you can voxelize polygon meshes at any grid size you want, but allocating more memory to the process will result in a faster completion time. But even on a machine with a ridiculously low amount of RAM, you can voxelize large polygon meshes.
The SVO builder is based on the method outlined here, but with several optimizations added along the way. After profiling the old solution with AMD CodeAnalyst Performance Analyzer (now called AMD CodeXL, appearantly), I noticed that a lot of runtime was spent adding empty nodes into the lowest buffer in the hierarchy (the only buffer which is directly touched : we’re building bottom-up).
These empty nodes do need to be fed into the algorithm, since they effectively represent empty space in the SVO, and bubble up in the tree to mark large sections as “empty”. So no, just getting rid of them was not an option.
However, I found out that depending on the amount of empty nodes you have to add (and you know this amount, since it’s the difference between the sorted morton numbers and your internal morton counter) and the filled status of the algorithms buffer hierarchy, you can often insert 8^x (where x is anywhere between 0 and log2(grid length)) nodes in one go, at a higher level, thus skipping 8^x costly add operations. This won’t result in any speedup when your voxel grid is not very sparse, but when voxelizing polygon meshes (and not filling the inside of the mesh with voxels, but instead guaranteeing a 26-seperability shell with this method) the amount of filled voxels in the grid is often <10%, thus allowing for great gains when having efficient methods to cover empty space.
A quick and dirty implementation of this idea already led to promising results, seeing close to eightfold speedups when building octrees on large voxel grids. These timings were made using the Bunny voxel grids I made, which have a sparseness of 5-10%. The voxels contained opacity, color and normal information. During this octree building, intermediate levels were generated from the given leaf data as well.
And since this is an out of core algorithm, all it needs is < 2 Mb of private bytes for the algorithm … the rest is disk or network I/O (morton sorted voxels in, octree nodes and data out). I can probably do better I/O with some clever buffering, but I havent’t looked at that yet (it’s just a simple ifstream/ofstream at the moment). Here’s a procexp grab of a typical run:
Nice and small, yay! I cannot share code or details yet, but will of course do when the publication is accepted and published.
A lot of people have contacted me regarding Sparse Voxel Octrees in the last months: – thanks, keep it coming! I love discussing and pitching ideas.
Update: after some further optimization to the file formats I’m using and I/O throughput. These graphs are again for the voxelized bunny, sparseness ratio about 0.4%. Now even the larger octrees (2048^3) build in less than a minute. I was still doing some text-based I/O somewhere, which slowed down the whole process.