In these exciting modern times, people get a lot of mileage out of their depth buffers. Long gone are the days where we only use depth buffers for visibility and stenciling, as we now make use of the depth buffer to reconstruct world-space or view-space position of our geometry at any given pixel. This can be a powerful performance optimization, since the alternative is to output position into a “fat” floating-point buffer. However it’s important to realize that using the depth buffer in such unconventional ways can impose new precision requirements, since complex functions like lighting attenuation and shadowing will depend on the accuracy of the value stored in your depth buffer. This is particularly important if you’re using a hardware depth buffer for reconstructing position, since the z/w value stored in it will be non-linear with respect to the view-space z value of the pixel. If you’re not familiar with any of this, there’s a good overview here by Steve Baker. The basic gist of it is that z/w will increase very quickly in the as you move away from the near-clip plane of your perspective projection, and for much of the area viewable by your camera you will have values >= 0.9 in your depth buffer. Consequently you’ll end up with a lot of precision for geometry that’s close to your camera, and very little for geometry that’s way in the back. This article from codermind has some mostly-accurate graphs that visualize the problem.
Recently I’ve been doing some research into different formats for storing depth, in order to get a solid idea of the amount of error I can expect. To do this I made DirectX11 app where I rendered a series of objects at various depths, and compared the position reconstructed from the depth buffer with a position interpolated from the vertex shader. This let me easily flip through different depth formats visualize the associated error. Here’s a front view and a side view of the test scene:
The cylinders are placed at depths of 5, 20, 40, 60, 80, and 100. The near-clip plane was set to 1, and the far-clip was set to 101.
For an error metric, I calculated the difference between the reference position (interpolated view-space vertex position) and normalized it by dividing by the distance to the far clip plane. I also multiplied by 100, so that a fully red pixel represented a difference equivalent to 1% of the view distance. For a final output I put the shaded and lit scene in the top-left corner, the sampled depth in the top right, the error in the bottom left, and error * 100 in the bottom right.
For all formats marked “Linear Z”, the depth was calculated by taking view-space Z and dividing by the distance to the far-clip plane. Position was reconstructed using the method decribed here. For formats marked “Perspective Z/W”, the depth was calculated by interpolating the z and w components of the clip-space position and then dividing in the pixel shader. Position was reconstructed by first reconstructing view-space Z from Z/W using values derived from the projection matrix. For formats marked “1 – Perspective Z/W”, the near and far plane values were flipped when creating the perspective projection matrix. This effectively stores 1 – z/w in the depth buffer. More on that in #9.
So without further rambling, let’s look at some results:
1. Linear Z, 16-bit floating point
So things are not so good on our first try. We get significant errors along the entire visible range with this format, with the error increasing as we get towards the far-clip plane. This makes sense, considering that a floating-point value has more precision closer to 0.0 than it does closer to 1.0.
2. Linear Z, 32-bit floating point
Ahh, much nicer. No visible error at all. It’s pretty clear that if you’re going to manually write depth to a render target, this is a good way to go. Storing into a 32-bit UINT would probably have even better results due to an even distribution of precision, but that format may not be available depending on your platform. In D3D11 you’d also have to add a tiny bit of packing/unpacking code since there’s no UNORM format.
3. Linear Z, 16-bit UINT
For this image I output depth to a render target with the DXGI_FORMAT_R16_UNORM format. As you can see it still has errors, but they’re significantly decreased compared to a 16-bit floating point. It seems to me that if you were going to restrict yourself to 16 bits for depth, this is a way to go.
4. Perspective Z/W, 16-bit floating point
This is easily the worst format out of everything I tested. You’re at a disadvantage right off the bat just from using 16-bits instead of 32, and you also compound that with the non-linear distribution of precision that occurs from storing perspective depth. Then on top of that, you’re encoding to floating point which gives you even worse precision for geometry that’s far from the camera. The results are not pretty…don’t use this!
5. Perspective Z/W, 32-bit floating point
This one isn’t so bad compared to using a 16-bit float, but there’s still error at higher depth values.
6. Perspective Z/W, 16-bit UINT
I used a normal render target for this in my test app, but it should be mostly equivalent to sampling from a 16-bit depth buffer. As you’d expect, quite a bit of error once you move away from the near clip plane.
7. Perspective Z/W, 24-bit UINT
This is the most common depth buffer format, and in my sample app I actually sampled the hardware depth-stencil buffer created from the first rendering pass. Compared to some of the alternatives this really isn’t terrible, and a lot of people have shipped awesome-looking games with this format. The maximum error towards the back is ~0.005%. If the distance to your far plane is very high, the error can be pretty significant.
8. Position, 16-bit floating point
For this format, I just output view-space position straight to a DXGI_FORMAT_R16G16B16A16_FLOAT render target. The only thing this format has going for it is convenience and speed of reconstruction…all you have to do is sample and you have position. In terms of accuracy, the amount of error is pretty close to what you get from storing linear depth in a 16-bit float. All in all…it’s a pretty bad choice.
9. 1 – Z/W, 16-bit floating point
This is where things get a bit interesting. Earlier I mentioned how floating-point values have more precision closer to 0.0 than they do closer to 1.0. It turns out that if you flip your near and far plane such so that you store 1 – z/w in the depth buffer, your two precision distribution issues will mostly cancel each other out. As far as I know this was first proposed by Humus in this Beyond3D thread. He later posted this short article, where elaborated on some of the issues brought up in that thread. As you can see he’s quite right: flipping the clip planes gives significantly improved results. They’re still not great, but clearly we’re getting somewhere.
10. 1 – Z/W, 32-bit floating point
With a 32-bit float, flipping the planes gives us results similar to what we got when storing linear z. Not bad! In D3D10/D3D11 you can even use this format for a depth-stencil buffer…as long as you’re willing to either give up stencil or use 64 bits for depth.
The one format I would have liked to add to this list is a 24-bit float depth-stencil format. This format is available on consoles, and is even exposed in D3D9 as D3DFMT_D24FS8. However according to the caps spreadsheet that comes with DX SDK, only ATI 2000-series and up GPU’s actually support this format. In D3D10/D3D11 there doesn’t even appear to be an equivalent DXGI format, unless I’m missing something.
If there’s any other formats or optimizations out there that you think are worthwhile, please let me know so that I can add them to the test app! Also if you’d to play around with the test app, I’ve upload the source and binaries here. The project uses my new sample framework, which I still consider to be work-in-progress. However if you have any comments about the framework please let me know. I haven’t put in the time to make the components totally separable, but if people are interested then I will take some time to clean things up a bit.
EDIT: I also started a thread here on gamedev.net, to try to get some discussion going on the subject. Feel free to weigh in!