D3D11 came with a whole bunch of new big-ticket features that received plenty of attention and publicity. Things like tessellation, compute shaders, and multithreaded command submission have the subject of many presentations, discussion, and sample apps. However D3D11 also came with a few other features that allow more “traditional” rendering approaches to benefit from the increased programmability of graphics hardware. Unfortunately most of them have gone relatively unnoticed, which isn’t surprising when you consider that most of them have little or no documentation, (much like some of the cool stuff that came in D3D10.1). Not too long ago one of these neat tricks came to my attention by way of John Hable’s blog, which inspired me to dig around a bit and try out some of other neat tricks I was missing out on. Quite a few are briefly described in this presentation from GDC. Here’s a few of my favorites, in no particular order:
1. Conservative depth output: this is something you use for pixel shaders that manually output a depth value. Basically rather than using SV_Depth, you use a variant that also specifiea an inequality. For instance SV_DepthGreater, or SV_DepthLessEqual. The depth you output from the shader must then satisfy the inequality relative to the interpolated depth of the rasterized triangle (if you don’t, the depth value is clamped for you). This allows the GPU to still use early-z cull, since it can still trivially reject pixels for cases where the depth test will always fail for the specified upper/lower bound. So for instance if you render a quad and output DepthGreaterEqual, the GPU can cull pixels where the quad’s depth is greater than the depth buffer value. Don’t bother looking for this one in the documentation…it’s not in there.
2. SV_Coverage as an input: D3D10.1 added the feature to let you output to SV_Coverage in order to manually specify the MSAA coverage mask (which controls how the pixel shader output gets written to the subsamples). In D3D11 you also can take it as an input to your pixel shader to know which of the sample points passed the triangle coverage test. This is really handy for deferred rendering, since you’ll want to mark off edge pixels as those are the only pixels that require you to sample all of the subsamples in the G-Buffer. In D3D10 you could do this with the centroid sampling trick, but it’s much nicer to just skip the intermediate step and get coverage directly. Plus the rules for centroid sampling are a little loosely defined, so I don’t really like relying on it.
3. Programmable interpolation: D3D10/D3D10.1 already had some modifiers you could use for pixel shader attributes that controlled how they were interpolated. For instance you had linear, noPerspective, and centroid. In D3D11 you still have those, but you also have a series of EvaluateAttributeAt* instrinsics that allow you to evaluate an attribute using a specified interpolation mode. Probably the most useful of the bunch of EvaluateAttributeAtSample, which interpolates the attribute to the MSAA sample point for the specified index. Probably the most obvious use case is for selective supersampling…using that intrinsic you could evaluate your BRDF at each subsample location. You can also sample alpha-tested textures multiple times, effectively antialiasing the edges. I whipped up a little test case where I rendered a billboarded quad to an MSAA target, and in the pixel shader I did a simple ray-cast into a sphere located at the quad center. I took SV_Coverage as an input to determine if the pixel was an edge pixel (all sample points weren’t full covered), and in that case I did a ray-cast per-sample using EvaluateAttributeAtSample to snap the interpolated view-space position to each sample point. This basically gives you selective super-sampling, so that you get anti-aliased edges without relying on rasterization. Cool stuff!
4. Read-only depth-stencil views: D3D10 let you bind depth-stencil buffers as shader resource views so that you could sample them in the pixel shader, but came with the restriction that you couldn’t have them simultaneously bound to the pipeline as both views simultaneously. That’s fine for SSAO or depth of field, but not so fine if you want to do deferred rendering and use depth-stencil culling. D3D10.1 added the ability to let you copy depth buffers asynchronously using the GPU, but that’s still not optimal. D3D11 finally makes everything right in the world by letting you create depth-stencil views with a read-only flag. So basically you can just create two DSV’s for your depth buffer, and switch to the read-only one when you want to do depth readback.
5. Unordered access views for pixel shaders: UAV’s are essentially buffers or textures that give you both random read access *and* random write access. They’re usually mentioned in the context of compute shaders, but they’re actually usable for pixel shaders too. I haven’t really dug into this use case, but it seems as though you could implement scatter or fully programmable blending.
After doing some research, I came up with a quick sample app so that I could try out conservative depth output and see the performance results. I ended up basing it off the SoftParticles sample from the SDK, since depth sprites are probably the most obvious use-case for that particular feature. Here’ s some numbers I got running on my machine (Radeon HD 5830) at 1280×720 resolution, with the particles covering most of the viewport:
Basic billboarding: 8.7ms
Depth output enabled: 11.76ms
Conservative depth enabled: 9.52ms
Soft particles w/depth output: 23.25ms
Soft particles w/conservative depth: 18.18ms
So overall it looks like it gets you about halfway back to the performance you get with no depth output, which is pretty nice (especially considering how easy it is to use). In addition to that, I also used a read-only depth-stencil buffer for the soft particles so that I could keep depth testing active.
If you want to run it yourself or check out the code, I uploaded the code + binaries here: https://mynameismjp.files.wordpress.com/2010/11/depthsprites.zip
I’ll also leave you off with a picture of my totally sweet fire/smoke effect. I should quit my job and become an effects artist.
6 thoughts on “Conservative Depth Output (and Other Lesser-Known D3D11 Features)”
Great post! It looks like the conservative depth is quite useful. I’ll have to look more into it.
I wrote a simple Compute Shader in SlimDX a couple months ago that used an Unordered Access View to draw primitives like lines and circles. UAVs are very powerful.
It’s located here, for anyone interested:
Pure awesome information.
Don’t forget about texture cube arrays being new! Jk – not really sure what those are useful for.
Oh and your smoke effect – don’t quit your day job =)
I agree, the “small” features are great: http://www.jshopf.com/blog/?p=166. Granted I wrote that before there was were dx11 drivers available..