Deferred Shadow Maps Sample

Got a new sample ready, this one  shows how you can defer shadow map calculations to a separate screen-space pass using a depth buffer.  Check it out on Ziggyware!

deferredshadowmaps

About these ads

7 comments

  1. Hi there!
    Quite advanced tutorials and samples you have here.
    Learning quite a lot with these, still I don’t understand some parts of it.

    (Feel free to edit/trim/publish, it’s a bit long).

    I have several questions regarding this deferred shadow map sample:

    - I understand the benefit gain (program-wise) by detaching the shadows from the geometry render. Even turning it off in your example is a simple switch to a 1×1 white-cleared renderTarget!. However I don’t see a clear performance gain, in the pixel shader each pixel has to be transformed to light space to calculate the texcoord that will be used to sample from the shadowMap. Is it really faster than doing it in the vertexShader at the geometry pass?
    - For 2, 3, and 4 shadows it would be needed 2/3/4 lightSpace transforms per pixel, am I right? (I still quite don’t get how the Crytek team render four shadows at the same time, if that’s what it is they do).
    - Is there a way to actually bypass that lightspace transformation like the viewSpace reconstruction from depth using a linear Depth?
    - All shadow maps examples I have seen and the one I have implemented use a non-linear depth buffer, is there an specific benefit/rule to that?
    - (Crazy idea, just though): Screen Space this and Screen Space that… Could it be possible to do “ScreenSpace Shadows?”. With the limited geometry information stored in the depth I believe that it might be possible to create a shadowMap on the fly with the points recovered from the camera depthMap. It may work with not too perpendicular lights? Like a FPS flashlight or a top down game with a sun fairly high sunlight. From your knowledge in the matter, is this possible?

    Thanks.
    Alejandro.

  2. Hi Alejandro.

    -The reason that a deferred shadow pass *can* be faster is because you’ll be efficient with your quad usage. When you rasterize thousands of small triangles, you’ll end up with a lot of quads of pixel shaders where <4 are running, but if you render two triangles that cover the entire screen you'll be using all 4 in most cases. Taking the shadow calculations out of your main pixel shader for opaques can also lower your GPR usage, which will increase the number of threads you can have in flight. But like anything related to performance it depends on the hardware and what else is going on in your renderer. I'd suggest doing some profiling if you want solid answers.

    -I don't think they do 4 shadows at once. It wouldn't make sense, because for any local spot/point lights you wouldn't want to do the shadow calculations for the entire screen. You would instead want to use a scissor rectangle or a bounding volume, much like you would with deferred rendering. Plus if you do it this way it lets you reuse your shadow map memory, since you only need one shadow map in memory at a time.

    -Non-linear depth typically isn't ideal, since it has a non-uniform distribution of precision. You end up with much of your precision dedicated to the area close to the near clip plane. So if you can help it, it's best to use a linear depth metric. Many people will use non-linear depth because they'll use hardware Z buffers for rendering their shadow maps, since this is generally quicker (most hardware can write Z-only at double speed) and also lets you use the vendor-specific hardware PCF or Fetch4 extensions. But of course in XNA you can't use these things so you're stuck outputting to a floating-point texure, in which case you might as well store linear depth.

    -I'd imagine the FPS flashlight is really the only case that would work well for screen-space shadows, but shadow maps are already quite good for this case since you don't need to render many objects to the shadow map (narrow view frustum) and because you can usually get away a low-res shadow map. You'd also probably need a decent number of samples to avoid artifacts, in which case you'll start to negate any performance benefits. But of course you don't know for sure until you try it. :P

  3. 1./2. Points taken!
    3. That’s quite a boost for the shadow map render and sampling (HW PCF or Ati’s Fetch4). Guess I’ll likely stick to linear shadow maps.
    By the way found this http://www.mvps.org/directx/articles/linear_z/linearz.htm, they premultiply in the vertex shader vPos.Z *= vPos.W / farClip. So when the hardware converts to homogenous space (vPos / W) the Z component stays linearized (Z * W) / W. You might find it interesting, however, as you say not too much that can be done with a hardware Depth Buffer in XNA (regarding shadows). Maybe save texCoords and interpolants in pixelShader 3.0 directly using vPos semantic for the shadowMap generation?

    4. Maybe I could try it someday, just for the fun of it and understanding the manipulation of heightfields and the like.

    Thanks again Matt.

  4. Forgot something that I found interesting also.
    The farClip division can be done in the application, right into the camera’s viewProjection Matrix.
    So it only adds one multiply operation.

  5. Sorry to revive an old thread… Just read elsewhere on the net, Crysis does 4 shadows by putting one lights shadow into each of the color channels of a render target, and doing the shadows deferred in screenspace. I want to get this working in webgl.. researching it now.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s