A friend of mine once told me that you could use “back in the habit” as the subtitle for any movie sequel. I think it works.
So a lot of people still have trouble with reconstructing position from depth thing, judging by the emails I get and also the threads I see in the gamedev forums made by people who read my earlier blog posts. Can’t say I blame them…it’s pretty tricky, and easy to screw up. So I’m going to take it again from the beginning and try to explain some of the actual math behind the code, in hopes that a more generalized approach will help people get through the little quirks they’ll have to deal with in their own implementations. And I’ll throw some shader code in there too for good measure. Oh and I should mention that I’m going to do everything with a left-handed coordinate system, but it shouldn’t be hard to convert to right-handed.
Let’s start with the basics of a perspective projection. For every pixel on the screen there’s a direction vector associated with it. You can figure out this direction vector by using the screen-space XY position of the pixel to lerp between the positions of the frustum corners, subtracting the camera position, and normalizing (you don’t have to subtract the camera position if you’re doing this in view space, since the camera position is 0). If geometry is rasterized at that pixel position, that means the surface at that pixel lies somewhere along that vector. The distance along that vector will vary depending on how far that geometry is from the camera, but the direction is always the same. This should sound familiar to anyone who’s written a ray tracer before, because it’s the exact concept used for primary rays: for a pixel on the near or far clip plane, get the direction from the camera to that pixel and check for intersections. What this ultimately means is that if we have the screen space pixel position and the camera position, we can figure out the position of the triangle surface if have the distance from the camera to the surface. Here’s an artfully-crafted diagram showing how this works:
With this is mind, you might be thinking that if we stored the distance from the camera to the triangle surface in a G-Buffer pass then it would be really easy to reconstruct position from it. And you’d be totally right. The basic steps go like this:
- In the pixel shader of the G-Buffer pass, calculate the distance from the camera to the surface being shaded and write it out to the depth texture
- In the vertex shader of the light pass, calculate the direction vector from the camera position to the vertex (we’ll call it the view ray).
- In the pixel shader, normalize the view ray vector
- Sample the depth texture to get the distance from the camera to the G-Buffer surface
- Multiply the sampled distance with the view ray
- Add the camera position
This is simple, cheap, and works in both view space and world space. In view space it’s a little cheaper and easier because the camera position is (0,0,0), so you can simplify the math. Plus the view ray is just the normalized view space position of the pixel. For a full-screen quad, you can get the view space position of the quad vertices either by directly mapping the verts to frustum corners, or by applying the inverse of your projection matrix. Then from there you can go back to world space if you want by applying the inverse of your view matrix (the camera world matrix). Here’s what the code might look like for doing it in world space (since people seem to like sticking to world space, despite the advantages of view space):
// G-Buffer vertex shader // Calculate view space position of the vertex and pass it to the pixel shader output.PositionVS = mul(input.PositionOS, WorldViewMatrix).xyz; // G-Buffer pixel shader // Calculate the length of the view space position to get the distance from camera->surface output.Distance.x = length(input.PositionVS); // Light vertex shader #if PointLight || SpotLight // Calculate the world space position for a light volume float3 positionWS = mul(input.PositionOS, WorldMatrix); #elif DirectionalLight // Calculate the world space position for a full-screen quad (assume input vertex coordinates are in [-1,1] post-projection space) float3 positionWS = mul(input.PositionOS, InvViewProjMatrix); #endif // Calculate the view ray output.ViewRay = positionWS - CameraPositionWS; // Light Pixel shader // Normalize the view ray, and apply the original distance to reconstruct position float3 viewRay = normalize(input.ViewRay); float viewDistance = DistanceTexture.Sample(PointSampler, texCoord); float3 positionWS = CameraPositionWS + viewRay * viewDistance;
Like I said it’s piece of cake, and I suspect that for a lot of people it’s efficient enough. But we’re not done yet, since we can still optimize things further if we stick to view space. We also want may want to use a hardware depth buffer as opposed to manually storing a distance value. So let’s dig deeper. Here’s a diagram showing another way of looking at the problem:
This time instead of using a normalized direction vector for the view ray, we extrapolate the ray all the way back until it intersects with the far clip plane. When we do this, it means that the position at the end of the view ray is at a known depth relative to the camera position and the direction the camera is looking (the depth is the far clip plane distance). In view space it means that the view ray has a Z component equal to the far clip plane. Since the Z component is a known value we no longer need to normalize the view ray vector. Instead we can multiply by a value that scales along the camera’s z-axis to get the final reconstructed position. In the case where Z = FarClipDistance, we want to scale by a ratio of the original surface depth relative to the far clip plane. In other words, the surface’s view space Z divided by the far clip distance. In code it looks like this:
// G-Buffer vertex shader // Calculate view space position of the vertex and pass it to the pixel shader output.PositionVS = mul(input.PositionOS, WorldViewMatrix).xyz; // G-Buffer pixel shader // Divide view space Z by the far clip distance output.Depth.x = input.PositionVS.z / FarClipDistance; // Light vertex shader #if PointLight || SpotLight // Calculate the view space vertex position output.PositionVS = mul(input.PositionOS, WorldViewMatrix); #elif DirectionalLight // Calculate the view space vertex position (you can also just directly map the vertex to a frustum corner to avoid the transform) output.PositionVS = mul(input.PositionOS, InvProjMatrix); #endif // Light Pixel shader #if PointLight || SpotLight // Extrapolate the view space position to the far clip plane float3 viewRay = float3(input.PositionVS.xy * (FarClipDistance / input.PositionVS.z), FarClipDistance); #elif DirectionalLight // For a directional light, the vertices were already on the far clip plane so we don't need to extrapolate float3 viewRay = input.PositionVS.xyz; #endif // Sample the depth and scale the view ray to reconstruct view space position float normalizedDepth = DepthTexture.Sample(PointSampler, texCoord).x; float3 positionVS = viewRay * normalizedDepth;
As you can see this is a bit cheaper, especially for the full-screen quad case. One thing to be aware of is that since with this we store normalized depth, it’s always in the range [0,1]. This means you can store it in a normalized integer format (such as DXGI_FORMAT_R16_UNORM) without having to do any rescaling after you sample it. A floating point format will obviously handle it just fine as well.
Now let’s say we want to sample a hardware depth buffer instead of writing out our own depth or distance value to the G-Buffer. This makes sense in a lot of cases, since you’re already using the memory and bandwidth to fill the depth buffer so you might as well make use of it. A hardware depth buffer will store the post-projection Z value divided by the post-projection W value, where W is equal to the view-space Z component of the surface position (for more information see this). This makes the value initially unsuitable for our needs, but fortunately it’s possible to recover the view-space Z from this using the parameters of the perspective projection. Once we do that, we can convert it to a normalized depth value if we want and proceed normally. However this is unnecessary. Instead of extrapolating the view ray to the far clip plane, if we instead clamp it to the plane at Z = 1 we can then scale it by the view space Z without having to manipulate it first. Here’s the code:
// Light vertex shader #if PointLight || SpotLight // Calculate the view space vertex position output.PositionVS = mul(input.PositionOS, WorldViewMatrix); #elif DirectionalLight // For a directional light we can clamp in the vertex shader, since we only interpolate in the XY direction float3 positionVS = mul(input.PositionOS, InvProjMatrix); output.ViewRay = float3(positionVS.xy / positionVS.z, 1.0f); #endif // Light Pixel shader #if PointLight || SpotLight // Clamp the view space position to the plane at Z = 1 float3 viewRay = float3(input.PositionVS.xy / input.PositionVS.z, 1.0f); #elif DirectionalLight // For a directional light we already clamped in the vertex shader float3 viewRay = input.ViewRay.xyz; #endif // Calculate our projection constants (you should of course do this in the app code, I'm just showing how to do it) ProjectionA = FarClipDistance / (FarClipDistance - NearClipDistance); ProjectionB = (-FarClipDistance * NearClipDistance) / (FarClipDistance - NearClipDistance); // Sample the depth and convert to linear view space Z (assume it gets sampled as // a floating point value of the range [0,1]) float depth = DepthTexture.Sample(PointSampler, texCoord).x; float linearDepth = ProjectionB / (depth - ProjectionA); float3 positionVS = viewRay * linearDepth;
It’s also possible to use a hardware depth buffer with the first method, if you want to work in an arbitrary coordinate space. The trick is to project the view ray onto the camera’s z axis (AKA the camera’s forward vector or lookAt vector), and use that to figure out a proper scaling value. The light pixel shader goes something like this:
// Normalize the view ray float3 viewRay = normalize(input.ViewRay); // Sample the depth buffer and convert it to linear depth float depth = DepthTexture.Sample(PointSampler, texCoord).x; float linearDepth = ProjectionB / (depth - ProjectionA); // Project the view ray onto the camera's z-axis float viewZDist = dot(EyeZAxis, viewRay); // Scale the view ray by the ratio of the linear z value to the projected view ray float3 positionWS = CameraPositionWS + viewRay * (linearDepth / viewZDist);
Alright, so I think that’s about it! If anything is unclear or if I made any mistakes, go ahead and let me know. For an actual working sample showing some of these techniques, you can have a look at the sample for my article on depth precision.
02/23/1985 – Fixed typo in view space volume reconstruction