New Tutorial: Using PIX With XNA

Ladies and gentlemen, I present you with the most epic of tutorials: Using PIX With XNA.  This 37-page monster teaches PIX for the XNA programmer, and includes an in-depth explanation of the XNA/D3D9 relationship as well as 6 excercises that show you the how to solve common problems (full source code and XNA 3.1 projects included).  I sure hope somebody finds this thing useful…it took me forever to write this thing.

I originally intended to have this tutorial hosted on Ziggyware…in fact I finished this over a month ago and submitted it to Ziggy.  However as you may or may not know, Ziggy has become the unfortunate target of scumbag hackers who have repeatedly hijacked his site in order to deploy malware.  The whole thing absolutely sucks…I really wish that those assholes had decided to hijack a site that wasn’t the most comprehensive collection of community-created XNA resources.  I hope Ziggy figures out a way to shake them and get the site up and running again…but it looks doubtful.  Honestly I don’t think I’d want to keep dealing with the kinds of problems he’s gone though.

Scintillating Snippets: Storing Normals Using Spherical Coordinates

Update:  n00body posted this link in the comments, which is way more in-depth than my post.  Check it out!

If you’ve ever implemented a deferred renderer, you know that one of the important points is keeping your G-Buffer small enough as to be reasonable in terms of bandwidth and your number of render targets.  Thanks to that constant struggle between good and evil, people have come up with some reasonable clever approaches towards packing necessary attributes in your G-Buffer.  One of the more popular approaches is that whole storing depth and reconstructing position thing, and another is packing normals so that you only need 2 components instead of 3.

One of the more simple and common approaches is to only store the X and Y components of your view-space normals and then assume Z is positive (or negative, depending on whether you’re using right-handed or left-handed coordinates).  As far as I know, this was first proposed here by Guerilla Games. However there’s a problem with this approach, which is that you can’t always assume the sign of your Z component when you’re using a perspective projection! This might seem weird at first (heck it took a while for someone to demonstrate to me why this is the case), but I assure you it’s true.  Insomniac has some good pictures here demonstrating the errors that occur.  So this means that if we want to use this technique and avoid errors, we have to pack the sign of Z somewhere in our two values. This is a little nasty, and takes away a bit of precision from one of your other values.

An alternative approach suggested to me a long time ago is to store the normal as a spherical coordinate.  Since a normal is always a unit vector with length = 1, you can (safely) assume that Rho = 1 and just store Thetha and Phi.  Piece of cake!  All you have to do is implement the equations on the wiki page, take out the Rho’s, and you’ve got a two-component normal with excellent precision.

But wait, there’s more!  It turns out if you use some trig-fu, you can actually further optimization to the conversions when Rho is equal to 1.  I was never actually good at simplifying equations with trig functions (I can do everything else, promise!) so I defer to the noble Pat Wilson who gave a quick rundown over in this thread.  Make sure you check out his set of screenshots that demonstrate the errors that occur from different normal storage options, so you can pick which method is right for you.

Also since this is Scinitillating Snippets and it wouldn’t be much fun without a snippet, I’ll post the HLSL functions I use for encoding and decoding my normals.  Just remember, all of the credit goes to Mr. Wilson.  I just did the pilfering!

// Converts a normalized cartesian direction vector
// to spherical coordinates.
float2 CartesianToSpherical(float3 cartesian)
{
  float2 spherical;

  spherical.x = atan2(cartesian.y, cartesian.x) / 3.14159f;
  spherical.y = cartesian.z;

  return spherical * 0.5f + 0.5f;
}

// Converts a spherical coordinate to a normalized
// cartesian direction vector.
float3 SphericalToCartesian(float2 spherical)
{
  float2 sinCosTheta, sinCosPhi;

  spherical = spherical * 2.0f - 1.0f;
  sincos(spherical.x * 3.14159f, sinCosTheta.x, sinCosTheta.y);
  sinCosPhi = float2(sqrt(1.0 - spherical.y * spherical.y), spherical.y);

  return float3(sinCosTheta.y * sinCosPhi.x, sinCosTheta.x * sinCosPhi.x, sinCosPhi.y);    
}

Also keep in mind that these functions normalize the values to the range [0,1], so that you can store in a regular fixed-point texture. If you’re using a floating point texture you can remove the division by PI if you wish (and corresponding multiply by PI in the decode), as well as the “multiply by 0.5, subtract by 0.5″.

Reconstructing Position From Depth, Continued

Picking up where I left off here

As I mentioned, you can also reconstruct a world-space position using the frustum ray technique.  The first step is that you need your frustum corners to be rotated so that they match the current orientation of your camera.  You can do this by transforming the frustum corners by a “camera world matrix”, which is a matrix representing the camera’s position and orientation in world-space.  If you don’t have this available you can just invert your view matrix, which you can actually do by transposing it (since your view matrix should be orthogonal unless you’re doing something really really weird).  I’ll demonstrate doing it right in the vertex shader for the sake of simplicity, but you’d probably want to do it ahead of time in your application code.

// Vertex shader for rendering a full-screen quad
void QuadVS (	in float3 in_vPositionOS		: POSITION,
		in float3 in_vTexCoordAndCornerIndex	: TEXCOORD0,
		out float4 out_vPositionCS		: POSITION,
		out float2 out_vTexCoord		: TEXCOORD0,
		out float3 out_vFrustumCornerWS		: TEXCOORD1	)
{
	// Offset the position by half a pixel to correctly
	// align texels to pixels. Only necessary for D3D9 or XNA
	out_vPositionCS.x = in_vPositionOS.x - (1.0f/g_vOcclusionTextureSize.x);
	out_vPositionCS.y = in_vPositionOS.y + (1.0f/g_vOcclusionTextureSize.y);
	out_vPositionCS.z = in_vPositionOS.z;
	out_vPositionCS.w = 1.0f;

	// Pass along the texture coordinate and the position
	// of the frustum corner in world-space.  This frustum corner
        // position is interpolated so that the pixel shader always
        // has a ray from camera->far-clip plane
	out_vTexCoord = in_vTexCoordAndCornerIndex.xy;
	float3 vFrustumCornerVS = g_vFrustumCornersVS[in_vTexCoordAndCornerIndex.z];
        out_vFrustumCornerWS = mul(vFrustumCornerVS, g_matCameraWorld);
}

So what we’ve done here is we’ve rotated (not translated, since vFrusumCornerVS is only a float3) the view-space frustum corner so that it’s now matches the camera’s orientation.  However it’s still centered around <0,0,0> and not the camera’s world-space position, so when we reconstruct position we’ll also add the camera’s world-space position:

// Pixel shader function for reconstructing world-space position
float3 WSPositionFromDepth(float2 vTexCoord, float3 vFrustumRayWS)
{
	float fPixelDepth = tex2D(DepthSampler, vTexCoord).r;
	return g_vCameraPosWS + fPixelDepth * vFrustumRayWS;
}

And there it is. Easy peasy, lemon squeezy.

The other bit I hinted at was using this same technique with arbitray geometry, for example  the bounding volumes for a local light source.  For this we once again need a ray that points from the camera position through the pixel position to the far-clip plane.  We can do this in the pixel shader by using the view-space position of the pixel.

void VSBoundingVolume(  in float3 in_vPositionOS       : POSITION,
                        out float4 out_vPositionCS     : POSITION,
                        out float3 out_vPositionVS    : TEXCOORD0 )
{
    out_vPositionCS = mul(in_vPositionOS, g_matWorldViewProj);    

    // Pass along the view-space vertex position to the pixel shader
    out_vPositionVS = mul(in_vPositionOS, g_matWorldView);
}

Then in our pixel shader, we calculate the ray and reconstruct position like this:

float3 VSPositionFromDepth(float2 vTexCoord, float3 vPositionVS)
{
    // Calculate the frustum ray using the view-space position.
    // g_fFarCip is the distance to the camera's far clipping plane.
    // Negating the Z component only necessary for right-handed coordinates
    float3 vFrustumRayVS = vPositionVS.xyz * (g_fFarClip/-vPositionVS.z);
    return tex2D(DepthSampler, vTexCoord).x * vFrustumRayVS;
}

So there you go, I did your homework for you.  Now stop beating me up in the schoolyard!

EDIT: Fixed the code and explanation so that it actually works now!  Big thanks to Bill and Josh for pointing out the mistake.

There’s More Than One Way To Defer A Renderer

While the idea of deferred shading/deferred rendering isn’t quite as hot as it was  year or two ago (OMG, Killzone 2 uses deferred rendering!), it’s still a cool idea that gets discussed rather often.  People generally tend to be attracted to way a “pure” deferred renderer neatly and cleanly separates your geometry from your lighting, as well as the idea of being able to throw lights everywhere in their scene.  However as anyone who’s done a little bit of research into the topic surely knows, it comes with a few drawbacks.  The main ones being that for MSAA you need to individually light all your subsamples (which isn’t doable in D3D9), and also that for non-opaque objects you have use forward rendering anyway.

The neat thing about the concepts involved with deferred shading is that you’re not all locked into the typical “render depth+normals+diffuse+specular to a fat G-Buffer and then shade” approach.  I’m not sure enough people are aware of this, and appreciate it.  For example, you can just defer your shadow map calculations to gain the related performance and organization benefits, and then use standard forward rendering techniques for everything else.  Or you can reconfigure the deferred lighting pipeline to gain back the ability to have multiple materials, or the ability to multisample without shading individual subsamples.  Surely there are even more possibilities!

Recently while working on my own game, I was grappling with the issue of having my engine support more local light sources in a scene.   I was using standard forward lighting with up to 3 lights per pass (which was fine), but I really wanted to keep my DrawPrimitives calls to a minium (due to how painful they can be on the 360).  This was problem since I’m aggressively batching my mesh rendering using instancing, and sorting instances by which light affects them would cause by batches to increase.  Thus, I was using 3 “global” light sources per frame.  This has obvious drawbacks.

While I was thinking over solutions, I considered the importance of smaller local lights that are relatively far away in the scene.  At further distances, it’s not necessarilly too important to have “correct” lighting.  In fact, we basically just need something that’s the right color, makes the area brighter, and doesn’t shade surfaces facing away from the light source.  So I thought: “I already have view-space depth…if I can calculate view-space normals I canget what I want by using a deferred pass”.  So I did exactly this…and it didn’t work very well.  The problem was that even though you can a calculate view-space normal from a depth value by calculating the partial derivatives and taking a cross product, the normals you calculate aren’t smoothly interpolated between vertices.  So what you get is something that looks an awful lot like flat shading.  Ewwwwwwwwwwww.

This lead to approach #2:  in the depth-only pass, render to a RGBA16F surface instead of a R32F surface and render out depth + view-space normals as interpolated from the vertex normals.  This worked much better!  The only remaining issue (aside from the fact that I just hard-code a diffuse albedo and specular albedo), is that normal-maps aren’t used.  However even with that those problems the results are still decent, as long as surface colors are primarily determined by your forward rendering pass and the local light are just “extra”.  Here’s screenshots of a test scene with forward rendering, and then with the point lights deferred:

The results are clearly not as good as a full forward pass when you have them side-by-side, but I think they’re probably good enough…especially if I only use this technique for lights that are small or far-away.  The trick is going to be transferring smoothly from deferred to forward, but that’s certainly doable.

One downside that came with this was that since I was just additively blending in the lights, I couldn’t use my beloved LogLuv encoding for HDR.  My next-best option of the 360 was to normalize R10G10B10A2 to a range greater than [0,1].  I ended up having to normalize to [0,8] to get the dynamic range I wanted, and unfortunately this can give some visible banding in certain cases.  And alternative I’ll have to explore is rendering just the point lights to an R10G10B10A2 buffer, and then sending this to my forward rendering pass to be sampled and added to the result.  If I did this I could also use the light prepass approach, and gain back material parameters and proper MSAA for the point lights.

Anyway I’m not saying that what I’m doing is that particularly interesting or useful, I’m just trying to demonstrate that there are many possibilities to explore.  It’s good to think out of the box every once in a while!

Scintillating Snippets: Reconstructing Position From Depth

There are times I wish I’d never responded to this thread over at GDnet, simply because of the constant stream of PM’s that I still get about it.  Wouldn’t it be nice if I could just pull out all the important bits, stick it on some blog, and then link everyone to it?  You’re right, it would be!

First things first: what am I talking about?  I’m talking about something that finds great use for deferred rendering: reconstructing the 3D position of a previously-rendered pixel (either in view-space or world-space) from a single depth value.  In practice, it’s really not terribly complicated.  You intrinsically know (or can figure out) the 2D position of any pixel when you’re shading it, which means that if you can sample a depth value you can get the whole 3D position.  However it’s still easy to get tripped up due to the fact that there’s several ways to go about it, coupled with the fact that many beginners aren’t very proficient at debugging their shaders.

Let’s talk about the first way to do it: storing post-projection z/w, combining it with x/w and y/w, transforming by the inverse of the projection matrix, and dividing by w.  In HLSL it looks something like this…

// Depth pass vertex shader
output.vPositionCS = mul(input.vPositionOS, g_matWorldViewProj);
output.vDepthCS.xy = output.vPositionCS.zw;

// Depth pass pixel shader (output z/w)
return input.vDepthCS.x / input.vDepthVS.y;

// Function for converting depth to view-space position
// in deferred pixel shader pass.  vTexCoord is a texture
// coordinate for a full-screen quad, such that x=0 is the
// left of the screen, and y=0 is the top of the screen.
float3 VSPositionFromDepth(float2 vTexCoord)
{
    // Get the depth value for this pixel
    float z = tex2D(DepthSampler, vTexCoord);  
    // Get x/w and y/w from the viewport position
    float x = vTexCoord.x * 2 - 1;
    float y = (1 - vTexCoord.y) * 2 - 1;
    float4 vProjectedPos = float4(x, y, z, 1.0f);
    // Transform by the inverse projection matrix
    float4 vPositionVS = mul(vProjectedPos, g_matInvProjection);  
    // Divide by w to get the view-space position
    return vPositionVS.xyz / vPositionVS.w;  
}

For many this is the preferred approach since it works with hardware depth buffers.  It also may seem natural to some: we get depth by projection, we get position by un-projecting.  But what if we don’t have access to a hardware depth buffer?  If you’re targeting the PC and D3D9,  sampling from a depth buffer as if it were a texture is not straightforward since it requires driver hacks.  If you’re using XNA, it’s not possible at all since the framework generally attempts to main cross-plaftorm compatibility between the PC and the Xbox 360.  In these cases, we can simply render out a depth buffer ourselves using the vertex and pixel shader bits I posted above.  But is this really a good idea?  z/w is non-linear, and most of the precision will be dedicated to areas very close to the near-clip plane.

A different approach would be to render out normalized view-space z as our depth.  Since it’s view-space it’s linear which means we get uniform precision distribution, and this also means we don’t need to bother with projection or unprojection to reconstruct position.  Instead we can take the approach of CryTek and multiply the depth value with a ray pointing from the camera to the far-clip plane.  In HLSL it goes something like this:

// Shaders for rendering linear depth
void DepthVS(   in float4 in_vPositionOS    : POSITION,
                out float4 out_vPositionCS  : POSITION,
                out float  out_fDepthVS     : TEXCOORD0    )
{    
    // Figure out the position of the vertex in
    // view space and clip space
    float4x4 matWorldView = mul(g_matWorld, g_matView);
    float4 vPositionVS = mul(in_vPositionOS, matWorldView);
    out_vPositionCS = mul(vPositionVS, g_matProj);
    out_fDepthVS = vPositionVS.z;
}

float4 DepthPS(in float in_fDepthVS : TEXCOORD0) : COLOR0
{
    // Negate and divide by distance to far-clip plane
    // (so that depth is in range [0,1])
    // This is for right-handed coordinate system,
    // for left-handed negating is not necessary.
    float fDepth = -in_fDepthVS/g_fFarClip;
    return float4(fDepth, 1.0f, 1.0f, 1.0f);
}

// Shaders for deferred pass where position is reconstructed

// Vertex shader for rendering a full-screen quad
void QuadVS (	in float3 in_vPositionOS		: POSITION,
		in float3 in_vTexCoordAndCornerIndex	: TEXCOORD0,
		out float4 out_vPositionCS		: POSITION,
		out float2 out_vTexCoord		: TEXCOORD0,
		out float3 out_vFrustumCornerVS		: TEXCOORD1	)
{
	// Offset the position by half a pixel to correctly
	// align texels to pixels. Only necessary for D3D9 or XNA
	out_vPositionCS.x = in_vPositionOS.x - (1.0f/g_vOcclusionTextureSize.x);
	out_vPositionCS.y = in_vPositionOS.y + (1.0f/g_vOcclusionTextureSize.y);
	out_vPositionCS.z = in_vPositionOS.z;
	out_vPositionCS.w = 1.0f;

	// Pass along the texture coordinate and the position
	// of the frustum corner in view-space.  This frustum corner
        // position is interpolated so that the pixel shader always
        // has a ray from camera->far-clip plane
	out_vTexCoord = in_vTexCoordAndCornerIndex.xy;
	out_vFrustumCornerVS = g_vFrustumCornersVS[in_vTexCoordAndCornerIndex.z];
}

// Pixel shader function for reconstructing view-space position
float3 VSPositionFromDepth(float2 vTexCoord, float3 vFrustumRayVS)
{
	float fPixelDepth = tex2D(DepthSampler, vTexCoord).r;
	return fPixelDepth * vFrustumRayVS;
}

As you can see the reconstruction is quite nice with linear depth, we only need a single multiply instead of the 4 MADD’s and a divide needed for unprojection.  If you’re curious on how to get the frustum corner position I use, it’s rather easy with a little trig.  This tutorial walks you through it.  Or if you’re using XNA, there’s a super-convient BoundingFrustum class that can take care of it for you.  My code for getting the positions looks something like this:

Matrix viewProjMatrix = viewMatrix * projMatrix;
BoundingFrustum frustum = new BoundingFrustum(viewProjMatrix);
frustum.GetCorners(frustumCornersWS);
Vector3.Transform(frustumCornersWS, ref viewMatrix, frustumCornersVS);
for (int i = 0; i < 4; i++)
    farFrustumCornersVS[i] = frustumCornersVS[i + 4];

The farFrustumCornersVS array is what I send to my vertex shader as shader constants. Then you just need to have an index in your quad vertices that tells you which vertex belongs to which corner (which you could also do with shader math, if you want).  Another approach would be to simply store the corner positions directly in the vertices as texCoord’s.

Extra Credit:  this technique can also be used to to reconstruct world-space position, if that’s what you’re after.  All you need to do is rotate (not translate) your frustum corner positions by the inverse of your view matrix to get them back into world space.  Then when you multiply the interpolated ray with your depth value, you simply add the camera position to the value (ends up being a single MADD).

Extra-Extra Credit: you can use this technique with arbitrary geometry too, not just quads.  You just need to figure out a texture coordinate for each pixel, which you can do by either interpolating the clip-space position and dividing x and y by w, or by using the VPOS semantic.  Then for your frustum ray you just calculate the eye->vertex vector and scale it so that it points all the way back to the far-clip plane.

UPDATE:  Answers to extra credit questions here

Deferred Cascaded Shadow Maps

For my next sample I was planning on extending my deferred shadow maps sample to implement cascaded shadow maps.  I got an email asking about how to make the sample look decent with large viewing distances which is exactly the problem CSM’s solve.  So I decided to bump up my plans a little early and get the code up and running.  It’ll be a while before I get the write-up finished, but until then feel free to play around with code (PC and 360 projects included).

Deferred Shadow Maps Sample

Got a new sample ready, this one  shows how you can defer shadow map calculations to a separate screen-space pass using a depth buffer.  Check it out on Ziggyware!

deferredshadowmaps

Teach Your Effect’s A New Trick

The Effects Framework is a pretty damn awesome tool.  However I’m afraid that’s not totally obvious to a lot of newbies, who either just don’t what it can do or haven’t been exposed to some of the situations where Effect’s can really come in handy.

One neat thing Effect’s can do that isn’t obvious from the documentation or samples is auto-generate variants of shaders for you based on the value of uniform parameters.  For instance let’s take a common scenario: lets say you have a shader for model, and you need it to work for either a point light, a spot light, or a directional light one-at-a-time.  You might write your shader code like this:

int g_iLightType;

float4 ModelPixelShader(in PSInput input) : COLOR0
{
    float4 vColor;
    if (g_iLightType == LIGHT_TYPE_POINT)
        vColor = DoPointLighting(input);
    else if (g_iLightType == LIGHT_TYPE_SPOT)
        vColor = DoSpotLighting(input);
    else
        vColor = DoDirectionalLighting(input);

    return vColor;
}

Alright, so this works.  The app sets the  g_iLightType shader parameter, and the right calculations get used. However is it optimal?  We’ve got these if statements in there, and maybe we’re not sure what they’ll get compiled into depending on the shader profile we’re targetting.  And maybe we’re not sure what the heck the driver is going to do once it gets the compiled shader.  Wouldn’t it be nice if we could avoid that?  Of course it would.  So let’s make some small changes:

float4 ModelPixelShader(in PSInput input, uniform int iLightType) : COLOR0
{   
    float4 vColor;
    if (iLightType == LIGHT_TYPE_POINT)
        vColor = DoPointLighting(input);
    else if (iLightType == LIGHT_TYPE_SPOT)
        vColor = DoSpotLighting(input);
    else
        vColor = DoDirectionalLighting(input);
}

technique PointLight
{
    pass p0
    {
        VertexShader = compile vs_2_0 ModelVertexShader();
        PixelShader = compile ps_2_0 ModelPixelShader(LIGHT_TYPE_POINT);       
    }
}

technique SpotLight
{
    pass p0
    {
        VertexShader = compile vs_2_0 ModelVertexShader();
        PixelShader = compile ps_2_0 ModelPixelShader(LIGHT_TYPE_SPOT);       
    }
}

technique DirectionalLight
{
    pass p0
    {
        VertexShader = compile vs_2_0 ModelVertexShader();
        PixelShader = compile ps_2_0 ModelPixelShader(LIGHT_TYPE_DIRECTIONAL);       
    }
}

Very similar, but one big difference: the HLSL code branches on a uniform int parameter to the pixel shader function, whose value is set in our technique declaration.  This means that the Effect knows that this parameter has a constant value for that entire technique, which allows it to generate a seperate shader for each technique where the parameter is a constant and not a variable.  Since it’s a constant for each shader variant, no branching of any sort is necessary.  Now our app just picks the technique it wants for each light source it’s handling, rather than setting a shader parameter.

Now keep in mind that using separate shaders like this will have performance implications:  switching vertex or pixel shaders has an associated overhead, and if you auto-generate different variants like we did above you’ll be switching shaders more than if you used one big shader.  Whether or not it’s a performance win will depend on what you’re doing.  However, it’s always good to be aware of all the neat tricks your tools can pull off.