Reconstructing Position From Depth, Continued

Picking up where I left off here

As I mentioned, you can also reconstruct a world-space position using the frustum ray technique.  The first step is that you need your frustum corners to be rotated so that they match the current orientation of your camera.  You can do this by transforming the frustum corners by a “camera world matrix”, which is a matrix representing the camera’s position and orientation in world-space.  If you don’t have this available you can just invert your view matrix.  I’ll demonstrate doing it right in the vertex shader for the sake of simplicity, but you’d probably want to do it ahead of time in your application code.

// Vertex shader for rendering a full-screen quad
void QuadVS (	in float3 in_vPositionOS		: POSITION,
		in float3 in_vTexCoordAndCornerIndex	: TEXCOORD0,
		out float4 out_vPositionCS		: POSITION,
		out float2 out_vTexCoord		: TEXCOORD0,
		out float3 out_vFrustumCornerWS		: TEXCOORD1	)
{
	// Offset the position by half a pixel to correctly
	// align texels to pixels. Only necessary for D3D9 or XNA
	out_vPositionCS.x = in_vPositionOS.x - (1.0f/g_vOcclusionTextureSize.x);
	out_vPositionCS.y = in_vPositionOS.y + (1.0f/g_vOcclusionTextureSize.y);
	out_vPositionCS.z = in_vPositionOS.z;
	out_vPositionCS.w = 1.0f;

	// Pass along the texture coordinate and the position
	// of the frustum corner in world-space.  This frustum corner
        // position is interpolated so that the pixel shader always
        // has a ray from camera->far-clip plane
	out_vTexCoord = in_vTexCoordAndCornerIndex.xy;
	float3 vFrustumCornerVS = g_vFrustumCornersVS[in_vTexCoordAndCornerIndex.z];
        out_vFrustumCornerWS = mul(vFrustumCornerVS, g_matCameraWorld);
}

So what we’ve done here is we’ve rotated (not translated, since vFrusumCornerVS is only a float3) the view-space frustum corner so that it’s now matches the camera’s orientation.  However it’s still centered around <0,0,0> and not the camera’s world-space position, so when we reconstruct position we’ll also add the camera’s world-space position:

// Pixel shader function for reconstructing world-space position
float3 WSPositionFromDepth(float2 vTexCoord, float3 vFrustumRayWS)
{
	float fPixelDepth = tex2D(DepthSampler, vTexCoord).r;
	return g_vCameraPosWS + fPixelDepth * vFrustumRayWS;
}

And there it is. Easy peasy, lemon squeezy.

The other bit I hinted at was using this same technique with arbitray geometry, for example  the bounding volumes for a local light source.  For this we once again need a ray that points from the camera position through the pixel position to the far-clip plane.  We can do this in the pixel shader by using the view-space position of the pixel.

void VSBoundingVolume(  in float3 in_vPositionOS       : POSITION,
                        out float4 out_vPositionCS     : POSITION,
                        out float3 out_vPositionVS    : TEXCOORD0 )
{
    out_vPositionCS = mul(in_vPositionOS, g_matWorldViewProj);    

    // Pass along the view-space vertex position to the pixel shader
    out_vPositionVS = mul(in_vPositionOS, g_matWorldView);
}

Then in our pixel shader, we calculate the ray and reconstruct position like this:

float3 VSPositionFromDepth(float2 vTexCoord, float3 vPositionVS)
{
    // Calculate the frustum ray using the view-space position.
    // g_fFarCip is the distance to the camera's far clipping plane.
    // Negating the Z component only necessary for right-handed coordinates
    float3 vFrustumRayVS = vPositionVS.xyz * (g_fFarClip/-vPositionVS.z);
    return tex2D(DepthSampler, vTexCoord).x * vFrustumRayVS;
}

So there you go, I did your homework for you.  Now stop beating me up in the schoolyard!

EDIT: Fixed the code and explanation so that it actually works now!  Big thanks to Bill and Josh for pointing out the mistake.

UPDATE: More position from depth goodness here

There’s More Than One Way To Defer A Renderer

While the idea of deferred shading/deferred rendering isn’t quite as hot as it was  year or two ago (OMG, Killzone 2 uses deferred rendering!), it’s still a cool idea that gets discussed rather often.  People generally tend to be attracted to way a “pure” deferred renderer neatly and cleanly separates your geometry from your lighting, as well as the idea of being able to throw lights everywhere in their scene.  However as anyone who’s done a little bit of research into the topic surely knows, it comes with a few drawbacks.  The main ones being that for MSAA you need to individually light all your subsamples (which isn’t doable in D3D9), and also that for non-opaque objects you have use forward rendering anyway.

The neat thing about the concepts involved with deferred shading is that you’re not all locked into the typical “render depth+normals+diffuse+specular to a fat G-Buffer and then shade” approach.  I’m not sure enough people are aware of this, and appreciate it.  For example, you can just defer your shadow map calculations to gain the related performance and organization benefits, and then use standard forward rendering techniques for everything else.  Or you can reconfigure the deferred lighting pipeline to gain back the ability to have multiple materials, or the ability to multisample without shading individual subsamples.  Surely there are even more possibilities!

Recently while working on my own game, I was grappling with the issue of having my engine support more local light sources in a scene.   I was using standard forward lighting with up to 3 lights per pass (which was fine), but I really wanted to keep my DrawPrimitives calls to a minium (due to how painful they can be on the 360).  This was problem since I’m aggressively batching my mesh rendering using instancing, and sorting instances by which light affects them would cause by batches to increase.  Thus, I was using 3 “global” light sources per frame.  This has obvious drawbacks.

While I was thinking over solutions, I considered the importance of smaller local lights that are relatively far away in the scene.  At further distances, it’s not necessarilly too important to have “correct” lighting.  In fact, we basically just need something that’s the right color, makes the area brighter, and doesn’t shade surfaces facing away from the light source.  So I thought: “I already have view-space depth…if I can calculate view-space normals I canget what I want by using a deferred pass”.  So I did exactly this…and it didn’t work very well.  The problem was that even though you can a calculate view-space normal from a depth value by calculating the partial derivatives and taking a cross product, the normals you calculate aren’t smoothly interpolated between vertices.  So what you get is something that looks an awful lot like flat shading.  Ewwwwwwwwwwww.

This lead to approach #2:  in the depth-only pass, render to a RGBA16F surface instead of a R32F surface and render out depth + view-space normals as interpolated from the vertex normals.  This worked much better!  The only remaining issue (aside from the fact that I just hard-code a diffuse albedo and specular albedo), is that normal-maps aren’t used.  However even with that those problems the results are still decent, as long as surface colors are primarily determined by your forward rendering pass and the local light are just “extra”.  Here’s screenshots of a test scene with forward rendering, and then with the point lights deferred:

The results are clearly not as good as a full forward pass when you have them side-by-side, but I think they’re probably good enough…especially if I only use this technique for lights that are small or far-away.  The trick is going to be transferring smoothly from deferred to forward, but that’s certainly doable.

One downside that came with this was that since I was just additively blending in the lights, I couldn’t use my beloved LogLuv encoding for HDR.  My next-best option of the 360 was to normalize R10G10B10A2 to a range greater than [0,1].  I ended up having to normalize to [0,8] to get the dynamic range I wanted, and unfortunately this can give some visible banding in certain cases.  And alternative I’ll have to explore is rendering just the point lights to an R10G10B10A2 buffer, and then sending this to my forward rendering pass to be sampled and added to the result.  If I did this I could also use the light prepass approach, and gain back material parameters and proper MSAA for the point lights.

Anyway I’m not saying that what I’m doing is that particularly interesting or useful, I’m just trying to demonstrate that there are many possibilities to explore.  It’s good to think out of the box every once in a while!

Scintillating Snippets: Reconstructing Position From Depth

There are times I wish I’d never responded to this thread over at GDnet, simply because of the constant stream of PM’s that I still get about it.  Wouldn’t it be nice if I could just pull out all the important bits, stick it on some blog, and then link everyone to it?  You’re right, it would be!

First things first: what am I talking about?  I’m talking about something that finds great use for deferred rendering: reconstructing the 3D position of a previously-rendered pixel (either in view-space or world-space) from a single depth value.  In practice, it’s really not terribly complicated.  You intrinsically know (or can figure out) the 2D position of any pixel when you’re shading it, which means that if you can sample a depth value you can get the whole 3D position.  However it’s still easy to get tripped up due to the fact that there’s several ways to go about it, coupled with the fact that many beginners aren’t very proficient at debugging their shaders.

Let’s talk about the first way to do it: storing post-projection z/w, combining it with x/w and y/w, transforming by the inverse of the projection matrix, and dividing by w.  In HLSL it looks something like this…

// Depth pass vertex shader
output.vPositionCS = mul(input.vPositionOS, g_matWorldViewProj);
output.vDepthCS.xy = output.vPositionCS.zw;

// Depth pass pixel shader (output z/w)
return input.vDepthCS.x / input.vDepthVS.y;

// Function for converting depth to view-space position
// in deferred pixel shader pass.  vTexCoord is a texture
// coordinate for a full-screen quad, such that x=0 is the
// left of the screen, and y=0 is the top of the screen.
float3 VSPositionFromDepth(float2 vTexCoord)
{
    // Get the depth value for this pixel
    float z = tex2D(DepthSampler, vTexCoord);  
    // Get x/w and y/w from the viewport position
    float x = vTexCoord.x * 2 - 1;
    float y = (1 - vTexCoord.y) * 2 - 1;
    float4 vProjectedPos = float4(x, y, z, 1.0f);
    // Transform by the inverse projection matrix
    float4 vPositionVS = mul(vProjectedPos, g_matInvProjection);  
    // Divide by w to get the view-space position
    return vPositionVS.xyz / vPositionVS.w;  
}

For many this is the preferred approach since it works with hardware depth buffers.  It also may seem natural to some: we get depth by projection, we get position by un-projecting.  But what if we don’t have access to a hardware depth buffer?  If you’re targeting the PC and D3D9,  sampling from a depth buffer as if it were a texture is not straightforward since it requires driver hacks.  If you’re using XNA, it’s not possible at all since the framework generally attempts to main cross-plaftorm compatibility between the PC and the Xbox 360.  In these cases, we can simply render out a depth buffer ourselves using the vertex and pixel shader bits I posted above.  But is this really a good idea?  z/w is non-linear, and most of the precision will be dedicated to areas very close to the near-clip plane.

A different approach would be to render out normalized view-space z as our depth.  Since it’s view-space it’s linear which means we get uniform precision distribution, and this also means we don’t need to bother with projection or unprojection to reconstruct position.  Instead we can take the approach of CryTek and multiply the depth value with a ray pointing from the camera to the far-clip plane.  In HLSL it goes something like this:

// Shaders for rendering linear depth
void DepthVS(   in float4 in_vPositionOS    : POSITION,
                out float4 out_vPositionCS  : POSITION,
                out float  out_fDepthVS     : TEXCOORD0    )
{    
    // Figure out the position of the vertex in
    // view space and clip space
    float4x4 matWorldView = mul(g_matWorld, g_matView);
    float4 vPositionVS = mul(in_vPositionOS, matWorldView);
    out_vPositionCS = mul(vPositionVS, g_matProj);
    out_fDepthVS = vPositionVS.z;
}

float4 DepthPS(in float in_fDepthVS : TEXCOORD0) : COLOR0
{
    // Negate and divide by distance to far-clip plane
    // (so that depth is in range [0,1])
    // This is for right-handed coordinate system,
    // for left-handed negating is not necessary.
    float fDepth = -in_fDepthVS/g_fFarClip;
    return float4(fDepth, 1.0f, 1.0f, 1.0f);
}

// Shaders for deferred pass where position is reconstructed

// Vertex shader for rendering a full-screen quad
void QuadVS (	in float3 in_vPositionOS		: POSITION,
		in float3 in_vTexCoordAndCornerIndex	: TEXCOORD0,
		out float4 out_vPositionCS		: POSITION,
		out float2 out_vTexCoord		: TEXCOORD0,
		out float3 out_vFrustumCornerVS		: TEXCOORD1	)
{
	// Offset the position by half a pixel to correctly
	// align texels to pixels. Only necessary for D3D9 or XNA
	out_vPositionCS.x = in_vPositionOS.x - (1.0f/g_vOcclusionTextureSize.x);
	out_vPositionCS.y = in_vPositionOS.y + (1.0f/g_vOcclusionTextureSize.y);
	out_vPositionCS.z = in_vPositionOS.z;
	out_vPositionCS.w = 1.0f;

	// Pass along the texture coordinate and the position
	// of the frustum corner in view-space.  This frustum corner
        // position is interpolated so that the pixel shader always
        // has a ray from camera->far-clip plane
	out_vTexCoord = in_vTexCoordAndCornerIndex.xy;
	out_vFrustumCornerVS = g_vFrustumCornersVS[in_vTexCoordAndCornerIndex.z];
}

// Pixel shader function for reconstructing view-space position
float3 VSPositionFromDepth(float2 vTexCoord, float3 vFrustumRayVS)
{
	float fPixelDepth = tex2D(DepthSampler, vTexCoord).r;
	return fPixelDepth * vFrustumRayVS;
}

As you can see the reconstruction is quite nice with linear depth, we only need a single multiply instead of the 4 MADD’s and a divide needed for unprojection.  If you’re curious on how to get the frustum corner position I use, it’s rather easy with a little trig.  This tutorial walks you through it.  Or if you’re using XNA, there’s a super-convient BoundingFrustum class that can take care of it for you.  My code for getting the positions looks something like this:

Matrix viewProjMatrix = viewMatrix * projMatrix;
BoundingFrustum frustum = new BoundingFrustum(viewProjMatrix);
frustum.GetCorners(frustumCornersWS);
Vector3.Transform(frustumCornersWS, ref viewMatrix, frustumCornersVS);
for (int i = 0; i < 4; i++)
    farFrustumCornersVS[i] = frustumCornersVS[i + 4];

The farFrustumCornersVS array is what I send to my vertex shader as shader constants. Then you just need to have an index in your quad vertices that tells you which vertex belongs to which corner (which you could also do with shader math, if you want).  Another approach would be to simply store the corner positions directly in the vertices as texCoord’s.

Extra Credit:  this technique can also be used to to reconstruct world-space position, if that’s what you’re after.  All you need to do is rotate (not translate) your frustum corner positions by the inverse of your view matrix to get them back into world space.  Then when you multiply the interpolated ray with your depth value, you simply add the camera position to the value (ends up being a single MADD).

Extra-Extra Credit: you can use this technique with arbitrary geometry too, not just quads.  You just need to figure out a texture coordinate for each pixel, which you can do by either interpolating the clip-space position and dividing x and y by w, or by using the VPOS semantic.  Then for your frustum ray you just calculate the eye->vertex vector and scale it so that it points all the way back to the far-clip plane.

UPDATE:  Answers to extra credit questions here

UPDATE 2: More info here

Closing the comments for now, because I keep getting spam comments

Scintillating Snippets: Programatically Adding Content To A Content Project

One of the tools I made for my current project is a model editor.  Basically it can import in .fbx or .x models, and then you can apply my custom effects, set parameters, set textures, and then save it using my custom model format I named “.jsm” (it’s just XML…don’t tell anyone!).  Anyway one of the neat features I wanted it to have was the ability to add a model to my game’s Content project so that you wouldn’t have to manually do it through Visual Studio.  And since the Content Pipeline uses MSBuild, this is easy to do:

// Load up the content project
Engine.GlobalEngine.BinPath = System.Runtime.InteropServices.RuntimeEnvironment.GetRuntimeDirectory();
Project contentProject = new Project();
contentProject.Load(projectFileName);

// Add it
BuildItem newItem = contentProject.AddNewItem("Compile", "Models\\" + modelName + ".fbx");
newItem.SetMetadata("Link", "Models\\" + modelName + ".fbx");
newItem.SetMetadata("Name", modelName);
newItem.SetMetadata("Importer", "FbxImporter");
newItem.SetMetadata("Processor", "ModelProcessor");

// Save it
contentProject.Save(projectFileName);

This is of course the generic version and not the actual code I used, but you get the idea.  The “projectFileName” string should contain a path to your Content.contentproj file in your Content subfolder.  “modelName” would just be a name for your model, minus the extension.    What’s going on is pretty simple:  I load up the Content project using the Engine and Project classes found in Microsoft.Build.BuildEngine.  Then I create a new BuildItem for the model, which I add to the Project.   When I create the BuildItem, the string I send to the constructor contains the path to the model file relative to the .contentproj file.  The first bit of metadata specifies that I want to add the file as a link, not as a copy.  The string specifies how the file shows up in the project hierarchy (AKA, how it will show up when you expand the Content node in Visual Studio).  The second bit of metadata is just a name associated with the file.  Then the third specifies the ContentImporter to use, and the fourth specifies the ContentProcessor to use.

Deferred Cascaded Shadow Maps

For my next sample I was planning on extending my deferred shadow maps sample to implement cascaded shadow maps.  I got an email asking about how to make the sample look decent with large viewing distances which is exactly the problem CSM’s solve.  So I decided to bump up my plans a little early and get the code up and running.  It’ll be a while before I get the write-up finished, but until then feel free to play around with code (PC and 360 projects included).

Profiling Events vs. Virtual Functions On The 360

Over the past week or so I’ve been completely reworking my collision system in order to better decouple it from other areas of code, and also make it more flexible.  One part I got stuck on for a bit was deciding on the mechanism to use for notifying owners of collision components when the component collides with something.  I narrowed it down to two options:

-notify owners via the ICollisionOwner interface I was using

OR

-use an Event

I was leaning more towards events because I felt their semantics naturally fit with the usage pattern I was working.  If game entities want to be notified, they simply subscribe and they get notified.  This seemed cleaner and easier to understand than letting each collision component have some sort of  “NotifyOwner” flag, and then call a virtual function if the flag was true.  However I was a little worried about performance…I hadn’t really used delegates on the 360 before and I wanted to make sure that the overhead wasn’t going to be something astronomical before proceeding. So I set up a simple test harness that vaguely resembled how I was going to use events:

public delegate void EventDelegate(object sender, ref Vector3 parameter);

public class EventServer
{
    public event EventDelegate SomeEvent;

    public void RaiseEvent()
    {
        Vector3 param = new Vector3();

        if (SomeEvent != null)
            SomeEvent(this, ref param);

        //for (int i = 0; i < Handlers.Count; i++)
        //{
        //    if (Handlers[i].HandlesEvent)
        //        Handlers[i].HandleEventVirtual(this, ref param);
        //}
    }

    public List<IEventHandler> Handlers = new List<IEventHandler>();
}

public interface IEventHandler
{
    void HandleEventVirtual(object sender, ref Vector3 parameter);
    bool HandlesEvent
    {
        get;
    }
}

public class EventHandler : IEventHandler
{
    EventServer server;
    bool handleEvent;

    public EventHandler(EventServer server, bool handleEvent)
    {
        this.server = server;
        this.handleEvent = handleEvent;  

        if (handleEvent)
            server.SomeEvent += new EventDelegate(HandleEvent);
    }

    void HandleEvent(object sender, ref Vector3 parameter)
    {
        parameter.Y += 0.001f;
    }

    public virtual void HandleEventVirtual(object sender, ref Vector3 parameter)
    {
        parameter.X += 0.001f;
    }

    public bool HandlesEvent
    {
        get { return handleEvent; }
    }
}

public class EventHandler2 : EventHandler
{
    public EventHandler2(EventServer server, bool handleEvent)
        : base(server, handleEvent)
    {
    }

    public override void HandleEventVirtual(object sender, ref Vector3 parameter)
    {
        base.HandleEventVirtual(sender, ref parameter);
        parameter.Normalize();
    }
}

Pretty simple set up: a class that will dole out events to a collection of handlers, with a derivative of the handler class also being thrown in just to make sure the compiler doesn’t do anything funky that will prevent us from actually getting virtual functions.  To test events we leave it like this, to test virtual functions we comment out the event invocation and use the virtual function call instead.  Any .NET junkies might notice I’ve violated the guidelines for creating custom event handlers by not using a an EventArgs derivate…the reason why is because EventArgs in a class, so creating a new instance would generate garbage everytime the event fires.  And as we all know..the GC is not our friend on the Xbox.

I set it up to run with various amounts of event handlers distributed across various amounts of event servers.  I then set up the game class to fire off all the event servers in the Update function and use a Stopwatch to time how long it took.  I also averaged the timing results across 64 frames to smooth out the results.  This is what I got:

50:1         9
             22

500:1        710
             220

5000:1       163000 (3.26ms)
             2200

5000:10      18600
             2200

5000:100     1000
             2200

5000:1000    820
             2200

The table shows the EventHandler:EventServer ratio, and on the right is the of time taken for invocation (in ticks).  The number on top is from using Events, the bottom from using virtual functions.  The first few results are pretty interesting:  the virtual function method scales linearly with the amount of handlers we have, while the the time required for firing events goes up exponentially.   The bottom half of the results are even more interesting: the time taken goes way down as we start to distribute the handlers more evenly across servers.  In fact it goes down so much, it becomes quicker than virtual functions!.  Crazy.

Anyway I had my answer: events would be fine with my setup.  I can’t foresee any reason why more than one handler would subscribe to the same collision component, and even if it did the overhead is basically miniscule for the numbers I’ll be working with.  But it’s always fun to experiment, right?

Deferred Shadow Maps Sample

Got a new sample ready, this one  shows how you can defer shadow map calculations to a separate screen-space pass using a depth buffer.  Check it out on Ziggyware!

deferredshadowmaps

Teach Your Effect’s A New Trick

The Effects Framework is a pretty damn awesome tool.  However I’m afraid that’s not totally obvious to a lot of newbies, who either just don’t what it can do or haven’t been exposed to some of the situations where Effect’s can really come in handy.

One neat thing Effect’s can do that isn’t obvious from the documentation or samples is auto-generate variants of shaders for you based on the value of uniform parameters.  For instance let’s take a common scenario: lets say you have a shader for model, and you need it to work for either a point light, a spot light, or a directional light one-at-a-time.  You might write your shader code like this:

int g_iLightType;

float4 ModelPixelShader(in PSInput input) : COLOR0
{
    float4 vColor;
    if (g_iLightType == LIGHT_TYPE_POINT)
        vColor = DoPointLighting(input);
    else if (g_iLightType == LIGHT_TYPE_SPOT)
        vColor = DoSpotLighting(input);
    else
        vColor = DoDirectionalLighting(input);

    return vColor;
}

Alright, so this works.  The app sets the  g_iLightType shader parameter, and the right calculations get used. However is it optimal?  We’ve got these if statements in there, and maybe we’re not sure what they’ll get compiled into depending on the shader profile we’re targetting.  And maybe we’re not sure what the heck the driver is going to do once it gets the compiled shader.  Wouldn’t it be nice if we could avoid that?  Of course it would.  So let’s make some small changes:

float4 ModelPixelShader(in PSInput input, uniform int iLightType) : COLOR0
{    
    float4 vColor;
    if (iLightType == LIGHT_TYPE_POINT)
        vColor = DoPointLighting(input);
    else if (iLightType == LIGHT_TYPE_SPOT)
        vColor = DoSpotLighting(input);
    else
        vColor = DoDirectionalLighting(input);
}

technique PointLight
{
    pass p0
    {
        VertexShader = compile vs_2_0 ModelVertexShader();
        PixelShader = compile ps_2_0 ModelPixelShader(LIGHT_TYPE_POINT);        
    }
}

technique SpotLight
{
    pass p0
    {
        VertexShader = compile vs_2_0 ModelVertexShader();
        PixelShader = compile ps_2_0 ModelPixelShader(LIGHT_TYPE_SPOT);        
    }
}

technique DirectionalLight
{
    pass p0
    {
        VertexShader = compile vs_2_0 ModelVertexShader();
        PixelShader = compile ps_2_0 ModelPixelShader(LIGHT_TYPE_DIRECTIONAL);        
    }
}

Very similar, but one big difference: the HLSL code branches on a uniform int parameter to the pixel shader function, whose value is set in our technique declaration.  This means that the Effect knows that this parameter has a constant value for that entire technique, which allows it to generate a seperate shader for each technique where the parameter is a constant and not a variable.  Since it’s a constant for each shader variant, no branching of any sort is necessary.  Now our app just picks the technique it wants for each light source it’s handling, rather than setting a shader parameter.

Now keep in mind that using separate shaders like this will have performance implications:  switching vertex or pixel shaders has an associated overhead, and if you auto-generate different variants like we did above you’ll be switching shaders more than if you used one big shader.  Whether or not it’s a performance win will depend on what you’re doing.  However, it’s always good to be aware of all the neat tricks your tools can pull off.

Fun With Compiled Content

EDIT:  I realized it was probably a much smarter idea to just zip up the code along with the designer code and upload it somewhere.  So here it is.

Wouldn’t it be neat to be able to have a dialog you could pop up that would show all the pre-compiled content of a certain Type, with it all listed in a nice tree showing the directory structure?  Of course it would!  Only a crazy person would think otherwise.  Well the good news is I already did this, so feel free to plunder the code for your own use.

public partial class ContentBrowser : Form
{

    private static Dictionary<string, TreeNode> contentTrees = new Dictionary<string, TreeNode>();
    private static string contentDirectory;
    private static ContentManager contentManager;

    private static string[] contentTypes = {    "Texture",
                                                "Texture2D",
                                                "Texture3D",
                                                "TextureCube",
                                                "SpriteFont",
                                                "Model",
                                                "Effect"    };

    /// <summary>
    /// Traverses the specified content directory for all loadable content, and stores
    /// it as static data for use when an instance of the Form is created.
    /// </summary>
    /// <param name="services">IServiceProvider implementation, contains IGraphicsDeviceService</param>
    /// <param name="contentDirectory">The content directory to traverse</param>
    /// <param name="ownerWindow">Owner window for the status dialog</param>
    public static void Initialize(ServiceContainer services, string contentDirectory, IWin32Window ownerWindow)
    {
        ContentBrowser.contentDirectory = contentDirectory;

        // Make a content manager
        contentManager = new ContentManager(services, contentDirectory);

        // Make a small progress dialog so the user knows something is going on
        Form notificationDialog = new Form();
        notificationDialog.FormBorderStyle = FormBorderStyle.FixedDialog;
        notificationDialog.Size = new Size(350, 150);
        notificationDialog.Text = "JSMapEditor";
        notificationDialog.StartPosition = FormStartPosition.CenterScreen;
        notificationDialog.ShowInTaskbar = false;
        notificationDialog.ShowIcon = false;
        notificationDialog.ControlBox = false;

        Label statusLabel = new Label();
        statusLabel.Size = new Size(200, 50);
        statusLabel.Location = new System.Drawing.Point(100, 50);
        statusLabel.Text = "Loading Content";
        notificationDialog.Controls.Add(statusLabel);            

        notificationDialog.Show(ownerWindow);

        // Do the content loading/enumeration on a worker thread so we
        // can keep pumping messages on this thread
        Stopwatch timer = new Stopwatch();
        timer.Start();
        int count = 0;
        long time = 0;
        long lastTime = 0;
        long loadTime = 0;
        Thread workerThread = new Thread(EnumerateContent);
        workerThread.Start();

        while (!workerThread.Join(0))
        {
            Application.DoEvents();

            time = timer.ElapsedMilliseconds;
            loadTime += time - lastTime;
            lastTime = time;

            if (loadTime > 300)
            {
                statusLabel.Text = "Loading Content";
                for (int i = 1; i <= count % 4; i++)
                    statusLabel.Text += ".";
                count++;
                loadTime -= 300;
            }
        }

        notificationDialog.Hide();

        // Dispose of the content
        contentManager.Dispose();
        contentManager = null;
        GC.Collect();
    }

    /// <summary>
    /// Enumerates all loadable content for types in contentTypes, and stores
    /// the resulting tree in contentTrees
    /// </summary>
    private static void EnumerateContent()
    {
        // Recursively build the content tree
        foreach (string contentType in contentTypes)
        {
            TreeNode rootNode = new TreeNode("Content\\");
            BuildContentTree(rootNode, contentManager.RootDirectory, contentType);
            contentTrees.Add(contentType, rootNode);
        }
    }

    /// <summary>
    /// Builds the tree by looking for acceptable content.  Recursively calls itself
    /// to traverse subdirectories
    /// <param name="parentNode">The TreeNode representing the current directory</param>
    /// <param name="directory">The current direcotry to traverse</param>
    /// <param name="contentType">The name of the content Type to look for</param>
    /// </summary>
    private static void BuildContentTree(TreeNode parentNode, string directory, string contentType)
    {
        // Find all the subdirectories, and recursively search them
        string[] subdirectories = Directory.GetDirectories(directory);
        foreach (string subdirectory in subdirectories)
        {
            string relativePath = subdirectory.Substring(directory.Length + 1);
            TreeNode directoryNode = new TreeNode(relativePath + "\\");
            BuildContentTree(directoryNode, subdirectory, contentType);
            if (directoryNode.Nodes.Count > 0)
                parentNode.Nodes.Add(directoryNode);
        }

        // Check out all the .xnb files, see if we can load them as the target type
        string[] contentFiles = Directory.GetFiles(directory, "*.xnb");
        foreach (string contentFile in contentFiles)
        {
            string loadName = Path.GetDirectoryName(contentFile) + "\\"
                                  + Path.GetFileNameWithoutExtension(contentFile);
            if (TryLoadContent(loadName, contentType))
            {
                TreeNode contentNode = new TreeNode(
                                         Path.GetFileNameWithoutExtension(contentFile));
                contentNode.Tag = loadName;
                parentNode.Nodes.Add(contentNode);
            }
        }

    }

    /// <summary>
    /// Checks whether the filename is valid by attempting to
    /// load it with the ContentManager.
    /// </summary>
    /// <param name="contentFile">The filename to check</param>
    /// <param name="contentType">The name of the content Type to check the content against</param>
    /// <returns>True if successful</returns>
    private static bool TryLoadContent(string contentFile, string contentType)
    {
        try
        {
            object content = contentManager.Load<object>(contentFile);
            if (content.GetType().Name == contentType)
                return true;
            else
                return false;
        }
        catch (ContentLoadException)
        {
            return false;
        }
    }

    private string selectedContentFile;

    public string SelectedContentFile
    {
        get { return selectedContentFile; }
    }

    /// <summary>
    /// Creates an instance of ContentBrowser
    /// </summary>
    /// <param name="value">The default content filename</param>
    /// <param name="contentType">The type of content to browse</param>
    public ContentBrowser(String value, Type contentType)
    {
        InitializeComponent();

        contentTree.Nodes.Add(contentTrees[contentType.Name]);
        contentTree.ExpandAll();

        selectedContentFile = value;
    }

    /// <summary>
    /// Called when the Form is closed
    /// </summary>
    /// <param name="e"></param>
    protected override void OnClosed(EventArgs e)
    {
        contentTree.Nodes.Clear();
        base.OnClosed(e);
    }

    /// <summary>
    /// Event handler for the TreeView's mouse clicks
    /// </summary>
    /// <param name="sender">contentTree</param>
    /// <param name="e">Event args</param>
    private void contentTree_NodeMouseClick(object sender, TreeNodeMouseClickEventArgs e)
    {
        // Tag != null means it's a content node
        if (e.Node.Tag != null)
        {
            selectedContentFile = (string)e.Node.Tag;
            selectedContentFile = selectedContentFile.Substring(contentDirectory.Length + 1);
        }
    }
}

Okay so a few notes on usage…it uses the string array “contentTypes” to know which types to check for.  You should fill this out with whatever Type’s you’re loading from the ContentManager.  The static Initialize method should either be called when your app starts up, and you’ll need to do it before you can actually create an instance of ContentBrowser.  It shows a little loading dialog while it’s working, so you have something to show the user while it’s happening.  You could take that out, if you wanted.

I also made a UITypeEditor so that you can have this dialog to set a property in a PropertyGrid:

/// <summary>
/// Used to allow the user to browse for content in the PropertyGrid
/// </summary>
/// <typeparam name="T">The Type of content to display in the ContentBrowser</typeparam>
public class ContentEditor<T> : UITypeEditor
{
    public override UITypeEditorEditStyle GetEditStyle(ITypeDescriptorContext context)
    {           
        return UITypeEditorEditStyle.Modal;
    }

    public override object EditValue(ITypeDescriptorContext context,
                                        IServiceProvider provider,
                                        object value)
    {
        IWindowsFormsEditorService editorService = null;

        if (provider != null)
        {
            editorService = provider.GetService(typeof(IWindowsFormsEditorService)) as IWindowsFormsEditorService;
        }

        if (editorService != null)
        {
            // Pop up our dialog
            ContentBrowser browser = new ContentBrowser((string)value, typeof(T));

            if (editorService.ShowDialog(browser) == DialogResult.OK)
                value = browser.SelectedContentFile;
        }

        return value;
    }
}

LogLuv Encoding for HDR

Okay so this marks the third time I’ve posted this blog entry somewhere.  What can  I say…I like it!  I also think it’s something useful for just about anyone trying to do HDR on the 360 through XNA, and I’m hoping some people will stumble upon it.

Designing an effective and performant HDR implementation for my game’s engine was a step that was complicated a bit by a few of the quirks of running XNA on the Xbox 360.  As a quick refresher for those who aren’t experts on the subject, HDR is most commonly implemented by rendering the scene to a floating-point buffer and then performing a tone-mapping pass to bring the colors back into he visible range. Floating-point formats (like A16B16G16R16F, AKA HalfVector4) are used because their added precision and floating-point nature allows them to comfortbly store linear RGB values in ranges beyond the [0,1] typically used for shader output to the backbuffer, which is crucial as HDR requires having data with a wide dynamic range. They’re also convenient, as this it allows values to be stored in the same format they’re manipulated in the shaders. Newer GPU’s also support full texture filtering and alpha-blending with fp surfaces, which prevents the need for special-case handling of things like non-opaque geometry. However as with most things, what’s convient is not always the best option. During planning, I came up with the following list of pro’s and con’s for various types of HDR implementations:

Standard HDR, fp16 buffer
+Very easy to integrate (no special work needed for the shaders)
+Good precision
+Support for blending on SM3.0+ PC GPU’s
+Allows for HDR bloom effects
-Double the bandwidth and storage requirements of R8G8B8A8
-Weak support for multi-sampling on SM3.0 GPU’s (Nvidia NV40 and G70/G71 can’t do it)
-Hardware filtering not available on ATI SM2.0 and SM3.0 GPU’s
-No blending on the Xbox 360
-Requires double space in framebuffer on the 360, which increases the number of tiles needed

HDR with tone-mapping applied directly in the pixel shader (Valve-style)
+Doesn’t require output to an HDR format, no floating-point or encoding required
+Multi-sampling and blending is supported, even on old hardware
-Can’t do HDR bloom, since only an LDR image is available for post-processing
-Luminance can’t be calculated directly, need to use fancy techniques to estimate it
-Increases shader complexity and combinations

HDR using an encoded format
+Allows for a standard tone-mapping chain
+Allows for HDR bloom effects
+Most formats offer a very wide dynamic range
+Same bandwidth and storage as LDR rendering
+Certain formats allow for multi-sampling and/or linear filtering with reasonable quality
-Alpha-blending usually isn’t an option, since the alpha-channel is used by most formats
-Linear filtering and multisampling usually isn’t mathmatically correct, although often the results are “good enough”
-Additional shader math needed for format conversions
-Adds complexity to shaders

My early prototyping used a standard tone-mapping chain and I didn’t want to ditch that, nor did I want to move away from what I was comfortable with.  This pretty much eliminated the second option for me off the bat…although I was unlikely to choose it anyway due its other drawbacks (having nice HDR bloom was something I felt was an important part of the look I wanted for my game, and in my opinion Valve’s method doesn’t do a great job of determining average luminance).  When I tried out the first method I found that it worked as well as it always did on the PC (I’ve used it before), but on the 360 it was another story.  I’m not sure why exactly, but for some reason it simply does not like the HalfVector4 format.  Performance was terrible, I couldn’t blend, I got all kinds of strange rendering artifacts (entire lines of pixels missing), and I’d get bizarre exceptions if I enabled multisampling. Loads of fun, let me tell you.

This left me with option #3.  I wasn’t a fan of this approach initially, as my original design plan called for things to be simple and straightforward whenever possible.  I didn’t really want to have two versions of my material shaders to support encoding, nor did I want to integrate decoding into the other parts of the pipeline that needed it.  But unfortunately, I wasn’t really left with any other options after I found there were no plans to bring the support for the 360′s special fp10 backbuffer format to XNA (which would have conveniently solved my problems on the 360).  So, I started doing my research.  Naturally the first place I looked was to actual released commercial game.  Why?  Because usually when a technique is used in a shipped game, it means it’s gone though the paces and has been determined to actually be feasible and practical in game environment.  Which of course naturally led me to consider NAO32.

NAO32 is a format that gained some fame in the dev community when ex-Ninja Theory programmer Marco Salvi shared some details on the technique over on the beyond3D forums.  Used in the game Heavenly Sword, it allowed for multisampling to be used in conjuction with HDR on a platform (PS3) whose GPU didn’t support multisampling of floating-point surfaces (The RSX is heavily based on Nvidia G70).  In this technique, color is stored in the LogLuv format using a standard R8G8B8A8 surface.  Two components are used to store X and Y at 8-bit precision, and the other two are used to store the log of luminance at 16-bit precision.  Having 16 bits for luminance allows for a wide dynamic range to be stored in this format, and storing the log of the luminance allows for linear filtering in multisampling or texture sampling.  Since he first explained it other games have also used it, such as Naughty Dog’s Uncharted.  It’s likely that it’s been used in many other PS3 games, as well.

My actual shader implementation was helped along quite a bit by Christer Ericson’s blog post, which described how to derive optimized shader code for encoding RGB into the LogLuv format.  Using his code as a starting point, I came up with the following HLSL code for encoding and decoding:

// M matrix, for encoding
const static float3x3 M = float3x3(
    0.2209, 0.3390, 0.4184,
    0.1138, 0.6780, 0.7319,
    0.0102, 0.1130, 0.2969);

// Inverse M matrix, for decoding
const static float3x3 InverseM = float3x3(
    6.0013,    -2.700,    -1.7995,
    -1.332,    3.1029,    -5.7720,
    .3007,    -1.088,    5.6268);    

float4 LogLuvEncode(in float3 vRGB)
{
    float4 vResult;
    float3 Xp_Y_XYZp = mul(vRGB, M);
    Xp_Y_XYZp = max(Xp_Y_XYZp, float3(1e-6, 1e-6, 1e-6));
    vResult.xy = Xp_Y_XYZp.xy / Xp_Y_XYZp.z;
    float Le = 2 * log2(Xp_Y_XYZp.y) + 127;
    vResult.w = frac(Le);
    vResult.z = (Le - (floor(vResult.w*255.0f))/255.0f)/255.0f;
    return vResult;
}

float3 LogLuvDecode(in float4 vLogLuv)
{
    float Le = vLogLuv.z * 255 + vLogLuv.w;
    float3 Xp_Y_XYZp;
    Xp_Y_XYZp.y = exp2((Le - 127) / 2);
    Xp_Y_XYZp.z = Xp_Y_XYZp.y / vLogLuv.y;
    Xp_Y_XYZp.x = vLogLuv.x * Xp_Y_XYZp.z;
    float3 vRGB = mul(Xp_Y_XYZp, InverseM);
    return max(vRGB, 0);
}

Once I had this implemented and worked through a few small glitches;, results were much improved in the 360 version of my game. Performance was much much better, I could multi-sample again, and the results looked great. So while things didn’t exactly work out in an ideal way, I’m pleased enough with the results.

If you’re interested in this, be sure to check out my sample

Follow

Get every new post delivered to your Inbox.

Join 34 other followers