What’s good on the menu, waiter?

I remember reading someone say on gamedev.net that at some point everyone tries to write their own UI system, and usually gets it wrong.  Apparently he’s right (or at least about the first part), because I’ve gone ahead and written a menu/UI system.  While it initially started out as part of the engine/framework I’ve been working on for my game, as I worked on it I decided it might be better off if I decoupled it from the rest of the engine components and made it a standalone library/editor package so that other people could make use of it.

While designing and implementing I had these goals in mind:

  • Keep it simple!  Make menu elements useful by default, but don’t cram in tons of functionality with limited use.  Just let them be flexible enough so that they can be customized for unusual cases.
  • Cross-platform, with a focus on Xbox 360.  Should look identical on both, and expose the same functionality regardless of input method.
  • Page-based layout. A few of the other GUI packages out there seem to be aimed at recreating WinForms using XNA…and I think that’s silly.  You don’t want sizeable windows for a game (or at least not most games), you want menus that are logically divided up into pages that you can switch between.
  • A PC-only editor application that lets you visually design your menus.   The core library should be aware of the fact that it can run in a designer, and provide support for this.
  • Free and open-source!

What I ended up with is the CPX Menu System.  It actually came out better than I expected…the editor is very stable and works pretty nicely.  It could use somore more fancy features (like tools for lining up menu items), but it definitely WORKS and I’m happy about that.  As for the menu item types included in the library itself…it’s pretty bare-bones but you can still do a lot with them.  I mean personally for my game I wouldn’t really need a whole lot more than what I put in the sample app.

Probably the biggest weakness it has working with content is a bit awkward.  Early on a I struggled a lot with trying to come up with a good way to handle it…and I don’t feel like I ever really came up with a killer solution.  As of right now the way it works is that the editor app itself does not build any content at runtime.  This isn’t so nice, since you have to have Content compiled ahead of time before you run the app.  The upside is that editor doesn’t depend on the content pipeline assemblies at all, so you can run it on a PC that doesn’t have the full XNA GS install.  Probably the easiest way to manage content is to just add all of your menu content to the CPXMenu project’s Content project.  If you do that, then you will always have the content available for the editor and your game (assuming you’re always building the editor in VS and running it that way).  Otherwise you can tell the editor to look for content in a specific path whenever it loads a project.  This is what I did for the sample app: it has its own Content project with some custom textures, so I set the editor to look in the output folder for that project.

I guess that’s it for now…at some point I suppose I’ll announce it on Ziggyware.  Maybe after I add some documentation explaining how to use the damned thing.  In the meantime, here’s some screenshots of the sample app and the editor:

There’s More Than One Way To Defer A Renderer

While the idea of deferred shading/deferred rendering isn’t quite as hot as it was  year or two ago (OMG, Killzone 2 uses deferred rendering!), it’s still a cool idea that gets discussed rather often.  People generally tend to be attracted to way a “pure” deferred renderer neatly and cleanly separates your geometry from your lighting, as well as the idea of being able to throw lights everywhere in their scene.  However as anyone who’s done a little bit of research into the topic surely knows, it comes with a few drawbacks.  The main ones being that for MSAA you need to individually light all your subsamples (which isn’t doable in D3D9), and also that for non-opaque objects you have use forward rendering anyway.

The neat thing about the concepts involved with deferred shading is that you’re not all locked into the typical “render depth+normals+diffuse+specular to a fat G-Buffer and then shade” approach.  I’m not sure enough people are aware of this, and appreciate it.  For example, you can just defer your shadow map calculations to gain the related performance and organization benefits, and then use standard forward rendering techniques for everything else.  Or you can reconfigure the deferred lighting pipeline to gain back the ability to have multiple materials, or the ability to multisample without shading individual subsamples.  Surely there are even more possibilities!

Recently while working on my own game, I was grappling with the issue of having my engine support more local light sources in a scene.   I was using standard forward lighting with up to 3 lights per pass (which was fine), but I really wanted to keep my DrawPrimitives calls to a minium (due to how painful they can be on the 360).  This was problem since I’m aggressively batching my mesh rendering using instancing, and sorting instances by which light affects them would cause by batches to increase.  Thus, I was using 3 “global” light sources per frame.  This has obvious drawbacks.

While I was thinking over solutions, I considered the importance of smaller local lights that are relatively far away in the scene.  At further distances, it’s not necessarilly too important to have “correct” lighting.  In fact, we basically just need something that’s the right color, makes the area brighter, and doesn’t shade surfaces facing away from the light source.  So I thought: “I already have view-space depth…if I can calculate view-space normals I canget what I want by using a deferred pass”.  So I did exactly this…and it didn’t work very well.  The problem was that even though you can a calculate view-space normal from a depth value by calculating the partial derivatives and taking a cross product, the normals you calculate aren’t smoothly interpolated between vertices.  So what you get is something that looks an awful lot like flat shading.  Ewwwwwwwwwwww.

This lead to approach #2:  in the depth-only pass, render to a RGBA16F surface instead of a R32F surface and render out depth + view-space normals as interpolated from the vertex normals.  This worked much better!  The only remaining issue (aside from the fact that I just hard-code a diffuse albedo and specular albedo), is that normal-maps aren’t used.  However even with that those problems the results are still decent, as long as surface colors are primarily determined by your forward rendering pass and the local light are just “extra”.  Here’s screenshots of a test scene with forward rendering, and then with the point lights deferred:

The results are clearly not as good as a full forward pass when you have them side-by-side, but I think they’re probably good enough…especially if I only use this technique for lights that are small or far-away.  The trick is going to be transferring smoothly from deferred to forward, but that’s certainly doable.

One downside that came with this was that since I was just additively blending in the lights, I couldn’t use my beloved LogLuv encoding for HDR.  My next-best option of the 360 was to normalize R10G10B10A2 to a range greater than [0,1].  I ended up having to normalize to [0,8] to get the dynamic range I wanted, and unfortunately this can give some visible banding in certain cases.  And alternative I’ll have to explore is rendering just the point lights to an R10G10B10A2 buffer, and then sending this to my forward rendering pass to be sampled and added to the result.  If I did this I could also use the light prepass approach, and gain back material parameters and proper MSAA for the point lights.

Anyway I’m not saying that what I’m doing is that particularly interesting or useful, I’m just trying to demonstrate that there are many possibilities to explore.  It’s good to think out of the box every once in a while!

Deferred Cascaded Shadow Maps

For my next sample I was planning on extending my deferred shadow maps sample to implement cascaded shadow maps.  I got an email asking about how to make the sample look decent with large viewing distances which is exactly the problem CSM’s solve.  So I decided to bump up my plans a little early and get the code up and running.  It’ll be a while before I get the write-up finished, but until then feel free to play around with code (PC and 360 projects included).

Profiling Events vs. Virtual Functions On The 360

Over the past week or so I’ve been completely reworking my collision system in order to better decouple it from other areas of code, and also make it more flexible.  One part I got stuck on for a bit was deciding on the mechanism to use for notifying owners of collision components when the component collides with something.  I narrowed it down to two options:

-notify owners via the ICollisionOwner interface I was using

OR

-use an Event

I was leaning more towards events because I felt their semantics naturally fit with the usage pattern I was working.  If game entities want to be notified, they simply subscribe and they get notified.  This seemed cleaner and easier to understand than letting each collision component have some sort of  “NotifyOwner” flag, and then call a virtual function if the flag was true.  However I was a little worried about performance…I hadn’t really used delegates on the 360 before and I wanted to make sure that the overhead wasn’t going to be something astronomical before proceeding. So I set up a simple test harness that vaguely resembled how I was going to use events:

public delegate void EventDelegate(object sender, ref Vector3 parameter);

public class EventServer
{
    public event EventDelegate SomeEvent;

    public void RaiseEvent()
    {
        Vector3 param = new Vector3();

        if (SomeEvent != null)
            SomeEvent(this, ref param);

        //for (int i = 0; i < Handlers.Count; i++)
        //{
        //    if (Handlers[i].HandlesEvent)
        //        Handlers[i].HandleEventVirtual(this, ref param);
        //}
    }

    public List<IEventHandler> Handlers = new List<IEventHandler>();
}

public interface IEventHandler
{
    void HandleEventVirtual(object sender, ref Vector3 parameter);
    bool HandlesEvent
    {
        get;
    }
}

public class EventHandler : IEventHandler
{
    EventServer server;
    bool handleEvent;

    public EventHandler(EventServer server, bool handleEvent)
    {
        this.server = server;
        this.handleEvent = handleEvent;  

        if (handleEvent)
            server.SomeEvent += new EventDelegate(HandleEvent);
    }

    void HandleEvent(object sender, ref Vector3 parameter)
    {
        parameter.Y += 0.001f;
    }

    public virtual void HandleEventVirtual(object sender, ref Vector3 parameter)
    {
        parameter.X += 0.001f;
    }

    public bool HandlesEvent
    {
        get { return handleEvent; }
    }
}

public class EventHandler2 : EventHandler
{
    public EventHandler2(EventServer server, bool handleEvent)
        : base(server, handleEvent)
    {
    }

    public override void HandleEventVirtual(object sender, ref Vector3 parameter)
    {
        base.HandleEventVirtual(sender, ref parameter);
        parameter.Normalize();
    }
}

Pretty simple set up: a class that will dole out events to a collection of handlers, with a derivative of the handler class also being thrown in just to make sure the compiler doesn’t do anything funky that will prevent us from actually getting virtual functions.  To test events we leave it like this, to test virtual functions we comment out the event invocation and use the virtual function call instead.  Any .NET junkies might notice I’ve violated the guidelines for creating custom event handlers by not using a an EventArgs derivate…the reason why is because EventArgs in a class, so creating a new instance would generate garbage everytime the event fires.  And as we all know..the GC is not our friend on the Xbox.

I set it up to run with various amounts of event handlers distributed across various amounts of event servers.  I then set up the game class to fire off all the event servers in the Update function and use a Stopwatch to time how long it took.  I also averaged the timing results across 64 frames to smooth out the results.  This is what I got:

50:1         9
             22

500:1        710
             220

5000:1       163000 (3.26ms)
             2200

5000:10      18600
             2200

5000:100     1000
             2200

5000:1000    820
             2200

The table shows the EventHandler:EventServer ratio, and on the right is the of time taken for invocation (in ticks).  The number on top is from using Events, the bottom from using virtual functions.  The first few results are pretty interesting:  the virtual function method scales linearly with the amount of handlers we have, while the the time required for firing events goes up exponentially.   The bottom half of the results are even more interesting: the time taken goes way down as we start to distribute the handlers more evenly across servers.  In fact it goes down so much, it becomes quicker than virtual functions!.  Crazy.

Anyway I had my answer: events would be fine with my setup.  I can’t foresee any reason why more than one handler would subscribe to the same collision component, and even if it did the overhead is basically miniscule for the numbers I’ll be working with.  But it’s always fun to experiment, right?

Deferred Shadow Maps Sample

Got a new sample ready, this one  shows how you can defer shadow map calculations to a separate screen-space pass using a depth buffer.  Check it out on Ziggyware!

deferredshadowmaps

LogLuv Encoding for HDR

Okay so this marks the third time I’ve posted this blog entry somewhere.  What can  I say…I like it!  I also think it’s something useful for just about anyone trying to do HDR on the 360 through XNA, and I’m hoping some people will stumble upon it.

Designing an effective and performant HDR implementation for my game’s engine was a step that was complicated a bit by a few of the quirks of running XNA on the Xbox 360.  As a quick refresher for those who aren’t experts on the subject, HDR is most commonly implemented by rendering the scene to a floating-point buffer and then performing a tone-mapping pass to bring the colors back into he visible range. Floating-point formats (like A16B16G16R16F, AKA HalfVector4) are used because their added precision and floating-point nature allows them to comfortbly store linear RGB values in ranges beyond the [0,1] typically used for shader output to the backbuffer, which is crucial as HDR requires having data with a wide dynamic range. They’re also convenient, as this it allows values to be stored in the same format they’re manipulated in the shaders. Newer GPU’s also support full texture filtering and alpha-blending with fp surfaces, which prevents the need for special-case handling of things like non-opaque geometry. However as with most things, what’s convient is not always the best option. During planning, I came up with the following list of pro’s and con’s for various types of HDR implementations:

Standard HDR, fp16 buffer
+Very easy to integrate (no special work needed for the shaders)
+Good precision
+Support for blending on SM3.0+ PC GPU’s
+Allows for HDR bloom effects
-Double the bandwidth and storage requirements of R8G8B8A8
-Weak support for multi-sampling on SM3.0 GPU’s (Nvidia NV40 and G70/G71 can’t do it)
-Hardware filtering not available on ATI SM2.0 and SM3.0 GPU’s
-No blending on the Xbox 360
-Requires double space in framebuffer on the 360, which increases the number of tiles needed

HDR with tone-mapping applied directly in the pixel shader (Valve-style)
+Doesn’t require output to an HDR format, no floating-point or encoding required
+Multi-sampling and blending is supported, even on old hardware
-Can’t do HDR bloom, since only an LDR image is available for post-processing
-Luminance can’t be calculated directly, need to use fancy techniques to estimate it
-Increases shader complexity and combinations

HDR using an encoded format
+Allows for a standard tone-mapping chain
+Allows for HDR bloom effects
+Most formats offer a very wide dynamic range
+Same bandwidth and storage as LDR rendering
+Certain formats allow for multi-sampling and/or linear filtering with reasonable quality
-Alpha-blending usually isn’t an option, since the alpha-channel is used by most formats
-Linear filtering and multisampling usually isn’t mathmatically correct, although often the results are “good enough”
-Additional shader math needed for format conversions
-Adds complexity to shaders

My early prototyping used a standard tone-mapping chain and I didn’t want to ditch that, nor did I want to move away from what I was comfortable with.  This pretty much eliminated the second option for me off the bat…although I was unlikely to choose it anyway due its other drawbacks (having nice HDR bloom was something I felt was an important part of the look I wanted for my game, and in my opinion Valve’s method doesn’t do a great job of determining average luminance).  When I tried out the first method I found that it worked as well as it always did on the PC (I’ve used it before), but on the 360 it was another story.  I’m not sure why exactly, but for some reason it simply does not like the HalfVector4 format.  Performance was terrible, I couldn’t blend, I got all kinds of strange rendering artifacts (entire lines of pixels missing), and I’d get bizarre exceptions if I enabled multisampling. Loads of fun, let me tell you.

This left me with option #3.  I wasn’t a fan of this approach initially, as my original design plan called for things to be simple and straightforward whenever possible.  I didn’t really want to have two versions of my material shaders to support encoding, nor did I want to integrate decoding into the other parts of the pipeline that needed it.  But unfortunately, I wasn’t really left with any other options after I found there were no plans to bring the support for the 360’s special fp10 backbuffer format to XNA (which would have conveniently solved my problems on the 360).  So, I started doing my research.  Naturally the first place I looked was to actual released commercial game.  Why?  Because usually when a technique is used in a shipped game, it means it’s gone though the paces and has been determined to actually be feasible and practical in game environment.  Which of course naturally led me to consider NAO32.

NAO32 is a format that gained some fame in the dev community when ex-Ninja Theory programmer Marco Salvi shared some details on the technique over on the beyond3D forums.  Used in the game Heavenly Sword, it allowed for multisampling to be used in conjuction with HDR on a platform (PS3) whose GPU didn’t support multisampling of floating-point surfaces (The RSX is heavily based on Nvidia G70).  In this technique, color is stored in the LogLuv format using a standard R8G8B8A8 surface.  Two components are used to store X and Y at 8-bit precision, and the other two are used to store the log of luminance at 16-bit precision.  Having 16 bits for luminance allows for a wide dynamic range to be stored in this format, and storing the log of the luminance allows for linear filtering in multisampling or texture sampling.  Since he first explained it other games have also used it, such as Naughty Dog’s Uncharted.  It’s likely that it’s been used in many other PS3 games, as well.

My actual shader implementation was helped along quite a bit by Christer Ericson’s blog post, which described how to derive optimized shader code for encoding RGB into the LogLuv format.  Using his code as a starting point, I came up with the following HLSL code for encoding and decoding:

// M matrix, for encoding
const static float3x3 M = float3x3(
    0.2209, 0.3390, 0.4184,
    0.1138, 0.6780, 0.7319,
    0.0102, 0.1130, 0.2969);

// Inverse M matrix, for decoding
const static float3x3 InverseM = float3x3(
    6.0013,    -2.700,    -1.7995,
    -1.332,    3.1029,    -5.7720,
    .3007,    -1.088,    5.6268);    

float4 LogLuvEncode(in float3 vRGB)
{
    float4 vResult;
    float3 Xp_Y_XYZp = mul(vRGB, M);
    Xp_Y_XYZp = max(Xp_Y_XYZp, float3(1e-6, 1e-6, 1e-6));
    vResult.xy = Xp_Y_XYZp.xy / Xp_Y_XYZp.z;
    float Le = 2 * log2(Xp_Y_XYZp.y) + 127;
    vResult.w = frac(Le);
    vResult.z = (Le - (floor(vResult.w*255.0f))/255.0f)/255.0f;
    return vResult;
}

float3 LogLuvDecode(in float4 vLogLuv)
{
    float Le = vLogLuv.z * 255 + vLogLuv.w;
    float3 Xp_Y_XYZp;
    Xp_Y_XYZp.y = exp2((Le - 127) / 2);
    Xp_Y_XYZp.z = Xp_Y_XYZp.y / vLogLuv.y;
    Xp_Y_XYZp.x = vLogLuv.x * Xp_Y_XYZp.z;
    float3 vRGB = mul(Xp_Y_XYZp, InverseM);
    return max(vRGB, 0);
}

Once I had this implemented and worked through a few small glitches;, results were much improved in the 360 version of my game. Performance was much much better, I could multi-sample again, and the results looked great. So while things didn’t exactly work out in an ideal way, I’m pleased enough with the results.

If you’re interested in this, be sure to check out my sample