HLSL User Defined Language for Notepad++

When it comes to writing shaders, Notepad++ is currently my editor of choice. The most recent release of Notepad++ added version 2.0 of their User Defined Language (UDL) system, which adds quite a few improvements. I’ve been using an HLSL UDL file that I downloaded from somewhere else for a while now, and I decided to upgrade it to the 2.0 format and also make it work better for SM5.0 profiles. I added all of the operators, keywords, types attributes, system-value semantics, intrinsics, and methods, so they all get syntax highlighting now. I also stripped out all of the old pre-SM4.0 intrinsics and semantics, as well as the effect-specifics keywords. I’ve exported it as an XML file and uploaded it to my Google Drive so that others can make use of it as well. To use it, you can either import the XML file from the UDL dialog (Language->Define your language), or you can replace your userDefineLang.xml file in the AppData\Notepad++ folder. Enjoy!

Experimenting with Reconstruction Filters for MSAA Resolve

Previous article in the series: A Quick Overview of MSAA

Despite having the flexibility to implement a custom resolve for MSAA, the “standard” box filter resolve is still commonly used in games. While the box filter works well enough, it has some characteristics that can be considered undesirable for a reconstruction filter. In particular, the box function has a discontinuity at its edge that results in an infinite frequency response (the frequency domain equivalent of a box function is the sinc function). This causes it to introduce postaliasing when used as a reconstruction filter, since the filter is unable to isolate the original copy of a signal’s spectrum. The primary advantage offered by such a resolve is that it’s cheap from a performance point of view, since only subsamples within a single pixel need to be considered when computing a resolved pixel value.

The question we now want to answer is “can we do better?” Offline renderers such as Pixar’s PRMan support a variety of  filter types for antialiasing, so it stands to reason that we should at least explore the possibilities for real-time graphics. If we decide to forego the “standard” resolve offered by ResolveSubresource and instead perform our own resolve using a pixel or compute shader that that accesses the raw multisampled texture data, we are pretty much free to implement whatever reconstruction filter we’d like. So there is certainly no concern over lack of flexibility. Performance, however, is still an issue. Every GPU that I’ve run my code on will perform worse with a custom resolve, even when using a simple box filter with results that exactly match a standard resolve. Currently the performance delta seems to be worse on AMD hardware as opposed to Nvidia hardware. On top of that, there’s additional costs for the increased texture samples required for wider filter kernels. Separable filtering can be used to reduce the number of samples required for wide filters, however you must have special considerations with the rotated grid sample patterns used by MSAA. Unfortunately I haven’t solved these problems yet, so for this sample I’m just going to focus on quality without too much regard for performance. Hopefully in a future article I can revisit this, and address the performance issues.

At this point I feel that I should bring up TXAA. If you’re not familiar, TXAA is a library-supported antialiasing technique introduced for recent Nvidia Kepler-based GPU’s. There’s no public documentation as to exactly how it works, but Timothy Lottes has mentioned a few details here and there on his blog. From the info he’s given, it seems safe to assume that the MSAA resolve used by TXAA is something other than a box filter, and is using a filter width wider than a pixel. Based on these assumptions, you should be able to produce similar results with the framework that I’ve set up.

Implementation

The sample application renders my trusty tank scene in HDR to an fp16 render target with either 1x, 2x, 4x, or 8x MSAA enabled. Once the scene is rendered, a pixel shader is used to resolve the MSAA render target to a non-MSAA render target with one of ten available reconstruction filters. I implemented the following filters:

Box
Triangle
Gaussian
Blackman-Harris
Smoothstep (Hermite spline)
B-spline
Catmull-Rom
Mitchell
Generalized Cubic
Sinc

I started out by implementing some of the filters supported by PRMan, and using similar parameters for controlling the filtering. However I ended up deviating from the PRMan setup in order to make things more intuitive (in my opinion, at least). All filters except for the sinc filter were implemented such that their “natural range” of non-zero values were in the [-0.5, 0.5] range. This deviates from the canonical filter widths for several of these filters, notably the cubic filters (which are normally defined for the [-2, 2] range). I then used a “filter width” parameter to inversely scale the inputs to the filtering functions. So for a filter width of 1.0, the filters all have a width equal to the size of a single resolved pixel. The one exception is the sinc filter, where I used the filter width to window the function rather than scaling the input value. I should also note that I implemented all of the filters as radial filters where the input is the screen-space distance from the output pixel center to the sample position. Typically filters for image scaling are used in separable passes where 1D filters are passed the X or Y sample distance. Because of this my “Box” filter is actually disc-shaped, but it produces very similar results. In fact for a filter width of 1.0 the results are identical to a “standard” box filter resolve. The “Triangle” filter uses a standard triangle function,which can be considered a “cone” function when used as a radial filter. “Gaussian” uses a standard Gaussian function with a configurable sigma parameter, with the result windowed to [-0.5, 0.5].  The “Smoothstep” filter simply uses the smoothstep intrinsic available in HLSL, which implements a cubic hermite spline. The “Generalized Cubic” filter is an implementation of the cubic function suggested by Mitchell and Netravali in their paper, with the B and C parameters being tweakable by the user. The “B-spline”, “Mitchell” and “Catmull-Rom” filters use this same function except with fixed values for B and C. “Sinc” is the standard sinc function, windowed to [-FilterWidth, FilterWidth] as mentioned previously.

To visualize the filtering function, I added a real-time 1D plot of the currently-selected filter function using the current filter width. I also added a plot of the 1D fourier transform of the filter function (calculated with the help if the awesomely easy-to-integrate Kiss FFT library), so that you can also visualize the frequency response of the selected filter type. This can be useful for estimating the amount of postaliasing produced by a filter, as well as the attenuation of frequencies below the Nyquist rate (which results in blurring).

After the resolve is performed, the result is fed into a standard post-processing chain. This phase includes average luminance calculation for auto-exposure, bloom, and HDR tone mapping. I added an option to tone map subsamples in a manner similar to Humus’s sample, so that the results can be compared to resolve prior to tone mapping. When this option is activated, the bloom and auto-exposure passes work with non-resolved MSAA textures since the output of the resolve no longer contains linear HDR values. Note that the resolve is still performed prior to post-processing, since I wanted to keep the resolve separate from the post-processing phase so that it was more visible. In production it would most likely be done after all post-processing, however you would still need the same considerations regarding working with non-resolved MSAA data.

Here’s a full list of all options that I implemented:

MSAA Mode – the number of MSAA samples to use for the primary render target (1x, 2x, 4x, or 8x)
Filter Type – the filtering function to use in the resolve step (supports all of the filters listed above)
Use Standard Resolve – when enabled, a “standard” box filter resolve is performed using ResolveSubresource
Tone Map Subsamples – when enabled, tone mapping is applied before the subsamples are resolved
Enable FXAA  – enables or disables FXAA with high-quality PC settings
Render Triangle – renders a plain red triangle in the center of the screen

Bloom Exposure – an exposure (in log2 space) applied to HDR values in order to create the bloom source values
Bloom Magnitude – a multiplier for the bloom value that’s combined with the tone mapped result
Auto-Exposure Key Value – key value for controlling auto-exposure
Adaptation Rate – rate at which exposure is adapted over time
Roughness – roughness used for material specular calculations
Filter Size – the radius of the filter kernel (in pixels) used during the resolve step
Gaussian Sigma – the sigma parameter for the Gaussian function, used by the Gaussian filter mode
Cubic B – the “B” parameter to Mitchell’s generalized cubic function, used by the Generalized Cubic filter mode
Cubic C – the “C” parameter to Mitchell’s generalized cubic function, used by the Generalized Cubic filter mode
Magnification – magnification level for the final output (magnification is performed with point filtering)
Triangle Rotation Speed – the speed at which the red triangle (enabled by Render Triangle) is rotated

Results

The following table contains links to 1280×720 screenshots from my sample application using various filter types and filter widths. All screenshots have use 4xMSAA, and perform the resolve in linear HDR space (bloom and tone mapping are performed after):

Box 1.0 2.0 3.0 4.0 5.0 6.0
Triangle 1.0 2.0 3.0 4.0 5.0 6.0
Gaussian 1.0 2.0 3.0 4.0 5.0 6.0
Blackman-Harris 1.0 2.0 3.0 4.0 5.0 6.0
Smoothstep 1.0 2.0 3.0 4.0 5.0 6.0
B-spline 1.0 2.0 3.0 4.0 5.0 6.0
Catmull-Rom 1.0 2.0 3.0 4.0 5.0 6.0
Mitchell 1.0 2.0 3.0 4.0 5.0 6.0
Sinc 1.0 2.0 3.0 4.0 5.0 6.0

This table contains similar screenshots, except that the tone mapping is performed prior to the resolve:

Box 1.0 2.0 3.0 4.0 5.0 6.0
Triangle 1.0 2.0 3.0 4.0 5.0 6.0
Gaussian 1.0 2.0 3.0 4.0 5.0 6.0
Blackman-Harris 1.0 2.0 3.0 4.0 5.0 6.0
Smoothstep 1.0 2.0 3.0 4.0 5.0 6.0
B-spline 1.0 2.0 3.0 4.0 5.0 6.0
Catmull-Rom 1.0 2.0 3.0 4.0 5.0 6.0
Mitchell 1.0 2.0 3.0 4.0 5.0 6.0
Sinc 1.0 2.0 3.0 4.0 5.0 6.0

As we take a close look at the images, the results shouldn’t be too surprising. For the most part, wider filter kernels tend to reduce aliasing while smaller filters preserve more high-frequency detail. Personally I find  that the cubic spline filters with no negative lobes (smoothstep and B-Spline) will produce the best results, with the best balance between aliasing and blurring occurring around the 2.0-3.0 range. Here is a magnified image showing the results of 4xMSAA with a standard 1-pixel-wide box filter, followed by the same image with a 3-pixel-wide B-spline filter:

4xMSAA with a “standard” 1-pixel-wide box filter, followed by 4xMSAA with a 3-pixel-wide B-spline filter

The aliasing is pretty significantly reduced on geometry edges with the B-spline filter, particularly edges with higher contrast. Here’s another pair of images that are magnified even further, so that you can see the edge quality:

Highly magnified images showing 4xMSAA with a 1-pixel-wide box filter, followed by a 3-pixel-wide B-spline filter

Here’s another set of images showing the results of a wider filter kernel on high-frequency details from normal maps and specular lighting:

4xMSAA with a 1-pixel-wide box filter, with a 2-pixel-wide B-spline filter, and with a 3-pixel-wide B-spline filter

As you can see in the images, a 2-pixel-wide B-spline filter is actually pretty good in terms of not attenuating details that are close to a pixel in size. A wider filter reduces aliasing even further, but I feel that  filter width of 2.0 is still an improvement over the quality offered by a “standard” resolve. So it’s probably a pretty good place to start if you want better quality, but you prefer a sharper output image. The other cubic filters with negative lobes (such as Catmull-Rom and Mitchell) will also produce a sharper result, however the negative lobe can produce undesirable artifacts if they’re too strong. This is especially true when filtering HDR values with high intensity, since they can have a strong effect on neighboring pixels. For this reason I think that Mitchell is a better option over Catmull-Rom, since Mitchell’s negative lobes are bit less pronounced. The sinc filter is almost totally unappealing for an MSAA resolve, since the ringing artifacts that it produces are very prominent. Here are three images comparing a 4-pixel-wide Catmull-Rom filter, a 4-pixel-wide Mitchell filter, and a 6-pixel-wide sinc filter:

All of the above images used 4xMSAA, but a wider filter kernel can also work well for 2xMSAA. Here’s some close-ups showing “normal” 2xMSAA vs. 2xMSAA with a B-spline filter vs. 4xMSAA with a B-spline filter:

2xMSAA with standard resolve, 2xMSAA with a 3-pixel-wide B-spline filter, and 4xMSAA with a 3-pixel-wide B-spline filter

To wrap things up, here a two close-ups showing the results of a wide B-spine filter applied before tone mapping, and the same image with tone mapping applied prior to filtering:

4xMSAA with a 3-pixel wide B-spline filter applied before tone mapping, and after tone mapping

These results are little interesting since they illustrate the differences in the two different approaches. In the first image the filtering is performed with HDR values, so you get similar effects to applying DOF or motion blur in HDR where bright values can dominate their local neighborhood. The second image shows quite a different result, where the darker geometry actually ends up looking “thicker” against the bright blue sky. In general I don’t find that it produces a substantial improvement when you’re already using a wider filter kernel, or at least not enough to justify the extra effort and performance required to make it work with an HDR post-processing pipeline. However it does tend to play nicer with cubic filters that have negative lobes, since you’re not filtering HDR values with arbitrary intensity.

Conclusions

There are clearly a lot of options available to you if you choose to implement a custom MSAA resolve. I think there’s some good opportunities here to do an even better job at reducing aliasing in games, and personally I’m of the opinion that it’s worth reducing the appearance of tiny pixel-wide details if it results in an overall cleaner image. Either way I don’t think that a box filter is the best choice no matter what your tastes are.

If you want to download the sample app with source code, it is available on my codeplex page. Feel free to download it and perform your own experiments, although I’d appreciate if you’d share your results and opinions!

Some quick notes about the sample code: I decided to use the newer DirectX headers and libraries from the Windows 8 SDK, so you’ll need to have it installed if you want to compile the project. I haven’t fully migrated to VS 2012 yet, so I’ve left the project in VS 2010 format so that it can be opened by either. I also overhauled a lot of my sample framework code, which includes a shader caching system that uses the totally-awesome MurmurHash to detect when a shader needs to be re-compiled. Using the new SDK also entailed ditching D3DX, so I’ve replaced the texture and mesh loading functionality that I was using with some of my own code combined with some code lifted from DirectXTK. One major downside I should mention is that I’m only supporting x64 for now, due to some annoyances with the SDK and redistributing D3DCompiler DLL’s. If anybody is still stuck on 32-bit Windows and really wants to run the sample, let me know and I’ll try to find some time to get it working.

A Quick Overview of MSAA

Previous article in the series: Applying Sampling Theory to Real-Time Graphics

MSAA can be a bit complicated, due to the fact that it affects nearly the entire rasterization pipeline used in GPU’s. It’s also complicated because really understanding why it works requires at least a basic understanding of signal processing and image resampling. With that in mind I wanted to provide an quick overview of how MSAA works on a GPU, in order to provide the some background material for the following article where we’ll experiment with MSAA resolves. Like the previous article on signal processing, feel free to skip if you’re already an expert. Or better yet, read through it and correct my mistakes!

Rasterization Basics

A modern D3D11-capable GPU features hardware-supported rendering of point, line, and triangle primitives through rasterization. The rasterization pipeline on a GPU takes as input the vertices of the primitive being rendered, with vertex positions provided in the homogeneous clip space produced by transformation by some projection matrix.  These positions are used to determine the set of pixels in the current render target where the triangle will be visible. This visible set is determined from two things: coverage, and occlusion. Coverage is determined by performing some test to determine if the primitive overlaps a given pixel. In GPU’s, coverage is calculated by testing if the primitive overlaps a single sample point located in the exact center of each pixel 1. The following image demonstrates this process for a single triangle:

Coverage being calculated for a rasterized triangle. The blue circles represent a grid of sample points, each located at the center of a pixel. The red circles represent sample points covered by the triangle.

Occlusion tells us whether a pixel covered by a primitive is also covered by any other triangles, and is handled by z-buffering in GPU’s. A z-buffer, or depth buffer, stores the depth of the closest primitive relative to the camera at each pixel location. When a primitive is rasterized, its interpolated depth is compared against the value in the depth buffer to determine whether or not the pixel is occluded. If the depth test succeeds, the appropriate pixel in the depth buffer is updated with new closest depth. One thing to note about the depth test is that while it is often shown as occurring after pixel shading, almost all modern hardware can execute some form of the depth test before shading occurs. This is done as an optimization, so that occluded pixels can skip pixel shading. GPU’s still support performing the depth test after pixel shading in order to handle certain cases where an early depth test would produce incorrect results. One such case is where the pixel shader manually specifies a depth value, since the depth of the primitive isn’t known until the pixel shader runs.

Together, coverage and occlusion tells us the visibility of a primitive. Since visibility can be defined as 2D function of X and Y, we can treat it as a signal and define its behavior in terms of concepts from signal processing. For instance, since coverage and depth testing is performed at each pixel location in the render target the visibility sampling rate is determined by the X and Y resolution of that render target. We should also note that triangles and lines will inherently have discontinuities, which means that the signal is not bandlimited and thus no sampling rate will be adequate to avoid aliasing in the general case.

Oversampling and Supersampling

While it’s generally impossible to completely avoid aliasing of an arbitrary signal with infinite frequency, we can still reduce the appearance of aliasing artifacts through a process known as oversampling. Oversampling is the process of sampling a signal at some rate that’s higher than our intended final output, and then reconstructing and resampling the signal again at the output sample rate. As you’ll recall from the first article, sampling at a higher rate causes the  clones of a signal’s spectrum to be further apart. This results in less of the higher-frequency components leaking into the reconstructed version of the signal, which in the case of an image means a reduction in the appearance of aliasing artifacts.

When applied to graphics and 2D images we call this supersampling, often abbreviated as SSAA. Implementing it in a 3D rasterizer is trivial: render to some resolution higher than the screen, and then downsample to screen resolution using a reconstruction filter. The following image shows the results of various supersampling patterns applied to a rasterized triangle:

Supersampling applied to a rasterized triangle, using various sub-pixel patterns. Notice how aliasing is reduced as the sample rate increases, even though the number of pixels is the same in all cases. Image from Real-Time Rendering, 3rd Edition, A K Peters 2008

The simplicity and effectiveness of supersampling resulted in it being offered as a driver option for many early GPU’s. The problem, however, is performance. When the resolution of the render target is increased, the sampling rate of visibility increases. However since the execution of the pixel shader is also tied to the resolution of the pixels, the pixel shading rate would also increase. This meant that any work performed in the pixel shader, such as lighting or texture fetches, would be performed at a higher rate and thus consume more resources. The same goes for bandwidth used when writing the results of the pixel shader to the render target, since the write (and blending, if enabled) is performed for each pixel. Memory consumption is also increased, since the render target and corresponding z buffer must be larger in size. Because of these adverse performance characteristics, supersampling was mostly relegated to a high-end feature for GPU’s with spare cycles to burn.

Supersampling Evolves into MSAA

So we’ve established that supersampling works in principle for reducing aliasing in 3D graphics, but that it’s also prohibitively expensive. In order to keep most of the benefit of supersampling without breaking the bank in terms of performance, we can observe that aliasing of triangle visibility function (AKA geometric aliasing) only occurs at the edges of rasterized triangles. If we hopped into a time machine and traveled back to 2001, we would also observe that pixel shading mostly consists of texture fetches and thus doesn’t suffer from aliasing (due to mipmaps). These observations would lead us to conclude that geometric aliasing is the primary form of aliasing for games, and should be our main focus. This conclusion is what what caused MSAA to be born.

In terms of rasterization, MSAA works in a similar manner to supersampling. The coverage and occlusion tests are both performed at higher-than-normal resolution, which is typically 2x through 8x. For coverage, the hardware implements this by having N sample points within a pixel, where N is the multisample rate. These samples are known as subsamples, since they are sub-pixel samples. The following image shows the subsample placement for a typical 4x MSAA rotated grid pattern:

Typical MSAA 4x Sample Pattern

The triangle is tested for coverage at each of the N sample points, essentially building a bitwise coverage mask representing the portion of the pixel covered by a triangle 2. For occlusion testing, the triangle depth is interpolated at each covered sample point and tested against the depth value in the z buffer. Since the depth test is performed for each subsample and not for each pixel, the size of the depth buffer must be augmented to store the additional depth values. In practice this means that the depth buffer will N times the size of the non-MSAA case. So for 2xMSAA the depth buffer will be twice the size, for 4x it will be four times the size, and so on.

Where MSAA begins to differ from supersampling is when the pixel shader is executed. In the standard MSAA case, the pixel shader is not executed for each subsample. Instead, the pixel shader is executed only once for each pixel where the triangle covers at least one subsample. Or in other words, it is executed once for each pixel where the coverage mask is non-zero. At this point pixel shading occurs in the same manner as non-MSAA rendering: the vertex attributes are interpolated to the center of the pixel and used by the pixel shader to fetch textures and perform lighting calculations. This means that the pixel shader cost does not increase substantially when MSAA is enabled, which is the primary benefit of MSAA over supersampling.

Although we only execute the pixel shader once per covered pixel, it is not sufficient to store only one output value per pixel in the render target. We need the render target to support storing multiple samples, so that we can store the results from multiple triangles that may have partially covered a single pixel. Therefore an MSAA render target will have enough memory to store N subsamples for each pixel. This is conceptually similar to an MSAA z buffer, which also has enough memory to store N subsamples. Each subsample in the render target is mapped to one of the subsample points used during rasterization to determine coverage. When a pixel shader outputs its value, the value is only written to subsamples where both the coverage test and the depth test passed for that pixel. So if a triangle covers half the sample points in 4x sample pattern, then half of the subsamples in the render target receive the pixel shader output value. Or if all of the sample points are covered, then all of the subsamples receive the output value. The following image demonstrates this concept:

Results from non-MSAA and 4x MSAA rendering when a triangle partially covers a pixel. Image from Real-Time Rendering, 3rd Edition

By using the coverage mask to determine which subsamples to be updated, the end result is that a single pixel can end up storing the output from N different triangles that partially cover the sample pixel. This effectively gives us the result we want, which is an oversampled form of triangle visibility. The following image, taken from the Direct3D 10 documentation[1], visually summarizes the rasterization process for the case of 4xMSAA:

An image from the D3D10 documentation detailing the results of rasterizing various primitives with 4xMSAA.

MSAA Resolve

As with supersampling, the oversampled signal must be resampled down to the output resolution before we can display it. With MSAA, this process is referred to as resolving the render target. In its earliest incarnations, the resolve process was carried out in fixed-function hardware on the GPU. The filter commonly used was a 1-pixel-wide box filter, which essentially equates to averaging all subsamples within a given pixel. Such a filter produces results such that fully-covered pixels end up with the same result as non-MSAA rendering, which could be considered either good or bad depending on how you look at it (good because you won’t unintentially reduce details through blurring, bad because a box filter will introduce postaliasing). For pixels with triangle edges, you get a trademark gradient of color values with a number of steps equal to the number of sub-pixel samples. Take a look at the following image to see what this gradient looks like for various MSAA modes:

Trademark MSAA edge gradients resulting from reconstruction using box filtering

One notable exception to box filtering was Nvidia’s “Quincunx” AA, which was available as a driver option on their DX8 and DX9-era hardware (which includes the RSX used by the PS3). When enabled, it would use a 2-pixel-wide triangle filter centered on one of the samples in a 2x MSAA pattern. The “quincunx” name comes the fact that the resolve process ends up using 5 subsamples that are arranged in the cross-shaped quincunx pattern. Since the quincunx resolve uses a wider reconstruction filter, aliasing is reduced compared to the standard box filter resolve. However, using a wider filter can also result in unwanted attenuation of higher frequencies. This can lead to a “blurred” look that appears to lack details, which is a complaint sometimes levied against PS3 games that have used the feature. AMD later added a similar feature to their 3 and 4000-series GPU’s called “Wide Tent” that also made use of a triangle filter with width greater than a pixel.

As GPU’s became more programmable and the API’s evolved to match them, we eventually gained the ability to perform the MSAA resolve in a custom shader instead of having to rely on an API function to do that. This is an ability we’re going to explore in the following article.

Compression

As we saw earlier, MSAA doesn’t actually improve on supersampling in terms of rasterization complexity or memory usage. At first glance we might conclude that the only advantage of MSAA is that pixel shader costs are reduced. However this isn’t actually true, since it’s also possible to improve bandwidth usage. Recall that the pixel shader is only executed once per pixel with MSAA. As a result, the same value is often written to all N subsamples of an MSAA render target. GPU hardware is able to exploit this by sending the pixel shader value coupled with another value indicating which subsamples should be written, which acts as a form of lossless compression. With such a compression scheme the bandwidth required to fill an MSAA render target can be significantly less than it would be for the supersampling case.

CSAA and EQAA

Since its introduction, the fundamentals of MSAA have not seen significant changes as graphics hardware has evolved. We already discussed the special resolve modes supported by the drivers for certain Nvidia and ATI/AMD hardware as well as the ability to arbitrarily access subsample data in an MSAA render target, which are two notable exceptions. A third exception has been Nvidia’s Coverage Sampling Antialiasing(CSAA)[2] modes supported by their DX10 and DX11 GPU’s. These modes seek to improve the quality/performance ratio of MSAA by decoupling the coverage of triangles within a pixel from the subsamples storing the value output by the pixel shader. The idea is that while subsamples have high storage cost since they store pixel shader outputs, the coverage can be stored as a compact bitmask. This is exploited is by rasterizing at a certain subsample rate and storing coverage at that rate, but then storing the actual subsample values at a lower rate. As an example, the “8x” CSAA mode stored 8 coverage samples and 4 pixel shader output values. When performing the resolve, the coverage data is used to augment the quality of the results. Unfortunately Nvidia does not provide public documentation of this step, and so the specifics will not be discussed here. They also do not provide programmatic access to the coverage data in shaders, thus the data will only be used when performing a standard resolve through D3D or OpenGL functions.

AMD has introduced a very similar feature in their 6900 series GPU’s, which they’ve named EQAA[3]. Like Nvidia, the feature can be enabled through driver options or special MSAA quality modes but it cannot be used in custom resolves performed via shaders.

Working with HDR and Tone Mapping

Before HDR became popular in real-time graphics, we essentially rendered display-ready color values to our MSAA render target with only simple post-processing passes applied after the resolve. This meant that after resolving with a box filter, the resulting gradients along triangle edges would be perceptually smooth between neighboring pixels3. However when HDR, exposure, and tone mapping are thrown into the mix there is no longer anything close to a linear relationship between the color rendered at each pixel and the perceived color displayed on the screen. As a result, you are no longer guaranteed to get the smooth gradient you would get when using a box filter to resolve LDR MSAA samples. This can seriously affect the output of the resolve, since it can end up appearing as if no MSAA is being used at all if there is extreme contrast on a geometry edge.

This strange phenomenon was first pointed out (to my knowledge) by Humus (Emil Persson), who created a sample[4] demonstrating it as well as a corresponding ShaderX6 article. In this same sample he also demonstrated an alternative approach to MSAA resolves, where he used a custom resolve to apply tone mapping to each subsample individually before filtering. His results were pretty striking, as you can see from these images (left is a typical resolve, right is resolve after tone mapping):

HDR rendering with MSAA. The top image applies tone mapping after a standard MSAA resolve, while the bottom image applies tone mapping before the MSAA resolve.

It’s important to think about what it actually means to apply tone mapping before the resolve. Before tone mapping, we can actually consider ourselves to be working with values representing physical quantities of light within our simulation. Primarily, we’re dealing with the radiance of light reflecting off of a surface towards the eye. During the tone mapping phase, we attempt to convert from a physical quantity of light to a new value representing the color that should be displayed on the screen. What this means is that by changing where the resolve takes places, we’re actually oversampling a different signal! When resolving before tone mapping we’re oversampling the signal representing physical light being reflected towards the camera, and when resolving after tone mapping we’re oversampling the signal representing colors displayed on the screen. Therefore an important consideration we have to make is which signal we actually want to oversample. This directly ties into post-processing, since a modern game will typically have several post-processing effects needing to work with HDR radiance values rather than display colors. Thus we want to perform tone mapping as the last step in our post-processing chain. This presents a potential difficulty with the approach of tone mapping prior to resolve, since it means that all previous post-processing steps must work with a non-resolved MSAA as an input and also produce an MSAA buffer as an output. This can obviously have serious memory and performance implications, depending on how the passes are implemented.

MLAA and Other Post-Process AA Techniques

Morphological Anti-Aliasing is an anti-aliasing technique originally developed by Intel[5] that initiated a wave of performance-oriented AA solutions commonly referred to as post-process anti-aliasing. This name is due to the fact that they do not fundamentally alter the rendering/rasterization pipeline like MSAA does. Instead, they work with only a non-MSAA render target to produce their results. In this way these techniques are rather interesting, in that they do not actually rely on increasing the sampling rate in order to reduce aliasing. Instead, they use what could be considered an advanced reconstruction filter in order to approximate the results that you would get from oversampling. In the case of MLAA in particular, this reconstruction filter uses pattern-matching in an attempt to detect the edges of triangles. The pattern-matching relies on the fact that for a fixed sample pattern, common patterns of pixels will be produced by the rasterizer for a triangle edge. By examining the color of the local neighborhood of pixels, the algorithm is able to estimate where a triangle edge is located and also the orientation of the line making up that particular edge. The edge and color information is then enough to estimate an analytical description of that particular edge, which can be used to calculate the exact fraction of the pixel that will be covered by the triangle. This is very powerful if the edge was calculated correctly, since it eliminates the need for multiple sub-pixel coverage samples. In fact if the coverage amount is used to blend the triangle color with the color behind that triangle, the results will match the output of standard MSAA rendering with infinite subsamples! The following image shows some of the patterns used for edge detection, and the result after blending:

MLAA edge detection using pattern recognition (from MLAA: Efficiently Moving Antialiasing from the GPU to the CPU)

The major problems with MLAA and similar techniques occur when the algorithm does not accurately estimate the triangle edges. Looking at only a single frame, the resulting artifacts would be difficult or impossible to discern. However in a video stream the problems become apparent due to sub-pixel rotations of triangles that occur as the triangle or the camera move in world space. Take a look at the following image:

Two different triangle edge orientations resulting in the same rasterization pattern

In this image, the blue line represents a triangle edge during one frame and the green line represents the same triangle edge in the following frame. The orientation of the edge relative to the pixels has changed, however in both cases only the leftmost pixel is marked as being “covered” by the rasterizer. Consequently the same pixel pattern (marked by the blue squares in the image) is produced by the rasterizer for both frames, and the MLAA algorithm detects the same edge pattern (denoted by the thick red line in the image). As the edge continues rotating, eventually it will cover the top-middle pixel’s sample point and that that pixel will “turn on”. In the resulting video stream that pixel will appear to “pop on”, rather than smoothly transitioning from a non-covered state to a covered state. This is a trademark temporal artifact of geometric aliasing, and MLAA is incapable of reducing it. The artifact can be even more objectionable for thin or otherwise small geometry, where entire portions of the triangle will appear and disappear from frame to frame causing a “flickering” effect. MSAA and supersampling are able to reduce such artifacts due to the increased sampling rate used by the rasterizer, which results in several “intermediate” steps in the case of sub-pixel movement rather than pixels “popping” on and off. The following animated GIFs demonstrate this effect on a single rotating triangle4 (click on the images if they’re not animating for you):

Two animations of a rotating triangle. The top image has FXAA enabled, which uses techniques similar to MLAA to reconstruct edges. The bottom edge uses 4x MSAA, which supersamples the visibility test at the edges of the triangle. Notice how in the MSAA image pixels will transition through intermediate values as the triangle moves across the sub-pixel sample points. The FXAA image lacks this characteristic, despite producing smoother gradients along the edges.

Another potential issue with MLAA and similar algorithms is that they may fail to detect edges or detect “false” edges if only color information is used. In such cases the accuracy of the edge detection can be augmented by using a depth buffer and/or a normal buffer. Another potential issue is that the algorithm uses the color adjacent to a triangle as a proxy for the color behind the triangle, which could actually be different. However this tends to be non-objectionable in practice.

Footnotes

1. The rasterization process on a modern GPU can actually be quite a bit more complicated than this, but those doesn’t aren’t particularly relevant to the goals of this article. Return to text
2. This mask is directly available to ps_5_0 pixel shaders in DX11 via the SV_Coverage system value. Return to text
3. The gamma-space rendering commonly used in the days before HDR would actually produce gradients that weren’t completely smooth, although later GPU’s supported performing the resolve in linear space. Either way the results were at least pretty close to being perceptually smooth, at least compared to the results that can occur with HDR rendering. Return to text
4. These animations were captured from the sample application that I’m going to discuss in the next article. So if you’d like to see live results without compression, you can download the sample app from that article. Return to text

References

[1]http://msdn.microsoft.com/en-us/library/windows/desktop/cc627092%28v=vs.85%29.aspx
[2]http://www.nvidia.com/object/coverage-sampled-aa.html
[3]http://developer.amd.com/Resources/archive/ArchivedTools/gpu/radeon/assets/EQAA Modes for AMD HD 690 Series Cards.pdf
[4]http://www.humus.name/index.php?page=3D&ID=77
[5]MLAA: Efficiently Moving Antialiasing from the GPU to the CPU

Next article in the series: Experimenting with Reconstruction Filters for MSAA Resolve

Applying Sampling Theory To Real-Time Graphics

Previous article in the series: Signal Processing Primer

Computer graphics is a field that constantly deals with discrete sampling and reconstruction of signals, although you might not be aware of it yet. This article focuses on the ways in which sampling theory can be applied to some of the common tasks routinely performed in graphics and 3D rendering.

Image Scaling

The concepts of sampling theory can are most easily applicable to graphics in the form of image scaling. An image, or bitmap, is typically the result of sampling a color signal at discrete XY sample points (pixels) that are evenly distributed on a 2D grid. To rescale it to a different number of pixels, we need to calculate a new color value at sample points that are different from the original pixel locations. In the previous article we mentioned that this process is known as resampling, and is also referred to as interpolation. Any graphics programmer should be familiar with the point (also known as nearest-neighbor) and linear (also known as bilinear) interpolation modes supported natively in GPU’s which are used when sampling textures. In case you’re not familiar, point filtering simply picks the closest texel to the sample point and uses that value. Bilinear filtering on the other hand picks the 4 closest texels, and linearly interpolates those values in the X and Y directions based on the location of the sample point relative to the texels. It turns out that these modes are both just implementations of a reconstruction filter, with point interpolation using a box function and linear interpolation using a triangle function. If you look back at the diagrams showing reconstruction with a box function and triangle function, you can see actually see how the reconstructed signal resembles the visual result that you get when performing point and linear sampling. With the box function you end up getting a reconstructed value that’s “snapped” to the nearest original sample point, while with a triangle function you end up with straight lines connecting the sample points. If you’ve used point and linear filtering, you probably also intuitively understand that point filtering inherently results in more aliasing than linear filtering when resizing an image. For reference, here’s an image showing the same rotated checkerboard pattern being resampled with a box filter and a triangle filter:

An image of a rotated checkerboard pattern being enlarged with a box filter (point filtering) and a triangle filter (bilinear filtering)

Knowing what we do about aliasing and reconstruction filters, we can now put some mathematical foundation behind what we intuitively knew all along.  The box function’s frequency domain equivalent (the sinc function) is smoother and wider than the triangle function’s frequency domain equivalent (the sinc2 function), which results in significantly more postaliasing. Of course we should note even though the triangle function might be considered among the the “low end” of reconstruction filters in terms of quality, it is still attractive due to its low performance impact. Not only is the triangle function very cheap to evaluate at a given point in terms of ALU instructions, but more importantly the function evaluates to 0 for all distances greater than or equal to 1. This is important for performance, because it means that any pixels that are further than a distance of 1.0 from the resampled pixel location will not have to be considered. Ultimately this means that we only need to fetch a maximum of 4 pixels (in a 2×2 area) for linear filtering, which limits bandwidth usage and cache misses. For point filtering the situation is even better, since the box function hits zero at 0.5 (it has a width of 1.0) and thus we only need to fetch one pixel.

Outside of realtime 3D rendering, it is common to use cubic filters (also known as bicubic filters) as a higher-quality alternative to point and linear filters when scaling images. A cubic filter is not a single filtering function, but rather a family of filters that interpolate using a 3rd-order (cubic) polynomial. The use of such functions in image processing dates back to Hsieh Hou’s paper entitled “Cubic Splines for Image Interpolation and Digital Filtering”[1] which proposed using cubic B-splines as the basis for interpolation. Cubic splines are attractive for filtering because they can be used to create functions where the 1st derivative is continuous across the entire domain range, which known as being C1 continuous. Being C1 continuous also implies that the function is C0 continuous, which means that the 0th derivative is also continuous. So in other words, the function itself will would have no visible discontinuities if you were to plot the function. Remember that there is an inverse relationship between rate of change in the spatial domain and the frequency domain, therefore a smooth function without discontinuities is desirable for reducing postaliasing. A second reason that cubic splines are attractive is that the functions can be made to be zero-valued after a certain point, much like a box or triangle function. This means the filter will have a limited width, which is optimal from a performance point of view. Typically cubic filters use functions defined along the [-2, 2] range, which is double the width of a unit triangle function. Finally, a third reason for the attractiveness of cubic filters is that they can be made to produce acceptable results when applied as a seperable filter. Seperable filters can be applied independently in two passes along the X and Y dimensions, which reduces the number of neighboring pixels that need to be considered when applying the filter and thus improves the performance.

In 1988, Don Mitchell and Arun Netravali published a paper entitled Reconstruction Filters in Computer Graphics[2] which narrowed down the set of possible of cubic filtering functions into a generalized form dependent on two parameters called B and C. This family of functions produces filtering functions that are always C1 continuous, and are normalized such that area under the curve is equal to one. The general form they devised is found below:

Generalized form for cubic filtering functions

Below you can find  graphs of some of the common curves in use by popular image processing software[3], as well as the result of using them to enlarge the rotated checkerboard pattern that we used earlier:

Common cubic filtering functions using Mitchell’s generalized form for cubic filtering. From top- left going clockwise: cubic(1, 0) AKA cubic B-spline, cubic(1/3, 1/3) AKA Mitchell filter, cubic(0, 0.75) AKA Photoshop bicubic filter, and cubic(0, 0.5) AKA Catmull-Rom spline

Cubic filters used to enlarge a rotated checkerboard pattern

One critical point touched upon in Mitchell’s paper is that the sinc function isn’t usually desirable for image scaling, since by nature the pixel structure of an image leads to discontinuities which results in unbounded frequencies. Therefore ideal reconstruction isn’t possible, and ringing artifacts will occur due to Gibb’s phenomenon. Ringing was identified by Schrieber and Troxel[4] as being one of four negative artifacts that can occur when using cubic filters, with the other three being aliasing, blurring and anisotropy effects. Blurring is recognized as the loss of detail due to too much attenuation of higher frequencies, and is often caused by a filter kernel that is too wide. Anisotropic effects are artifacts that occur due to applying the function as a separable filter, where the resulting 2D filtering function doesn’t end up being radially symmetrical.

Mitchell suggested that the purely frequency domain-focused techniques of filter design were insufficient for designing a filter that produces subjectively pleasing results to the human eye, and instead emphasized balancing the 4 previously-mentioned artifacts against the amount of postaliasing in order to design a high-quality filter for image scaling. He also suggested studying human perceptual response to certain artifacts in order to subjectively determine how objectionable they may be. For instance, Earl Brown[5] discovered that ringing from a single negative lobe can actually increase perceived sharpness in an image, and thus can be a desirable effect in certain scenarios. He also pointed out that ringing from multiple negative lobes, such as what you get from a sinc function, will always degrade quality. Here’s an image of our friend Rocko enlarged with a Sinc filter, as well as an image of a checkerboard pattern enlarged with the same filter:

Ringing from multiple lobes caused by enlargement with a windowed sinc filter

Ultimately, Mitchell segmented the domain of his B and C parameters into what he called “regions of dominant subjective behavior. In other words, he determined which values of each parameter resulted in undesirable artifacts. In his paper he included the following chart showing which artifacts were associated with certain ranges of the B and C parameters:

A chart showing the dominant areas of negative artifacts for Mitchell’s generalized cubic function. From “Reconstruction Filters in Computer Graphics” [Mitchell 88]

Based on his analysis, Mitchell determined that (1/3, 1/3) produced the highest-quality results. For that reason, it is common to refer to the resulting function as a “Mitchell filter”. The following images show the results of using non-ideal parameters to enlarge Rocko, as well as the results from using Mitchell’s suggested parameters:

Undesirable artifacts caused by enlargement using cubic filtering. The top left image demonstrates anisotropy effects, the top right image demonstrates excessive blurring, and the bottom left demonstrates excessive ringing. The bottom right images uses a Mitchell filiter, representing ideal results for a cubic filter. Note that these images have all been enlarged an extra 2x with point filtering after resizing with the cubic filter, so that the artifacts are more easier to see.

Texture Mapping

Real-time 3D rendering via rasterization brings about its own particular issues related to aliasing, as well as specialized solutions for dealing with them. One such issue is aliasing resulting from resampling textures at runtime in order to map them to a triangle’s 2D projection in screen space, which I’ll refer to as texture aliasing. If we take the case of a 2D texture mapped to a quad that is perfectly perpendicular to the camera, texture sampling essentially boils down to a classic image scaling problem: we have a texture with some width and height, the quad is scaled to cover a grid of screen pixels with a different width and height, and the image must be resampled at the pixel locations where pixel shading occurs. We already mentioned in the previous section that 3D hardware is natively capable of applying “linear” filtering with a triangle function. Such filtering is sufficient for avoiding severe aliasing artifacts when upscaling or downscaling, although for downscaling this only holds true when downscaling by a factor <= 2.0. Linear filtering will also prevent aliasing when rotating an image, which is important in the context of 3D graphics since geometry will often be rotated arbitrarily relative to the camera. Like image scaling, rotation is really just a resampling problem and thus the same principles apply. The following image shows how the pixel shader sampling rate changes for a triangle as it’s scaled and rotated:

Pixel sampling rates for a triangle. Pixel shaders are executed at a grid of fixed locations in screen space (represented by the  red dots in the image), thus the sampling rate for a texture depends on the position, orientation, and projection of a given triangle. The green triangle represents the larger blue triangle after being scaled and rotated, and thus having a lower sampling rate.

Mipmapping

When downscaling by a factor greater than 2, linear filtering leads to aliasing artifacts due to high-frequency components of the source image leaking into the downsampled version. This manifests as temporal artifacts, where the contents of the texture appear to flicker as a triangle moves relative to the camera. This problem is commonly dealt with in image processing by widening the filter kernel so that its width is equal to the size of the downscaled pixel. So for instance if downscaling from 100×100 to 25×25, the filter kernel would be greater than or equal in width to a 4×4 square of pixels in the original image. Unfortunately widening the filter kernel isn’t usually a suitable option for realtime rendering, since the number of memory accesses increases with O(N2) as the filter width increases. Because of this a technique known as mipmapping is used instead. As any graphics programmer should already know, mipmaps consist of a series of prefiltered versions of a 2D texture that were downsampled with a kernel that’s sufficiently wide enough to prevent aliasing. Typically these downsampled versions are generated for dimensions that are powers of two, so that each successive mipmap is half the width and height of the previous mipmap. The following image from Wikipedia shows an example of typical mipmap chain for a texture:

An example of a texture with mipmaps. Each mip level is roughly half the size of the level before it. Image take from Wikipedia.

A box function is commonly used for generating mip maps, although it’s possible to use any suitable reconstruction filter when downscaling the source image. The generation is also commonly implemented recursively, so that each mip level is generated from the mip level preceding it. This makes the process computationally cheap, since a simple linear filter can be used at each stage in order to achieve the same results as a wide box filter applied to the highest-resolution image. At runtime the pixel shader selects the appropriate mip level by calculating the gradients of the texture coordinate used for sampling, which it does by comparing texture coordinate used for one pixel to the texture coordinate used in the neighboring pixels of a 2×2 quad. These gradients, which are equal to the partial derivatives of the texture coordinates with respect to X and Y in screen space, are important because they tell us the relationship between a given 2D image and the rate at which we’ll sample that image in screen space. Smaller gradients mean that the sample points are close together, and thus we’re using a high sampling rate. Larger gradients result from the sample points being further apart, which we can interpret to mean that we’re using a low sampling rate. By examining these gradients we can calculate the highest-resolution mip level that would provide us with an image size less than or equal to our sampling rate. The following image shows a simple example of mip selection:

Using texture coordinate gradients to select a mip level for a 4×4 texture.

In the image, the two red rectangles represent texture-mapped quads of different sizes rasterized to a 2D grid of pixels. For the topmost quad, the a value of 0.25 will be computed as the partial derivative for the U texture coordinate with respect to the X dimension, and the same value will be computed as the partial derivative for the V texture coordinate with respect to the Y dimension. The larger of the two gradients is then used to select the appropriate mip level based on the size of the texture. In this case, the length of the gradient will be 0.25 which means that the 0th (4×4) mip level will be selected. For the lower quad the size of the gradient is doubled, which means that the 1st mip level will be selected instead. Quality can be further improved through the use of trilinear filtering, which linearly interpolates between the results of bilinearly sampling the two closest mip levels based on the gradients. Doing so prevents visible seams on a surface at the points where a texture switches to the next mip level.

One problem that we run into with mipmapping is when an image needs to be downscaled more in one dimension than in the other. This situation is referred to as anisotropy, due to the differing sampling rates with respect to the U and V axes of the texture. This happens all of the time in 3D rendering, particularly when a texture is mapped to a ground plane that’s nearly parallel with the view direction. In such a case the plane will be projected such that the V gradients grow more quickly than the U gradients as distance from the camera increases, which equates to the sampling rate being lower along the V axis. When the gradient is larger for one axis than the other, 3D hardware will use the larger gradient for mip selection since using the smaller gradient would result in aliasing due to undersampling. However this has the undesired effect of over-filtering along the other axis, thus producing a “blurry” result that’s missing details. To help alleviate this problem, graphics hardware supports anisotropic filtering. When this mode is active, the hardware will take up to a certain number of “extra” texture samples along the axis with the larger gradient. This allows the hardware to “reduce” the maximum gradient, and thus use a higher-resolution mip level. The final result is equivalent to using a rectangular reconstruction filter in 2D space as opposed to a box filter. Visually such a filter will produce results such that aliasing is prevented, while details are still perceptible. The following images demonstrate anisotropic filtering on a textured plane:

A textured plane without anisotropic filtering, and the same plane with 16x anistropic filtering. The light grey grid lines demonstrate the distribution of pixels, and thus the rate of pixel shading in screen space. The red lines show the U and V axes of the texture mapped to plane. Notice the lack of details in the grain of the wood on the left image, due to over-filtering of the U axis in the lower-resolution mip levels.

Geometric Aliasing

A second type of aliasing experienced in 3D rendering is known as geometric aliasing. When a 3D scene composed of triangles is rasterized, the visibility of those triangles is sampled at discrete locations typically located at the center of the screen pixels. Triangle visibility is just like any other signal in that there will be aliasing in the reconstructed signal when the sampling rate is inadequate (in this case the sampling rate is determined by the screen resolution). Unfortunately triangular data will always have discontinuities, which means the signal will never be bandlimited and thus no sampling rate can be high enough to prevent aliasing. In practice these artifacts manifest as the familiar jagged lines or “jaggies” commonly seen in games and other applications employing realtime graphics. The following image demonstrates how these aliasing artifacts occur from rasterizing a single triangle:

Geometric aliasing occurring from undersampling the visibility of a triangle. The green, jagged line represents the outline of the triangle seen on a where pixels appear as squares of a uniform color.

Although we’ve already established that no sampling rate would allow us to perfectly reconstruct triangle visibility, it is possible to reduce aliasing artifacts with a process known as oversampling. Oversampling essentially boils down to sampling a signal at some rate higher than our intended output, and then using those samples points to reconstruct new sample points at the target sampling rate. In terms of 3D rendering this equates to rendering at some resolution higher than the output resolution, and then downscaling the resulting image to the display size. This process is known as supersampling, and it’s been in use in 3D graphics for a very long time. Unfortunately it’s an expensive option, since it requires not just rasterizing at a higher resolution but also shading pixels at a higher rate. Because of this, an optimized form of supersampling known as multi-sample antialiasing (abbreviated as MSAA) was developed specifically for combating geometric aliasing. We’ll discuss MSAA and geometric aliasing in more detail in the following article.

Shader Aliasing

A third type of aliasing that’s common in modern 3D graphics is known as shader aliasing. Shader aliasing is similar to texture aliasing, in that occurs due to the fact that the pixel shader sampling rate is fixed in screen space. However the distinction is that shader aliasing refers to undersampling of signals that are evaluated analytically in the pixel shader using mathematical formulas, as opposed to undersampling of a texture map. The most common and noticeable case of shader aliasing results from applying per-pixel specular lighting with low roughness values (high specular exponents for Phong and Blinn-Phong). Lower roughness values result in narrower lobes, which make the specular response into a higher-frequency signal and thus more prone to undersampling. The following image contains plots of the N dot H response of a Blinn-Phong BRDF with varying roughness, demonstrating it becomes higher frequency for lower roughnesses:

N dot H response of a Blinn-Phong BRDF with various exponents. Note how the response becomes higher-frequency for higher exponents, which correspond to lower roughness values. Image from Real-Time Rendering, 3rd Edition, A K Peters 2008

Shader aliasing is most likely to occur when normal maps are used, since they increase the frequency of the surface normal and consequently cause the specular response to vary rapidly across a surface. HDR rendering and physically-based shading models can compound the problem even further, since they allow for extremely intense specular highlights relative to the diffuse lighting response. This category of aliasing is perhaps the most difficult to solve, and as of yet there are no silver-bullet solutions. MSAA is almost entirely ineffective, since the pixel shading rate is not increased compared to the non-MSAA. Supersampling is effective, but prohibitively expensive due to the increased shader and bandwidth costs required to shade and fill a larger render target. Emil Persson demonstrated a method of selectively supersampling the specular lighting inside the pixel shader[6], but this too can be expensive if the number of lights are high or if multiple normal maps need to be blended in order to compute the final surface normal.

A potential solution that has been steadily gaining some ground[7][8] is to modify the specular shading function itself based on normal variation. The theory behind this is that microfacet BRDF’s naturally represent micro-level variation along a surface, with the amount of variation being based on a roughness parameter. If we increase the roughness of a material as the normal map details become relatively smaller in screen space, we use the BRDF itself to account for the undersampling of the normal map/specular lighting response. Increasing roughness decreases the frequency of the resulting reflectance, which in turn reduces the appearance of artifacts. The following image contains an example of using this technique, with an image captured with 4x shader supersampling as a reference:

The topmost image shows an example of shader aliasing due to undersampling a high-frequency specular BRDF combined with a high-frequency normal map. The middle image shows the same scene with 4x shader supersampling applied. The bottom image shows the results of of using a variant of CLEAN mapping to limit the frequency of the specular response.

This approach (and others like it) can be considered to be part of a broader category of antialiasing techniques known as prefiltering. Prefiltering amounts to applying some sort of low-pass filter to a signal before sampling it, with the goal of ensuring that the signal’s bandwidth is less than half of the sampling rate. In a lot of cases this isn’t practical for graphics since we don’t have adequate information about what we’re sampling (for instance, we don’t know what triangle should is visible for a pixel until we sample and raterize the triangle). However in the case of specular aliasing from normal maps, the normal map contents are known ahead of time.

Temporal Aliasing

So far, we have discussed graphics in terms of sampling a 2D signal. However we’re often concerned with a third dimension, which is time. Whenever we’re rendering a video stream we’re also sampling in the time domain, since the signal will completely change as time advances. Therefore we must consider sampling along this dimension as well, and how it can produce aliasing.

In the case of video we are still using discrete samples, where each sample is a complete 2D image representing our scene at some period of time. This sampling is similar to our sampling in the spatial domain: there is some frequency of the signal we are sampling, and if we undersample that signal aliasing will occur. One classic example of a temporal aliasing is the so-called “wagon-wheel effect”, which refers to the phenomenon where a rotating wheel may appear to rotate more slowly (or even backwards) when viewed in an undersampled video stream. This animated GIF from Wikipedia demonstrates the effect quite nicely:

A demonstration of the wagon-wheel effect that occurs due to temporal aliasing. In the animation the camera is moving to the right at a constant speed, yet the shapes appear to speed up, slow down, and even switch direction. Image taken from Wikipedia.

In games, temporal sampling artifacts usually manifest as “jerky” movements and animations.  Increases in framerate correspond to an increase in sampling rate along the time domain, which allows for better sampling of faster-moving content. This is directly analogous to the improvements that are visible from increasing output resolution: more details are visible, and less aliasing is perceptible.

The most commonly-used anti-aliasing technique for temporal aliasing is motion blur. Motion blur actually refers to an effect visible in photography, which occurs due to the shutter of the camera being open for some non-zero amount of time. This produces a result quite different than what we produce in 3D rendering, where by default we get an image representing one infinitely-small period of time. To accurately simulate the effect, we could supersample in the time domain by rendering more frames than we output and applying a filter to the result. However this is prohibitively expensive just like spatial supersampling, and so approximations are used. The most common approach is to produce a per-pixel velocity buffer for the current frame, and then use that to approximate the result of oversampling with a blur that uses multiple texture samples from nearby pixels. Such an approach can be considered an example of advanced reconstruction filter that uses information about the rate of change of a signal rather than additional samples in order to reconstruct an approximation of the original sample. Under certain conditions the results can be quite plausible, however in many cases noticeable artifacts can occur due to the lack of additional sample points. Most notably these artifacts will occur where the occlusion of a surface by another surface changes during a frame, since information about the occluded surface is typically not available to the post-process shader performing the reconstruction. The following image shows three screenshots of a model rotating about the camera’s z-axis: the model rendered with no motion blur, the model rendered with Morgan McGuire’s post-process motion blur technique[9] applied using 16 samples per pixel, and finally the model rendered temporal supersampling enabled using 32 samples per frame”

A model rendered without motion blur, the same model rendered with post-process motion blur, and the same model rendered with temporal supersampling.

References

[1]Hou, Hsei. Cubic Splines for Image Interpolation and Digital Filtering. IEEE Transactions on Acoustics, Speech, and Signal Processing. Vol. 26, Issue 6. December 1978.
[2]Mitchell, Don P. and Netravali, Arun N. Reconstruction Filters in Computer Graphics. SIGGRAPH ’88 Proceedings of the 15th annual conference on Computer graphics and interactive techniques.
[3]http://entropymine.com/imageworsener/bicubic/
[4] Schreiber, William F. Transformation Between  Continuous  and Discrete  Representations  of Images:  A  Perceptual  Approach. IEEE Transactions on Pattern Analysis and Machine Intelligence. Volume 7, Issue 2. March 1985.
[5] Brown, Earl F. Television: The  Subjective  Effects  of Filter  Ringing  Transients. February, 1979.
[6]http://www.humus.name/index.php?page=3D&ID=64
[7]http://blog.selfshadow.com/2011/07/22/specular-showdown/
[8]http://advances.realtimerendering.com/s2012/index.html
[9]McGuire, Morgan. Hennessy, Padraic. Bukowski, Michael, and Osman, Brian. A Reconstruction Filter for Plausible Motion Blur. I3D 2012.

Next article in the series: A Quick Overview of MSAA

Signal Processing Primer

For a theoretical understanding of aliasing and anti-aliasing, we can turn to the fields of signal processing[1] and sampling theory[2]. This article will explain some of the basics of  these two related field in my own words, taking a more theoretical point of view. In the following article the concepts covered here will be used to analyze common aspects of real-time graphics, so that we can describe them in terms of signal processing. If you’d like some further reading, I’d recommend consulting chapter 7 of Physically Based Rendering[3], chapter 5 of Real-Time Rendering, 3rd Edition[4] or Principles of Digital Image Synthesis[5] (which is actually freely available for download)

As always, I’m more interested in the material being correct than I am in sounding like I’m smart. So if you see anything that you feel is incorrect or have any additional insights to share, please let me know in the comments!

Sampling Theory

Sampling theory deals with the process of taking some continuous signal that varies with one or parameters, and sampling the signal at discrete values of those parameters. If you’re not familiar with signals and signal processing, you can think of a signal as some continuous function of any dimension that varies along its domain. To sample it, we then calculate that function’s value at certain points along the curve. Usually the points at which we sample are evenly-spaced apart, which we call uniform sampling. So for instance if we had a 1D signal defined as f(x) = x^2, and we might sample it at x =0, x = 1, x = 2 x = 3, and so on. This would give us our set of discrete samples, which in this case would be 0, 1, 4, 9, and continuing on in that fashion. Here’s an image showing our continuous function of  f(x) = x^2, and the result of discretely sampling that function at integer values of x:

Discretely sampling a continuous function

Working with discrete samples has a lot of advantages. For instance, it allows us to store a representation of an arbitrary signal in memory by simply storing the sampled values in an array. It also allows us to perform operations on a signal by repeatedly applying the same operation in a loop for all sample points. But what happens when we need the value of s signal at a location that we didn’t sample at?  In such cases, we can use a process known as reconstruction to derive an approximation of the original continuous function. With this approximation we can then discretely sample its values at a new set of sample points, which is referred to as resampling. You may also see this referred to as interpolation in cases where local discrete values are used to compute an “in between” value.

The actual process of reconstruction involves applying some sort of reconstruction filter to a set of discrete samples. Typically this filter is some function that is symmetrical about x=0, often with non-zero values only in a small region surrounding x=0. The following image contains a few functions commonly used as reconstruction filters:

Various functions used as reconstruction filters. Starting from the top left and moving clockwise: the box function, the triangle function, and the sinc function. Image from Real-Time Rendering, 3rd Edition, A K Peters 2008

The filter is applied using a process known as convolution. In the case of discrete sample points, a convolution implies multiplying the sample values with an instance of the filter function translated such that it is centered about the same point, and then summing the result from all sample points. If you’re having trouble understand what this means, take a look at the following three images which show the result of convolution with three common reconstruction filters:

Discretely-sampled signals being reconstructed with a box function, a triangle function, and a sinc function. Images from Real-Time Rendering, 3rd Edition, A K Peters 2008

If you’ve ever written a full-screen Gaussian blur shader, then you’ve used convolution. Think about how you would write such a shader: you loop over nearby pixel values (sample points) in a texture, multiply each value by the Gaussian function evaluated using the distance from your output pixel, and sum the results. Evaluating a function using the distance to the sample point is equivalent to translating the filter function to the location of the sample point, although you may not have thought of it this way.

Of the common reconstruction filters, the sinc filter is particularly interesting. This is because it is theoretically possible to use it to exactly reconstruct the original continuous signal, provided that the signal was adequately sampled. This is known as ideal reconstruction. To define what “adequately sampled” means for a continuous signal, we must now discuss aliasing.

Key Takeaway Points:

  • Continuous signals(functions) can be sampled at discrete sample points
  • An approximation of the original continuous signal can be reconstructed by applying a filter to the discrete sample values

Frequency and Aliasing

Signals are often described in terms of their frequency, which in rough terms describes how quickly a signal changes over their domain. In reality a signal is not composed of just one frequency, but can have an entire spectrum of frequencies. Mathematically a signal can be converted from its original representation (often referred to as the time domain or spatial domain, depending on the context) to its spectrum of frequencies (known as the frequency domain) using the Fourier transform. Once in the frequency domain, it can be determined if there is some maximum frequency where all frequencies above that have an intensity of zero. If such a maximum frequency exists, the signal is said to be bandlimited, which means we can determine the bandwidth of that signal.  Depending on the context, the term “bandwidth” can be either the passband bandwidth or the baseband bandwidth. The passband bandwidth is equal to the maximum frequency minus the minimum frequency, while the baseband bandwidth simply refers to the refers to the maximum frequency. With sampling theory we are primarily concerned with the baseband bandwidth, because it is used to determine the Nyquist rate of the signal. The Nyquist rate is minimum rate at which the signal should be sampled in order to prevent aliasing from occurring, and it is equal to 2 times the baseband bandwidth. The term “aliasing” refers the fact that a signal can become indistinguishable from a lower-frequency signal when undersampled. The following image demonstrates this phenomenon with two sine waves of different frequencies, where the samples would be the same for either signal:

Aliasing of a sampled sine wave. Image from Wikipedia

In practice, aliasing that occurs due to undersampling will result in errors in the reconstructed signal. So in other words, the signal you end up with will be different than the one you were originally sampling. For signals that are not bandlimited, there is no maximum frequency and thus there is no sampling rate that won’t result in aliasing after reconstruction.

To better understand how and why aliasing occurs, it can be helpful to look at things in the frequency domain. Let’s start with the plot of the frequency spectrum for an arbitrary signal:

Frequency spectrum of an arbitrary signal. Image from Wikipedia.

As we can see from the plot, there is a maximum frequency located at point “B”, meaning that the signal is bandlimited and has a bandwidth equal to B. When this signal is discretely sampled, an infinite number of “copies” of the signal will appear alongside the original at various points. Here’s an image illustrating how it looks:

Replicas of a signals frequency spectrum. Image from Wikipedia

The location of the signal duplicates is determined by the sampling rate, which is marked as “fs” in the plot. Since these duplicates are present, we must use a filter (the reconstruction filter) to remove these duplicates and leave us with only the frequency spectrum that was within the original signal’s bandwidth (referred to as the baseband frequencies). The obvious solution is to use a box function in the frequency domain, since a box function implies multiplying a certain range of values by 1 and all other values by 0. So if we were to use a box function with a width of B, we would remove the duplicates while leaving the original signal intact. The following diagram shows how this might work:

A reconstruction filter is used to isolate the original copy of a signal’s spectrum. Image from Wikipedia

What’s important to keep in mind is that we typically need to apply our reconstruction filter in the spatial domain, and not in the frequency domain. This means that we need to use the spatial domain equivalent of a box function in frequency space, and it turns out that this is the previously-mentioned sinc function. By now it should make sense why the sinc function is called the ideal reconstruction filter, since it has the ability to leave certain frequency ranges untouched while completely filtering out other frequencies. For this same reason it is also common to refer to the frequency domain box function as the ideal low-pass filter.

Now let’s look at what happens when we don’t sample the signal at an adequate rate. As we saw earlier, the duplicates of a signal will appear at multiples of the sampling frequency. So the higher our sampling rate the further apart they will be, while the lower our sampling rate the closer they will be. Earlier we learned that the critical sampling rate for a signal is 2B, so let’s look at the plot of a signal that’s been sampled at a rate lower than 2B:

Inadequate sampling rate results in overlap of a signal’s replicas. Image from Wikipedia

Once we dip below the Nyquist rate of the signal, the duplicates begin to overlap in the frequency domain. After this happens it is no longer possible to isolate the original copy of the signal with a sinc filter, and thus we end up with aliasing. The bottom plot in the above image demonstrates what an alias of the original signal would look like. Since its frequency response is identical to that of the original signal, it is completely indistinguishable.

Key Takeaway Points:

  • Signals can be decomposed into a spectrum of frequencies, with the spectrum being tied to the rate of change of the signal
  • Signals with a maximum frequency are bandlimited
  • A signal’s Nyqust rate is equal to two times its maximum frequency, and this is the minimum sampling rate required to perfectly reconstruct a signal without aliasing
  • Signal reconstruction can be viewed as the process of removing “replicas” of a signal’s spectrum in the frequency domain

Reconstruction Filter Design

Aliasing that results from undersampling is referred to as prealiasing, since it occurs before reconstruction takes place. However it is also possible for artifacts to occur due to the reconstruction process itself. For instance, imagine if we used a box function that was too wide when applying a reconstruction filter. The result would look like this:

A wide reconstruction filter fails to isolate the original copy of a signal’s spectrum. Image from Wikipedia

With such a reconstruction we would still end up with artifacts in the reconstructed signal, even though it was adequately sampled.

As we’ve demonstrated, using the wrong size box function in the frequency domain is one way to adversely affect the quality of our reconstructed signal. However we’ve already mentioned that a variety of functions can be applied as a filter in the spatial domain, and these functions all have a frequency domain counterpart that differs from the box function that we previously discussed. With this in mind, we can reason that the choice in filter will affect how well we isolate the signal in the frequency domain, and that this will affect how much postaliasing is introduced into the reconstructed signal. Let’s look at some common filtering functions, and compare their plots with the plots of their frequency domain counterparts:

Box function -> Sinc Function

Triangle Function -> (Sinc Function)2

Gaussian Function -> Gaussian Function

(Sinc Function)2 -> Triangle Function

Sinc Function -> Box Function

By looking at the frequency domain counterpart of a spatial domain filter function, we can get a rough idea of how well it’s going to preserve the frequency range we’re interested in and filter out the extraneous copies of our signal’s spectrum. The field of filter design is primarily concerned with this process of analyzing a filter’s frequency domain spectrum, and using that to evaluate or approximate that filter’s overall performance. Looking that the spectrums plotted in the above images, we can see that the non-sinc functions will all attenuate the baseband frequencies in some way. For some functions we can also observe that the frequency domain equivalent has no maximum value above which all frequencies have a value of zero, which means that the frequency domain filter extends infinitely in both directions. This ultimately means that all of the infinite replicas of the signal’s spectrum will bleed into the reconstructed signal to some extent, which will cause aliasing.

One general pattern that we can observe from looking at the plots of the spatial domain filter functions and their frequency domain equivalents is that there is an inverse relationship between rate of change in one representation and its counterpart. For instance, have a look at the spatial domain box function. This function has a discontinuity at some value, resulting in infinite rate of change. Consequently its frequency domain counterpart is the sinc, which extends to infinity representing the infinite rate of change inherent in the box function’s discontinuity. By the same token a sinc function in the spatial domain equates to a box function in the frequency domain since the relationship is reciprocal. The Gaussian function is a special case, where the spatial domain and frequency domain counterparts are the same function. For this reason the Gaussian function represents the exact “midpoint” between smoothly-changing functions and sharply-changing functions in the spatial and frequency domains. Another important aspect of this relationship is that by making a filtering function “wider” (which can be achieved by dividing the input distance by some value greater than 1), the resulting frequency spectrum for that function will become more “narrow”. As an example, have a look at the spectrum of a “unit” box function with width of 1.0 compared with the spectrum of a box function with width of 4.0

Frequency spectrum of a box function with width of 1.0

Frequency spectrum of a box function with width of 4.0

The graphs clearly show that as the filter kernel becomes wider, the magnitude of the lowest frequencies becomes higher. This is really just another manifestation of the behavior we noted earlier regarding rate of change in the spatial domain and the frequency domain, since “wider” functions will change more slowly over time and thus will have more low frequency components in its frequency spectrum.

One difficult aspect of filter design is that we often must not just consider the filter’s frequency domain representation, but we also must consider the effect that its spatial domain representation will have on the reconstructed signal. In particular, we must be careful with filters that have negative lobes, such as the sinc filter. Such filters can produce an effect known as ringing when applied to sharp discontinuities, where the reconstructed signal oscillates about the signal being sampled. Take a look at the plot of a square wave being reconstructed with a sinc filter:

Gibbs phenomenon resulting from a square wave being reconstructed with a sinc filter

Key Takeaway Points:

  • Error resulting from inadequate sampling rate is known as prealiasing. Error introduced through poor filtering is known as postaliasing.
  • A filter’s ability to limit aliasing can be estimated by observing its frequency domain representation
  • A filter’s rate of change in the spatial domain is inversely related to its rate of change in the frequency domain
  • A filter’s spatial domain representation can also have an effect on the quality of the reconstructed signal, with the most notable effect being ringing occurring at discontinuities.

References

[1]http://en.wikipedia.org/wiki/Signal_processing
[2]http://en.wikipedia.org/wiki/Nyquist-Shannon_sampling_theorem
[3]Pharr, Matt and Humphreys, Greg. Physically Based Rendering – From Theory to Implementation, 2nd Edition.
[4]Akenine-Möller, Tomas, Haines, Eric, and Hoffman, Naty. Real-Time Rendering, 3rd Edition
[5]Glassner, Andrew. Principles of Digital Image Synthesis

Next article in the series:

Applying Sampling Theory To Real-Time Graphics

Upcoming Series on Signal Processing and MSAA

Aliasing is everywhere in graphics. Almost everything we do uses discrete sampling, which means almost everything can produce a variety of aliasing artifacts. The folks in the film industry have historically taken a “no aliasing allowed” stance in their work, but in real-time graphics we’re still producing games with more sparkling and shimmering than a glitzy prom dress. If we’re going to do anything about that problem, I think it’s important that we all try to have at least a basic understanding of signal processing. Signal processing is something that I had a lot of experience with during my previous life as an engineering student, but even with that experience it wasn’t always immediately obvious to me how some of the fundamentals applied to common aspects of 3D rendering. You might see people mentioning signal processing terminology when talking about some particular technique, but it can be difficult to use those small pieces to assemble the bigger picture.

Recently I was doing some experimenting with MSAA resolves, and read quite a bit of background material to refresh my knowledge. I starting taking some notes, and after a few pages worth I decided to organize them a bit into (lengthy) article that describes the basics of signal processing when it comes to graphics. Hopefully this material can be useful if you need a refresher yourself, or if you’ve yet to learn these basics at all. I’ve even gone through the trouble of listing a few of the most important points in bullet point form at the end of each section, so if you’re new to this I’d recommend at least skimming through those. After that I’ve also prepared some material on the basics of MSAA, since it can be a confusing topic.

After the articles I’m going to share the results of my experiments with custom MSAA resolves, along with a sample application. So if you’re already an expert (or you just really want to get to the code), then you’ll want to wait a bit so that you can skip ahead to the new material.

Article 1: Signal Processing Primer

Article 2: Applying Sampling Theory To Real-Time Graphics

Article 3: A Quick Overview of MSAA

Article 4: Experimenting with Reconstruction Filters for MSAA Resolve

OpenGL Insights

Some time ago Charles de Rousiers adapted my Bokeh Depth of Field sample to OpenGL, and we contributed it as a chapter to the recently-released OpenGL Insights. Bokeh is still an ongoing area of R&D for myself, and hopefully I’ll be able to share some more improvements and optimizations once my current project is announced or released.

There’s going to be an author meet-up/book signing a the CRC Press SIGGRAPH booth (#929) this Tuesday from 2-3PM, and I’ll most likely be stopping by. So if you want to talk about Bokeh o(r anything else graphics-related), then feel free to stop by and chat me up!

Follow

Get every new post delivered to your Inbox.

Join 56 other followers