Depth shadow mapping Dx9 depth-range gotchas

· by Steve · Read in about 10 min · (1998 Words)

Pretty much everyone wants to use texture shadows in their real-time scenes these days; since they are calculated entirely on the GPU they scale well with modern chipsets, they are capable of shadowing alpha-rejected materials correctly (both as casters and receivers), they can be extended relatively simply to have soft edges, a variable penumbra and opacity with distance, and all kinds of other nice features.

Depth-shadowmapping is the approach whereby you render the light-space depth (or some derivative thereof) of the shadow caster into a (typically floating point) shadow texture, then when rendering the main scene perform comparisons of the light-space depth of the pixel being rendered versus what is stored in that shadow texture. All pretty straight forward, and OGRE comes already set up with mechanisms to allow you to do that, in quite a varied number of configurations. However, when people write their own shaders, I’ve often found that they come across a problem with the depth range that they store and access, particularly in Dx9, and don’t know why. I’ve seen clients come across this, and I thought a general post might be useful (I may migrate this to the wiki later).

Set-up

Most shadow techniques only require a simple depth in the shadow texture; others need something more, such as the squared depth in VSM for example. Let’s assume for the moment that you’re using a simple 1-channel floating point shadow texture, set up something like this (simple 1-shadow texture set-up):

`Pretty much everyone wants to use texture shadows in their real-time scenes these days; since they are calculated entirely on the GPU they scale well with modern chipsets, they are capable of shadowing alpha-rejected materials correctly (both as casters and receivers), they can be extended relatively simply to have soft edges, a variable penumbra and opacity with distance, and all kinds of other nice features.

Depth-shadowmapping is the approach whereby you render the light-space depth (or some derivative thereof) of the shadow caster into a (typically floating point) shadow texture, then when rendering the main scene perform comparisons of the light-space depth of the pixel being rendered versus what is stored in that shadow texture. All pretty straight forward, and OGRE comes already set up with mechanisms to allow you to do that, in quite a varied number of configurations. However, when people write their own shaders, I’ve often found that they come across a problem with the depth range that they store and access, particularly in Dx9, and don’t know why. I’ve seen clients come across this, and I thought a general post might be useful (I may migrate this to the wiki later).

Set-up

Most shadow techniques only require a simple depth in the shadow texture; others need something more, such as the squared depth in VSM for example. Let’s assume for the moment that you’re using a simple 1-channel floating point shadow texture, set up something like this (simple 1-shadow texture set-up):

`

So, your ‘DepthShadowCaster’ global material is where you render objects from your lights perspective into your floating point shadow texture (you can of course associate per-technique shadow caster alternates on individual materials if you want, such as coping with transparency, but we’ll skip that for now). Let’s look at a simple example (Cg), which most online articles will tend to reflect:

void caster_vp(
float4 position : POSITION,
out float4 outPos   : POSITION,
out float  outDepth : TEXCOORD0,
uniform float4x4 worldViewProj
)
{
    outPos = mul( worldViewProj, position );
    outDepth = outPos.z;
}
float4 caster_fp(
float depth  : TEXCOORD0) : COLOR
{
    return depth.xxxx;
}

Simple, eh? A super-simple set of receiver shaders might look like this:

void receiver_vp(
    float4 position : POSITION,
    out float4 outPos   : POSITION,
    out float4 shadowUV : TEXCOORD0,
    uniform float4x4 world,
    uniform float4x4 worldViewProj,
    uniform float4x4 texViewProj
)
{
    float4 worldPos = mul(world, position);
    shadowUV = mul(texViewProj, worldPos);
    outPos = mul(worldViewProj, position);
}
float4 receiver_fp(
    float4 shadowUV : TEXCOORD0,
    uniform sampler2D shadowTex : register(s0),
    uniform float4 sceneRange

) : COLOR { shadowUV.xy = shadowUV.xy / shadowUV.w; float shadowDepth = tex2D(shadowTex, shadowUV.xy); if (shadowDepth < shadowUV.z) { return float4(0,0,0,1); } else { return float4(1, 1, 1, 1); } }

That just returns white if the object is unshadowed and black if it is shadowed - not exactly pretty but it proves the capability.

So Why Doesn’t It Work?

That’s a bit of a generalisation, but in many cases people will find this just doesn’t work as they’d expect - either no shadows, or large blocks of shadow where there should be none. There are several reasons why this can be the case, but the most common problem I’ve seen is to do with DirectX 9 and the clear colour of the shadow texture viewport.

You see, the problem is that DirectX 9 can only clear a viewport to a 32-bit number. When clearing a floating-point surface, it has to map this simple integer range onto a floating point range, and it effectively does it by dividing each channel clear colour by 255. This means it can’t clear floating point textures to any number higher than 1.0! When clearing the frame buffer for a shadow texture that stores depths, you need to initialise it to the highest depth value possible so that rendered objects will update it to be ‘closer’. If you’re storing raw unscaled depth, that value needs to be the light’s attenuation range or some other far scene distance. You simply can’t do that in Dx9, so what you find is that your texture contains all 1.0’s in initialised areas. You might think this isn’t so bad, since provided at least one thing is rendered at any particular point, the floating point buffer will be right. That’s true, except that if you have any single-sided geometry (terrain or a ground plane), and you use the default ‘render back faces to shadow texture’ option (highly recommended to make biasing much simpler), you can have significant problems.

Note that the 1.0 clear limit does not exist on GL or Dx10 - on those render systems you can set any floating point values in your Ogre::Colour as the clear colour and they will be respected.

So, how to deal with the 1.0 clear limit in Dx9, and write your shadowing system in a portable way? There are a number of approaches.

1. Store Clip Space Depth

The one that lots of people use without really realising why it works is to store the depth in the shadow texture as a homogenous clip space value - ie divide the ‘z’ value by ‘w’ in the fragment program. I know I did this for a while based on examples without particularly asking why. It looks like this:

void caster_vp(
float4 position : POSITION,
out float4 outPos   : POSITION,
out float2  outDepth : TEXCOORD0,
uniform float4x4 worldViewProj
)
{
    outPos = mul( worldViewProj, position );
    outDepth = outPos.zw;
}
float4 caster_fp(
float2 depth  : TEXCOORD0) : COLOR
{
    return depth.x / depth.y;
}

Obviously you need to do the same thing in the receiver program. This makes sure that no depth value exceeds 1.0f and eliminates the problem. However, there’s a downside to this technique, in that the depth is non-linear and therefore not very friendly for doing other kinds of calculations, such as variable penumbra widths (PCSS), depth-based fading and even simple biasing calculations. It’s not really a problem for small scenes, but it becomes more difficult as the ranges increase. So, if you need a robust solution for a large scene, this probably isn’t it.

2. Store Custom Scaled Depth

This is the technique I normally use. You still rescale the depth values so that they don’t exceed 1.0f, but you do it using a fixed, known divisor rather than a per-pixel one. This could be a single global value that you know no  shadows will ever be cast beyond (since we’re dealing with floating point values, only the relative scale matters here), or the individual light attenuation range of the associated light. The latter is bindable in OGRE using the ‘light_attenuation’ auto-parameter; however do note that this auto-param is only available to shadow caster shaders in the current OGRE trunk (Cthugha, 1.8 in waiting), since previous versions skipped all light autos during shadow caster shader binding, since you never perform actual lighting there - this optimisation prevented per-light attenuation values for scaling being passed through here too. So in 1.6 or earlier, use a fixed scale value which is the largest of your light attenuation ranges.

Since you know this divisor ahead of time, you can use it to rescale any other parameters you need to perform calculations in the shader in the same space, like PCSS or depth fading, and know that you’ll always be dealing with a linear, predictable calculation.

3. Render Front-Facing Polygons Into Shadow Texture

This tends to resolve the problems in most cases, since single-sided geometry renders into the shadow texture the same way as the main camera sees it and so overwrites most of the badly cleared areas (although not necessarily all, for example if you have casting disabled on some materials / objects), but the major downside is that you instantly have far more biasing problems. Not recommended.

4. Clear The Viewport With A Custom Quad

Finally, instead of a clear operation you could clear the buffers with a rendered quad, before any other geometry is rendered into the shadow texture. This effectively becomes your clear operation (so you should disable viewport clearing). The plane can be built quite easily with a 2D ManualObject - you just want to set the ‘z’ value to the highest possible far distance that will still be within the range (so, the light attenuation range minus a small delta value). You should place this ManualObject in the RENDER_QUEUE_BACKGROUND queue to make it render first, and then use a SceneManager::Listener to configure it - shadowTextureCasterPreViewProj will give you the light information you need to set up the geometry per light, and shadowTexturesUpdated will tell you when the shadow textures are done so you can hide the ManualObject again so it’s not rendered in your main passes.

Your material for this quad should have depth checking disabled and depth writing enabled, and use the regular shadow caster shader. You should probably also disable backface culling for it, so that you don’t have to think about facing (which may well be inverted for the shadow render to further mess with your head).

This method is a little more fiddly, but has the advantage that you can deal with ‘raw’ depth information everywhere, with no rescaling required. This is handy for avoiding a few arithmetic instructions in your pixel shader. However, it does mean you’re executing your caster pixel shader for every texel on the shadow texture in addition to rendering casters - probably not a big deal since the shader is simple, but if your application is pixel-shader limited already, this may be more costly than the extra arithmetic instructions in the caster/receiver shaders using option 2 instead; you’d have to benchmark.

Conclusion

Options 2 and 4 are the paths I generally recommend - both of them should make your shadows behave predictably in all rendersystems as well as allowing you to use depth values more intuitively. Option 4 is particularly attractive if you want to do real-world unit calculations without incurring per-pixel scaling costs. There’s some inaccurate information on the web that says you must output values in the range [0,1] from a shadow shader, hence the homogenous or fixed-range scaling that many sites recommend, but this is completely untrue assuming you’re using Shader Model 2 or above - which you really must be to be considering depth shadowmapping anyway. You can happily output full floating point range in SM2, and indeed must be able to for HDR to work of course. The reason it doesn’t generally work immediately when you try is the clear colour in Dx9.

I hope that this post helps someone!