SpatialGraph, SceneTree, RenderQueue - sound familiar?

Development, OGRE 7 Comments

I was quite gratified to read this post on Wolfgang Engel’s blog, in which he refers to some other posts discussing the recommended categorisation & nomenclature for the various stages / structures of scene rendering. If you read it and you’re an OGRE user, you’ll find them all rather familiar concepts, because OGRE has been based around these principles for years :)

“SpatialGraph: used for finding out what is visible and should be drawn. Should make culling fast”

That’s our SceneManager and it’s subclasses. One of the core tenets of my design for OGRE from day 1 was that the mechanism for performing fast culling should be customisable based on the scene type independent of the scene content, and therefore be able to be divorced from, but derived from, the primary scene structure (the SceneTree, see below). While other engines either tended to hardcode their culling strategy, or retrofit it into the ‘main’ scene graph as custom node types (which tends to encourage the ‘parallel inheritance hierarchy’ anti-pattern, because you mostly specialise nodes for other reasons too, and also makes macro-level decisions difficult), specialised SceneManagers build separate and derived culling structures which are entirely theirs to own.

“SceneTree: used for hierarchical animations, e.g. skeletal animation or a sword held in a character’s hand”

That’s easy - that’s our Node and all it’s subclasses like Bone and SceneNode. It’s rare for an engine not to have this concept, but too many engines make these nodes do too much IMO - like holding material state, being derived for custom culling routines, etc. I’ve always liked the idea of keeping the focus of a class tight (the principle of cohesion), it tends to work better long-term.

“RenderQueue: is filled by the SpatialGraph. Renders visible stuff fast. It sorts sub arrays per key, each key holding data such as depth, shaderID etc.”

Yep, we have exactly this structure and unsurprisingly it’s called RenderQueue :) We have the concept of queue groups for user-customisable ‘fire breaks’ between rendered objects, and the ability to sort by pass, by distance, or via a custom RenderQueueInvocationSequence.

It’s nice to read posts like this - makes me think we’ve been doing things the right way :)

Depth shadow mapping Dx9 depth-range gotchas

Development, OGRE 4 Comments

Pretty much everyone wants to use texture shadows in their real-time scenes these days; since they are calculated entirely on the GPU they scale well with modern chipsets, they are capable of shadowing alpha-rejected materials correctly (both as casters and receivers), they can be extended relatively simply to have soft edges, a variable penumbra and opacity with distance, and all kinds of other nice features.

Depth-shadowmapping is the approach whereby you render the light-space depth (or some derivative thereof) of the shadow caster into a (typically floating point) shadow texture, then when rendering the main scene perform comparisons of the light-space depth of the pixel being rendered versus what is stored in that shadow texture. All pretty straight forward, and OGRE comes already set up with mechanisms to allow you to do that, in quite a varied number of configurations. However, when people write their own shaders, I’ve often found that they come across a problem with the depth range that they store and access, particularly in Dx9, and don’t know why. I’ve seen clients come across this, and I thought a general post might be useful (I may migrate this to the wiki later).

Set-up

Most shadow techniques only require a simple depth in the shadow texture; others need something more, such as the squared depth in VSM for example. Let’s assume for the moment that you’re using a simple 1-channel floating point shadow texture, set up something like this (simple 1-shadow texture set-up):

mSceneMgr->setShadowTextureCount(1);
mSceneMgr->setShadowTextureConfig(0, 1024, 1024, PF_FLOAT32_R);
mSceneMgr->setShadowTextureSelfShadow(true);
mSceneMgr->setShadowTextureCasterMaterial("DepthShadowCaster");

So, your ‘DepthShadowCaster’ global material is where you render objects from your lights perspective into your floating point shadow texture (you can of course associate per-technique shadow caster alternates on individual materials if you want, such as coping with transparency, but we’ll skip that for now). Let’s look at a simple example (Cg), which most online articles will tend to reflect:

void caster_vp(
float4 position : POSITION,
out float4 outPos   : POSITION,
out float  outDepth : TEXCOORD0,
uniform float4x4 worldViewProj
)
{
    outPos = mul( worldViewProj, position );
    outDepth = outPos.z;
}
float4 caster_fp(
float depth  : TEXCOORD0) : COLOR
{
    return depth.xxxx;
}

Simple, eh? A super-simple set of receiver shaders might look like this:

void receiver_vp(
    float4 position : POSITION,
    out float4 outPos   : POSITION,
    out float4 shadowUV : TEXCOORD0,
    uniform float4x4 world,
    uniform float4x4 worldViewProj,
    uniform float4x4 texViewProj
)
{
    float4 worldPos = mul(world, position);
    shadowUV = mul(texViewProj, worldPos);
    outPos = mul(worldViewProj, position);
}
float4 receiver_fp(
	float4 shadowUV : TEXCOORD0,
	uniform sampler2D shadowTex : register(s0),
	uniform float4 sceneRange

) : COLOR
{
    shadowUV.xy = shadowUV.xy / shadowUV.w;
    float shadowDepth = tex2D(shadowTex, shadowUV.xy);
    if (shadowDepth < shadowUV.z)
    {
        return float4(0,0,0,1);
    }
    else
    {
        return float4(1, 1, 1, 1);
    }
}

That just returns white if the object is unshadowed and black if it is shadowed - not exactly pretty but it proves the capability.

So Why Doesn’t It Work?

That’s a bit of a generalisation, but in many cases people will find this just doesn’t work as they’d expect - either no shadows, or large blocks of shadow where there should be none. There are several reasons why this can be the case, but the most common problem I’ve seen is to do with DirectX 9 and the clear colour of the shadow texture viewport.

You see, the problem is that DirectX 9 can only clear a viewport to a 32-bit number. When clearing a floating-point surface, it has to map this simple integer range onto a floating point range, and it effectively does it by dividing each channel clear colour by 255. This means it can’t clear floating point textures to any number higher than 1.0! When clearing the frame buffer for a shadow texture that stores depths, you need to initialise it to the highest depth value possible so that rendered objects will update it to be ‘closer’. If you’re storing raw unscaled depth, that value needs to be the light’s attenuation range or some other far scene distance. You simply can’t do that in Dx9, so what you find is that your texture contains all 1.0’s in initialised areas. You might think this isn’t so bad, since provided at least one thing is rendered at any particular point, the floating point buffer will be right. That’s true, except that if you have any single-sided geometry (terrain or a ground plane), and you use the default ‘render back faces to shadow texture’ option (highly recommended to make biasing much simpler), you can have significant problems.

Note that the 1.0 clear limit does not exist on GL or Dx10 - on those render systems you can set any floating point values in your Ogre::Colour as the clear colour and they will be respected.

So, how to deal with the 1.0 clear limit in Dx9, and write your shadowing system in a portable way? There are a number of approaches.

1. Store Clip Space Depth

The one that lots of people use without really realising why it works is to store the depth in the shadow texture as a homogenous clip space value - ie divide the ‘z’ value by ‘w’ in the fragment program. I know I did this for a while based on examples without particularly asking why. It looks like this:

void caster_vp(
float4 position : POSITION,
out float4 outPos   : POSITION,
out float2  outDepth : TEXCOORD0,
uniform float4x4 worldViewProj
)
{
    outPos = mul( worldViewProj, position );
    outDepth = outPos.zw;
}
float4 caster_fp(
float2 depth  : TEXCOORD0) : COLOR
{
    return depth.x / depth.y;
}

Obviously you need to do the same thing in the receiver program. This makes sure that no depth value exceeds 1.0f and eliminates the problem. However, there’s a downside to this technique, in that the depth is non-linear and therefore not very friendly for doing other kinds of calculations, such as variable penumbra widths (PCSS), depth-based fading and even simple biasing calculations. It’s not really a problem for small scenes, but it becomes more difficult as the ranges increase. So, if you need a robust solution for a large scene, this probably isn’t it.

2. Store Custom Scaled Depth

This is the technique I normally use. You still rescale the depth values so that they don’t exceed 1.0f, but you do it using a fixed, known divisor rather than a per-pixel one. This could be a single global value that you know no  shadows will ever be cast beyond (since we’re dealing with floating point values, only the relative scale matters here), or the individual light attenuation range of the associated light. The latter is bindable in OGRE using the ‘light_attenuation’ auto-parameter; however do note that this auto-param is only available to shadow caster shaders in the current OGRE trunk (Cthugha, 1.8 in waiting), since previous versions skipped all light autos during shadow caster shader binding, since you never perform actual lighting there - this optimisation prevented per-light attenuation values for scaling being passed through here too. So in 1.6 or earlier, use a fixed scale value which is the largest of your light attenuation ranges.

Since you know this divisor ahead of time, you can use it to rescale any other parameters you need to perform calculations in the shader in the same space, like PCSS or depth fading, and know that you’ll always be dealing with a linear, predictable calculation.

3. Render Front-Facing Polygons Into Shadow Texture

This tends to resolve the problems in most cases, since single-sided geometry renders into the shadow texture the same way as the main camera sees it and so overwrites most of the badly cleared areas (although not necessarily all, for example if you have casting disabled on some materials / objects), but the major downside is that you instantly have far more biasing problems. Not recommended.

4. Clear The Viewport With A Custom Quad

Finally, instead of a clear operation you could clear the buffers with a rendered quad, before any other geometry is rendered into the shadow texture. This effectively becomes your clear operation (so you should disable viewport clearing). The plane can be built quite easily with a 2D ManualObject - you just want to set the ‘z’ value to the highest possible far distance that will still be within the range (so, the light attenuation range minus a small delta value). You should place this ManualObject in the RENDER_QUEUE_BACKGROUND queue to make it render first, and then use a SceneManager::Listener to configure it - shadowTextureCasterPreViewProj will give you the light information you need to set up the geometry per light, and shadowTexturesUpdated will tell you when the shadow textures are done so you can hide the ManualObject again so it’s not rendered in your main passes.

Your material for this quad should have depth checking disabled and depth writing enabled, and use the regular shadow caster shader. You should probably also disable backface culling for it, so that you don’t have to think about facing (which may well be inverted for the shadow render to further mess with your head).

This method is a little more fiddly, but has the advantage that you can deal with ‘raw’ depth information everywhere, with no rescaling required. This is handy for avoiding a few arithmetic instructions in your pixel shader. However, it does mean you’re executing your caster pixel shader for every texel on the shadow texture in addition to rendering casters - probably not a big deal since the shader is simple, but if your application is pixel-shader limited already, this may be more costly than the extra arithmetic instructions in the caster/receiver shaders using option 2 instead; you’d have to benchmark.

Conclusion

Options 2 and 4 are the paths I generally recommend - both of them should make your shadows behave predictably in all rendersystems as well as allowing you to use depth values more intuitively. Option 4 is particularly attractive if you want to do real-world unit calculations without incurring per-pixel scaling costs. There’s some inaccurate information on the web that says you must output values in the range [0,1] from a shadow shader, hence the homogenous or fixed-range scaling that many sites recommend, but this is completely untrue assuming you’re using Shader Model 2 or above - which you really must be to be considering depth shadowmapping anyway. You can happily output full floating point range in SM2, and indeed must be able to for HDR to work of course. The reason it doesn’t generally work immediately when you try is the clear colour in Dx9.

I hope that this post helps someone!

Wiimote head-tracking Ogre game

OGRE 4 Comments

I posted about Johnny Lee’s Wiimote head-tracking demos early this year, and everyone said how much they’d love to play a game that included that kind of control system. Well, students at Qantm College in Brisbane, Australia have done it with Ogre. “State of Rage” is an on-rails shooter with multiple Wiimotes to perform head tracking and gun aiming. This video is from a non-final version before they added ragdolls and a few other features. Still, I think the result is very cool, particularly the way the head tracking works:

Qantm are regular users of Ogre for their projects - they’ve been using it for the last 5 years now, and have an uncanny habit of winning the Indie Awards at Game Connect Asia Pacific, 4 times out of the last 5 years in fact. We’re rather proud of them :) State of Rage wasn’t actually the winner - that accolade went to one of the other projects, “Debug”. You can read more about them in the Ogre forum.

This would never happen on my watch

Business, OGRE, Open Source 6 Comments

I read with some interest Matt Asay’s blog on TWiki, and what has happened over there as the company associated with the open-source project has basically decided to ‘reorganise’ everything, it appears in order to make itself more attractive to venture capitalists.

To be honest, I really don’t understand the motivation at all. All open source projects live or die by the strength of their community, and to suddenly break from it in the interests of attracting investment is crazy. Personally, I’ve never wanted to take VC money if I can help it - I’d prefer to run a small, self-funded and organically growing ship that I can stay in control of, and which I can apply my own brand of ethics to in balance with the need to make a living. Balancing open source and commercial necessity (we all have to eat after all) is tough, and it’s very different to running a regular proprietary software business, so you really can’t apply the same rules without undermining the very basis of the business.

Unfortunately open source poster-child examples like Ubuntu don’t help in many ways. Ubuntu manages to do everything the ‘right’ open source way while still having gazillions of dollars to spend on premises, staff, servers etc - but that’s only because it’s backed by an interested billionnaire who doesn’t really care how long it takes to turn a profit (and probably wouldn’t be too fussed if it never did). So in some ways Ubuntu makes it hard for others because it’s often held up as an example of how things should be done, when in fact almost no-one else can afford to do it that way, unless they can find a billionnaire of their own. Perhaps that leads to cases like TWiki for some projects, where ‘Ubuntu envy’ leads them to chase investment, but at the detriment of the reason they exist in the first place.

All I can say is that this will never happen to Ogre while I’m in charge. I obviously have to seek commercial opportunities related to Ogre, but I have a very deep line in the sand drawn many moons ago that I will never cross. At times people have asked me ‘what would happen if Ogre got acquired?’ - and I have to patiently explain to them that even if I wanted that to happen (and I don’t), it’s actually not possible in a traditional sense, since my company doesn’t own all the code. It owns a lot of it for sure, but the rest is community-contributed and licensed by TKS based on the contributor agreements - which in our case don’t ask for copyright assignment, just permission to use & relicense. This means that no-one could come along and ‘buy it up’, or at least not in the traditional sense. They could buy the domain from me I suppose, and the rights that TKS has, but could not fundamentally change the licensing conditions without approaching the contributors for permission. It’s commercially resticting, but I also see it as a key factor in reassuring the community about the intentions of my company.

When it comes down to it, in many ways having this restriction there is unnecessary - even if I was able to ’sell’ Ogre, or suddenly change the licensing, I would be stupid to do it, because it would immediately destroy the community, which is what has made it great. Someone would fork a new project from it, and with a bit of time, that would become the ’standard’ version. There might be some opportunity to ‘milk’ the codebase with custom commercial versions for a little while, but it wouldn’t last. The whole idea is self-defeating in the medium to long-term, as TWiki.net will probably discover shortly.

I’ve talked about business models and open source before, and that it can be necessary for companies like mine to mix in some proprietary aspects sometimes (e.g. optional add-ons) to make ends meet. However, maintaining the absolute integrity of the central open source project and its community at all times is absolutely vital. That’s the heart of it, and any business destabilises that at its absolute peril.

New OgreSpeedTree media up

Business, OGRE 11 Comments

A few people asked for an OgreSpeedTree video with more varied scenes, and I’ve now uploaded one to the OgreSpeedTree section of the Torus Knot site. Just scroll down below the screenshots if you want to view the video.

I have a higher resolution & better quality version (this one is H.264 at 1Kb/s) but I’ve kept this one small for now to keep my bandwidth under control. Places like Vimeo don’t allow commercial advertising, and while before I could get away with claiming it was just in-development test output shared with enthusiasts only, this is really an advertisement video so I’m hosting it myself. I have enough bandwidth to spare unless something really goes bonkers (I think) - in case it does, does anyone know of any reasonably priced business media hosts (UK only), should I need something more than just upping my bandwidth allowance? I’ve seen a few dedicated streaming media hosts around but don’t have a view on how good they are.

OgreSpeedTree 1.0 entered RC1 a week ago and I haven’t had any reports of any issues, so I’m pretty much ready to stick a fork in it & declare it done for now. I’ve been improving OgreSpeedGrass this week, such as making the grass paging re-entrant so that new cells can be filled gradually to spread the buffer update overhead over many frames, that seems to have helped in busier scenes. That just needs a couple more utility functions for loading in grass distributions from tools, then that will be done too. Then, it’ll be time to get the marketing wheels moving…

8000 trees and 2.5M blades of grass? No problem.

Business, Development, OGRE 12 Comments

I’ve been crazily busy lately trying to get OgreSpeedTree to a fit state for a 1.0 release alongside other projects (such as Ogre of course), so I can really start promoting it. Being the kind of person I am, I find it hard to stop tinkering and perfecting and I can’t let something go out the door without being totally happy with it. The screenshots and videos so far have been good I think, but I’ve been polishing away and making it all just that bit better, and one element of that has been some additional optimisation.

Thanks to some improved batching, OgreSpeedTree is now running even faster than before, and most importantly it scales to larger forests even better than before too. Here’s a short video where I tested adding over 8,000 trees (from 5 different models, and each with different rotations / scales) to the scene, together with over 2.5 million blades of grass, each of which can  be placed individually (I procedurally generated the distribution, but it could be done manually). Actually some of those tree models have multiple trees in them (the very close ‘clusters’ of 3/4 are actually one model), so in reality there are actually 12,000+ trees as far as the viewer is concerned.

Note that all the trees here are dynamically lit including normal mapping, and dynamic shadows are being cast through 3 shadow textures (PSSM). The LOD transitions are extremely hard to spot IMO too.

On my 9800 GX2 with a 2.66 dual-core, it runs consistently over 60fps (actually about 75fps most of the time). This is with a quite dense  clustering of the trees too; If you spread the trees out a bit more you can easily double that. The LOD settings are quite high too; reign those in and your lower class cards should be able to easily handle this, and of course you have the option of dropping or scaling back the dynamic shadows if you’re pushed.

I’m happy :) Not that I’ve quite finished of course, I have a couple of things still to polish for 1.0, but it shouldn’t be long now.

OgreSpeedGrass

Business, OGRE 12 Comments

Next in the line of OgreSpeed* products, here’s a shot of OgreSpeedGrass.

It’s based on IDV’s SpeedGrass but I’ve rewritten a fair amount to make it work conveniently with Ogre, and also improved it somewhat - such as better wind effects and the completely dynamic lighting and shadowing you see there, which I think looks rather nice.

OgreSpeedGrass will be bundled with a yearly support agreement for OgreSpeedTree, in the same way that the original SpeedGrass is licensed. I’m not looking for any additional beta testers right now, but there will be an official 1.0 release of both these libraries by the end of the month; if you’d like to be notified when that happens, please email enquiries at torusknot dot com.

More shots are available in the Ogre gallery.

Edit: and here’s a video, for those who asked:

OGRE 2008 User Survey Results!

OGRE 4 Comments

I’ve just released a report summarising the results of the OGRE 2008 User Survey. Thanks to everyone who participated, we did in fact break the 1,000 responses mark which was my goal when I decided to  run the survey, I think that’s a statistically respectable number to draw conclusions from.

I intend to give copies of this report to hardware and software companies I need to blag a bit of assistance from, so I think this will do the job. I’m also glad that some of the impressions I got from my work over the last few years were reinforced, such as that there are quite a lot of companies using OGRE for applications other than games.

I’ll write up a forum posting on it when I get a chance (probably tomorrow), but I don’t venture into the forum unless I have at least an hour or two free otherwise I miss some updated threads when I have to rush off! Blog readers get an early peek.

Phew…

OGRE 7 Comments

Today has been totally bonkers, but I finally got at least a large part of the Ogre 1.6.0 RC1 release done. I finished all the straggling documentation updates, the source releases are up and the prebuilt SDK for VC8 is there too. I have to do the VC7.1 SDK, the Mac OS X SDK and perhaps the VC9 SDK too (since I have a build of that locally now) yet. Florian was having a few odd linker problems with MinGW which didn’t occur on Linux or OS X so that one might take a while longer to resolve, perhaps until RC2.

It’s worth all the effort though, the 1.6 Changelog is positively bursting with goodness, and I’m pretty sure I probably missed some less headline things in there anyway. 1.6 is definitely a worthy release.

As you can see, I even found time to make a new release logo :)

It’s going on midnight here now though so I’m going to finish up and have a cup of tea before going to bed. Perhaps there might be rest for the wicked after all! ;)

Mixing Open Source & Business - my take

Business, Open Source, Personal 17 Comments

Bruce Byfield wrote an interesting article (discovered via Matt ‘Alfresco’ Asay’s blog, which should be required reading for anyone in this field) about the sometimes unsteady alliance between open source and business that, on the whole, I agreed with - within a given context. I do think, however, that his context was weighted towards the larger players in market that are fusing open source with business opportunities though, and wanted to share some of my experiences and conclusions from the perspective of a more individual player in the business.

Apologies for the length of this article, I had a lot to say :)

Read the rest of this entry »