Well, I finished the first pass of the HDR compositor - I wrote it in Cg to begin with, but in the end I converted it to HLSL and GLSL because I found a very annoying trait of the Cg compiler is to burn up temporary registers way too fast. For example, when performing the ‘bloom’ passes, I switched to a 15-sample gaussian blur, and found that the Cg compiler decided to try to offset all the samples into temporary registers first, before doing all the texture lookups. This is no good, because there are only a guaranteed 12 temporary registers in ps_2_0, so this blows the limit really quickly. When compiling the exact same code in HLSL, the uv offset calculations and texture loads are interleaved, therefore not using as many temporary registers, and working. I also had lots of issues with Cg generating sequences which blew the dependent texture read limit, just because I used a swizzle later on after the main loop, even when using the latest stable 1.4.1 version.
Generally it yanked my chain enough that I decided that it wasn’t faster at all to write it once and use it on both rendersystems, so I’m now using the native high-level languages in both. It was my first real play with GLSL, and I had a couple of niggles:
You have to use a separate file for every shader, no matter how trivial
You can’t initialise array variables on declaration
However, everything else about GLSL is nice, I particularly lik the fact that the linkage between vertex and fragment programs is not via fixed attribute bindings, but via a linker which hooks up external variables. That’ll fit well with the next generation of cards.
I like to test my programs for correctness outside the runtime environment, so I used this nice tool from 3DLabs to validate my GLSL before plugging it into OGRE. I still have a really bad lock-up when using GL, but I was getting that with the Cg version too, not sure what’s going on there. That’s tonight’s job, along with coming up with a better demo scene for the compositor demo.