As much as I love using OS X, one of the double-edged swords is that the graphics driver updates are controlled by Apple. On the one hand, that’s a bonus because you have a better idea of what you’re dealing with out in the wild, and people get prompted to update their drivers (as part of the regular OS X auto-update). On the other hand, it’s a pain in the ass because the drivers tend to lag behind those from the GPU manufacturers and therefore have bugs the mainstream ones don’t.
I just recently committed a patch from ‘hellcatv’, one of the more prolific Mac users in our community to deal with a few driver bugs in some of the older Powerbooks, and also some quirks of the recent Intel GMA-based iMacs – stuff like choking on glCompressedTexSubImage2DARB for no good reason (ie, forget uploading part of a DXT-compressed texture, it’s all or nothing). I’m indebted to him for testing on a huge range of Macs that I’d never have access to, without spending a load of cash and filling up my home office with yet more surplus hardware (my wife would not entirely approve of either methinks). One of the remaining problems we’ve had is that the OS X GLSL drivers on the recent NVIDIA-based MacBook Pros suffered from vertex attribute aliasing-associated performance problems that other platforms did not.
Now, NVIDIA has always had a fixed set of vertex attribute assignments for the built-ins – gl_Vertex is 0, gl_Normal is 2, etc. If you used gl_Normal in a shader, but also bound a custom attribute (say, skeletal blend weights) to index 2 too, you’d get a performance drop because of the aliasing. That’s fine – so instead, when we used custom attributes, we didn’t fix the indexes we used, we let the linker decide, taking into account what was actually used in the shader. We’d include the attributes in the shader, and then after glLinkProgramARB, we’d ask the program object what indexes it had chosen for the custom attributes, then wire them up that way. The well-behaved drivers (Windows, Linux) on NVIDIA would avoid clashing with any built-in attributes that had been referenced in the shader and we’d have a nice tight list of unique indexes, but on OS X, the driver would stupidly often assign custom attributes to built-in indexes that were in fact being used in the shader. Tsk, bad driver, no treats for you today.
It’s been reported to Apple as a bug, but so far, no dice on the fix front, so I decided it was time to try to work around it. The first thing I tried was simply telling the driver at the pre-link stage that I wanted any occurrences of the custom vertex attributes we supported to be placed out of the way of any possible built-ins that might be used. So for example, I did this:
glBindAttribLocationARB(mGLHandle, 6, "blendWeights");
glBindAttribLocationARB(mGLHandle, 7, "blendIndices");
That seemed to take effect, and calling glGetAttribLocationARB after the link reflected that I was indeed getting indexes 6/7 bound, rather than the 1/2 that the driver kept picking before (bad, because I used gl_Normal in this shader which is index 2). However, despite the indexes being out of the way of anything else, the shader still performed really poorly. I tried a few other indexes, like 14 and 15 which overlap with the top 2 UVs but which are rarely used (you can’t exceed 15, at least on NVIDIA), but the result was the same.
Cue head-scratching. There should be no aliasing problems anymore, yet still the shader performs like an asthmatic ant carrying some heavy shopping. So, the last thing I tried was going the whole hog, and implementing support for custom attribute replacements for all of the built-ins, all at known, fixed indexes matching currently known hardware defaults & limitations, ie:
And what do you know, that works. The skinning shader runs considerably better like that – still not great, I think the Apple GLSL implementation is not that good, but at least 2-3 times faster than it did before. It kinda sucks to have had to do it that way, I really liked being able to leave it up to the driver to organise the attribute bindings and only using the ones I needed because that’s more in the spirit of the GLSL way, but clearly being more rigid is the more reliable way. I know that I could have packed the tangent in attribute 6 instead to save a UV entry, but the use of 6 seemed to have some performance issues still so I’ve gone with the fixed bindings I would have used with ARB programs. It’s incredibly rare to need more than 5 UVs going into a vertex program anyway in my experience.
So, the advice appears to be that if you need a custom binding in GLSL and you want it to run well on a Mac, using all custom attribute bindings in the vertex shader and fixing the indexes seems to be the way to go.