And lo, it came to pass that several users of the green-headed abomination known as Ogre did throw up their hands and bemoan the most uncanny and unjust chasm that existed between the lands of D3D and OpenGL. For some of these poor souls, their torrid creations did speed – nay, rocket – on the one, whilst something of a laggard they became on the other. Much disquiet there was among the gathered hordes, and it did trouble those whose craft it was to deduce the matter’s origin. Toil day and night they did, across continents too vast for the eye to encompass – searching; searching.
In far-flung Asia did the first clue arise, and where the sun rose, so did the foul spectre of the cause – glMapBuffer.
Damn, I can’t keep that up any more. Anyway – the fact of the matter was that GL was underperforming in some demos and we didn’t know why. GL is typically a pain in the ass to performance test compared to D3D, since there are just so fewer decent tools, at least for free. geDEBugger is one option of course which I may consider in the future. But anyway, the cause as originally discovered by genva was glMapBuffer – as a GL function, it sucks.
We’ve used hardware buffers for years, and were there when VBOs were brand new in the spec, and lived through all kinds of really crappy driver issues, some of which were performance related. One of these was about the only serious nVidia GL driver bug I think I’ve encountered – the same cannot of course be said for ATI’s GL drivers. Anyway, moving on … glMapBuffer is equivalent in purpose to D3D’s buffer locking methods, and that’s why we used it. The options are a little less flexible than D3D’s though, and specifying usage and locking modes is considerably less refined. However, we found a way that was good enough, at least we thought.
It turns out though that GL performs far more syncing when using glMapBuffer than you would expect. Even if we tell it that we don’t want to read the buffer back, and that we want to discard the whole buffer contents so a stall should not be necessary, it still seems to sync up something somewhere. To the extent that just using glBufferSubData instead of performing a write-lock (ie writing your buffer data to a temporary main memory area and uploading it), makes a huge performance difference where dynamic buffers are being used. We’d only used glBufferSubData when bulk-uploading fro main memory, but it’s actually better to use it all the time. Grr.
What I did was generalise things so that we now use a pool of ‘scratch’ memory when locking GL buffers now. Provided there’s memory available (and there almost always will be since it’s only used for active locks), locked buffers now grab a chunk of memory from this pool to use instead of glMapBuffer, and at unlock if the mode was a writable one, it uploads the data again in bulk. Personally I think that GL should be able to figure this out from the usage and lock modes just like D3D can, but clearly it can’t. No amount of fiddling with these modes and trying to follow suggested GL lock strategies helped, meaning that glMapBuffer is a pretty damn useless piece of API. The result of using glBufferSubData instead is up to a 50% improvement on demos using dynamic buffers, bringing the performance up to D3D level again.
I love the design of the GL API (most of the time – some of those extensions are pretty crazy) but jeez, there needs to be better documentation about this kind of thing. The D3D documentation at least tells you what strategies are most efficient in what situations and why (most of the time). GL’s extension specs are probably the most annoyingly written of any documentation you can come across – I can only imagine sometime someone said "Hey, I know – let’s make our advanced API references read like the minutes of a meeting! How awesome will that be!". And for some reason no-one shot him on the spot. Missed opportunity. A collection of incremental and narrow-focussed technical discussion docs is woefully inadequate when it comes to giving you a holistic overview of an API, including what’s good and what’s not, leading to an awful lot of really annoying trial and error iteration. I hope the Khronos group sorts this out, the GL docs are still a hell of a tangled mess and one of the few things stopping GL being a serious challenge to D3D again except driver quality – nVidia being one of the few that seem to consistently get it right. GL’s lovely to learn the basics, but get beyond that and it can be such a pain in the arse sometimes.