Well, I had a bit of a rant about glMapBuffer yesterday on the blog, and I was lucky enough that an acquaintance at nVidia read it and sent me some tips from one of their GL driver gurus. Splendid
Having pored over that, I found that I’d tried most of what he mentioned but he made specific reference to the size of the updates being significant. This I hadn’t really experimented with, so I decided to revisit the whole situation again today (apologies to my wife Marie again here, I know it’s the weekend but this really can’t wait
).
And you know, size really does matter.
I played with a lot of different scenarios and came to the conclusion that when locking and updating less than about 32k of buffer data (on my 6800), glBufferSubData trumps glMapBuffer by a significant margin. The closer you get to that threshold though, the less difference it makes, and when you start getting well above that, glMapBuffer starts to pull away again. With very, very large updates, glMapBuffer wins outright (provided you discard the current contents of the buffer – in GL this means calling glBufferData with a null pointer). So, I’ve changed my implementation now to use chunks of a scratch buffer and glBufferSubData when the locked area is smaller than 32k, and glMapBuffer otherwise. The result is that all of the demos are now faster in GL, and I’d discovered after making that previous post that a couple of demos had lost a small amount of speed by avoiding glMapBuffer – these were cases where rather large buffers were being locked in one go.
It’s still a bit odd – I personally would have thought that using glMapBuffer in write-only mode, having previously discarded the buffer by calling glBufferData(…, NULL) would be directly equivalent to using glBufferSubData. The usage mode is surely the same, ie that you don’t want to read the current contents of the buffer and want to discard them, freeing the buffer up from needing to be synced. But, clearly not – most strange.
I’ve a little doubt at the back of my mind that this 32k threshold is rather unscientific and I’m wondering if it might vary per card, hoping to get some guidance from the guru on that
But, it’s good that I’ve found a route which seems to perform better in all cases now.
Big thanks go to Kevin at nVidia for taking the time to read my blog and get some info from their driver team, it was much appreciated.
March 25th, 2007 at 10:52 pm
[...] What I don’t get is: D3D has had a buffer model that is simple to understand and actually works for, like, 6 years now! Why ARB_vertex_buffer_object guys couldn’t just copy that? The world would be a better place! No, instead they make a way to map only whole buffer; updating chunks is extra memory copy; there are confusing usage parameters (when should I use STREAM and when DYNAMIC?); performance costs are unclear (when is glBufferSubData faster than glMapBuffer?) etc. And in the end when an OpenGL noob like me tries to actually make them work – he can’t! It’s slow! [...]
June 21st, 2007 at 7:25 pm
[...] Finally, GL buffer objects are due to get some of the functionality we’ve taken for granted in Direct3D for years, as described in this article. Things like explicit write-only interaction modes and sub-region optimisation. About damn time is all I can say, it’s because of GL’s far too generic buffer object management that we have to bend over backwards and use esoteric scratch buffer thresholds in GL to get decent performance under varying buffer conditions. D3D may have one of the more butt-ugly APIs in the known world (mostly because it’s influenced by the similarly butt-ugly Win32 API), but at least it gives you the control when you need it in these areas. [...]
June 24th, 2007 at 8:44 am
Hi Steve and all
Yes I think this was a good catch and making sure different cards (or perhaps different drivers) are handled the right way is important.
Ogre doesn’t have any way to handle implementing different driver/cards implementations does it?
Like a standard way?
June 24th, 2007 at 2:27 pm
Not at the moment, our goal really is to try to make that unnecessary for the user – it’s nasty to have to be concerned about that at the application / content creator level. However, it might happen as part of the Summer of Code ‘hardware emulation’ project, it’s being discussed right now.