Well, I had a bit of a rant about glMapBuffer yesterday on the blog, and I was lucky enough that an acquaintance at nVidia read it and sent me some tips from one of their GL driver gurus. Splendid
Having pored over that, I found that I’d tried most of what he mentioned but he made specific reference to the size of the updates being significant. This I hadn’t really experimented with, so I decided to revisit the whole situation again today (apologies to my wife Marie again here, I know it’s the weekend but this really can’t wait ;)).
And you know, size really does matter. I played with a lot of different scenarios and came to the conclusion that when locking and updating less than about 32k of buffer data (on my 6800), glBufferSubData trumps glMapBuffer by a significant margin. The closer you get to that threshold though, the less difference it makes, and when you start getting well above that, glMapBuffer starts to pull away again. With very, very large updates, glMapBuffer wins outright (provided you discard the current contents of the buffer – in GL this means calling glBufferData with a null pointer). So, I’ve changed my implementation now to use chunks of a scratch buffer and glBufferSubData when the locked area is smaller than 32k, and glMapBuffer otherwise. The result is that all of the demos are now faster in GL, and I’d discovered after making that previous post that a couple of demos had lost a small amount of speed by avoiding glMapBuffer – these were cases where rather large buffers were being locked in one go.
It’s still a bit odd – I personally would have thought that using glMapBuffer in write-only mode, having previously discarded the buffer by calling glBufferData(…, NULL) would be directly equivalent to using glBufferSubData. The usage mode is surely the same, ie that you don’t want to read the current contents of the buffer and want to discard them, freeing the buffer up from needing to be synced. But, clearly not – most strange.
I’ve a little doubt at the back of my mind that this 32k threshold is rather unscientific and I’m wondering if it might vary per card, hoping to get some guidance from the guru on that But, it’s good that I’ve found a route which seems to perform better in all cases now.
Big thanks go to Kevin at nVidia for taking the time to read my blog and get some info from their driver team, it was much appreciated.