Digging into Boost.Thread

· by Steve · Read in about 3 min · (560 Words)

I thought before I got any deeper into the background loading code, I thought I’d do a bit more digging on the performance of the approach I was planning to use to make sure I wasn’t barking up the wrong tree. I was particularly concerned that the major synchronisation primitive in Boost.Thread uses is called a mutex, and many of you will know that mutexes are expensive (in Windows anyway) since they involve a transition from user to kernel mode in all cases. However, this is Boost we’re talking about, and I couldn’t believe they would opt for a low performing option without some way to control it.

A bit of perusing later and I confirmed that Boost.Thread is indeed smarter than that. It actually implements the ‘mutex’ and ‘recursive_mutex’ classes using critical sections in Windows rather than true mutexes. Internally there is an option to use full mutexes but this would never be used without editing a couple of lines of code and rebuilding. Using critical sections has the advantage of being rather fast, since if a critical section is not already locked, no user/kernel mode transition is made. In the case where the critical section is already locked, then a kernel mode call is made to make the thread wait - but this is to be expected since you want to give up the CPU time and get a single notification when the resource is freed.

There are even faster alternatives to going through the OS critical section calls, that can be used especially if you don’t want to cause a kernel mode switch even whilst a resource is locked - the typical example is called a ‘Spin Lock’, because instead of using the OS to lock, you literally spin the CPU, polling a much simpler atomic state flag like a test-and-set implemented in assembly (e.g. ‘lock xchg’ on Intel), which typically takes only a handful of cycles instead of the dozens required for an OS call. Trouble is this requires per-platform assembler (even within the Intel platform), and is only faster if the time spent spinning is less than the kernel mode switch would have taken during a lock situation. Spinning is also pretty inappropriate on machines where there are more threads than CPUs when relatively protracted lock contention happens, since it’s burning CPU time all the time, defeating the object of threading somewhat. Given that in our context we want to use threads (even on a single-CPU machine) to wait for file loading, which will certainly take longer than a kernel mode switch (I/O device waits are measured in millennia compared to even kernel waits), I’m thinking that manual spin locks aren’t worth the hassle. Even though they’re the fastest for very, very fine granularity locking they’re not the most appropriate here I don’t think.

The other disadvantage to using critical sections is that they can only be used to synchronise threads within a single process. If you want to synchronise across processes then you’d have to use a full mutex or semaphore - but I have no need of that, so this is fine.

Later in the week I’ll get back to the implementation. It’s a social night tonight and tomorrow I really have to find time to look at an FBO readback issue someone’s been waiting in the forum for an answer on. Sigh.