OS X Localisation: incremental genstrings and UTF-8 files

· by Steve · Read in about 3 min · (434 Words)

I came across a couple of interesting issues when I came to do the first pass of writing the text for the user-visible strings I’d been setting up for a Cocoa app I’m writing (painfully slowly as I learn the nuances of the environment), and I thought I’d share them. Full details are after the jump, since I’ve embedded a large script in the post.

The basic principle for text localisation on OS X is that, like most systems, you externalise your user-visible strings in string tables and reference them by keyed aliases in code - in this case using NSLocalizedString. Apple provide a tool called ‘genstrings’ which extracts all these into a template strings file called Localizable.strings which you can then populate per language - localised files are kept in folders called en.lproj, fr.lproj etc and helpfully they’re picked up by default like this.  So far, so good.

There are a couple of practical issues though. Firstly, genstrings always overwrites its output file, which means that using it incrementally to add new strings when you’ve already populated the previous set - which is bound to be the normal case for most developers - isn’t possible out of the box. Luckily I found a nice little Python script which solves this problem for you by merging the results in to your existing files. I’ve added a custom target with a Run Script step to my XCode project which uses a modified version of this script (see below) to update my strings files whenever I need to.

The second problem is that genstrings creates UTF-16 encoded files, and there’s no way to alter this. The problem with UTF-16 is that both Mercurial and Git don’t like them very much; both system’s text/binary detection will classify them as binary, meaning you lose the ability to diff and merge these files in any useful way. It’s not a deal-breaker, but it’s inconvenient. Couple that with the fact that OS X will quite happily use UTF-8 encoded .strings files directly at run-time (although iPhone will not), and it seemed something that I should resolve. For the convenience of development, I modified the Python script (as shown below) to convert the result of genstrings to UTF-8 via iconv, meaning they always get picked up as text in source control. If you’re deploying on iPhone, it’s trivial to write a small build script calling iconv again to convert back to UTF-16 for deployment.

Hopefully this is useful to someone else! I’m still very much learning on the Mac development side, so if there’s something I haven’t considered, please let me know.