OS X Localisation: incremental genstrings and UTF-8 files
May 18th, 2010Development, OS X No CommentsI came across a couple of interesting issues when I came to do the first pass of writing the text for the user-visible strings I’d been setting up for a Cocoa app I’m writing (painfully slowly as I learn the nuances of the environment), and I thought I’d share them. Full details are after the jump, since I’ve embedded a large script in the post.
The basic principle for text localisation on OS X is that, like most systems, you externalise your user-visible strings in string tables and reference them by keyed aliases in code – in this case using NSLocalizedString. Apple provide a tool called ‘genstrings’ which extracts all these into a template strings file called Localizable.strings which you can then populate per language – localised files are kept in folders called en.lproj, fr.lproj etc and helpfully they’re picked up by default like this. So far, so good.
There are a couple of practical issues though. Firstly, genstrings always overwrites its output file, which means that using it incrementally to add new strings when you’ve already populated the previous set – which is bound to be the normal case for most developers – isn’t possible out of the box. Luckily I found a nice little Python script which solves this problem for you by merging the results in to your existing files. I’ve added a custom target with a Run Script step to my XCode project which uses a modified version of this script (see below) to update my strings files whenever I need to.
The second problem is that genstrings creates UTF-16 encoded files, and there’s no way to alter this. The problem with UTF-16 is that both Mercurial and Git don’t like them very much; both system’s text/binary detection will classify them as binary, meaning you lose the ability to diff and merge these files in any useful way. It’s not a deal-breaker, but it’s inconvenient. Couple that with the fact that OS X will quite happily use UTF-8 encoded .strings files directly at run-time (although iPhone will not), and it seemed something that I should resolve. For the convenience of development, I modified the Python script (as shown below) to convert the result of genstrings to UTF-8 via iconv, meaning they always get picked up as text in source control. If you’re deploying on iPhone, it’s trivial to write a small build script calling iconv again to convert back to UTF-16 for deployment.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Localize.py - Incremental localization on XCode projects
# João Moreno 2009
# http://joaomoreno.com/
# Modified by Steve Streeting 2010 http://www.stevestreeting.com
# Changes
# - Use .strings files encoded as UTF-8
# This is useful because Mercurial and Git treat UTF-16 as binary and can't
# diff/merge them. For use on iPhone you can run an iconv script during build to
# convert back to UTF-16 (Mac OS X will happily use UTF-8 .strings files).
# - Clean up .old and .new files once we're done
from sys import argv
from codecs import open
from re import compile
from copy import copy
import os
re_translation = compile(r'^"(.+)" = "(.+)";$')
re_comment_single = compile(r'^/\*.*\*/$')
re_comment_start = compile(r'^/\*.*$')
re_comment_end = compile(r'^.*\*/$')
class LocalizedString():
def __init__(self, comments, translation):
self.comments, self.translation = comments, translation
self.key, self.value = re_translation.match(self.translation).groups()
def __unicode__(self):
return u'%s%s\n' % (u''.join(self.comments), self.translation)
class LocalizedFile():
def __init__(self, fname=None, auto_read=False):
self.fname = fname
self.strings = []
self.strings_d = {}
if auto_read:
self.read_from_file(fname)
def read_from_file(self, fname=None):
fname = self.fname if fname == None else fname
try:
f = open(fname, encoding='utf_8', mode='r')
except:
print 'File %s does not exist.' % fname
exit(-1)
line = f.readline()
while line:
comments = [line]
if not re_comment_single.match(line):
while line and not re_comment_end.match(line):
line = f.readline()
comments.append(line)
line = f.readline()
if line and re_translation.match(line):
translation = line
else:
raise Exception('invalid file')
line = f.readline()
while line and line == u'\n':
line = f.readline()
string = LocalizedString(comments, translation)
self.strings.append(string)
self.strings_d[string.key] = string
f.close()
def save_to_file(self, fname=None):
fname = self.fname if fname == None else fname
try:
f = open(fname, encoding='utf_8', mode='w')
except:
print 'Couldn\'t open file %s.' % fname
exit(-1)
for string in self.strings:
f.write(string.__unicode__())
f.close()
def merge_with(self, new):
merged = LocalizedFile()
for string in new.strings:
if self.strings_d.has_key(string.key):
new_string = copy(self.strings_d[string.key])
new_string.comments = string.comments
string = new_string
merged.strings.append(string)
merged.strings_d[string.key] = string
return merged
def merge(merged_fname, old_fname, new_fname):
try:
old = LocalizedFile(old_fname, auto_read=True)
new = LocalizedFile(new_fname, auto_read=True)
merged = old.merge_with(new)
merged.save_to_file(merged_fname)
except:
print 'Error: input files have invalid format.'
STRINGS_FILE = 'Localizable.strings'
def localize(path):
languages = [name for name in os.listdir(path) if name.endswith('.lproj') and os.path.isdir(name)]
for language in languages:
original = merged = language + os.path.sep + STRINGS_FILE
old = original + '.old'
new = original + '.new'
if os.path.isfile(original):
os.rename(original, old)
os.system('genstrings -q -o "%s" `find . -name "*.m"`' % language)
os.system('iconv -f UTF-16 -t UTF-8 "%s" > "%s"' % (original, new))
merge(merged, old, new)
else:
os.system('genstrings -q -o "%s" `find . -name "*.m"`' % language)
os.rename(original, old)
os.system('iconv -f UTF-16 -t UTF-8 "%s" > "%s"' % (old, original))
if os.path.isfile(old):
os.remove(old)
if os.path.isfile(new):
os.remove(new)
if __name__ == '__main__':
localize(os.getcwd())
Hopefully this is useful to someone else! I’m still very much learning on the Mac development side, so if there’s something I haven’t considered, please let me know.








