Translation Nightmare

I just got a new bug titled Very weird translation template, need comments in .pot file to clarify, and giggled to myself. I was wondering how long it would be for this bug to be filed. The problem is that whilst most of the translatable strings in Tasks are pretty boring: "Tasks", "today", "Priority" and so on, all of a sudden the template goes a bit mental:

"^(?<task>.+) (?:by|due|on)? (?<month>\\w+) (?<day>\\d{1,2})(?:st|nd|rd|th)?$"

Apparently the average translator doesn't think that learning PCRE-style regular expressions, and reading the source that uses this string to understand how it is to be used, is appropriate. [note: this is sarcasm]

Maybe I should have added some translator comments to clarify exactly what I meant by this. These monster strings (all in koto-date-parser.c) are GRegex regular expressions which are used to parse the user's input to try and extract meaningful date information. To translate these strings you'll need to have a basic understanding of regular expressions: if you don't then skip them and hopefully someone who does will finish the translation. If you know regular expressions then translating these strings is easy, honest.

The golden rule is to never translate the words which look like this: (?<foo>. These are markers which identify portions of the input (such as task or month) and need to remain in English, although they can be moved around if required. The rest of the strings are translatable. I'll give an example using the French translation by Stéphane Raimbault. First, the string in English and a worked example:

"^(?<task>.+) (?:by|due|on)? (?<day>\\d{1,2})(?:st|nd|rd|th)? (?<month>\\w+)$"

First, we have a sequence of any characters identified as task, which magically expands to be as many as possible. This is optionally followed by one of the words "by", "due" or "on". This is followed by one or two digits identified as day followed by "st", "nd", "rd" or "th". Finally a sequence of characters which is identified as month. If the user had entered "pay bills on 2nd june" then task would be "pay bills", day would be "2", and month would be "june". Tasks can then turn "june" into a month number through other translations, and it now knows what date the user entered. In French, this translates as follows:

"^(?<task>.+) (?:pour|prévu|pour le)? (?<day>\\d{1,2})(?:er|e)? (?<month>\\w+)$"

See, I said it was easy! All I need now is a legion of translators who understand regular expressions enough to correctly translate the new Tasks... [this, again, is sarcasm] Luckily, plans are afoot to move the Tasks source to the GNOME Subversion server, so the full fury of the GNOME translation team can attack this.

NP: Trailer Park, Beth Orton

21:17 Wednesday, 01 Oct 2008 [#] [computers] (9 comments)

Posted by ulrik at Wed Oct 1 21:32:53 2008:
No, it's not that easy. The translator-filed bug is very honest and simple and perfectly valid.

What you haven't considered is that most translations will need to change the structure of the regexp: What about different word order or date formats? You need good language + regexp skills!

What about a comment like
/* This is a pattern matching expression (regexp) to identify stanzas like 'due 3rd december' or 'by 14th march'. Please email email@example.com or ask at [link] for help in writing a regexp for your language */
Posted by Tomasz Dominikowski at Wed Oct 1 21:50:29 2008:
The average translator is not a programmer and should not be expected to be one. I've filed that bug precisely because of that.

"Apparently the average translator doesn't think that learning PCRE-style regular expressions, and reading the source that uses this string to understand how it is to be used, is appropriate."

How am I supposed to take that? That sounds condescending and borderline mockingly.

It's great that you plan to move the project to GNOME SVN, but the thing is, I'm already there and out of 4 people working on the Polish team (yes, there's only 4 of us), thankfully one is a programmer, so maybe he'll be able to figure this out. As for me, I don't care anymore.
Posted by Ross at Wed Oct 1 22:09:43 2008:
Man, sarcasm gets totally stripped over the internet doesn't it.

Ulrik: yes I know its not simple.  If someone comes up with a code change which means I don't have to ask translators to translate regular expressions then I'll happily apply it, but I can't think of how it would work for exactly the reason you give: the order of the regular expression needs to be changed.

Tomasz: That wasn't intended to be condescending or mocking, it was sarcasm.  I understand your concerns which is why I wrote a long blog post and doubled the number of translator comments this evening instead of socialing with my wife.  I know that the majority of translators may have at best a basic understanding of C and few can understand the regular expressions here.  Sadly, I have regular expressions which need to be translated.

Please don't stop translating the rest of the strings in tasks: your work in the past has been wonderfully prompt and efficient (which is why I got you an account on our svn server) and is much appreciated.

Again, this post was intended to be sarcastic and not insulting -- what I'm asking translators to do is way beyond what is usual and I know this.
Posted by Tomasz Dominikowski at Wed Oct 1 22:21:19 2008:
Yes, sarcasm indeed does get stripped over the Internet, that's why we invented emoticons :] I don't know you Ross. Many FOSS developers are unfortunately sociopaths with Asperger's or plain autism. It's sad that I assume that everyone is, but it's a safe bet (again, unfortunetely). From now on I'll know that you're just being sarcastic. I love being sarcastic myself, but it doesn't work that well over the Internet, does it :]
Posted by ulrik at Wed Oct 1 23:56:38 2008:
I was also reading the post literally I must confess.

I also appreciate how I read Tomasz comments of OSS people being sociopaths, without thinking it was sarcastic, then realizing -- wait it should be sarcastic -- then again thinking it's probably not.
Posted by iain at Thu Oct 2 00:52:35 2008:
but ross is a sociopath.
I bet he wishes I used my blog so he could write insulting comments on it too. but I don't, so HA to you.
Posted by Erik at Thu Oct 2 07:10:38 2008:
Shouldn't this bea available in the language tools? Can't you do this dynamocally and use strings to translate like this:

translate
"Do this task on 17th of may"
"... on 1st of may"
"here are all the strings that can end a date e.g. 1st 2nd 3rd 4th 5th etc..."
Posted by Ross at Thu Oct 2 07:36:04 2008:
Erik: the first version did that.  It totally failed to work in Finnish and Polish.  If you can come up with a basic implementation which works when translated to Finnish and Polish then please do. :)
Posted by elcuco at Thu Oct 2 18:47:06 2008:
just for those who are serious:

Please don't do this. Not all languages work the same, and in some languages the order of the sentence will get completely different. Imagine that in some language the "by" should be before the "task" or very other weird combinations.

You cannot translate this, and not because regex (which BTW will make you want to put a nail into your eye if you mix them with hebrew for examples). The linguistics are very different for Hebrew (in my example).

Name:


E-mail:


URL:


Add 8 and 6 (required):


Comment: