Translation Nightmare
I just got a new bug titled Very weird translation template, need comments in .pot file to clarify, and giggled to myself. I was wondering how long it would be for this bug to be filed. The problem is that whilst most of the translatable strings in Tasks are pretty boring: "Tasks", "today", "Priority" and so on, all of a sudden the template goes a bit mental:
"^(?<task>.+) (?:by|due|on)? (?<month>\\w+) (?<day>\\d{1,2})(?:st|nd|rd|th)?$"
Apparently the average translator doesn't think that learning PCRE-style regular expressions, and reading the source that uses this string to understand how it is to be used, is appropriate. [note: this is sarcasm]
Maybe I should have added some translator comments to clarify exactly what I meant by this. These monster strings (all in koto-date-parser.c) are GRegex regular expressions which are used to parse the user's input to try and extract meaningful date information. To translate these strings you'll need to have a basic understanding of regular expressions: if you don't then skip them and hopefully someone who does will finish the translation. If you know regular expressions then translating these strings is easy, honest.
The golden rule is to never translate the words which look like this: (?<foo>. These are markers which identify portions of the input (such as task or month) and need to remain in English, although they can be moved around if required. The rest of the strings are translatable. I'll give an example using the French translation by Stéphane Raimbault. First, the string in English and a worked example:
"^(?<task>.+) (?:by|due|on)? (?<day>\\d{1,2})(?:st|nd|rd|th)? (?<month>\\w+)$"
First, we have a sequence of any characters identified as task, which magically expands to be as many as possible. This is optionally followed by one of the words "by", "due" or "on". This is followed by one or two digits identified as day followed by "st", "nd", "rd" or "th". Finally a sequence of characters which is identified as month. If the user had entered "pay bills on 2nd june" then task would be "pay bills", day would be "2", and month would be "june". Tasks can then turn "june" into a month number through other translations, and it now knows what date the user entered. In French, this translates as follows:
"^(?<task>.+) (?:pour|prévu|pour le)? (?<day>\\d{1,2})(?:er|e)? (?<month>\\w+)$"
See, I said it was easy! All I need now is a legion of translators who understand regular expressions enough to correctly translate the new Tasks... [this, again, is sarcasm] Luckily, plans are afoot to move the Tasks source to the GNOME Subversion server, so the full fury of the GNOME translation team can attack this.
NP: Trailer Park, Beth Orton
What you haven't considered is that most translations will need to change the structure of the regexp: What about different word order or date formats? You need good language + regexp skills!
What about a comment like
/* This is a pattern matching expression (regexp) to identify stanzas like 'due 3rd december' or 'by 14th march'. Please email email@example.com or ask at [link] for help in writing a regexp for your language */
"Apparently the average translator doesn't think that learning PCRE-style regular expressions, and reading the source that uses this string to understand how it is to be used, is appropriate."
How am I supposed to take that? That sounds condescending and borderline mockingly.
It's great that you plan to move the project to GNOME SVN, but the thing is, I'm already there and out of 4 people working on the Polish team (yes, there's only 4 of us), thankfully one is a programmer, so maybe he'll be able to figure this out. As for me, I don't care anymore.
Ulrik: yes I know its not simple. If someone comes up with a code change which means I don't have to ask translators to translate regular expressions then I'll happily apply it, but I can't think of how it would work for exactly the reason you give: the order of the regular expression needs to be changed.
Tomasz: That wasn't intended to be condescending or mocking, it was sarcasm. I understand your concerns which is why I wrote a long blog post and doubled the number of translator comments this evening instead of socialing with my wife. I know that the majority of translators may have at best a basic understanding of C and few can understand the regular expressions here. Sadly, I have regular expressions which need to be translated.
Please don't stop translating the rest of the strings in tasks: your work in the past has been wonderfully prompt and efficient (which is why I got you an account on our svn server) and is much appreciated.
Again, this post was intended to be sarcastic and not insulting -- what I'm asking translators to do is way beyond what is usual and I know this.
I also appreciate how I read Tomasz comments of OSS people being sociopaths, without thinking it was sarcastic, then realizing -- wait it should be sarcastic -- then again thinking it's probably not.
I bet he wishes I used my blog so he could write insulting comments on it too. but I don't, so HA to you.
translate
"Do this task on 17th of may"
"... on 1st of may"
"here are all the strings that can end a date e.g. 1st 2nd 3rd 4th 5th etc..."
Please don't do this. Not all languages work the same, and in some languages the order of the sentence will get completely different. Imagine that in some language the "by" should be before the "task" or very other weird combinations.
You cannot translate this, and not because regex (which BTW will make you want to put a nail into your eye if you mix them with hebrew for examples). The linguistics are very different for Hebrew (in my example).
Another thing you have to consider, is what to do if the translated version of the regular expression is broken. Will you get your variables out of it? Will your application crash? Do you want a translation to be able to do this? It is not clear what the best for your application is, but you could even consider maintaining different versions in code for languages as people are able to contribute them.
Although I'm not sure what you want to do with the variables you extract with this regular expression, I'll still mostly encourage you to find a much simpler approach. What do I do if my "translation" of "(?:by|due|on)?" does not occur in one place, or the word order depends on these, or multiple word orders are possible? Does \\d support localised digits, such as those used by Arabic and Indian languages? How does the application handle it if the month names are not completely translated?
Will it create usability problems if you encourage the user to use a format by specifying YYY-MM-DD or similar (such a string is reasonably easy to translate). Some programs ask translators to specify date formats to use and expect. Some of this you can even get directly from the locale.
Good luck on finding a good solution!