On Sun, Nov 07, 2004 at 12:41:32AM +0000, Yves Rutschle wrote:
On Sun, Nov 07, 2004 at 12:37:57AM +0100, Martin Quinson wrote:
> But later, when I reviewed the code I discovered that the way it did split
> the sentences makes it very hard to use. "I <b>like</b> it" is
splitted in 3
> msgid which have to be translated separately ("I" ; "like" and
"it").
That's not actually the reason I found it useless as is: the
current CVS version slices paragraphs randomly (well, on 512
bytes boundaries or something like that), which means that
irrelevant changes in formatting anywhere in the file
fuzzify the entire file.
Ouch. Even worse than expected.
> See
http://po4a.alioth.debian.org/en/po4a.7.html#Why_not_to_split_on_ and
>
http://graal.ens-lyon.fr/~mquinson/l10n.html#l2.2 for my point on this. I
> should have explained my point to Laurent before...
Yes, I have actually run into a couple of those problems
myself. While the splitting in 3 as in your above example
is, indeed, a bit confusing, I don't find that makes it
useless, and more to the point, I just don't think there is
any other good solution: the bottom line is, you want to
specify that something in that sentence is important, which
will need to be in a different msgid.
Ok. I wanted to reply this message the way it desserve (with a long
argumentation to base my point), but I lack the time to do so. I'll be
short. Check the URL given above for more details.
The solution, I find, is to have the translator understand
the structure of the original text so she'll know to
translate:
"It's a <b>blue</b> car" => ("It's a",
"blue", "car" )
into:
"C'est une voiture <b>bleue</b>" => ( "C'est une
voiture", "bleue", " ")
And now, add this english sentence to your system: "it's a
<b>blue</b> horse"
You then have the following translations (one per line)
it's a -> "c'est un" or "c'est une" depending on the
context since horse is
masculin in french
blue -> "bleu" or "bleue" (same issue)
car -> voiture
horse -> cheval.
And now, add "it's a <b>small</b> car". This time, the issue is
that in
french, the adjective is placed before the noun where the translation of
"blue" is placed afterward.
How you'll implement this different translations depending on the context
and the reordering of sentence elements? My point is that splitting
sentences is *never* a viable solution.
If you think that such issues are seldom and dealable with, type
man Locale::Maketext::TPJ13 in a terminal ;)
I guess an alternative would be to have a list of "small
formatting tags" ( bold, italics etc) that do not actually
split at all, and appear in the msgid with the onus on the
translator to know enough HTML to know what to do with them
(so you'd have something like:
msgid "It's a <b>blue</b> car"
msgstr "C'est une voiture <b>bleue</b>"
That would have the advantage of providing the translator
with context information. In fact that goes a long way
towards your point of splitting at paragraph level :-)
That's actually fairly easily achievable: the list of
paragraph-marking tags is fairly small (<p>, <div>,
<h1,2,3,4,...>) and XHTML makes it mandatory for text to be
included in a block-level element of some sort.
That's exactly my point, indeed. You should split the translation on a
paragraph boundary because if you take bigger chunks, gettext and po editors
get clumsy. If you take smaller chunks, you run into endless issues about
context changing the meaning of the chunk.
You thus have to show some formating tags to the translators. We do so in
all other modules. I don't see any better idea.
> Nowadays, this module should probably be reimplemented using
Jordi's great
> work on XML-like formats.
I know next to nothing about XML; last time I saw some, I
thought it looked quite different from HTML. A quick read of
Jordi's module makes me think it's mostly an XML parser:
Html.pm relies on Gisle Aas' HTML parser, and it doesn't
seem to be very beneficial to change parsers just for fun;
It's not for fun, it's because the XML module do work and is done to allow
the rapid developpement of other modules (no new code needed) whereas the
existing HTML module does not work.
Moreover, I'd be pleased to cut a dependency. I hate unjustified
dependencies, but it may be personal.
besides, Gisle's parser is supposed to be quite good at
handling broken HTML, which I doubt XML is very good at
(then again, helping bad HTML spread probably isn't good :)
That's a good argument to stick to this parser, then.
Thanks for your interest for po4a,
Mt.