On Thu, 12 Aug 2004, Danilo Piazzalunga wrote:
Alle 19:10, giovedì 12 agosto 2004, Martin Quinson ha scritto:
> Hello,
>
> I was admiring the new translation and trying to read it (without really
> speaking italian) when I discovered some encoding issues. It looks like
> UTF8 chars handled by a non utf-ready tool.
In the Html you can see things such as:
"Perché" instead of "Perché"
Yes, it seems mpod2html assumes that the input document is in iso-8859,
and works with the eyes closed :(
> After some investigation, it's possible that the corruption
comes from the
> po file itself.
[...]
> it's even possible that the po file is clean but my less is broken.
The PO itself is clean. Recode doesn't complain. Try "LANG=C less
<file>" and
you will see that the PO is really UTF-8.
Likely, some tool dealing with manpages expect them to use the ISO-8859-1
charset. I already had a similar experience: one UTF-8 page looked fine when
viewed directly (man ./foo.1), but when insalled and viewed with "man 1 foo"
it showed the same problem.
The man pages also appear with the utf codes. I've tried to install them,
and to open the file directly, and both fail.
The files could be recoded to either ISO-8859-1 or ISO-8859-15, but
the real
problem lies elsewere.
The easiest (and fastest) solution would be (as you say) to recode the po
file to the encoding on which the final documents should be (and mantain
the translation with the recoded po).
But this is a temporal solution. When we have the first non-european
translations this problem will be back.
I think that the real issue is that the programs that deal with the
translated documents don't have encodings support. The binary translation
seems to work well, althought its po is also in utf (gettext rocks ;)
An intermediate solution could be to add an option to po4a-translate to
specify the encoding in which you want the output document to be. This
would be also interesting for the translation of files gettextized with
the po4a script (the generated po could be in utf, since it mixes files
of different formats and maybe different encodings) and you want some
output files in different encodings.
This would be very rudimentary (but necessary?) to specify an output
encoding for each language...
This would also take to a redesign of the po4a config files, since we need
to specify more information there. This could be done together with the
TransTractor redesign explained in the "Future directions" section. Well,
we should leave all this for a future release... but we should begin
thinking about it ;)
Regards,
Jordi Vilalta