On Thu, 12 Aug 2004, Danilo Piazzalunga wrote:
 Alle 19:10, giovedì 12 agosto 2004, Martin Quinson ha scritto:
> Hello,
>
> I was admiring the new translation and trying to read it (without really
> speaking italian) when I discovered some encoding issues. It looks like
> UTF8 chars handled by a non utf-ready tool.
 In the Html you can see things such as:
 "Perché" instead of "Perché" 
Yes, it seems mpod2html assumes that the input document is in iso-8859, 
and works with the eyes closed :(
> After some investigation, it's possible that the corruption
comes from the
> po file itself.
 [...]
> it's even possible that the po file is clean but my less is broken.
 The PO itself is clean. Recode doesn't complain. Try "LANG=C less
<file>" and
 you will see that the PO is really UTF-8.
 Likely, some tool dealing with manpages expect them to use the ISO-8859-1
 charset. I already had a similar experience: one UTF-8 page looked fine when
 viewed directly (man ./foo.1), but when insalled and viewed with "man 1 foo"
 it showed the same problem. 
The man pages also appear with the utf codes. I've tried to install them, 
and to open the file directly, and both fail.
 The files could be recoded to either ISO-8859-1 or ISO-8859-15, but
the real
 problem lies elsewere. 
The easiest (and fastest) solution would be (as you say) to recode the po 
file to the encoding on which the final documents should be (and mantain 
the translation with the recoded po).
But this is a temporal solution. When we have the first non-european 
translations this problem will be back.
I think that the real issue is that the programs that deal with the 
translated documents don't have encodings support. The binary translation 
seems to work well, althought its po is also in utf (gettext rocks ;)
An intermediate solution could be to add an option to po4a-translate to 
specify the encoding in which you want the output document to be. This 
would be also interesting for the translation of files gettextized with 
the po4a script (the generated po could be in utf, since it mixes files 
of different formats and maybe different encodings) and you want some 
output files in different encodings.
This would be very rudimentary (but necessary?) to specify an output 
encoding for each language...
This would also take to a redesign of the po4a config files, since we need 
to specify more information there. This could be done together with the 
TransTractor redesign explained in the "Future directions" section. Well, 
we should leave all this for a future release... but we should begin 
thinking about it ;)
Regards,
Jordi Vilalta