On Thu, Feb 17, 2005 at 12:43:39AM +0100, Jordi Vilalta wrote:
Is there a way to enter non-breaking spaces from the keyboard? Or did
they
use some special software that diferentiates both kinds of spaces?
Sometimes (on my box, it depends on the locale), 'Alt-Space' works. On
other systems, I've also seen some other combinations. With vim, you can use
a digraph: 'Ctrl-K' followed by 'NS'.
acheck may also be used to generate them in French.
I'm not using any PO editor (my fingers refuse using anything else than
vim). In vim, 0xA0 can be differentiated (the default behavior depends on
the locale, on a ISO-8859-1 locale I think it is displayed as '| ', in
blue).
>What I propose is to keep the conversion of 0xAO to '\ '
in post_trans,
>but remove the opposite conversion in pre_trans. Thus PO will be valid and
>translators will be able (at their will) to use 0xA0 in the msgid (and
>will have to set a correct charset in the header).
If I understood it well, it would be compatible with existent po files,
but the newly created files would have "\ " instead of 0xA0? (I would like
this approach)
A sort of compatibility. It should fuzzy strings. Not a big deal I think.
>Do you think we may keep the 0xA0 if the user specified an
>$self->{TT}{'file_in_charset'} = UTF-8 or latin-1
>(should we then check in_charset or out_charset ?)
It's the first time I see 0xA0, so I don't know many things about it. I
see it like a strange character, hard to diferentiate from the standard
space in classic editors (correct me if I'm wrong). Personally I prefer
having "\ " everywhere instead of 0xA0, independent from the character
set.
Since it generate errors, I would also advocate for "\ ".
>I'm also asking this for the TeX module (there I'm doing
translation of
>accentuated characters, i.e. \'e in the TeX file becomes é in the PO which
>is then translated again to \'e in the TeX file).
This is a different case because it's easy to diferentiate a 'e' from a
'é' (both visually when reading and when writing each one).
Apart of this, I'll try to have a look at this conversion, because this
introduces undetected non-ascii characters (which should force the po file
to be in utf-8).
Do you mean that we could force the PO becoming UTF-8 when these
conversions are performed?
'é' will also cause an error.
Supporting such translations could be useful for other languages (e.g.
XML, with é and other characters).
Regards,
--
Nekral