On Tue, Aug 03, 2004 at 03:40:42PM -0700, Martin Quinson wrote:
[...]
 I'm ok with being pedentic here, too. This approach would fit
me:
 For the master:
  - if no encoding specified, supposed to be UTF8 
If you run "xgettext --from-code=UTF-8", no other charset can be used
for PO files, and translators may dislike being forced to use this
charset without any good reason.
I much prefer assuming ASCII by default.  (Then UTF-8 if a falback is
needed)
  - if it's not valid UTF8, refuse to process until being given
what it is
 For translations:
  - if not specified, suppose it's the same than the one in translated part
    of the po file 
There is a problem I did not think about before, few English man pages
contain non-ASCII characters, like euro-test in Debian.  PO files have
then to be UTF-8 encoded, and generated man pages will also be UTF-8
encoded which is not the expected result, at least in Debian.
The easy solution is to use escaped sequences (see groff_char(7))
instead of ISO-8859-1 characters, and hope that a similar solution
is always available.  Then documentation should clearly state which
encoding can be used for original documents, depending on their format.
  - could be cool if we could check that the encoding is not broken,
but I'm
    not sure whether it's even possible. 
Double conversion from ISO-8859-1 to UTF-8 is a common error and seems
pretty hard to diagnose.
  - during gettextization, assume it's UTF8 if no encoding is
provided, whine
    for a proper setting if it's not the case
 For po files:
  - msgid must be in UTF8. No matter what happen.
  - msgstr have to be in the encoding specified in the po file headers. 
No, msgids and msgstrs must share the same encoding, which is why UTF-8
is the only sane encoding if msgids contain non-ASCII characters.
Denis