Re: [Po4a-devel]Some comments

Friday, 4 June 2004

Hello,

On Mon, 24 May 2004, Martin Quinson wrote:
...
 [...]
 On Fri, May 07, 2004 at 03:43:38PM +0200, Jordi Vilalta wrote:
 > > po4a skips the generation of msgid containing an entity only (or tags only).
 > > It will now issue a warning when such optimizations are done. Thanks for the
 > > repport. [At least this is what I planned, but the msgid containing spaces
 > > along with entities where not detected. This is also fixed]
 > 
 > Now it seems to skip this kind of msgids (the version I tried some days 
 > ago didn't), but it has an irregular behavior. I've done the following 
 > (meaningless) test:

 When I redo the test, I got something corresponding to what I expect:
 ====[/tmp/a]====
 <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
 <!ENTITY chap SYSTEM "chapter1.xml">
 <!ENTITY chap2 SYSTEM "chapter2.xml">
 <!ENTITY aaa "contens of aaa">
 <!ENTITY bbb "contens of bbb">
 <!ENTITY ccc "contens of ccc">
 ]>

 <book>
         &chap0;
         &chap;
         &chap2;
         &aaa;
         &chap3;
         &bbb;
         &chap;
         &ccc;
         &aaa;
 </book>
 ====[/tmp/chapter1.xml]====
 [content of chapt1]
 ====[/tmp/chapter2.xml]====
 [content of chapt2]
 ====[generated po file]====
 # SOME DESCRIPTIVE TITLE
 # Copyright (C) YEAR Free Software Foundation, Inc.
 # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
 # 
 #, fuzzy
 msgid ""
 msgstr ""
 "Project-Id-Version: PACKAGE VERSION\n"
 "POT-Creation-Date: 2004-05-24 14:10-0700\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language-Team: LANGUAGE <LL(a)li.org&gt;\n&quot;
 "MIME-Version: 1.0\n"
 "Content-Type: text/plain; charset=CHARSET\n"
 "Content-Transfer-Encoding: ENCODING"

 # type: definition of entity &aaa;
 #, no-wrap
 msgid "contens of aaa"
 msgstr ""

 # type: definition of entity &bbb;
 #, no-wrap
 msgid "contens of bbb"
 msgstr ""

 # type: definition of entity &ccc;
 #, no-wrap
 msgid "contens of ccc"
 msgstr ""

 # type: <book></book>
 msgid ""
 "&chap0; [content of chapt1] [content of chapt2] &aaa; &chap3; &bbb;
[content "
 "of chapt1] &ccc; &aaa;"
 msgstr ""
 ====[end of files]====

 The type line looks ok to me, and there is no reference line for entity
 definition. That way, it is not broken ;) 
Well, the problem here was with the chapter?.xml files. With your files I 
get the same result as you, but when changing their content to:

<chapter><title>ch.1</title>
<para>content 1</para>
</chapter>

I get this (mad) output po file:

...
# type: <title></title>
#: a.xml:12 chapter2.xml:1
msgid "ch.1"
msgstr ""

# type: <para></para>
#: a.xml:12 chapter2.xml:1
msgid "content 1"
msgstr ""

# type: <title></title>
#: chapter1.xml:1
msgid "ch.2"
msgstr ""

# type: <para></para>
#: chapter1.xml:1
msgid "content 2"
msgstr ""

# type: </chapter><chapter>
#: chapter2.xml:1
msgid "&aaa; &chap3; &bbb;"
msgstr ""

# type: </chapter></book>
msgid "&ccc; &aaa;"
msgstr ""

It seems that when inserting the content of the included file, it's parsed 
in the main file, and it gets this behavior (and the wrong type lines). 
Also, I don't like the substitution of the content here:

"&chap0; [content of chapt1] [content of chapt2] &aaa; &chap3; &bbb;
[content "
"of chapt1] &ccc; &aaa;"

As you see, the content of chapter1 appears twice (must be translated 
twice). Instead of this, I think that inclusion entities should be treated 
like the substitution entities (the content is translated once, and their 
appearances should be left as they are): &aaa; appears twice in this 
msgid, and its content is only translated once.

Now I've still tried to complicate it a little more. I've tried to put 
some tags into a substitution entity (I've used it in real documents) and 
then, the entity disappears from the generated po.

...

 > When watching the contens of the msgids, it seems that it skips only the 
 > inclusion entities that it knows, and gives the "substitution" entities 
 > up:

 No, we substitute only inclusion entities, and never the substitution ones.
 This is exaclty what I wanted, since expending them would force the
 translator to update his work each time the &version; entity is updated,
 which is exaclty contrary to the philosophy of this mecanism.

 > I think there are 2 alternative ways to treat these cases better:
 >   1) Exclude all entities-only messages (any number, known or unknown)
 >   2) Include the whole messages that have more than 1 entity (known or 
 >      unknown), because in some languages it may be interesting to change 
 >      the order of some of them.

 As reflected by the source code, the second option is the selected one.
 For the argument you give ;)

 > hmmm, now I was thinking about the standard entities that define special 
 > characters, as &acute; and I've seen that they're also excluded if
there's 
 > something like <title>&Acute;</title>. Seeing this, I prefer not to

 > exclude any entities. In some cases it can be a little annoying for the 
 > translators, but else, there could be some untranslateable strings.

 hmm. This example looks a bit artificial, doesn't it? Anyway. I added a
 'include-all' option to the module to disable those optimisations. 

 Passing options to modules are one of the novelty introduced to the CVS
 version. For example, it would be :
 po4a-gettextize -t sgml -o include-all -m bla.sgml -p bla.pot 
Interesting :)

[...]

Regards,

Jordi Vilalta

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Po4a-devel]Some comments