[Po4a-devel]Call for a (La)TeX module
by Martin Quinson
Hello,
I think there is a cruel need for a TeX module in po4a. This is the last
major documentation format missing to our panel, and people keep asking
about it. Nicolas (CCed) mailed Denis and me privately about that a week
ago, Nekral mailed me yesterday, and so on.
This format family may allow us to deal with texinfo documentation (all GNU
documentation), with book translators (like Nicolas) and maybe even with the
Python documentation (I trust Nekral on this one). And, more important than
all the rest, I'm tired of translating my presentations and articles
manually :)
The problem that kept me away from doing this until now is that as groff,
tex is a programming language in which you can define new macros. It's even
worse since authors do actually define macros (in groff, a few steal some
classical macros here and there. most people don't bother).
As with the other formats, po4a do not intend to become a full featured
format interpreter (à la HeVeA). It just intend to parse it and split the
input in msgids. I have some ideas about how to do so, but it's really
impossible for me to start a new module implementation. So, I'll explain my
plans here, and hope that someone will step in...
For [the rare] documents not defining any new macros and sticking to
unadulterated LaTeX, it should be rather easy to build a first prototype
simply splitting on limits between TeX's vertical and horizontal modes.
- As usual (hello Yves), you need to distinguish between inline tags (ups,
macros), which you ignore (such as textit or footnotesize or $bla$), and
formating ones, for which you translate the argument (such as \section,
\subsubsection or $$bla$$).
- Translate separately the content of all environment.
- Some macros need a more complex handling, I'm sure.
- Translate separately each item (of a itemize and associate).
- Naturally translate separately each paragraph separated by empty lines.
- Ignore stuff like \medskip, since they are formating only.
Hint: it's used in vertical mode. (if there is some \newpage, I guess
you're dead)
And so on and so far. I belive in this approach for simple documents. There
is two main jobs here :
- write a proper parser, which can detect macros, separate their arguments,
etc. This may be the more difficult part. tex is full of \ and { all
around the place. You'll have to protect them, and to come up with a
usable way to determine the } corresponding to a given { (so that the
inbetween can be treated as a macro argument).
Classical constructions (item) should be dealed with in there. All the
rest should be passed to macro handler just as in the man module.
- read a latex definition and write the right handlers for the right macro.
There will be a bunch of dupplicated work if you don't do as in the man
module (or come up with a better idea, of course).
Once this is done, you'll be able to deal with documents with no
\newcommand. For new definitiones, I guess that the only viable idea is to
go for specifically formated comments in the document (lines begining with
'%po4a:' ?) to explain which category each macro belongs to. You may even
want to allow the interpretation of perl code embeeded into the document, if
you're not concerned about security *at all*.
If you want to give it a try, you're welcome, anywho you are. Just check the
documentation to see how po4a works.
man Locale::Po4a::TransTractor -> root of the project.
man Locale::Po4a::Sgml -> some ideas about the categories; file inclusion.
man Locale::Po4a::Man -> some ideas about the macro handler mecanism.
Then, mail us. Then start coding.
That's all I can think about at 6.30 am in a night train bringing me to yet
another job interview...
Please comment/forward/react.
Mt.
20 years
[Po4a-devel]Nested blocks with Xml.pm
by Yves Rutschle
Hi all,
I started looking at using Xml.pm for Xhtml last night.
Is there an example somewhere of how one would go at
processing nested blocks with Xml.pm? As it is, having
defined the 'block' tags and 'inline' tags, it would produce
only one msgid for, e.g.:
<div>
<h1>blah blah</h1>
<p>This is a paragraph</p>
</div>
because <div> created a block and Xml.pm doesn't descend
into blocks (as I understand it -- I may be confused).
Y.
20 years
[Po4a-devel]Two small issues with Po.pm
by Nicolas François
Hello,
There is a first issue which causes a bad formatting of the po4a.7 and
Locale::Po4a::TransTractor manpages:
Here is an example:
http://po4a.alioth.debian.org/es/Locale/Po4a/TransTractor.html#DESCRIPCION
(at the end of this section, the "bone" is broken)
This is caused by a line ending by a \ in a no-wrap paragraph.
The Po unescape function replace ([^\\])\\n by $1\n
I think it should instead replace "even number of \ followed by \n"
thus s/([^\\](\\\\)*)\\n/$1\n/g;
(I'm joining Po.pm.escape.patch)
If you think it is the right way to fix this issue, then I will also
change other similar regular expressions in this module (there are many
others).
The second issue is one of the problems detected by the WDIFF category of
the testsuite (some text is present in the original, but not in the
translation, or the contrary).
If a man page contains something like that (in a wrapped paragraph):
\fP
foo
then the font stack may "delay" the font modifier, and po4a generates the
following paragraph: "\nfoo". The result of gettext with this msgid is
always a null string.
I will fix the Man module so that the empty line will be pushed without
translation, but the Po module may need the second patch.
(I'm joining Po.pm.defined_empty.patch and foo.1, which trigger this
issue)
Both patches work, but I'm not sure of my perl and I would prefer you have
a look at them.
Regards,
--
Nekral
20 years
[Po4a-devel]Re: [Po4a-commits] po4a/po/pod fr.po,1.40,1.41
by Martin Quinson
On Sun, Nov 21, 2004 at 10:39:28AM +0000, Nicolas FRAN??OIS wrote:
> Update of /cvsroot/po4a/po4a/po/pod
> In directory haydn:/tmp/cvs-serv17739/po/pod
>
> Modified Files:
> fr.po
> Log Message:
> It really needed a pass through a spell checker.
:) Sorry about that.
You could ask for a review on the debian french l10n list, maybe. It's not
debian only, but it's far more used within debian than out there.
Bye, Mt.
20 years
[Po4a-devel][patch] Making Html.pm (slightly) better
by Yves Rutschle
Hi all,
It looks like my spare time has shrunk further, hence my
long silence. Martin's last comments, and my spending some
time reading Sgml.pm and Xml.pm, along with running into
harder files from my site, have me almost convinced that
Martin is right and Html.pm is going the wrong way.
Here is the patch to it that I currently use. This brings it
to state in which it is useful for "simple" files (i.e.
files with simple paragraphs, little in-line formatting),
which I think still is useful for sites that contain a lot
of simple text (how-to's, for example, would be good
candidates if they used html as their primary format).
It doesn't change the fundamentals of its working, so all of
Martin's objections still hold true. Rather, it fixes the
module's shortcomings:
* Paragraphs are now spit along paragraphs, instead of
random 512-byte-aligned boundaries,
* title and alt attribute contents now create msgids.
I hope to get some time to play with Sgml.pm and Xml.pm soon
(because, let's face it, it's much more fun than actually
doing translations).
Y.
20 years, 1 month
[Po4a-devel][patch] Take 2: Making Html.pm (a little) better
by Yves Rutschle
Ok, here is a much bigger patch:
Thanks to Nekral for pointing out 2 problems:
- Those two spaces around the pushline() call where wrong,
but so was what I did. After further thinking, it's
actually obvious that you _cannot_ touch leading and
trailing spaces. They are now conserved, and all we do is
remove multiple spaces.
- The title/alt attribute translation was indeed wrong. Now
the <img> tag is rewritten entirely.
Additional things:
- tokens with no content shall be translated no more (things
like <b>hello<b>, <i>world</i> would generated a msgid ", ")
- Test suite added. Thanks to Denis for making me do this
(made me find potential problems) and Jordi for pointing
po4a-normalize.
Enjoy!
Y. - damn, soon I'll run out of excuses for not translating
the site.
20 years, 1 month
[Po4a-devel][patch] make clean
by Yves Rutschle
At the moment, make clean doesn't remove po4a.log, which
upsets people who make patches on the test system. See
attached...
While I'm at it, I was sure that diff had an option
--dont-diff or something like that to avoid diffing specific
files (namely everything in CVS directories in this case),
yet I can't seem to find it in the man page. Did I dream
this one?
Y. - cleaner
20 years, 1 month
[Po4a-devel]Re: [Po4a-commits] po4a/lib/Locale/Po4a Man.pm,1.69,1.70
by Denis Barbier
On Sun, Nov 28, 2004 at 02:03:50PM +0000, Nicolas FRAN??OIS wrote:
> Update of /cvsroot/po4a/po4a/lib/Locale/Po4a
> In directory haydn:/tmp/cvs-serv18386/lib/Locale/Po4a
>
> Modified Files:
> Man.pm
> Log Message:
> Change "" to \(dq in quoted macro arguments. \(dq will be converted back to
> a single double quote for the translator.
Such constructs are also very good candidates for unit tests ;)
Denis
20 years, 1 month
[Po4a-devel]Re: [Po4a-commits] po4a/po/bin fr.po,1.37,1.38
by Nicolas François
Hello,
> Index: fr.po
> ===================================================================
> RCS file: /cvsroot/po4a/po4a/po/bin/fr.po,v
> retrieving revision 1.37
> retrieving revision 1.38
> diff -u -d -r1.37 -r1.38
> --- fr.po 6 Nov 2004 15:28:23 -0000 1.37
> +++ fr.po 23 Nov 2004 22:43:18 -0000 1.38
> @@ -734,7 +734,7 @@
>
> #: ../../po4a-updatepo:188
> msgid "po4a-updatepo can't take the input po from stdin."
> -msgstr "po4a-update ne peut lire le fichier po depuis l'entr�e standard."
> +msgstr "po4a-updatepo ne peut lire le fichier po depuis l'entr�e standard."
>
> #: ../../po4a-updatepo:200
> msgid "Parse input files... "
Jordi, the Spanish and Catalan translations have the same typo.
Regards,
--
Nekral
20 years, 1 month
[Po4a-devel]Presentation of the po4a project
by Martin Quinson
[please keep both lists in the loop]
Hello,
my attention was just bought to the Translate project. As member of the po4a
team, I find this very interesting.
As the documentation over http://po4a.alioth.debian.org/ says,
the po4a (po for anything) project goal is to ease translations (and more
interestingly, the maintenance of translations) using gettext tools on
areas where they were not expected like documentation.
The Translate homepage over sourceforge says that :
the Translate package brings the translation format tools closer to the
goal of providing a seemless method of converting the various translation
formats. Currently the tools manage the conversion of Mozilla and
OpenOffice to the Gettext format. Also included are tools to convert PO to
CSV to allow translators to use wordprocessors to perform the
translations.
At the first glance, I'd say that there is a bunch of functionalities in
common in both projects. The main differences are the targeted formats. For
now, po4a has working modules for sgml (both DebianDoc and Docbook DTD), XML
(a bunch of DTDs), groff (for man pages), pod (for the perl documentation),
and the documentation strings of the 2.4.x kernels. Another difference is
that po4a is in perl while Translate is in python. :)
From there on, I dunno exactly what to say. This mail is meant as a sort of
"hello".
For example, I would like to take the code of your converters for Moz et OO
formats, and change them to po4a modules. Is it ok for you guys? What is the
licence of those codes? po4a is GPL, is it ok when I relicence your code to
GPL?
I really hope that we will be able to coordinate our forces in the future...
Thanks a lot,
Martin Quinson.
20 years, 1 month