Hello,
I think there is a cruel need for a TeX module in po4a. This is the last
major documentation format missing to our panel, and people keep asking
about it. Nicolas (CCed) mailed Denis and me privately about that a week
ago, Nekral mailed me yesterday, and so on.
This format family may allow us to deal with texinfo documentation (all GNU
documentation), with book translators (like Nicolas) and maybe even with the
Python documentation (I trust Nekral on this one). And, more important than
all the rest, I'm tired of translating my presentations and articles
manually :)
The problem that kept me away from doing this until now is that as groff,
tex is a programming language in which you can define new macros. It's even
worse since authors do actually define macros (in groff, a few steal some
classical macros here and there. most people don't bother).
As with the other formats, po4a do not intend to become a full featured
format interpreter (à la HeVeA). It just intend to parse it and split the
input in msgids. I have some ideas about how to do so, but it's really
impossible for me to start a new module implementation. So, I'll explain my
plans here, and hope that someone will step in...
For [the rare] documents not defining any new macros and sticking to
unadulterated LaTeX, it should be rather easy to build a first prototype
simply splitting on limits between TeX's vertical and horizontal modes.
- As usual (hello Yves), you need to distinguish between inline tags (ups,
macros), which you ignore (such as textit or footnotesize or $bla$), and
formating ones, for which you translate the argument (such as \section,
\subsubsection or $$bla$$).
- Translate separately the content of all environment.
- Some macros need a more complex handling, I'm sure.
- Translate separately each item (of a itemize and associate).
- Naturally translate separately each paragraph separated by empty lines.
- Ignore stuff like \medskip, since they are formating only.
Hint: it's used in vertical mode. (if there is some \newpage, I guess
you're dead)
And so on and so far. I belive in this approach for simple documents. There
is two main jobs here :
- write a proper parser, which can detect macros, separate their arguments,
etc. This may be the more difficult part. tex is full of \ and { all
around the place. You'll have to protect them, and to come up with a
usable way to determine the } corresponding to a given { (so that the
inbetween can be treated as a macro argument).
Classical constructions (item) should be dealed with in there. All the
rest should be passed to macro handler just as in the man module.
- read a latex definition and write the right handlers for the right macro.
There will be a bunch of dupplicated work if you don't do as in the man
module (or come up with a better idea, of course).
Once this is done, you'll be able to deal with documents with no
\newcommand. For new definitiones, I guess that the only viable idea is to
go for specifically formated comments in the document (lines begining with
'%po4a:' ?) to explain which category each macro belongs to. You may even
want to allow the interpretation of perl code embeeded into the document, if
you're not concerned about security *at all*.
If you want to give it a try, you're welcome, anywho you are. Just check the
documentation to see how po4a works.
man Locale::Po4a::TransTractor -> root of the project.
man Locale::Po4a::Sgml -> some ideas about the categories; file inclusion.
man Locale::Po4a::Man -> some ideas about the macro handler mecanism.
Then, mail us. Then start coding.
That's all I can think about at 6.30 am in a night train bringing me to yet
another job interview...
Please comment/forward/react.
Mt.