Re: [Po4a-devel]Design of the (La)TeX module

Saturday, 11 December 2004

On Fri, Dec 10, 2004 at 11:34:32PM +0100, Denis Barbier wrote:
...
 On Mon, Dec 06, 2004 at 09:47:09PM +0100, Nicolas François wrote:
 > parse:
 >   * The parse function will only separate paragraphs. The separator is an
 >     empty line or a line beginning by a comment.
 >     What I'm calling here a "paragraph" is not a paragraph in the
output
 >     document, but a bloc of code separated by one of these separators.

 No, comments should simply be ignored when splitting into paragraphs.
 It is not uncommon to write a comment within a paragraph. 
I'm not sure to understand:

Here is what I'm doing when I encounter the following block:

foo
bar %baz
qux
%quux
corge
grault

I first remove the %baz comment, and store it a table. This comment will
disappear from the final document, but I will show it in the PO (possibly
at a wrong place) because it may help the translation.
Then, for the %quux comment, I consider that this comment separate two
paragraphs, which will be translated separately.

So, I'm ignoring the first comment when splitting into paragraphs, but not
the second one. Is this OK, or should I also ignore the second kind?

...
 > translate_buffer: return the translation of a buffer (typically
a
 > paragraph or a subset of a paragraph)

 See above, IMO it should be a paragraph. 
If I encounter the following paragraph:

\chapter{Lexical analysis\label{lexical}}

This buffer will be given to translate_buffer, which will separate this in
one command with one argument, and will call the chapter subroutine. This
subroutine will then call back translate_buffer with the content of this
argument: "Lexical analysis\label{lexical}".
Then translate_buffer separate this in one buffer (Lexical analysis), and
one (trailing) command with one argument (\label{lexical}). It will
translate the buffer, and call the label subroutine.

The same think can happen if a textual paragraph ends by a footnote. The
footnote can (and IMHO should) be translated separately.

That's why I wrote "a paragraph or a subset of a paragraph".

...
 >   1) call get_leading_command, to handle a leading command
 >      If the paragraph begins by a command, call this command's subroutine
 >      with the paragraph in argument and append this translation to the
 >      translated buffer.
 >      Loop until there is no more leading command.
 >
 >   2) call get_trailing_command, to handle trailing command (loop)
 >      while there is some trailing commands, call these commands, and build
 >      a translated buffer to push at the end of the current paragraph.
 >   3) append the translation of the remaining paragraph (if any)
 >   4) append the translation of the trailing commands

 Should work mostly fine with Nicolas' book, but what are these trailing
 commands? 
Here is a Python paragraph:
A Python program is read by a \emph{parser}.  Input to the parser is a
stream of \emph{tokens}, generated by the \emph{lexical analyzer}.  This
chapter describes how the lexical analyzer breaks a file into tokens.
\index{lexical analysis}
\index{parser}
\index{token}

The indexes here are trailing commands. They are translated separately
from the paragraph (in this case, they are maybe untranslated)

...
 >   * it should be possible to keep the separator between the
commands
 >     (could be none, a space or a newline).
 > 
 > One question: Is this separator important? For example, can I re-wrap:
 > \inputprotcode
 > \makeindex
 > \debing{document}
 > \myeqnspacing
 >    into:
 > \inputprotcode \makeindex \begin{document} \myeqnspacing
 > or even
 > \inputprotcode\makeindex\begin{document}\myeqnspacing

 Normally spaces, tabs and newlines are equivalent, but there are some
 circumstances where they are not, as when writing source codes.
 It is likely that spaces do not matter in this book, so I would say
 to not bother if this is much easier for you. 
That's the answer I was waiting for;)
(It can certainly be corrected later)

...
 But those macros aside, is a LaTeX module very different from XML
 or SGML?  It looks similar to me, there is a stack of environments,
 and the parser could be told what to do with these environements by
 a command like set_tags_kind (from Sgml.pm) 
Yes, a parser is a parser. Maybe we could share the same interface. I'm
not sure we can do more, because for example these languages don't split
paragraphs the same way (in XML, empty lines have no special meaning, and
we have to look inside the buffer to find some <BR/>).

I've just started to have a look at Sgml. If anybody wants me to change
the name of the TeX subroutines in order to have the same interface, it's
OK.

I'm attaching my current implementation. I will clean it and commit it.
It's mostly laking customization of new commands and file inclusion.
I was able to parse bk2/bk2.tex with it.
There are currently two kind ofs commands: untranslated and
translate_joined. It is possible to customize new commands on command
line (or at the end of the file).

The commands I've added at the end of the file won't be committed (I've
arbitrary set all encountered commands to one of these categories).

It is not ready AT ALL for texinfo.

Regards,
-- 
Nekral

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Po4a-devel]Design of the (La)TeX module