On Mon, Dec 06, 2004 at 09:47:09PM +0100, Nicolas François wrote:
Hello,
Based on my reading on LaTeX and Nicolas' book, I will change a little
bit the implementation.
It is much more formalized than the previous prototype. If it works, the
content of this mail may be used as a documentation for this module.
1) The functions:
parse:
* The parse function will only separate paragraphs. The separator is an
empty line or a line beginning by a comment.
What I'm calling here a "paragraph" is not a paragraph in the output
document, but a bloc of code separated by one of these separators.
No, comments should simply be ignored when splitting into paragraphs.
It is not uncommon to write a comment within a paragraph.
* The parse function will also remove the comments from this
paragraph
and keep them in a buffer (to be pushed as PO comments if there is
a string to translate in this paragraph, or ignored otherwise).
The comments will be ignored in the localized document.
(This doesn't concern lines beginning by a comment, which will just be
pushed, like empty lines)
Looks fine.
* Once a paragraph is found, the translation of the paragraph
(built by
translate_buffer) is pushed.
translate_buffer: return the translation of a buffer (typically a
paragraph or a subset of a paragraph)
See above, IMO it should be a paragraph.
1) call get_leading_command, to handle a leading command
If the paragraph begins by a command, call this command's subroutine
with the paragraph in argument and append this translation to the
translated buffer.
Loop until there is no more leading command.
2) call get_trailing_command, to handle trailing command (loop)
while there is some trailing commands, call these commands, and build
a translated buffer to push at the end of the current paragraph.
3) append the translation of the remaining paragraph (if any)
4) append the translation of the trailing commands
Should work mostly fine with Nicolas' book, but what are these trailing
commands?
* it should be possible to keep the separator between the commands
(could be none, a space or a newline).
One question: Is this separator important? For example, can I re-wrap:
\inputprotcode
\makeindex
\debing{document}
\myeqnspacing
into:
\inputprotcode \makeindex \begin{document} \myeqnspacing
or even
\inputprotcode\makeindex\begin{document}\myeqnspacing
Normally spaces, tabs and newlines are equivalent, but there are some
circumstances where they are not, as when writing source codes.
It is likely that spaces do not matter in this book, so I would say
to not bother if this is much easier for you.
parse_command:
A subroutine for the commands subroutine and get_leading_command /
get_trailing_command
* take a paragraph/buffer in argument
* output the command name, an optional * (for \chapter*{foo}), an array
of optional argument (between []), an array of argument (between {}),
and the remaining paragraph/buffer.
Another question: Are optional arguments always before regular arguments?
Yes.
get_leading_command:
Is probably the same as parse_command.
get_trailing_command:
If the given paragraph ends by a command, then extract this command and
return the command name, etc. and the remaining paragraph.
Again I do not understand what makes trailing commands special, can you
please elaborate?
The parameter of a command can contain a command, so a simple
regular
expression won't be sufficient.
Right.
To be understood as a trailing command, the command will have to
end by
an argument (could be optional), or should not have any argument.
I've read that a command is a \ followed by a string of lower and/or
uppercase letters or a \ followed by a single nonletter.
Mostly true, you can ignore other cases for now.
[...]
3) Some questions:
* Is there some commands that need to be translated?
For example, somebody may want to change \noindent into a
\localized_noindent.
No, localizing macros is a bad idea.
[...]
% po4a: new_command x y z t
where
* x is the number of optional arguments (between [])
0 - no optional argument
-1 - variable (can it be?)
Sometimes, yes.
n - maximum number of optional argument (maybe -1 will be
easier to use)
* y is the number of arguments
maybe x and y are not needed
* z array of indexes of the optional arguments that have to be translated
-1 - all optional argument should be translated
0 - none
1,3,7 - the 1st, 3rd and 7th arguments should be translated
* t array of indexes of the arguments that have to be translated
I do not fully understand how your parser will work, but this point
seems important. You will always find macros which have different
kinds of arguments, and I see no other solution than yours above.
But those macros aside, is a LaTeX module very different from XML
or SGML? It looks similar to me, there is a stack of environments,
and the parser could be told what to do with these environements by
a command like set_tags_kind (from Sgml.pm)
Denis