questions on parser input output includes initialization and more
by Patrice Dumas
Hello,
I have questions I did not find answers to by reading the documentation,
mainly for TransTractor. In link with the specificities of the Texinfo
format and of the parser I intend to use to define a TransTractor
derived parser (which produces a whole tree for an input file and
included files. That tree can then be split in translated paragraphs,
environment and lines and used to reconstitute the translated output
document).
I have a first question regarding the input. I did not understand
if the shiftline function provides the content of one file or of a
series of files. If a series of files is provided, I guess that it is
up to the parser() to read the reference returned by shiftline and
determine when there is a change in input file. Maybe this could be
documented in the TransTractor manual.
Second, to what extent is it important to read the input file with
shiftline? My understanding is that the shiftline facility is there
to have a way to read the input file line by line, but the parser caller
do not really care about how the input file is read, what really matters
are the calls to translate and to pushline. If the parser() can
determine by the file name and the line, to provide to translate second
argument, it could be used instead of shifltine information. If there
are several files passed through shiftline, then the parser needs to
read all the input be it only to get the list of files to process.
Both for input and output, is there a way to handle included files? It
seems to me that ideally, one translated file for each include file
should be written along with the translation of the main input file,
with modified include directives in translated files including
translated included files to use the translated file names. My
reading of the documentation is that it could be possible for
translate() as there is a reference in argument that could, in
principle, specify a different file than the file passed by shiftline,
but it does not seems to be possible for pushline that only accepts a
line and no information on the file(s) to write to. How is such a
situation supposed to be handled? I had a look at the TeX.pm code, and
it seems that read is redefined but also that the include files are not
translated as files but output together with the main file.
I have another question about the files encoding. Is it possible to
change the encoding based on the the information in the file(s) being
read, both for input and output? In Texinfo, there is a
@documentencoding directive that can be used to specify the encoding,
and it can change within the document (more likely when including a
file). It is becoming less and less relevant now that UTF-8 is
increasingly used for every manual, but still if it is possible to do
something it could be relevant.
Also, there is a @documentlanguage command in Texinfo to specify the
language of the document. It is used by processors, for example, to
translate strings added to the output format, for example the "Appendix"
string could be added and translated depending on the @documentlanguage.
It can also be used to change hyphenation patterns and the like. Is it
possible to have the information on the language being translated to,
such as to add the @documentlanguage at the beginning of the translated
document output? I have not seen anything about that in the
documentation.
It seems that there is an initialize function that can be redefined that
is called (but only by new?). Is it possible to use that function to
initialize the parser globally? Or should it be done in another way?
--
Pat
2 weeks