Bugs item #301335, was opened at 2005-03-29 16:20
You can respond by visiting:
http://alioth.debian.org/tracker/?func=detail&atid=410622&aid=301...
Category: Sgml.pm
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Chris Karakas (be-my-guest)
Assigned to: Nobody (None)
Summary: PHP code inside CDATA is parsed as processing instruction
Initial Comment:
Suppose you have the following in your SGML file:
<screen><![CDATA[<?php
]]><![CDATA[include("config.php");
]]><![CDATA[mysql_connect("$dbhost", "$dbuname",
"$dbpass");
]]><![CDATA[mysql_select_db("$dbname");
]]><![CDATA[echo mysql_error();
]]><![CDATA[phpinfo();
]]><![CDATA[?>
]]></screen>
This will be transformed to:
<screen>{PO4A-beg-CDATA}<?php
{PO4A-end}{PO4A-beg-CDATA}include("config.php");
{PO4A-end}{PO4A-beg-CDATA}mysql_connect("$dbhost", "$dbuname",
"$dbpass");
{PO4A-end}{PO4A-beg-CDATA}mysql_select_db("$dbname");
{PO4A-end}{PO4A-beg-CDATA}echo mysql_error();
{PO4A-end}{PO4A-beg-CDATA}phpinfo();
{PO4A-end}{PO4A-beg-CDATA}?>
{PO4A-end}</screen>
in the intermediate temporary file that is passed to nsgmls through the SGMLS module.
This will produce the error:
(po4a::sgml)
Unknown SGML event type: pi
The reason is that in SGML.pm the value of $event-type is "pi" - meaning
"Processing Instruction". The SGMLS module reads the
"{PO4A-beg-CDATA}", does NOT understand that we are inside CDATA and then reads
"<?php" and thinks it has a processing instruction, since it "something
not in CDATA that starts with <?". So SGMLS sets $event-type to "pi".
But "pi" events are not handled by SGML.pm, so the if-elseif-elseif...construct
that tests $event->type in SGML.pm ends in:
else {
die wrap_ref_mod($refs[$parse->line], "po4a::sgml",
dgettext("po4a","Unknown SGML event type: %s"), $event->type);
}
and we see the error above.
The solution must be some kind of check that says:
"If you are inside a CDATA of the *original* file and get an event type of
'pi', then treat the data as part of the CDATA, not as processing
instruction."
However, this is bound to be tricky, since there might be multiple opening and closing
CDATA tags on one line. Simple checks with regexps will not do. Actually, the best way
would be to consult the parser itself - but po4a tricks the parser by changing
"<![CDATA[" strings to {PO4A-beg-CDATA}, so we must "do the parser's
work" here. I find it a bad idea. We will never be better than the parser itself. We
open a Pandora's box here. You have been warned.
But we need to fix this, if we want to do real work with po4a...maybe it's time to
abandon this "{PO4A-beg|end-*}" idea after all.
How to test
===========
Get the file
http://www.karakas-online.de/EN-Book/EN-Book.sgml
from
http://www.karakas-online.de/EN-Book/formats.html
As you can see, this is a perfect, valid, bug-free SGML document, whose rendering in HTML
you can admire in
http://www.karakas-online.de/EN-Book/
To reproduce the error, you must create an empty bibliography.sgml file:
touch bibliography.sgml
then run
po4a-gettextize -v --option debug=generic -f sgml -m EN-Book.sgml -M iso-8859-1 -p
EN-Book-en.po
You will see some other errors first:
- An error saying "CONTRIB not recognized". Go and enter contrib in the list of
docbook tags in SGML.pm:
$self->set_tags_kind("translate" => "abbrev acronym arg
artheader attribution ".
"contrib ".
"date ".
- An error saying "KEYWORD not recognized". Go and change "kerword" to
"keyword" in SGML.pm:
"imageobject important index indexterm
informaltable itemizedlist ".
"keyword keywordset ".
"legalnotice listitem lot "
After these two changes, SGML.pm will be able to continue processing up to the point where
it encounters the above situation.
----------------------------------------------------------------------
You can respond by visiting:
http://alioth.debian.org/tracker/?func=detail&atid=410622&aid=301...