Hello,
There is a bug reported on the Alioth tracker against the Sgml module.
I did not notice it before.
Was there a notification on po4a-devel(a)lists.alioth?
Otherwise, is there a way to get some notifications from the tracker?
Then regarding the bug report:
* I've already uploaded a simple fix for a typo reported in the bug
report.
* the SGML book uses a contrib and epigraph tag. Are those tags
standards? Can I add them to the translate category?
* for the main part of the bug report, I propose to escape '<', '>'
and
'&' to {PO4A-lt}, {PO4A-gt} and {PO4A-amp} before feeding nsgmls. And
changing them back to the original in the cdata type.
I also had some other issues with this PHP book:
* around line 795, PO4A-beg/end are changed back to there SGML
counterparts only if they appear at the beginning of a line.
Why only at the beginning?
This cause some PO4A-beg/end to be kept in the output document.
* also, the content of the cdata is pushed, but the buffer is not
flushed, so it can be pushed too early.
In my patch, I appended the content of the cdata to $buffer.
Should the content of cdata be verbatim? shouldn't it be translated?
* also, I don't really understand what is done with the leading spaces
and the added trailing '\n', but this is probably not an issue.
* around line 535, & is changed to {PO4A-amp} if it is not the beginning
of an entity.
This uses:
while ($origfile =~ /^(.*?)&([^;\s]*);(.*)$s/) {
...
}
this regex is too permissive. This cause the following line:
]]><![CDATA[&d_op=viewdownload&cid=79\">Web Installer...
being changed in:
]]><![CDATA[_op=viewdownload=79\">Web Installer...
I found the following grammar (for XML):
http://www.w3.org/TR/REC-xml/#NT-Name
It's probably too complicated (the Letter or Digit rules use a lot of
Unicode chars). So I propose to only allow ASCII chars (with a non
greedy match):
while ($origfile =~ /^(.*?)&([A-Za-z_:][-_:.A-Za-z0-9]*?);(.*)$s/) {
...
}
* my last point: can anybody have a look at the sgmldiff between
EN-Book.sgml and po4a-normalize.output?
I'm highly incompetent regarding SGML and I based my analysis on po4a and
sgmldiff outputs. So please stop me if any of the above statement is
wrong.
Attached is the patch I plan to commit this week.
TIA,
--
Nekral