Hello,
I should have released these patches a few time ago, but Alioth had troubles.
Sorry if this mail is a little bit long.
Some are still not "ready for production", and are provided to inform you
I'm working on those subjects (and also to grab some ideas).
I also had to work on the testsuite (the check script) and added a
stats.sh scripts for regression tests. Here are how this last script
formats regression tests statistics:
$ ./stats.sh orig work
IGN OK OK2 WOK1 WOK2 WOK3 PBS WDIFF
IGN 1698 0 0 0 0 0 0 0
OK 0 125 0 0 0 0 0 0
OK2 0 0 1564 4 2 0 0 0
WOK1 0 8 0 89 0 1 0 0
WOK2 0 0 0 0 208 0 0 1
WOK3 0 2 11 2 4 301 3 0
PBS 0 35 150 15 48 43 585 14
WDIFF 0 1 9 4 5 8 0 39
total: 4979 | 4979
(It takes two directories in argument, the two directories containing
results of the check script, i.e. the LISTE files. It creates a stats_work
directory.)
You can read this table like this:
11 man pages which were in the WOK3 category in orig are now in OK2.
Those pages can be found in stats_work/WOK3_OK2
The different categories are:
IGN man pages po4a refused to operate on (e.g. wad generated by
Pod::Man)
OK diff -uBb didn't see any difference
This can contain very rare misformatting
OK2 diff -uBb didn't see any difference after converting hyphens to
minus sign, `` to ", and '' to " in both man pages
This contains a little bit more misformatting, for example an man
page referring to an empty argument ('') should not display only ".
WOK1 wdiff doesn't see any difference after the same modifications
WOK2 This tries to detect changes in the hyphenation of words (but has
more false negative)
WOK3 This removes minus signs, and thus detects more changes in
hyphenation
PBS po4a preferred to stop processing the man page (non supported
macro, ...)
WDIFF These are probably bugs in po4a or in the man page (I started
reporting some of them in the BTS, which is another way of
improving po4a statistics)
In the table above, it is usually an improvement to have big numbers on
the bottom left corner (with the exception of the IGN column).
Here are the patches for the Man module:
+ comments
It recognize some (probably incorrect, but usual) comment lines.
Here are the results of the regression tests for this patch:
IGN OK OK2 WOK1 WOK2 WOK3 PBS WDIFF
IGN 1698 0 0 0 0 0 0 0
OK 0 125 0 0 0 0 0 0
OK2 0 0 1570 0 0 0 0 0
WOK1 0 0 0 98 0 0 0 0
WOK2 0 0 0 0 209 0 0 0
WOK3 0 0 0 0 0 323 0 0
PBS 0 0 3 0 1 1 885 0
WDIFF 0 0 1 1 0 1 0 63
+ nested_fonts
It deals with the nested font issue.
I have an idea on how to simplify it a lot, but I think it could be
applied, because it is doing a good job.
The only remaining issue is with "un-terminated" fonts, as in:
Hello, my name is \fINicolas \fBFRANÇOIS
IMHO, in groff, there is no nested font (with some exceptions, like
SB, and some italic and bold faces, or by using exotic tmac).
\fIfoo\fBbar\fR is equivalent to \fIfoo\fR\fBbar\fR (with the
exception of the \fP).
Here are the results of the regression tests for this patch:
IGN OK OK2 WOK1 WOK2 WOK3 PBS WDIFF
IGN 1698 0 0 0 0 0 0 0
OK 0 125 0 0 0 0 0 0
OK2 0 0 1562 0 0 0 8 0
WOK1 0 0 0 98 0 0 0 0
WOK2 0 0 0 0 209 0 0 0
WOK3 0 1 6 2 1 307 6 0
PBS 0 1 81 5 27 31 738 7
WDIFF 0 0 0 0 0 0 0 66
+ arg_next_line
It allows arguments to be provided on the next line for some macros
(.SH, .I, ..., .BR, ...)
It works fine, but would require some cleanup (lots of redundant
code).
It can be applied cleanly on CVS, but require the 'nested_fonts' patch
to operate cleanly.
Here are the results of the regression tests for this patch (with the
previous patch also applied):
IGN OK OK2 WOK1 WOK2 WOK3 PBS WDIFF
IGN 1698 0 0 0 0 0 0 0
OK 0 125 0 0 0 0 0 0
OK2 0 0 1567 1 0 0 0 2
WOK1 0 0 0 97 0 1 0 0
WOK2 0 0 0 0 209 0 0 0
WOK3 0 2 11 2 4 301 3 0
PBS 0 1 109 9 34 32 692 13
WDIFF 0 0 0 0 0 0 0 66
(here most of the new 'WDIFF' man pages are bug in the man page, and a
bug was filed in the BTS)
+ dot_lines
po4a generated some lines starting with a dot. In those cases, a \&
should be added to allow the line to be displayed. (for exemple:
.I ../file
is displayed in groff, but
\fI../fil/\fR won't be displayed
It also fix the same issue for lines starting by a "'"
IGN OK OK2 WOK1 WOK2 WOK3 PBS WDIFF
IGN 1698 0 0 0 0 0 0 0
OK 0 125 0 0 0 0 0 0
OK2 0 0 1570 0 0 0 0 0
WOK1 0 8 0 90 0 0 0 0
WOK2 0 0 0 0 209 0 0 0
WOK3 0 0 0 0 0 323 0 0
PBS 0 0 0 0 0 0 890 0
WDIFF 0 1 5 0 0 0 0 60
+ hyphen
I had a obligation to fix this because I said Martin that replacing
hyphens by minus signs were always allowed.
In fact, it should not be modified in
- .so/.mso arguments
- after a \s (font size modifier, e.g. \s-2)
I also added a comment on why I *hate* hyphens.
Here are the results of the regression tests for this patch:
IGN OK OK2 WOK1 WOK2 WOK3 PBS WDIFF
IGN 1698 0 0 0 0 0 0 0
OK 0 125 0 0 0 0 0 0
OK2 0 0 1570 0 0 0 0 0
WOK1 0 0 0 98 0 0 0 0
WOK2 0 0 0 0 209 0 0 0
WOK3 0 0 0 0 0 323 0 0
PBS 0 0 0 0 0 0 890 0
WDIFF 0 0 2 1 3 5 0 55
+ new_macros
Some new macros:
.R
.EX and .EE
.so and .mso
.cs
minimal support (when no argument is given) for:
.ce
.ul
.cu
Here are the results of the regression tests for this patch:
IGN OK OK2 WOK1 WOK2 WOK3 PBS WDIFF
IGN 1698 0 0 0 0 0 0 0
OK 0 125 0 0 0 0 0 0
OK2 0 0 1570 0 0 0 0 0
WOK1 0 0 0 98 0 0 0 0
WOK2 0 0 0 0 209 0 0 0
WOK3 0 0 0 0 0 323 0 0
PBS 0 27 11 0 0 1 843 8
WDIFF 0 0 0 0 0 0 0 66
+ escape
It tries to deal with the \c escape.
It still need some work.
+ others
some other minor points that I could isolate from my working directory
+ split_args
This fix an issue for the limits.conf man page.
It was also reported in #268904
It adds one string for the translation.
+ all
all the above patch, and more.
It also contains some comments that should be removed.
The results are presented in the first table.
Comments (and commits;) are welcome.
Thanks for those who read this mail to the end,
--
Nekral