Sorry for the delay.
On Wed, Sep 22, 2004 at 11:19:30PM +0200, Nicolas François wrote:
Hello,
I should have released these patches a few time ago, but Alioth had troubles.
Yeah, you really should. Now, you'll have to create an account on alioth so
that we can give you the cvs write access, so that you can commit them
yourself...
Sorry if this mail is a little bit long.
You must be kidding... Thanks for your work on po4a. And my answer is even
longer :)
Some are still not "ready for production", and are provided
to inform you
I'm working on those subjects (and also to grab some ideas).
ok, here we go.
formats regression tests statistics:
$ ./stats.sh orig work
IGN OK OK2 WOK1 WOK2 WOK3 PBS WDIFF
IGN 1698 0 0 0 0 0 0 0
OK 0 125 0 0 0 0 0 0
OK2 0 0 1564 4 2 0 0 0
WOK1 0 8 0 89 0 1 0 0
WOK2 0 0 0 0 208 0 0 1
WOK3 0 2 11 2 4 301 3 0
PBS 0 35 150 15 48 43 585 14
WDIFF 0 1 9 4 5 8 0 39
total: 4979 | 4979
Please add "was \ is" (or the contrary ;) in the upper left cell of the tab
to help my understanding.
The different categories are:
IGN man pages po4a refused to operate on (e.g. wad generated by
Pod::Man)
OK diff -uBb didn't see any difference
This can contain very rare misformatting
OK2 diff -uBb didn't see any difference after converting hyphens to
minus sign, `` to ", and '' to " in both man pages
This contains a little bit more misformatting, for example an man
page referring to an empty argument ('') should not display only ".
WOK1 wdiff doesn't see any difference after the same modifications
WOK2 This tries to detect changes in the hyphenation of words (but has
more false negative)
WOK3 This removes minus signs, and thus detects more changes in
hyphenation
PBS po4a preferred to stop processing the man page (non supported
macro, ...)
WDIFF These are probably bugs in po4a or in the man page (I started
reporting some of them in the BTS, which is another way of
improving po4a statistics)
In the table above, it is usually an improvement to have big numbers on
the bottom left corner (with the exception of the IGN column).
Please put this as comment in the script (at least, if not by default under
the table).
Where is the WOK category (diff sees a difference, but not wdiff even before
edition) ?
Here are the patches for the Man module:
+ comments
It recognize some (probably incorrect, but usual) comment lines.
Here are the results of the regression tests for this patch:
Question: the info groff part you cite in comment seems to imply that
.B toto \" a comment
is a valid construct, but this is not handled by the code. You should at
least add a comment stating so, shouldn't you?
That being said, if this feature is not used by the existing pages, don't
bother implementing (=> let's commit).
+ nested_fonts
It deals with the nested font issue.
I have an idea on how to simplify it a lot, but I think it could be
applied, because it is doing a good job.
The only remaining issue is with "un-terminated" fonts, as in:
Hello, my name is \fINicolas \fBFRANÇOIS
IMHO, in groff, there is no nested font (with some exceptions, like
SB, and some italic and bold faces, or by using exotic tmac).
\fIfoo\fBbar\fR is equivalent to \fIfoo\fR\fBbar\fR (with the
exception of the \fP).
Mmm. That's a very tought problem. As you saw in the code, I already tried
to tackle it, in vain so far. I must admit that the regression result you
show are rather impressive.
But I do not like this patch anyway, for several reasons:
- no print "NEKRAL: bla" should remain. Change them to debugging statements
or regular (translated) warnings, or whatever. Add a new debugging
category if needed. Same for "# print STDERR bla".
- you are kind of cheating by not die()ing when you said in the comments
that you should... It removes a lot of PBS :)
If the pages are ok afterward, then remove the fixme asking to die. If
not, die. Undetected errors are worse for the user than unprocessed
pages. He can always change the structure of the page to make po4a happy.
- you say that a proper implementation of \fP needs to keep track of the
font used at the end of the last paragraph. Are we in bold after
\fB\fI\fB\fP\fP ? (in which case, we need to keep track of the whole
stack)
- this is really complex, I'm concerned about maintaining the beast when
you leave. Ie, you shouldn't have said you have a simpler solution in
mind, now I want to see it ;)
- there is one stylistic rule I like in Perl: if you use a given variable
only once, remove it, you don't need it. @tmp_array1 falls in that
category. Likewise, I'd prefer to write (! $old eq "") as ($old ne
"")
[please don't get offended by those stylistic remarks, this code is
incredibly hairly, I'd like to keep it under control]
- I'm not sure I understand the "two concecutive \fR are not needed" part
What about the following:
while ($str =~ /^(.*)\\fR # Get the beginning, until a \fR
# Now, get the middle
((?: # [this parenthesis clusters, don't capture]
[^\\] # the middle is either not '\'
| # or
\\(?!f) # '\', but not followed by 'f' (look ahead)
)*)
\\fR # end sentinel
(.*)$/sx) {# use x modifier to add documentation and space
$str = "$1\\fR$2$3";
}
Ok, I admit, this is not tested at all, and I just discovered the
"negative look ahead" black magic in man perlre, I may well be wrong.
I'm
sure you'll correct me if it's the case, since you use this magic
elsewhere.
We should use the x modifier more often to document the REs.
All that to say that I'd prefer to keep this one out of the CVS for now.
+ arg_next_line
It allows arguments to be provided on the next line for some macros
(.SH, .I, ..., .BR, ...)
It works fine, but would require some cleanup (lots of redundant
code).
Erm. You'll say I'm picky, but I'd prefer you to do this cleanup before
commit :)
It can be applied cleanly on CVS, but require the
'nested_fonts' patch
to operate cleanly.
Well, you really should send us more mails to avoid having to deal with
interdepending patches. Those beast are sooo nasty to deal with, I'm really
sorry for you.
(here most of the new 'WDIFF' man pages are bug in the
man page, and a
bug was filed in the BTS)
Hey, cool.
+ dot_lines
po4a generated some lines starting with a dot. In those cases, a \&
should be added to allow the line to be displayed. (for exemple:
.I ../file
is displayed in groff, but
\fI../fil/\fR won't be displayed
It also fix the same issue for lines starting by a "'"
Arg, I'm so bad! I thought I got this one years ago. Thanks for chasing it.
Why don't you change it in paragraphs after the wrap? You'd get 'em all, and
the code for that would be in only one location.
For example, you may want to merge your change to the
$str =~ s/\n([.'"])/ $1/mg; #'
just before. What about:
$str =~ s/^ # at the begin of line
(?:\\f.)? # possibly followed by a font modifier
([.'"]) # avoid '.' and '''
/\\&$1$2 # add \& in front to fix it
/mgx;
Same story, not tested at all.
+ hyphen
I had a obligation to fix this because I said Martin that replacing
hyphens by minus signs were always allowed.
In fact, it should not be modified in
- .so/.mso arguments
- after a \s (font size modifier, e.g. \s-2)
I also added a comment on why I *hate* hyphens.
I like this one ;)
Change all items to an itemize in the comment, and, if you don't mind,
remove your name. That's just a suggestion. No need to overload the code
with who hate hyphen the most. I do to. But, by the way, add your name to
the authors of this module, and of po4a.
Please document the RE. I'm sure I'll forget that ?<! means negative look
behind until I'll need to read it ;)
s/they most of the time break/they break most of the time/
+ new_macros
Some new macros:
.R
.EX and .EE
.so and .mso
.cs
minimal support (when no argument is given) for:
.ce
.ul
.cu
This one is not in the tarbal. :-/
+ escape
It tries to deal with the \c escape.
It still need some work.
Kill the die I did put if it's not needed anymore.
You could override the pushline function, if you understand Perl objects. I
don't.
What is the effect of \c, again?
+ others
some other minor points that I could isolate from my working directory
What is the change for cdrdao ? It deals with non escaped leading spaces, am
I right? Does it still deal with not leading spaces? I don't think so. you
may want to kill those ^
Ok for the SH macro.
For the space at the begining of paragraph, shouldn't it be handled along
with ' or . at the beginning of line?
+ split_args
This fix an issue for the limits.conf man page.
It was also reported in #268904
It adds one string for the translation.
Ok, I tried to understand your bug report three times, in vain. It's not now
at 1 am that I will understand the patch. I belive you on this one.
+ all
all the above patch, and more.
It also contains some comments that should be removed.
The results are presented in the first table.
Nope, forget about this one ;)
Comments (and commits;) are welcome.
I will commit some of them (I'm now offline), but I'd really prefer you to
get the write access. It would help you concentrating on the bugs instead of
playing with interdependent patches.
Did you use quilt, at least? ;)
Thanks for those who read this mail to the end,
Many many thanks to you.
Mt.
PS: I forgot whether you're on the list, thus the cc.