On Fri, Sep 24, 2004 at 01:28:23AM +0200, Martin Quinson wrote:
On Wed, Sep 22, 2004 at 11:19:30PM +0200, Nicolas François wrote:
> I should have released these patches a few time ago, but Alioth had troubles.
Yeah, you really should. Now, you'll have to create an account on alioth so
that we can give you the cvs write access, so that you can commit them
yourself...
Alioth is up and running again. Here is my account: nekral-guest.
> formats regression tests statistics:
> $ ./stats.sh orig work
> IGN OK OK2 WOK1 WOK2 WOK3 PBS WDIFF
[...]
> IGN 1698 0 0 0 0 0 0 0
Please add "was \ is" (or the contrary ;) in the upper left cell of the tab
to help my understanding.
Done. (In fact dir1\dir2)
> The different categories are:
> IGN man pages po4a refused to operate on (e.g. wad generated by
> Pod::Man)
[...]
> WDIFF These are probably bugs in po4a or in the man page (I
started
> reporting some of them in the BTS, which is another way of
> improving po4a statistics)
>
> In the table above, it is usually an improvement to have big numbers on
> the bottom left corner (with the exception of the IGN column).
Please put this as comment in the script (at least, if not by default under
the table).
Done. I added a Usage for both scripts.
The categories are documented in check, because stats.sh could be used
with any other regression test tool.
Where is the WOK category (diff sees a difference, but not wdiff even
before
edition) ?
I tried not to have too much categories, because they increase the
execution time.
I propose to add more categories latter, when there will be less WDIFF.
At this time, I'm not sure I'm reading WOK1 and WOK2 as very different.
> Here are the patches for the Man module:
>
> + comments
> It recognize some (probably incorrect, but usual) comment lines.
> Here are the results of the regression tests for this patch:
Question: the info groff part you cite in comment seems to imply that
.B toto \" a comment
is a valid construct, but this is not handled by the code. You should at
least add a comment stating so, shouldn't you?
Yes, it is valid. (At least it is recognized as such by my groff).
My problem was: Ok, I recognized a comment inside a paragraph, what
should I do with this comment ?
Can I show it in the po (and how?)? Is there any interest in doing this?
Can I just trash the comments?
That being said, if this feature is not used by the existing pages,
don't
bother implementing (=> let's commit).
At this time, '.B toto \" a comment' will generate
'\fBtoto \" a comment\fR', which is buggy because the Bold font is
not terminated and \" may mask the end of the paragraph.
It is largely used for commenting .de, .if,... (but those macros are not
supported (and will probably never be).
I've tried a grep on my pages, but there was to much noise. However The
issue should have raise in the regression test, so I think it is not used
at this time.
Maybe a die could do the trick ("comments are not supported yet, please
contact ...")
> + nested_fonts
> It deals with the nested font issue.
> I have an idea on how to simplify it a lot, but I think it could be
> applied, because it is doing a good job.
> The only remaining issue is with "un-terminated" fonts, as in:
> Hello, my name is \fINicolas \fBFRANÇOIS
> IMHO, in groff, there is no nested font (with some exceptions, like
> SB, and some italic and bold faces, or by using exotic tmac).
> \fIfoo\fBbar\fR is equivalent to \fIfoo\fR\fBbar\fR (with the
> exception of the \fP).
Mmm. That's a very tought problem. As you saw in the code, I already tried
to tackle it, in vain so far. I must admit that the regression result you
show are rather impressive.
But I do not like this patch anyway, for several reasons:
- no print "NEKRAL: bla" should remain. Change them to debugging statements
or regular (translated) warnings, or whatever. Add a new debugging
category if needed. Same for "# print STDERR bla".
- you are kind of cheating by not die()ing when you said in the comments
that you should... It removes a lot of PBS :)
If the pages are ok afterward, then remove the fixme asking to die. If
not, die. Undetected errors are worse for the user than unprocessed
pages. He can always change the structure of the page to make po4a happy.
- you say that a proper implementation of \fP needs to keep track of the
font used at the end of the last paragraph. Are we in bold after
\fB\fI\fB\fP\fP ? (in which case, we need to keep track of the whole
stack)
Yes.
That's the reason why I replace \fP by the exact font (starting from the left).
In the patch a loop is in charge of replacing \fB\fI\fB\fP\fP by:
\fB\fI\fB\fI\fP
and then
\fB\fI\fB\fI\fB
And then I can clean up the "nested fonts"
I Think doing this can avoid keeping track of the whole stack
Another insane example:
.BI t1 t2 t3\fRt4\fIt5 t6\fPt7\fPt8 t9
(such groff requests were found)
- this is really complex, I'm concerned about maintaining the
beast when
you leave. Ie, you shouldn't have said you have a simpler solution in
mind, now I want to see it ;)
1) change all font modifiers (e.g. .B, .RI,...) to the corresponding \f
I'm thinking of doing this in shiftline (any objection ?), because I
need to handle these lines in parse, in the .TP, .SH, and maybe other
macro subroutines.
Then in pre_trans:
2) split the paragraph on \f
3) play with the first letters of each element of this array
(they can be [1-9],B,I,R,CW,P,s(-|+)[0-9])
- If the first letter is not recognized, then die (or give a \f to the
translator)
- If two consecutive elements start with the same font, merge them
A special care shall be taken for the first elements (maybe the two first,
because of \fP) and for the last (maybe the last two) elements.
I will probably do this with a global context that will keep the
current and previous font (which will be reset by some macros like
.SH).
I will do some experiments on this.
I think this will be better, cleaner, smaller.
- there is one stylistic rule I like in Perl: if you use a given
variable
only once, remove it, you don't need it. @tmp_array1 falls in that
category. Likewise, I'd prefer to write (! $old eq "") as ($old ne
"")
[please don't get offended by those stylistic remarks, this code is
incredibly hairly, I'd like to keep it under control]
I didn't knew ne ;) thanks!
How can I remove a variable (or maybe you mean I could have done it
without using variable with map)?
Using perldoc is still a little bit confusing for me. So please, no
hesitation on stylistic comments.
- I'm not sure I understand the "two concecutive \fR are
not needed" part
What about the following:
while ($str =~ /^(.*)\\fR # Get the beginning, until a \fR
# Now, get the middle
((?: # [this parenthesis clusters, don't capture]
[^\\] # the middle is either not '\'
| # or
\\(?!f) # '\', but not followed by 'f' (look ahead)
)*)
\\fR # end sentinel
(.*)$/sx) {# use x modifier to add documentation and space
$str = "$1\\fR$2$3";
}
Ok, I admit, this is not tested at all, and I just discovered the
"negative look ahead" black magic in man perlre, I may well be wrong.
I'm
sure you'll correct me if it's the case, since you use this magic
elsewhere.
Yes. This is equivalent. I've tested it (with some minor changes).
I will use it, but this will be removed once I will have another font
algorithm.
We should use the x modifier more often to document the REs.
Right!
I'm not sure I can still read my regexp now!
All that to say that I'd prefer to keep this one out of the CVS
for now.
> + arg_next_line
> It allows arguments to be provided on the next line for some macros
> (.SH, .I, ..., .BR, ...)
>
> It works fine, but would require some cleanup (lots of redundant
> code).
Erm. You'll say I'm picky, but I'd prefer you to do this cleanup before
commit :)
I never said the patches were ready for commit;)
> + dot_lines
> po4a generated some lines starting with a dot. In those cases, a \&
> should be added to allow the line to be displayed. (for exemple:
> .I ../file
> is displayed in groff, but
> \fI../fil/\fR won't be displayed
> It also fix the same issue for lines starting by a "'"
Arg, I'm so bad! I thought I got this one years ago. Thanks for chasing it.
Why don't you change it in paragraphs after the wrap? You'd get 'em all, and
the code for that would be in only one location.
For example, you may want to merge your change to the
$str =~ s/\n([.'"])/ $1/mg; #'
just before. What about:
$str =~ s/^ # at the begin of line
(?:\\f.)? # possibly followed by a font modifier
([.'"]) # avoid '.' and '''
/\\&$1$2 # add \& in front to fix it
/mgx;
You're right, doing it in post_trans is better.
However, there's currently a minor bug with .TP:
.TP
.BI foo bar
will proposes for translation the ".BR foo bar" string instead of
"B<foo>I<bar>".
This issue doesn't raise any difference in the regression test, but
escaping the leading dot add some pages to WDIFF.
So I only propose a degraded mode:
# # No . on first char, or nroff will think it's a macro
- $str =~ s/\n([.'"])/ $1/mg; #'
+ unless (defined $self->{type} && $self->{type} =~ m/^TP$/) {
+ # This doesn't work when the translated string start
+ # by a macro (which should be avoided).
+ # At this time, .TP lines followed by .B, .BI,... produce
+ # such strings.
+ # This will be fixed latter.
+ #
+ # Only "." are handled now.
+ # "'" will be handled latter (when it will also be handled in the
+ # parse subroutine.
+ $str =~ s/^( # at the beginning of a line
+ (?:[BI]<)? # possibly followed by a font modifier
+ [.]) # avoid "." and "'"
+ /\\&$1/mgx; # add \& in front to fix it
+ }
+ $str =~ s/\n([.'])/\n\\&$1/mg; # degraded mode for "'", and
"." in TP
+ # (doesn't handle the first line)
The unless, a part of the comments and the last regexp could be
removed latter. As ' could be added to [.] latter.
This seems to work fine. I will prepare a patch or commit it.
> + new_macros
> Some new macros:
> .R
> .EX and .EE
> .so and .mso
> .cs
> minimal support (when no argument is given) for:
> .ce
> .ul
> .cu
This one is not in the tarbal. :-/
It should be attached this time.
Do you think the .so/.mso part is OK ?
> + escape
> It tries to deal with the \c escape.
> It still need some work.
Kill the die I did put if it's not needed anymore.
You could override the pushline function, if you understand Perl objects. I
don't.
I've done some tests. Overriding pushline is OK. I will try to handle \c
there.
What is the effect of \c, again?
info groff --index-search "Line Control"
> + others
> some other minor points that I could isolate from my working directory
What is the change for cdrdao ? It deals with non escaped leading spaces, am
I right? Does it still deal with not leading spaces? I don't think so. you
may want to kill those ^
cdrdao uses:
.IP CATALOG\ "ddddddddddddd"
(Here, the quote have to be displayed)
When this passes through pushmacro, additional quotes shouldn't be added,
because
.IP "CATALOG\ "ddddddddddddd""
(this is understood by groff as an IP request with 2 arguments: 'CATALOG\ '
and 'ddddddddddddd""')
The regexp proposed in the patch would be shorter with a negative
look-behind regexp.
Ok for the SH macro.
Do you think it could be done in the $macro{'SH'} sub ?
For the space at the begining of paragraph, shouldn't it be
handled along
with ' or . at the beginning of line?
> + split_args
> This fix an issue for the limits.conf man page.
> It was also reported in #268904
> It adds one string for the translation.
Ok, I tried to understand your bug report three times, in vain. It's not now
at 1 am that I will understand the patch. I belive you on this one.
It is probably useful for only one man page. It may not
Did you use quilt, at least? ;)
No.
After two mails bounced by Alioth I had a look at the diff and thought it
was time to split it in different patches.
PS: I forgot whether you're on the list, thus the cc.
I'm on the list, but thanks for the cc (it permitted me to receive it
some days before).
--
Nekral