[Po4a-devel]Administrivia

po4a 0.19 tomorrow?

[Po4a-devel]#278365: Found the...

Yves Rutschle

Wednesday, 3 November 2004 Wed, 3 Nov '04

7:18 p.m.

Hello po4a-ers, My better half has talked me into translating and maintaining our Web site into French; after some searching around, I found po4a was probably the best tool, and almost does all I need (or so I like to think now). So, expect some patches to the HTML module, which is curiously "almost finished", but also "useless as is". :) Meanwhile, I've got a couple of remarks on the administrative side: - There is no mention of where the tools are being developed. Surely it would make sense to add a link to the page on alioth somewhere in a README that would install with the package. - As it is, before finding alioth, I found savannah. The savannah projet is obviously obsolete and unused, but there is no way to know that. It wouldn't be bad if there was no reference to it from google, but there is. Maybe someone who still has access to Savannah could add a mention that that site isn't used anymore? - What does PO stand for? That's all folks, Y.

Show replies by date

Denis Barbier

Thursday, 4 November Thu, 4 Nov

5:38 p.m.

On Thu, Nov 04, 2004 at 01:18:26AM +0000, Yves Rutschle wrote:

...

Hi Yves, It looks like we are at the moment very busy with other projects, so your help is definitely welcome ;)

...

Meanwhile, I've got a couple of remarks on the administrative side: - There is no mention of where the tools are being developed. Surely it would make sense to add a link to the page on alioth somewhere in a README that would install with the package. - As it is, before finding alioth, I found savannah. The savannah projet is obviously obsolete and unused, but there is no way to know that. It wouldn't be bad if there was no reference to it from google, but there is. Maybe someone who still has access to Savannah could add a mention that that site isn't used anymore?

You are fully right, will try to fix that.

...

- What does PO stand for?

$ info gettext Files Files Conveying Translations ============================ The letters PO in `.po' files means Portable Object, to distinguish it from `.mo' files, where MO stands for Machine Object. Denis

Martin Quinson

Saturday, 6 November Sat, 6 Nov

5:37 p.m.

On Thu, Nov 04, 2004 at 01:18:26AM +0000, Yves Rutschle wrote:

...

Hello po4a-ers,

Hello,

...

My better half has talked me into translating and maintaining our Web site into French; after some searching around, I found po4a was probably the best tool, and almost does all I need (or so I like to think now). So, expect some patches to the HTML module, which is curiously "almost finished", but also "useless as is". :)

The story of the HTML module is that it is the very first module which were written by someone else than me. At one point, Laurent wrote in the documentation that it was almost done. But later, when I reviewed the code I discovered that the way it did split the sentences makes it very hard to use. "I <b>like</b> it" is splitted in 3 msgid which have to be translated separately ("I" ; "like" and "it"). See http://po4a.alioth.debian.org/en/po4a.7.html#Why_not_to_split_on_ and http://graal.ens-lyon.fr/~mquinson/l10n.html#l2.2 for my point on this. I should have explained my point to Laurent before... That's why I added the comment about the useless state and decided not to distribute it. Nowadays, this module should probably be reimplemented using Jordi's great work on XML-like formats.

...

done by Denis, so no comment (beside "thanks").

...

- As it is, before finding alioth, I found savannah. The savannah projet is obviously obsolete and unused, but there is no way to know that. It wouldn't be bad if there was no reference to it from google, but there is. Maybe someone who still has access to Savannah could add a mention that that site isn't used anymore?

I guess you speak of: https://savannah.nongnu.org/projects/po4a The only description of the package is: This project moved to http://alioth.debian.org/projects/po4a/ I agree that we should turn the project completely off. I should contact savannah's admin about this. (I used a bug on savannah to remove all developpers of the project, which thus cannot be modified anymore without admin intervention. What a lamer).

...

- What does PO stand for?

The letters PO in `.po' files means Portable Object, to distinguish it from `.mo' files, where MO stands for Machine Object. (info gettext) consider this as a [meaningless] legacy of the eighties or even before. the T of pot stands for template. Have fun around, Mt.

Yves Rutschle

6:04 p.m.

On Sun, Nov 07, 2004 at 12:37:57AM +0100, Martin Quinson wrote: [snip on HTML which I'll discuss separately]

...

https://savannah.nongnu.org/projects/po4a The only description of the package is: This project moved to http://alioth.debian.org/projects/po4a/ I agree that we should turn the project completely off. I should contact savannah's admin about this. (I used a bug on savannah to remove all developpers of the project, which thus cannot be modified anymore without admin intervention. What a lamer).

Yes. What a lamer indeed: I googled my way directly to the CVS page, found the navigation confusing, dismissed it and just checked out the project. It is just now that I realise there is a tiny 'main' link at the top, which goes to the page you speak of. Bah. Lucky the release tags made it obvious it was outdated. (<mumble>still think the navigation of savannah is confusing</mumble>) Y.

Jordi Vilalta

6:05 p.m.

On Sun, 7 Nov 2004, Martin Quinson wrote:

...

> - As it is, before finding alioth, I found savannah. The > savannah projet is obviously obsolete and unused, but > there is no way to know that. It wouldn't be bad if there > was no reference to it from google, but there is. Maybe > someone who still has access to Savannah could add a > mention that that site isn't used anymore? I guess you speak of: https://savannah.nongnu.org/projects/po4a The only description of the package is: This project moved to http://alioth.debian.org/projects/po4a/

The first result on Google points to http://www.nongnu.org/po4a/, which seems to be the same as the "Homepage" link at the top of the savannah project (which points to http://www.freesoftware.fsf.org/po4a but has the same content). There's some old po4a documentation. There should also be a link to the alioth project. Regards, Jordi Vilalta

Martin Quinson

Sunday, 7 November Sun, 7 Nov

4:11 a.m.

On Sun, Nov 07, 2004 at 01:05:12AM +0100, Jordi Vilalta wrote:

...

On Sun, 7 Nov 2004, Martin Quinson wrote: >>- As it is, before finding alioth, I found savannah. The >> savannah projet is obviously obsolete and unused, but >> there is no way to know that. It wouldn't be bad if there >> was no reference to it from google, but there is. Maybe >> someone who still has access to Savannah could add a >> mention that that site isn't used anymore? > >I guess you speak of: > >https://savannah.nongnu.org/projects/po4a > >The only description of the package is: > >This project moved to http://alioth.debian.org/projects/po4a/ The first result on Google points to http://www.nongnu.org/po4a/, which seems to be the same as the "Homepage" link at the top of the savannah project (which points to http://www.freesoftware.fsf.org/po4a but has the same content). There's some old po4a documentation. There should also be a link to the alioth project.

There is no developper of the po4a project anymore (I removed everybody, including myself). I asked the savannah admin to kill the project for us, since they are the only ones who can help us now. https://savannah.gnu.org/support/index.php?func=detailitem&item_id=10... Bye, Mt.

Denis Barbier

Saturday, 6 November Sat, 6 Nov

6:08 p.m.

On Sun, Nov 07, 2004 at 12:37:57AM +0100, Martin Quinson wrote: [...]

...

> - There is no mention of where the tools are being > developed. Surely it would make sense to add a link to > the page on alioth somewhere in a README that would > install with the package. done by Denis, so no comment (beside "thanks").

No, Jordi was faster. PS: Martin, I am playing with quilt for a couple of weeks, it is really great! Denis

Martin Quinson

Sunday, 7 November Sun, 7 Nov

3:56 a.m.

New subject: HS (was: [Po4a-devel]Administrivia)

On Sun, Nov 07, 2004 at 01:08:43AM +0100, Denis Barbier wrote:

...

On Sun, Nov 07, 2004 at 12:37:57AM +0100, Martin Quinson wrote: [...] > > - There is no mention of where the tools are being > > developed. Surely it would make sense to add a link to > > the page on alioth somewhere in a README that would > > install with the package. > > done by Denis, so no comment (beside "thanks"). No, Jordi was faster.

ah, ups. Sorry Jordi. You keep saying you have no time and do more and more...

...

PS: Martin, I am playing with quilt for a couple of weeks, it is really great!

Yeah, indeed. I love it also. The fact is that I also lack time on this topic, and that I have a rather problematic bug against quilt. I suspect that upstream expects the user to set a environment variable. Which release critical... If you have some time to have a look at that, too... Mt.

Yves Rutschle

Saturday, 6 November Sat, 6 Nov

6:41 p.m.

New subject: [Po4a-devel]HTML translating [Was: Administrivia]

On Sun, Nov 07, 2004 at 12:37:57AM +0100, Martin Quinson wrote:

...

But later, when I reviewed the code I discovered that the way it did split the sentences makes it very hard to use. "I <b>like</b> it" is splitted in 3 msgid which have to be translated separately ("I" ; "like" and "it").

That's not actually the reason I found it useless as is: the current CVS version slices paragraphs randomly (well, on 512 bytes boundaries or something like that), which means that irrelevant changes in formatting anywhere in the file fuzzify the entire file.

...

See http://po4a.alioth.debian.org/en/po4a.7.html#Why_not_to_split_on_ and http://graal.ens-lyon.fr/~mquinson/l10n.html#l2.2 for my point on this. I should have explained my point to Laurent before...

Yes, I have actually run into a couple of those problems myself. While the splitting in 3 as in your above example is, indeed, a bit confusing, I don't find that makes it useless, and more to the point, I just don't think there is any other good solution: the bottom line is, you want to specify that something in that sentence is important, which will need to be in a different msgid. The solution, I find, is to have the translator understand the structure of the original text so she'll know to translate: "It's a <b>blue</b> car" => ("It's a", "blue", "car" ) into: "C'est une voiture <b>bleue</b>" => ( "C'est une voiture", "bleue", " ") I guess an alternative would be to have a list of "small formatting tags" ( bold, italics etc) that do not actually split at all, and appear in the msgid with the onus on the translator to know enough HTML to know what to do with them (so you'd have something like: msgid "It's a <b>blue</b> car" msgstr "C'est une voiture <b>bleue</b>" That would have the advantage of providing the translator with context information. In fact that goes a long way towards your point of splitting at paragraph level :-) That's actually fairly easily achievable: the list of paragraph-marking tags is fairly small (<p>, <div>, <h1,2,3,4,...>) and XHTML makes it mandatory for text to be included in a block-level element of some sort.

...

Nowadays, this module should probably be reimplemented using Jordi's great work on XML-like formats.

I know next to nothing about XML; last time I saw some, I thought it looked quite different from HTML. A quick read of Jordi's module makes me think it's mostly an XML parser: Html.pm relies on Gisle Aas' HTML parser, and it doesn't seem to be very beneficial to change parsers just for fun; besides, Gisle's parser is supposed to be quite good at handling broken HTML, which I doubt XML is very good at (then again, helping bad HTML spread probably isn't good :) Y.

Martin Quinson

Monday, 8 November Mon, 8 Nov

7:22 a.m.

New subject: [Po4a-devel]HTML translating [Was: Administrivia]

On Sun, Nov 07, 2004 at 12:41:32AM +0000, Yves Rutschle wrote:

...

On Sun, Nov 07, 2004 at 12:37:57AM +0100, Martin Quinson wrote: > But later, when I reviewed the code I discovered that the way it did split > the sentences makes it very hard to use. "I <b>like</b> it" is splitted in 3 > msgid which have to be translated separately ("I" ; "like" and "it"). That's not actually the reason I found it useless as is: the current CVS version slices paragraphs randomly (well, on 512 bytes boundaries or something like that), which means that irrelevant changes in formatting anywhere in the file fuzzify the entire file.

Ouch. Even worse than expected.

...

> See http://po4a.alioth.debian.org/en/po4a.7.html#Why_not_to_split_on_ and > http://graal.ens-lyon.fr/~mquinson/l10n.html#l2.2 for my point on this. I > should have explained my point to Laurent before... Yes, I have actually run into a couple of those problems myself. While the splitting in 3 as in your above example is, indeed, a bit confusing, I don't find that makes it useless, and more to the point, I just don't think there is any other good solution: the bottom line is, you want to specify that something in that sentence is important, which will need to be in a different msgid.

Ok. I wanted to reply this message the way it desserve (with a long argumentation to base my point), but I lack the time to do so. I'll be short. Check the URL given above for more details.

...

The solution, I find, is to have the translator understand the structure of the original text so she'll know to translate: "It's a <b>blue</b> car" => ("It's a", "blue", "car" ) into: "C'est une voiture <b>bleue</b>" => ( "C'est une voiture", "bleue", " ")

And now, add this english sentence to your system: "it's a <b>blue</b> horse" You then have the following translations (one per line) it's a -> "c'est un" or "c'est une" depending on the context since horse is masculin in french blue -> "bleu" or "bleue" (same issue) car -> voiture horse -> cheval. And now, add "it's a <b>small</b> car". This time, the issue is that in french, the adjective is placed before the noun where the translation of "blue" is placed afterward. How you'll implement this different translations depending on the context and the reordering of sentence elements? My point is that splitting sentences is *never* a viable solution. If you think that such issues are seldom and dealable with, type man Locale::Maketext::TPJ13 in a terminal ;)

...

I guess an alternative would be to have a list of "small formatting tags" ( bold, italics etc) that do not actually split at all, and appear in the msgid with the onus on the translator to know enough HTML to know what to do with them (so you'd have something like: msgid "It's a <b>blue</b> car" msgstr "C'est une voiture <b>bleue</b>" That would have the advantage of providing the translator with context information. In fact that goes a long way towards your point of splitting at paragraph level :-) That's actually fairly easily achievable: the list of paragraph-marking tags is fairly small (<p>, <div>, <h1,2,3,4,...>) and XHTML makes it mandatory for text to be included in a block-level element of some sort.

That's exactly my point, indeed. You should split the translation on a paragraph boundary because if you take bigger chunks, gettext and po editors get clumsy. If you take smaller chunks, you run into endless issues about context changing the meaning of the chunk. You thus have to show some formating tags to the translators. We do so in all other modules. I don't see any better idea.

...

> Nowadays, this module should probably be reimplemented using Jordi's great > work on XML-like formats. I know next to nothing about XML; last time I saw some, I thought it looked quite different from HTML. A quick read of Jordi's module makes me think it's mostly an XML parser: Html.pm relies on Gisle Aas' HTML parser, and it doesn't seem to be very beneficial to change parsers just for fun;

It's not for fun, it's because the XML module do work and is done to allow the rapid developpement of other modules (no new code needed) whereas the existing HTML module does not work. Moreover, I'd be pleased to cut a dependency. I hate unjustified dependencies, but it may be personal.

...

besides, Gisle's parser is supposed to be quite good at handling broken HTML, which I doubt XML is very good at (then again, helping bad HTML spread probably isn't good :)

That's a good argument to stick to this parser, then. Thanks for your interest for po4a, Mt.

Yves Rutschle

Wednesday, 10 November Wed, 10 Nov

11:36 a.m.

New subject: [Po4a-devel]HTML translating

On Mon, Nov 08, 2004 at 02:22:42PM +0100, Martin Quinson wrote:

...

Ok. I wanted to reply this message the way it desserve (with a long argumentation to base my point)

Thank you for sharing your experience; I'm getting convinced now.

...

If you think that such issues are seldom and dealable with, type man Locale::Maketext::TPJ13 in a terminal ;)

I read that article a long time ago, printed in a real paper version of TPJ... I think that's actually the most single interesting article I read in all TPJs :) [splitting in HTML blocs]

...

> That's actually fairly easily achievable: the list of > paragraph-marking tags is fairly small (<p>, <div>, > <h1,2,3,4,...>) and XHTML makes it mandatory for text to be > included in a block-level element of some sort. You thus have to show some formating tags to the translators. We do so in all other modules. I don't see any better idea.

Ok. Well, I'm afraid that means I'm gonna have to ditch the current Html.pm and redo one from scratch (bar a couple of routines that may be recued). So, we'll now be cutting the html along blocks and display formatting tags inline (at first sight, it looks like cutting along tags that have a 'display: block' property, while keeping those that have a 'display: inline' property). While thinking about it there is at least one thing I'd like feedback on: I'd personally rather not expose "complicated" tags to the translator, i.e. while I think it's acceptable to present them with <b> and <i> and so on, I don't think something like: This is a <a href="blahblah.com/this/that/blah.html">link</a> to <img src="blahblah.com/this/that/blah.png" alt="blah" title="Blah"> belongs in a PO. So I'd propose to collapse the inside of long inline tags, so as to simply state there is a tag (e.g. "you're in a link") without detailing what the tag contains. Thus, the example line would appear, in the PO, as: This is a <a>link</a> to <img>blah</img> (Meanwhile we also output the title field of the img as a separate msgid; the alt field is a replacement for the image for text browsers, and therefore belongs interpolated in the rest of the text). One argument to expose the full tag would be that it allows the translator to update links (change a link to blah.html into a link to blah.fr.html for example), allowing the complete translation of a Web site. I'm not fond of the idea though, as: - The tranlator doesn't necessarily know how the translation would be implemented - The burden of maintaining the Web site should not be on the translator - A small script can easily take care of that (I'll be happy to provide what I've written later on, but I'm not sure it belongs in po4a). Any comments on this? [HTML::Parser vs Jordi's XML parser]

...

Moreover, I'd be pleased to cut a dependency. I hate unjustified dependencies, but it may be personal.

Me too, but I hate reimplementation of code (reinventing the wheel) more. Besides HTML::Parser is also quite widely spread, and only one apt-get away at worse (or emerge, or whatever -- if it's not one command away, you need a better distribution :) ) Y.

Martin Quinson

3:41 p.m.

New subject: [Po4a-devel]HTML translating

On Wed, Nov 10, 2004 at 05:36:29PM +0000, Yves Rutschle wrote:

...

On Mon, Nov 08, 2004 at 02:22:42PM +0100, Martin Quinson wrote: > Ok. I wanted to reply this message the way it desserve (with a long > argumentation to base my point) Thank you for sharing your experience; I'm getting convinced now.

Thansk for your patience. I'll have to be even shorter tonight...

...

[splitting in HTML blocs] > > That's actually fairly easily achievable: the list of > > paragraph-marking tags is fairly small (<p>, <div>, > > <h1,2,3,4,...>) and XHTML makes it mandatory for text to be > > included in a block-level element of some sort. > > You thus have to show some formating tags to the translators. We do so in > all other modules. I don't see any better idea. Ok. Well, I'm afraid that means I'm gonna have to ditch the current Html.pm and redo one from scratch (bar a couple of routines that may be recued).

I see three solutions to implement a Html module: - pretend html is a xml dialect (xhtml is), and use Jordi's parser. It should be about 20 lines long. See the Guide module for an example. - pretend html is a sgml dialect, and use the sgml module for that. It will work if all html pages begin with a prolog stating the dtd. It should be the case, isn't it ? Then you have to list all tags in the relevant lists around line 400 of Sgml.pm. Just add a " } elsif ($prolog =~ /html/i) {" block, and do the same than for other DTDs. - recognize html is uniq. You have to implement a whole new module in that case. You may well want to check how we did it for the sgml and xml modules. The best may be to translate a file with both of them, or so.

...

This is a <a href="blahblah.com/this/that/blah.html">link</a> to <img src="blahblah.com/this/that/blah.png" alt="blah" title="Blah"> [doesn't] belongs to a PO. So I'd propose to collapse the inside of long inline tags, so as to simply state there is a tag (e.g. "you're in a link") without detailing what the tag contains. Thus, the example line would appear, in the PO, as: This is a <a>link</a> to <img>blah</img>

I'm not fond of this because if the translator wants/have to reordonate the links, you'll have trouble. Check the gettext info file, in the section explaining what "%2$s" is good for. It's not impossible, but you have to deal with it.

...

[HTML::Parser vs Jordi's XML parser] > Moreover, I'd be pleased to cut a dependency. I hate unjustified > dependencies, but it may be personal. Me too, but I hate reimplementation of code (reinventing the wheel) more.

Then, that's an argument of pretending that html is xml or sgml and not reimplement any specif po4a module :) Ok, I'm sorry, this mail really should be longer, but I'm out of time, man. Mt.

7576

days inactive

7582

days old

devel@lists.po4a.org

Manage subscription

11 comments

4 participants

tags (0)

participants (4)

Denis Barbier
Jordi Vilalta
Martin Quinson
Yves Rutschle

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[Po4a-devel]Administrivia