Devel November 2008

devel@lists.po4a.org

5 participants
5 discussions

by intrigeri

Hello, in the process of writing a translation plugin[1] for ikiwiki[2], using po4a, we wondered how safe it was to run po4a on untrusted content. Hence the following questions. (You might need to know, in order to provide an accurate answer, that we actually don't use /usr/bin/po4a* at all, but rather the Locale::Po4a Perl module.) Was po4a designed with "processing safely on untrusted content" as a goal? If not, do you consider it is now achieved as a side effect? About the external dependencies: - I could not find any command execution in Locale::Po4a, did I miss some? - The first glance makes me think that Locale::gettext is used only to display translated messages; can you please confirm this? - Amongst the dependencies (I could quickly list DynaLoader, Encode, Encode::Guess, Text::WrapI18N, Locale::gettext), is there one (or more) that you know to be unsafe to process untrusted content? - What about the msgmerge command, that po4a command-line programs use, as well as this ikiwiki plugin? Was the full code checked for symlink attacks when CVE-2007-4462 was fixed? Was po4a tested with a fuzzing program? Would you be interested in the results if I did this? [1] http://ikiwiki.info/plugins/contrib/po/ [2] http://ikiwiki.info Bye, -- intrigeri <intrigeri(a)boum.org>

16 years, 6 months

2
8
0 / 0

Doctor Listing

by Rodgers cometary

Currently in Practice: Doctors in America Coverage in many different areas of medicine such as Endocrinology, Pathology, Urology, Neurology, Plastic Surgery, Psychiatry, Cardiology and much more you can sort by many different fields like city, state or zip Normally priced at $495 now: $399 +++ GET THE 4 ITEMS BELOW AS A GIFT WHEN YOU ORDER +++ ++ Optometrists ++ Visiting Nurses & RN's ++ Massage Therapists ++ Acupuncturists please send replies to - : Boyle(a)statlists.com for this week ======================================= Send email to gone(a)statlists.com to ensure no further communication

16 years, 8 months

1
0
0 / 0

Several how-to questions on XML files translated with PO4A

by Raphaël Maville

Sorry, I am speaky, and I already thank you if you do reply or not! ------ For the writers and maintainers of some documentation files written un XML, there are some helpful carriage returns inside them to ease viewing the XML source in editors (both text and XML editors). And of course, po4a consider the messages as different! Example: # type: Content of: <chapter><sect1><sect2><para><guilabel> #: guide/C/ch_basics.xml:278 #, no-wrap msgid "" "Transaction\n" " Journal" msgstr "" # type: Content of: <chapter><sect1><sect2><itemizedlist><listitem><para><guilabel> #: guide/C/ch_basics.xml:1324 #, no-wrap msgid "Transaction Journal" msgstr "" In this case, I wanted these sentences to be considered as the same, and grouped like this: # type: Content of: <chapter><sect1><sect2><para><guilabel> #: guide/C/ch_basics.xml:278 # type: Content of: <chapter><sect1><sect2><itemizedlist><listitem><para><guilabel> #: guide/C/ch_basics.xml:1324 #, no-wrap msgid "Transaction Journal" msgstr "" Question: how to and is it possible to "auto-remove" these carriage return while creating the Pot and Po files, only with PO4A ? I mean, without modifying the XML original source before translation... ------ The XML files contain text tags into text tags, and the file is parse in several msgid and msgtr at each new "text" tag inside a "text" tag; (Long) Example: # type: Content of: <chapter><sect1><sect2><para> #: guide/C/ch_basics.xml:180 #, no-wrap msgid "An" msgstr "" # type: Content of: <chapter><sect1><sect2><para> #: guide/C/ch_basics.xml:180 guide/C/ch_basics.xml:305 #, no-wrap msgid "account" msgstr "" # type: Content of: <chapter><sect1><sect2><para> #: guide/C/ch_basics.xml:180 #, no-wrap msgid "" "is a place for keeping track of\n" " what you own, owe, spend or receive. Although you only have one main\n" " data file, that file will contain many accounts. You probably already\n" " think of money you own or owe as being in an account. For example, at\n" " some point you opened checking and savings accounts at a particular\n" " bank, and that bank sends you monthly statements showing how much money\n" " you" msgstr "" # type: Content of: <chapter><sect1><sect2><para><emphasis> #: guide/C/ch_basics.xml:186 #, no-wrap msgid "own" msgstr "" # type: Content of: <chapter><sect1><sect2><para> #: guide/C/ch_basics.xml:186 #, no-wrap msgid "" "in these accounts. Credit card accounts\n" " also send you statements showing what you" msgstr "" # type: Content of: <chapter><sect1><sect2><para><emphasis> #: guide/C/ch_basics.xml:187 guide/C/ch_basics.xml:189 #, no-wrap msgid "owe" msgstr "" # type: Content of: <chapter><sect1><sect2><para> #: guide/C/ch_basics.xml:187 #, no-wrap msgid "" "to a\n" " credit card company, and the mortgage company may send you periodic\n" " statements showing how much you still" msgstr "" # type: Content of: <chapter><sect1><sect2><para> #: guide/C/ch_basics.xml:189 #, no-wrap msgid "" "on your\n" " loan." msgstr """is a place for keeping track of\n" " what you own, owe, spend or receive. Although you only have one main\n" " data file, that file will contain many accounts. You probably already\n" " think of money you own or owe as being in an account. For example, at\n" " some point you opened checking and savings accounts at a particular\n" " bank, and that bank sends you monthly statements showing how much money\n" " you" msgstr "" # type: Content of: <chapter><sect1><sect2><para><emphasis> #: guide/C/ch_basics.xml:186 #, no-wrap msgid "own" msgstr "" # type: Content of: <chapter><sect1><sect2><para> #: guide/C/ch_basics.xml:186 #, no-wrap msgid "" "in these accounts. Credit card accounts\n" " also send you statements showing what you" msgstr "" # type: Content of: <chapter><sect1><sect2><para><emphasis> #: guide/C/ch_basics.xml:187 guide/C/ch_basics.xml:189 #, no-wrap msgid "owe" msgstr "" # type: Content of: <chapter><sect1><sect2><para> #: guide/C/ch_basics.xml:187 #, no-wrap msgid "" "to a\n" " credit card company, and the mortgage company may send you periodic\n" " statements showing how much you still" msgstr "" # type: Content of: <chapter><sect1><sect2><para> #: guide/C/ch_basics.xml:189 #, no-wrap msgid "" "on your\n" " loan." msgstr "" In this example, you had to read, without the tags given for precision: <para>An <emphasis>account</emphasis> is a place for keeping track of what you own, owe, spend or receive. Although you only have one main data file, that file will contain many accounts. You probably already think of money you own or owe as being in an account. For example, at some point you opened checking and savings accounts at a particular bank, and that bank sends you monthly statements showing how much money you <emphasis>own</emphasis> in these accounts. Credit card accounts also send you statements showing what you <emphasis>owe</emphasis> to a credit card company, and the mortgage company may send you periodic statements showing how much you still <emphasis>owe</emphasis> on your loan.</para> In the documentation from where comes the example, some tags contain some text or some others tags; the tag <para> can contain the following tags: <emphasis>, <quote>, <xref>, <guilabel>, <guibutton>, <guimenu>, <guimenuitem>, etc. The split of some sentences or paragraphs create several msgid/msgtr, with these effects: - some sentences are split but they are the same and they could be translated one time for all! - I translate to french, where often the ordre of the words is revert; for example, say "un chat noir" (a cat black) for "a black cat" and the split of the sentences and paragraphs get the translation hard! [for the little story, it was impossible to use gtranslator nor poedit to translate a documentation: once a msgid/msgtr is translated or marked fuzzy, faulty..., it is sorted some where else in the translation list or tree, and it is impossible to re-sort them based on the line numbers...and editors get the risk to break the XML tags...I use Kbabel instead, which is smarter with all that!] Questions: - Is it possible to keep the total paragraph <para> sentence in the same msgid/msgtr ? - If yes, how to do that ? from the command line configuration file please! - Will the tags got back after translation (emphasis or quoted words let like they are but translated, etc...) The documentation of po4a is clear for all, but it miss some examples to understand how to write these options on the configuration file (opt: ...); I mean, like those given in http://po4a.alioth.debian.org/man/man3pm/Locale::Po4a::Xml.3pm.php Locale::Po4a::XML with the wrap and nostrip, etc. commands. Are thse usable in a command line configuration file ? ------ Sometimes, some msgid/msgtr are grouped, but finally the translation is different, depending of the chapter, section, paragraph, context, sentence, etc. Questions: - Is it possible to split them into different msgid/msgtr ? - When is the best moment ? - Perhaps it is best to group them for a unique translation, and split these back to different msgid/msgtr when it is needed; I think this is probably not the problem of po4a but of the translation softwares. - In this case, will PO4A respect this choice while translating or updating ? ------ In fact, the text in the XML file is split "tag by tag", "paragraph by paragraph" (<para> by <para>), but each paragraph contains some phrases, some sentences, which are often the same in the documentation, or at least contain some keywords, or more precisely some "group-of-keywords", some "key-sentences" (a group of words which is always the same; here are some examples: "File -> New", "Transaction Journal"). It can happen that the same exact sentence comes several times in the documentation to translate; but the msgid/msgtr to edit are the full paragraph! This is not usefull at all! Questions: - Is it possible to create a keywords list with PO4A ? Or is it an external problem (kbabel) ? - How to say to PO4A to split a paragraph into sentences ? Based upon the period (full-stop, dot), colon, semi-colon, etc. ? But the sentences can contain some periods that are not the end of a sentence (e.g. in $4.5 or in U.N.). Sometimes, the periods are forgotten! It is also a problem of "correct writing" for the writer of the documentation: they should use the good punctuation, vocabulary, etc. - another way should be an automatic "routine" to create such "groups-of-keywords", beside or close to the PO-file creation; the computer should compares all the document sentences and sub-sentences to find out all the repeated words or groups of words... to do not repeat and repeat their translation! After the creation of the msgid/msgtr and inside PO4A or with an other soft ?

16 years, 8 months

2
1
0 / 0

Returned mail: Service unavailable

by Mail Delivery Subsystem

The original message was received at Mon, 17 Nov 2008 22:30:43 +0800 (CST) from msx-sms1-5.hinet.net [168.95.7.15] ----- The following addresses had permanent fatal errors ----- <plai(a)ms1.hinet.net> ----- Transcript of session follows ----- mail.local: /var/mail/p/plai: Disc quota exceeded 554 <plai(a)ms1.hinet.net>... Service unavailable ----- Original message follows ----- Return-Path: <po4a-devel(a)lists.alioth.debian.org> Received: from msx-sms1-5.hinet.net (msx-sms1-5.hinet.net [168.95.7.15]) by ms1.hinet.net (8.8.8/8.8.8) with ESMTP id WAA11906 for <plai(a)ms1.hinet.net>; Mon, 17 Nov 2008 22:30:43 +0800 (CST) Received: from msx-sg4-1.hinet.net (msx-sg4-1.hinet.net [168.95.5.150]) by msx-sms1-5.hinet.net (8.12.11/8.12.11) with ESMTP id mAHEUgcO004162 for <plai(a)ms1.hinet.net>; Mon, 17 Nov 2008 22:30:43 +0800 (CST) Received: from toshiba (smtp.ozturk-manisa.com.tr [88.250.220.100] (may be forged)) by msx-sg4-1.hinet.net (8.8.8/8.8.8) with SMTP id WAA19775 for <plai(a)ms1.hinet.net>; Mon, 17 Nov 2008 22:30:40 +0800 (CST) Date: Mon, 17 Nov 2008 22:30:40 +0800 (CST) Message-Id: <200811171430.WAA19775(a)msx-sg4-1.hinet.net> X-Originating-IP: [43.327.99.907] X-Originating-Email: [plai(a)ms1.hinet.net] X-Sender: plai(a)ms1.hinet.net To: <plai(a)ms1.hinet.net> Subject: [X-Spam]RE:ci.Doctor Ford From: <plai(a)ms1.hinet.net> MIME-Version: 1.0 Importance: High Content-Type: text/html X-Brightmail-Tracker: AAAAAwXxyPoL8MJKDKR7CA== X-HiNet-Brightmail: Spam <html><body> <center> <table width="600" cellpadding="0" cellspacing="0" border="0" align="center"> <tr> <td width="100%" align="center"> <p><font face="verdana" size="1" color="#444444">If you are unable to see the images in this email, please <a href="http://ambaritsa.com/eng/texts/offer/">click here.</a> </font></p> <p><font face="verdana" size="1" color="#444444"><br> </font> <strong>ORDER NOW WHILE QUANTITIES LAST! </strong></p></td> </tr> </table> <a href="http://e.dsw.com/a/hBJHIPdB7S7eaB7W$OA$et2iUZL/dsw2"></a> <table width="600" border="0" cellspacing="0" cellpadding="0"> <tr> <td><a href="http://ambaritsa.com/eng/texts/offer/"><img src="http://img514.imageshack.us/img514/2456/71676664gr2.gif" alt="" width="600" height="293" title="DSW Shoes" border="0"></a></td> </tr> </table>  <table width="600" border="0" cellspacing="0" cellpadding="0"> <tr> <td width="20"></td> <td align="left" style="text-align:justify;"><p><font face="Verdana, Arial, Helvetica, sans-serif" style="font-size:9px; text-align:justify; color:#231f20; line-height:170%;"><br> <br> </font></p> </td> <td valign="middle" width="20"> </td> </tr> </table> <table width="600" border="0" cellspacing="0" cellpadding="0"> <tr> <td width="20"></td> <td align="center"><font face="Verdana, Arial, Helvetica, sans-serif" style="font-size:9px; color:#6d6e71; line-height:170%;"><br> <span style="color: #000000"><a style="color: #000000;" href="http://ambaritsa.com/eng/texts/offer/" target="_blank">VIEW PRIVACY POLICY</a> <br/> This email was sent to: plai(a)ms1.hinet.net <br/> You have told us you would like to receive exciting e-mail offers from <br/> <br/> <a style="color: #000000;" href="http://ambaritsa.com/eng/texts/offer/" rel="nofollow" target="_blank">CLICK HERE TO UNSUBSCRIBE</a><br/> Please allow 24 - 48 hours for processing. <br/> <br/> <span style="font-family: Verdana,Arial,Helvetica,sans-serif; font-size: 10px; width: 700px;">Products limited and may sell out at any time. Prices are subject to change.</span></span><br><br> This email was sent to: <strong>plai(a)ms1.hinet.net</strong><br> </font></td> <td width="20"></td> </tr> </table> </center> </body> </html>

16 years, 8 months

1
0
0 / 0

Fuzzing results

by intrigeri

Hi. Here are the first results of the zzuf[1] vs po4a contest. Some probably have security-related consequences, but as it seems nobody uses po4a against untrusted content yet, I guess there is no problem disclosing these results without delay. ,---- | Test conditions `---- - a 21M file containing 100 concatenated copies of all the files in my `/usr/share/common-licenses/`; I had no existing PO file or translated versions at hand, which renders these tests quite incomplete. - po4a 0.34-2 Debian package; the same tests were also run after replacing the `Text` module with the CVS one (last time I checked, the core had not been changed in CVS since 0.34-2 was released), without any significant impact on the results. - Perl 5.10.0-16 ,---- | po4a-gettextize `---- Without specifying the input charset, zzuf'ed po4a-gettextize quickly errors out, complaining it was not able to detect the input charset; no incomplete file is left on disk. So I had to pretend the input was in UTF-8, as does ikiwiki's po plugin. Two ways of crashing were revealed by this command-line: zzuf -vc -s 0:100 -r 0.1:0.5 \ po4a-gettextize -f text -o markdown -M utf-8 -L utf-8 \ -m LICENSES >/dev/null They are: Malformed UTF-8 character (UTF-16 surrogate 0xdcc9) in substitution iterator at /usr/share/perl5/Locale/Po4a/Po.pm line 1443. Malformed UTF-8 character (fatal) at /usr/share/perl5/Locale/Po4a/Po.pm line 1443. and Malformed UTF-8 character (UTF-16 surrogate 0xdcec) in substitution (s///) at /usr/share/perl5/Locale/Po4a/Po.pm line 1443. Malformed UTF-8 character (fatal) at /usr/share/perl5/Locale/Po4a/Po.pm line 1443. Perl seems to exit cleanly, and an incomplete PO file is written on disk. I not sure if this is a bug in Perl or in Po.pm. ,---- | po4a-translate `---- Without specifying an input charset, same behaviour as po4a-gettextize, so let's specify UTF-8 as input charset as of now. The command: zzuf -cv \ po4a-translate -d -f text -o markdown -M utf-8 -L utf-8 \ -k 0 -m LICENSES -p LICENSES.fr.po -l test.fr ... prints tons of occurences of the following error, but a complete translated document is written (obviously with some weird chars inside): Use of uninitialized value in string ne at /usr/share/perl5/Locale/Po4a/TransTractor.pm line 854. Use of uninitialized value in string ne at /usr/share/perl5/Locale/Po4a/TransTractor.pm line 840. Use of uninitialized value in pattern match (m//) at /usr/share/perl5/Locale/Po4a/Po.pm line 1002. While: zzuf -cv -s 0:10 -r 0.001:0.3 \ po4a-translate -d -f text -o markdown -M utf-8 -L utf-8 \ -k 0 -m LICENSES -p LICENSES.fr.po -l test.fr ... seems to lose the fight, at the readpo(LICENSES.fr.po) step, against some kind of infinite loop, deadlock, or any similar beast. Seems like it could go on using CPU power forever, but memory use does not increase. Whatever format module is used does not change anything. This is thus probably a bug in po4a's core or in a lib it depends on. The sub read(), in TransTractor.pm, seems to be a good debugging starting point. ,---- | msgmerge `---- While not being part of po4a, msgmerge is used in some po4a* command-line tools, so you might be interested to hear that I did not manage to crash it with zzuf. Seems weird to me so I'll try harder. [1] http://caca.zoy.org/wiki/zzuf Bye, -- intrigeri <intrigeri(a)boum.org> | gnupg key @ https://gaffer.ptitcanardnoir.org/intrigeri/intrigeri.asc | The impossible just takes a bit longer.

16 years, 9 months

2
5
0 / 0

← Newer
1
Older →

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Devel November 2008