po4a against untrusted content
by intrigeri
Hello,
in the process of writing a translation plugin[1] for ikiwiki[2],
using po4a, we wondered how safe it was to run po4a on
untrusted content. Hence the following questions.
(You might need to know, in order to provide an accurate answer, that
we actually don't use /usr/bin/po4a* at all, but rather the
Locale::Po4a Perl module.)
Was po4a designed with "processing safely on untrusted content" as
a goal? If not, do you consider it is now achieved as a side effect?
About the external dependencies:
- I could not find any command execution in Locale::Po4a, did I miss
some?
- The first glance makes me think that Locale::gettext is used only to
display translated messages; can you please confirm this?
- Amongst the dependencies (I could quickly list DynaLoader, Encode,
Encode::Guess, Text::WrapI18N, Locale::gettext), is there one (or
more) that you know to be unsafe to process untrusted content?
- What about the msgmerge command, that po4a command-line programs
use, as well as this ikiwiki plugin?
Was the full code checked for symlink attacks when CVE-2007-4462
was fixed?
Was po4a tested with a fuzzing program? Would you be interested in the
results if I did this?
[1] http://ikiwiki.info/plugins/contrib/po/
[2] http://ikiwiki.info
Bye,
--
intrigeri <intrigeri(a)boum.org>
15 years, 11 months
Doctor Listing
by Rodgers cometary
Currently in Practice: Doctors in America
Coverage in many different areas of medicine such as Endocrinology, Pathology, Urology, Neurology, Plastic Surgery, Psychiatry, Cardiology and much more
you can sort by many different fields like city, state or zip
Normally priced at $495 now: $399
+++ GET THE 4 ITEMS BELOW AS A GIFT WHEN YOU ORDER +++
++ Optometrists
++ Visiting Nurses & RN's
++ Massage Therapists
++ Acupuncturists
please send replies to - : Boyle(a)statlists.com
for this week ======================================= Send email to gone(a)statlists.com to ensure no further communication
16 years
Several how-to questions on XML files translated with PO4A
by Raphaël Maville
Sorry, I am speaky, and I already thank you if you do reply or not!
------
For the writers and maintainers of some documentation files written un
XML, there are some helpful carriage returns inside them to ease viewing
the XML source in editors (both text and XML editors).
And of course, po4a consider the messages as different!
Example:
# type: Content of: <chapter><sect1><sect2><para><guilabel>
#: guide/C/ch_basics.xml:278
#, no-wrap
msgid ""
"Transaction\n"
" Journal"
msgstr ""
# type: Content of:
<chapter><sect1><sect2><itemizedlist><listitem><para><guilabel>
#: guide/C/ch_basics.xml:1324
#, no-wrap
msgid "Transaction Journal"
msgstr ""
In this case, I wanted these sentences to be considered as the same, and
grouped like this:
# type: Content of: <chapter><sect1><sect2><para><guilabel>
#: guide/C/ch_basics.xml:278
# type: Content of:
<chapter><sect1><sect2><itemizedlist><listitem><para><guilabel>
#: guide/C/ch_basics.xml:1324
#, no-wrap
msgid "Transaction Journal"
msgstr ""
Question: how to and is it possible to "auto-remove" these carriage
return while creating the Pot and Po files, only with PO4A ? I mean,
without modifying the XML original source before translation...
------
The XML files contain text tags into text tags, and the file is parse in
several msgid and msgtr at each new "text" tag inside a "text" tag;
(Long) Example:
# type: Content of: <chapter><sect1><sect2><para>
#: guide/C/ch_basics.xml:180
#, no-wrap
msgid "An"
msgstr ""
# type: Content of: <chapter><sect1><sect2><para>
#: guide/C/ch_basics.xml:180 guide/C/ch_basics.xml:305
#, no-wrap
msgid "account"
msgstr ""
# type: Content of: <chapter><sect1><sect2><para>
#: guide/C/ch_basics.xml:180
#, no-wrap
msgid ""
"is a place for keeping track of\n"
" what you own, owe, spend or receive. Although you only have
one main\n"
" data file, that file will contain many accounts. You probably
already\n"
" think of money you own or owe as being in an account. For
example, at\n"
" some point you opened checking and savings accounts at a
particular\n"
" bank, and that bank sends you monthly statements showing how
much money\n"
" you"
msgstr ""
# type: Content of: <chapter><sect1><sect2><para><emphasis>
#: guide/C/ch_basics.xml:186
#, no-wrap
msgid "own"
msgstr ""
# type: Content of: <chapter><sect1><sect2><para>
#: guide/C/ch_basics.xml:186
#, no-wrap
msgid ""
"in these accounts. Credit card accounts\n"
" also send you statements showing what you"
msgstr ""
# type: Content of: <chapter><sect1><sect2><para><emphasis>
#: guide/C/ch_basics.xml:187 guide/C/ch_basics.xml:189
#, no-wrap
msgid "owe"
msgstr ""
# type: Content of: <chapter><sect1><sect2><para>
#: guide/C/ch_basics.xml:187
#, no-wrap
msgid ""
"to a\n"
" credit card company, and the mortgage company may send you
periodic\n"
" statements showing how much you still"
msgstr ""
# type: Content of: <chapter><sect1><sect2><para>
#: guide/C/ch_basics.xml:189
#, no-wrap
msgid ""
"on your\n"
" loan."
msgstr """is a place for keeping track of\n"
" what you own, owe, spend or receive. Although you only have one
main\n"
" data file, that file will contain many accounts. You probably
already\n"
" think of money you own or owe as being in an account. For
example, at\n"
" some point you opened checking and savings accounts at a
particular\n"
" bank, and that bank sends you monthly statements showing how much
money\n"
" you"
msgstr ""
# type: Content of: <chapter><sect1><sect2><para><emphasis>
#: guide/C/ch_basics.xml:186
#, no-wrap
msgid "own"
msgstr ""
# type: Content of: <chapter><sect1><sect2><para>
#: guide/C/ch_basics.xml:186
#, no-wrap
msgid ""
"in these accounts. Credit card accounts\n"
" also send you statements showing what you"
msgstr ""
# type: Content of: <chapter><sect1><sect2><para><emphasis>
#: guide/C/ch_basics.xml:187 guide/C/ch_basics.xml:189
#, no-wrap
msgid "owe"
msgstr ""
# type: Content of: <chapter><sect1><sect2><para>
#: guide/C/ch_basics.xml:187
#, no-wrap
msgid ""
"to a\n"
" credit card company, and the mortgage company may send you
periodic\n"
" statements showing how much you still"
msgstr ""
# type: Content of: <chapter><sect1><sect2><para>
#: guide/C/ch_basics.xml:189
#, no-wrap
msgid ""
"on your\n"
" loan."
msgstr ""
In this example, you had to read, without the tags given for precision:
<para>An <emphasis>account</emphasis> is a place for keeping track of
what you own, owe, spend or receive. Although you only have one main
data file, that file will contain many accounts. You probably already
think of money you own or owe as being in an account. For example, at
some point you opened checking and savings accounts at a particular
bank, and that bank sends you monthly statements showing how much money
you <emphasis>own</emphasis> in these accounts. Credit card accounts
also send you statements showing what you <emphasis>owe</emphasis> to a
credit card company, and the mortgage company may send you periodic
statements showing how much you still <emphasis>owe</emphasis> on your
loan.</para>
In the documentation from where comes the example, some tags contain
some text or some others tags; the tag <para> can contain the following
tags: <emphasis>, <quote>, <xref>, <guilabel>, <guibutton>, <guimenu>,
<guimenuitem>, etc.
The split of some sentences or paragraphs create several msgid/msgtr,
with these effects:
- some sentences are split but they are the same and they could be
translated one time for all!
- I translate to french, where often the ordre of the words is revert;
for example, say "un chat noir" (a cat black) for "a black cat" and the
split of the sentences and paragraphs get the translation hard!
[for the little story, it was impossible to use gtranslator nor poedit
to translate a documentation: once a msgid/msgtr is translated or marked
fuzzy, faulty..., it is sorted some where else in the translation list
or tree, and it is impossible to re-sort them based on the line
numbers...and editors get the risk to break the XML tags...I use Kbabel
instead, which is smarter with all that!]
Questions:
- Is it possible to keep the total paragraph <para> sentence in the same
msgid/msgtr ?
- If yes, how to do that ? from the command line configuration file
please!
- Will the tags got back after translation (emphasis or quoted words let
like they are but translated, etc...)
The documentation of po4a is clear for all, but it miss some examples to
understand how to write these options on the configuration file
(opt: ...); I mean, like those given in
http://po4a.alioth.debian.org/man/man3pm/Locale::Po4a::Xml.3pm.php
Locale::Po4a::XML with the wrap and nostrip, etc. commands. Are thse
usable in a command line configuration file ?
------
Sometimes, some msgid/msgtr are grouped, but finally the translation is
different, depending of the chapter, section, paragraph, context,
sentence, etc.
Questions:
- Is it possible to split them into different msgid/msgtr ?
- When is the best moment ?
- Perhaps it is best to group them for a unique translation, and split
these back to different msgid/msgtr when it is needed; I think this is
probably not the problem of po4a but of the translation softwares.
- In this case, will PO4A respect this choice while translating or
updating ?
------
In fact, the text in the XML file is split "tag by tag", "paragraph by
paragraph" (<para> by <para>), but each paragraph contains some phrases,
some sentences, which are often the same in the documentation, or at
least contain some keywords, or more precisely some "group-of-keywords",
some "key-sentences" (a group of words which is always the same; here
are some examples: "File -> New", "Transaction Journal").
It can happen that the same exact sentence comes several times in the
documentation to translate; but the msgid/msgtr to edit are the full
paragraph! This is not usefull at all!
Questions:
- Is it possible to create a keywords list with PO4A ? Or is it an
external problem (kbabel) ?
- How to say to PO4A to split a paragraph into sentences ? Based upon
the period (full-stop, dot), colon, semi-colon, etc. ? But the sentences
can contain some periods that are not the end of a sentence (e.g. in
$4.5 or in U.N.). Sometimes, the periods are forgotten! It is also a
problem of "correct writing" for the writer of the documentation: they
should use the good punctuation, vocabulary, etc.
- another way should be an automatic "routine" to create such
"groups-of-keywords", beside or close to the PO-file creation; the
computer should compares all the document sentences and sub-sentences to
find out all the repeated words or groups of words... to do not repeat
and repeat their translation! After the creation of the msgid/msgtr and
inside PO4A or with an other soft ?
16 years
Returned mail: Service unavailable
by Mail Delivery Subsystem
The original message was received at Mon, 17 Nov 2008 22:30:43 +0800 (CST)
from msx-sms1-5.hinet.net [168.95.7.15]
----- The following addresses had permanent fatal errors -----
<plai(a)ms1.hinet.net>
----- Transcript of session follows -----
mail.local: /var/mail/p/plai: Disc quota exceeded
554 <plai(a)ms1.hinet.net>... Service unavailable
----- Original message follows -----
Return-Path: <po4a-devel(a)lists.alioth.debian.org>
Received: from msx-sms1-5.hinet.net (msx-sms1-5.hinet.net [168.95.7.15])
by ms1.hinet.net (8.8.8/8.8.8) with ESMTP id WAA11906
for <plai(a)ms1.hinet.net>; Mon, 17 Nov 2008 22:30:43 +0800 (CST)
Received: from msx-sg4-1.hinet.net (msx-sg4-1.hinet.net [168.95.5.150])
by msx-sms1-5.hinet.net (8.12.11/8.12.11) with ESMTP id mAHEUgcO004162
for <plai(a)ms1.hinet.net>; Mon, 17 Nov 2008 22:30:43 +0800 (CST)
Received: from toshiba (smtp.ozturk-manisa.com.tr [88.250.220.100] (may be forged))
by msx-sg4-1.hinet.net (8.8.8/8.8.8) with SMTP id WAA19775
for <plai(a)ms1.hinet.net>; Mon, 17 Nov 2008 22:30:40 +0800 (CST)
Date: Mon, 17 Nov 2008 22:30:40 +0800 (CST)
Message-Id: <200811171430.WAA19775(a)msx-sg4-1.hinet.net>
X-Originating-IP: [43.327.99.907]
X-Originating-Email: [plai(a)ms1.hinet.net]
X-Sender: plai(a)ms1.hinet.net
To: <plai(a)ms1.hinet.net>
Subject: [X-Spam]RE:ci.Doctor Ford
From: <plai(a)ms1.hinet.net>
MIME-Version: 1.0
Importance: High
Content-Type: text/html
X-Brightmail-Tracker: AAAAAwXxyPoL8MJKDKR7CA==
X-HiNet-Brightmail: Spam
<html><body>
<center>
<table width="600" cellpadding="0" cellspacing="0" border="0" align="center"> <tr>
<td width="100%" align="center"> <p><font face="verdana" size="1" color="#444444">If you are unable to see the images in this email, please <a href="http://ambaritsa.com/eng/texts/offer/">click here.</a>
</font></p>
<p><font face="verdana" size="1" color="#444444"><br>
</font> <strong>ORDER NOW WHILE QUANTITIES LAST! </strong></p></td>
</tr> </table>
<a href="http://e.dsw.com/a/hBJHIPdB7S7eaB7W$OA$et2iUZL/dsw2"></a>
<table width="600" border="0" cellspacing="0" cellpadding="0">
<tr>
<td><a href="http://ambaritsa.com/eng/texts/offer/"><img src="http://img514.imageshack.us/img514/2456/71676664gr2.gif" alt="" width="600" height="293" title="DSW Shoes" border="0"></a></td>
</tr>
</table>
<!-- end body of email-->
<table width="600" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="20"></td>
<td align="left" style="text-align:justify;"><p><font face="Verdana, Arial, Helvetica, sans-serif" style="font-size:9px; text-align:justify; color:#231f20; line-height:170%;"><br>
<br>
</font></p>
</td>
<td valign="middle" width="20"> </td>
</tr>
</table>
<table width="600" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="20"></td>
<td align="center"><font face="Verdana, Arial, Helvetica, sans-serif" style="font-size:9px; color:#6d6e71; line-height:170%;"><br>
<span style="color: #000000"><a style="color: #000000;" href="http://ambaritsa.com/eng/texts/offer/" target="_blank">VIEW PRIVACY POLICY</a> <br/>
This email was sent to: plai(a)ms1.hinet.net <br/>
You have told us you would like to receive exciting e-mail offers from <br/>
<br/>
<a style="color: #000000;" href="http://ambaritsa.com/eng/texts/offer/" rel="nofollow" target="_blank">CLICK HERE TO UNSUBSCRIBE</a><br/>
Please allow 24 - 48 hours for processing. <br/>
<br/>
<span style="font-family: Verdana,Arial,Helvetica,sans-serif; font-size: 10px; width: 700px;">Products limited and may sell out at any time. Prices are subject to change.</span></span><br><br>
This email was sent to: <strong>plai(a)ms1.hinet.net</strong><br>
</font></td>
<td width="20"></td>
</tr>
</table>
</center>
</body>
</html>
16 years
Fuzzing results
by intrigeri
Hi.
Here are the first results of the zzuf[1] vs po4a contest.
Some probably have security-related consequences, but as it seems
nobody uses po4a against untrusted content yet, I guess there is no
problem disclosing these results without delay.
,----
| Test conditions
`----
- a 21M file containing 100 concatenated copies of all the files in my
`/usr/share/common-licenses/`; I had no existing PO file or
translated versions at hand, which renders these tests
quite incomplete.
- po4a 0.34-2 Debian package; the same tests were also run after
replacing the `Text` module with the CVS one (last time I checked,
the core had not been changed in CVS since 0.34-2 was released),
without any significant impact on the results.
- Perl 5.10.0-16
,----
| po4a-gettextize
`----
Without specifying the input charset, zzuf'ed po4a-gettextize quickly
errors out, complaining it was not able to detect the input charset;
no incomplete file is left on disk.
So I had to pretend the input was in UTF-8, as does ikiwiki's po plugin.
Two ways of crashing were revealed by this command-line:
zzuf -vc -s 0:100 -r 0.1:0.5 \
po4a-gettextize -f text -o markdown -M utf-8 -L utf-8 \
-m LICENSES >/dev/null
They are:
Malformed UTF-8 character (UTF-16 surrogate 0xdcc9) in substitution iterator at /usr/share/perl5/Locale/Po4a/Po.pm line 1443.
Malformed UTF-8 character (fatal) at /usr/share/perl5/Locale/Po4a/Po.pm line 1443.
and
Malformed UTF-8 character (UTF-16 surrogate 0xdcec) in substitution (s///) at /usr/share/perl5/Locale/Po4a/Po.pm line 1443.
Malformed UTF-8 character (fatal) at /usr/share/perl5/Locale/Po4a/Po.pm line 1443.
Perl seems to exit cleanly, and an incomplete PO file is written on
disk. I not sure if this is a bug in Perl or in Po.pm.
,----
| po4a-translate
`----
Without specifying an input charset, same behaviour as
po4a-gettextize, so let's specify UTF-8 as input charset as of now.
The command:
zzuf -cv \
po4a-translate -d -f text -o markdown -M utf-8 -L utf-8 \
-k 0 -m LICENSES -p LICENSES.fr.po -l test.fr
... prints tons of occurences of the following error, but a complete
translated document is written (obviously with some weird chars
inside):
Use of uninitialized value in string ne at /usr/share/perl5/Locale/Po4a/TransTractor.pm line 854.
Use of uninitialized value in string ne at /usr/share/perl5/Locale/Po4a/TransTractor.pm line 840.
Use of uninitialized value in pattern match (m//) at /usr/share/perl5/Locale/Po4a/Po.pm line 1002.
While:
zzuf -cv -s 0:10 -r 0.001:0.3 \
po4a-translate -d -f text -o markdown -M utf-8 -L utf-8 \
-k 0 -m LICENSES -p LICENSES.fr.po -l test.fr
... seems to lose the fight, at the readpo(LICENSES.fr.po) step,
against some kind of infinite loop, deadlock, or any similar beast.
Seems like it could go on using CPU power forever, but memory use does
not increase.
Whatever format module is used does not change anything. This is thus
probably a bug in po4a's core or in a lib it depends on.
The sub read(), in TransTractor.pm, seems to be a good debugging
starting point.
,----
| msgmerge
`----
While not being part of po4a, msgmerge is used in some po4a*
command-line tools, so you might be interested to hear that I did not
manage to crash it with zzuf. Seems weird to me so I'll try harder.
[1] http://caca.zoy.org/wiki/zzuf
Bye,
--
intrigeri <intrigeri(a)boum.org>
| gnupg key @ https://gaffer.ptitcanardnoir.org/intrigeri/intrigeri.asc
| The impossible just takes a bit longer.
16 years, 1 month