On 2012/9/27 Anders Nawroth wrote:
Hi!
I have the following two documents:
The original:
What is a Graph Database?
=========================
The translation:
Qu'est-ce qu'une base de données graphe?
========================================
Running po4a-gettextize on them gives:
po4a gettextization: Structure disparity between original and translated files:
msgid (at target/original/src/dummy.asciidoc:2) is of type 'Title =' while
msgstr (at src/dummy.asciidoc:2) is of type 'Plain text'.
Original text: What is a Graph Database?
Translated text: Qu'est-ce qu'une base de données graphe?
[...]
Indeed, this is due to accented characters.
It seems that length() returns the number of bytes and not characters.
I looked at Unicode issues with Perl a very long time ago and do not
remember about its quirks; if anyone has a clue, please tell ;-)
Denis