Hi,
Le jeudi 27 sept. 2012 à 08:24:46 (-0400), David Prévot a écrit :
Le 27/09/2012 07:55, D. Barbier a écrit :
> Indeed, this is due to accented characters.
> It seems that length() returns the number of bytes and not characters.
> I looked at Unicode issues with Perl a very long time ago and do not
> remember about its quirks; if anyone has a clue, please tell ;-)
Thomas, CCed, helped us a lot for the DPNhtml2mail script [0], and
managed to make that work.
Not really me: it was Ryuunosuke Ayanokouzi. He help a lot since for
Japanese it is even worse than for accented characters.
I will just precise the portion of code to get the real length of a
UTF-8 string:
use Unicode::GCString;
sub columns {
return Unicode::GCString->new(shift)->columns();
}
After you can use the columns function instead of length.
The rest of the code was to be sure that the string was not empty and
encoded in UTF-8 and if not converted to UTF-8. After depending of the
application, you may not need it.
Best regards,
Thomas
PS: keep me in CC if you still want my input I am not subscribed to
Po4a-devel list.