On 2012/9/27 D. Barbier wrote:
On 2012/9/27 David Prévot wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Hi,
>
> Le 27/09/2012 07:55, D. Barbier a écrit :
>
>> Indeed, this is due to accented characters.
>> It seems that length() returns the number of bytes and not characters.
>> I looked at Unicode issues with Perl a very long time ago and do not
>> remember about its quirks; if anyone has a clue, please tell ;-)
>
> Thomas, CCed, helped us a lot for the DPNhtml2mail script [0], and
> managed to make that work.
>
>> 0:
http://anonscm.debian.org/viewvc/publicity/dpn/scripts/DPNhtml2mail.pl?vi...
>
> I guess the magic operates in the end of the following code:
>
> # number of column of a string
> sub _columns {
> my $str = scalar shift;
>
> return 0 if ( !defined $str || $str eq '' );
>
> $str = decode_utf8($str) unless utf8::is_utf8($str);
> return Unicode::GCString->new($str)->columns();
> }
Thanks David,
This seems to be different, you are computing the string width whereas
I need the number of characters.
I believe that all we need is to add some ":encoding(foo)" flag when
opening file for reading, encoding must be specified and is thus
known.
Hello,
I was wrong, we need text width; I checked in some commits to use
Unicode::GCString if available, thanks to both of you for the help.
The downside is that there is one new string in PO files.
Denis