how to convert from gbk encoding to utf-8 encoding in Perl

how to convert from gbk encoding to utf-8 encoding in Perl - perl

I have a simple question which I do not know how to solve in Perl. I know how to convert from utf-8 to GBK, for example, from e4b8ad to d6d0. But I am not sure how to go backward, i.e. given d6d0, how do I know e4b8ad.
Please enlighten me! Many thanks.

When you have hex digits, pack is your friend. Following is a REPL session. Notes:
To reverse the direction, pack the hex digits into octets, decode from GB octets to character string, encode character string to UTF-8 octets, unpack octets into hex digits.
GBK is superseded. Use of GB18030 (provided by Encode::HanExtra in Perl) has been mandatory for five years already.
$ use Encode qw(decode encode); use Encode::HanExtra; use Devel::Peek qw(Dump);
$ 'e4b8ad'
e4b8ad # hex digits
$ pack('H*', 'e4b8ad')
中
$ Dump(pack('H*', 'e4b8ad'))
SV = PV(0x3657680) at 0x36b7188
REFCNT = 1
FLAGS = (PADTMP,POK,pPOK)
PV = 0x36c0768 "\344\270\255"\0 # octets of UTF-8 encoded data
CUR = 3
LEN = 8
$ decode('UTF-8', pack('H*', 'e4b8ad'))
中
$ Dump(decode('UTF-8', pack('H*', 'e4b8ad')))
SV = PV(0x326c3a0) at 0x36a50c8
REFCNT = 1
FLAGS = (TEMP,POK,pPOK,UTF8)
PV = 0x3698a48 "\344\270\255"\0 [UTF8 "\x{4e2d}"] # character string
CUR = 3
LEN = 8
$ encode('GB18030', decode('UTF-8', pack('H*', 'e4b8ad')))
"\xd6\xd0"
$ Dump(encode('GB18030', decode('UTF-8', pack('H*', 'e4b8ad'))))
SV = PV(0x36a2da0) at 0x36b6d98
REFCNT = 1
FLAGS = (TEMP,POK,pPOK)
PV = 0x36db3e8 "\326\320"\0 # octets of GB18030 encoded data
CUR = 2
LEN = 8
$ unpack('H*', encode('GB18030', decode('UTF-8', pack('H*', 'e4b8ad'))))
d6d0 # hex digits

The answer to the question asked:
use Encode qw( from_to );
my $gbk = "\xD6\xD0";
from_to(my $utf8 = $gbk, 'GB18030', 'UTF-8'); # E4 B8 AD
or
use Encode qw( decode encode );
my $gbk = "\xD6\xD0";
my $utf8 = encode('UTF-8', decode('GB18030', $gbk)); # E4 B8 AD
However, a more normal flow looks like the following:
open(my $fh_in, '<:encoding(GB18030)', ...) or die ...;
open(my $fh_out, '>:encoding(UTF-8)', ...) or die ...;
while (<$fh_in>) {
...
print $fh_out ...;
...
}
Encode::HanExtra must be installed for Encode to find the encoding.

use Encode qw/encode decode/;
$utf8 = decode("euc-cn", $euc_cn); # ditto
You can also normally specify the encoding when you open or close a FD and it will perform necessary conversions.
Works like a charm:
perl -e 'open(X,">","/tmp/x"); print X chr(0xd6).chr(0xd0);close(X)'
perl -mEncode -e 'open(X,"<","/tmp/x"); $x=<X>; print Encode::decode("euc-cn",$x);' > /tmp/xx

Related

Perl: Packing a sequence of bytes into a string

I'm trying to run a simple test whereby I want to have differently formatted binary strings and print them out. In fact, I'm trying to investigate a problem whereby sprintf cannot deal with a wide-character string passed in for the placeholder %s.
In this case, the binary string shall just contain the Cyrillic "д" (because it's above ISO-8859-1)
The code below works when I use the character directly in the source.
But nothing that passes through pack works.
For the UTF-8 case, I need to set the UTF-8 flag on the string $ch , but how.
The UCS-2 case fails, and I suppose it's because there is no way for Perl UCS-2 from ISO-8859-1, so that test is probably bollocks, right?
The code:
#!/usr/bin/perl
use utf8; # Meaning "This lexical scope (i.e. file) contains utf8"
# https://perldoc.perl.org/open.html
use open qw(:std :encoding(UTF-8));
sub showme {
my ($name,$ch) = #_;
print "-------\n";
print "This is test: $name\n";
my $ord = ord($ch); # ordinal computed outside of "use bytes"; actually should yield the unicode codepoint
{
# https://perldoc.perl.org/bytes.html
use bytes;
my $mark = (utf8::is_utf8($ch) ? "yes" : "no");
my $txt = sprintf("Received string of length: %i byte, contents: %vd, ordinal x%04X, utf-8: %s\n", length($ch), $ch, $ord, $mark);
print $txt,"\n";
}
print $ch, "\n";
print "Combine: $ch\n";
print "Concat: " . $ch . "\n";
print "Sprintf: " . sprintf("%s",$ch) . "\n";
print "-------\n";
}
showme("Cryillic direct" , "д");
showme("Cyrillic UTF-8" , pack("HH","D0","B4")); # UTF-8 of д is D0B4
showme("Cyrillic UCS-2" , pack("HH","04","34")); # UCS-2 of д is 0434
Current output:
Looks good
-------
This is test: Cryillic direct
Received string of length: 2 byte, contents: 208.180, ordinal x0434, utf-8: yes
д
Combine: д
Concat: д
Sprintf: д
-------
That's a no. Where does the 176 come from??
-------
This is test: Cyrillic UTF-8
Received string of length: 2 byte, contents: 208.176, ordinal x00D0, utf-8: no
Ð°
Combine: Ð°
Concat: Ð°
Sprintf: Ð°
-------
This is even worse.
-------
This is test: Cyrillic UCS-2
Received string of length: 2 byte, contents: 0.48, ordinal x0000, utf-8: no
0
Combine: 0
Concat: 0
Sprintf: 0
-------

You have two problems.
Your calls to pack are incorrect
Each H represents one hex digit.
$ perl -e'printf "%vX\n", pack("HH", "D0", "B4")' # XXX
D0.B0
$ perl -e'printf "%vX\n", pack("H2H2", "D0", "B4")' # Ok
D0.B4
$ perl -e'printf "%vX\n", pack("(H2)2", "D0", "B4")' # Ok
D0.B4
$ perl -e'printf "%vX\n", pack("(H2)*", "D0", "B4")' # Better
D0.B4
$ perl -e'printf "%vX\n", pack("H*", "D0B4")' # Alternative
D0.B4
STDOUT is expecting decoded text, but you are providing encoded text
First, let's take a look at strings you are producing (once the problem mentioned above is fixed). All you need for that is the %vX format, which provides the period-separated value of each character in hex.
"д" produces a one-character string. This character is the Unicode Code Point for д.
$ perl -e'use utf8; printf("%vX\n", "д");'
434
pack("H*", "D0B4") produces a two-character string. These characters are the UTF-8 encoding of д.
$ perl -e'printf("%vX\n", pack("H*", "D0B4"));'
D0.B4
pack("H*", "0434") produces a two-character string. These characters are the UCS-2be and UTF-16be encodings of д.
$ perl -e'printf("%vX\n", pack("H*", "0434"));'
4.34
Normally, a file handle expects a string of bytes (characters with values in 0..255) to be printed to it. These bytes are output verbatim.[1][2]
When an encoding layer (e.g. :encoding(UTF-8)) is added to a file handle, it expects a string of Unicode Code Points (aka decoded text) to be printed to it instead.
Your program adds an encoding layer to STDOUT (through its use of the use open pragma), so you must provide UCP (decoded text) to print and say. You can obtain decoded text from encoded text using, for example, Encode's decode function.
use utf8;
use open qw( :std :encoding(UTF-8) );
use feature qw( say );
use Encode qw( decode );
say "д"; # ok (UCP of "д")
say pack("H*", "D0B4"); # XXX (UTF-8 encoding of "д")
say pack("H*", "0434"); # XXX (UCS-2be and UTF-16be encoding of "д")
say decode("UTF-8", pack("H*", "D0B4")); # ok (UCP of "д")
say decode("UCS-2be", pack("H*", "0434")); # ok (UCP of "д")
say decode("UTF-16be", pack("H*", "0434")); # ok (UCP of "д")
For the UTF-8 case, I need to set the UTF-8 flag on
No, you need to decode the strings.
The UTF-8 flag is irrelevant. Whether the flag is set or not originally is irrelevant. Whether the flag is set or not after the string is decoded is irrelevant. The flag indicates how the string is stored internally, something you shouldn't care about.
For example, take
use strict;
use warnings;
use open qw( :std :encoding(UTF-8) );
use feature qw( say );
my $x = chr(0xE9);
utf8::downgrade($x); # Tell Perl to use the UTF8=0 storage format.
say sprintf "%s %vX %s", utf8::is_utf8($x) ? "UTF8=1" : "UTF8=0", $x, $x;
utf8::upgrade($x); # Tell Perl to use the UTF8=1 storage format.
say sprintf "%s %vX %s", utf8::is_utf8($x) ? "UTF8=1" : "UTF8=0", $x, $x;
It outputs
UTF8=0 E9 é
UTF8=1 E9 é
Regardless of the UTF8 flag, the UTF-8 encoding (C3 A9) of the provided UCP (U+00E9) is output.
I suppose it's because there is no way for Perl UCS-2 from ISO-8859-1, so that test is probably bollocks, right?
At best, one could employ heuristics to guess whether a string is encoded using iso-latin-1 or UCS-2be. I suspect one could get rather accurate results (like those you'd get for iso-latin-1 and UTF-8.)
I'm not sure why you bring up iso-latin-1 since nothing else in your question relates to iso-latin-1.
Except on Windows, where a :crlf layer added to handles by default.
You get a Wide character warning if you provide a string that contains a character that's not a byte, and the utf8 encoding of the string is output instead.

Please see if following demonstration code of any help
use strict;
use warnings;
use feature 'say';
use utf8; # https://perldoc.perl.org/utf8.html
use Encode; # https://perldoc.perl.org/Encode.html
my $str;
my $utf8 = 'Привет Москва';
my $ucs2le = '1f044004380432043504420420001c043e0441043a0432043004'; # Little Endian
my $ucs2be = '041f044004380432043504420020041c043e0441043a04320430'; # Big Endian
my $utf16 = '041f044004380432043504420020041c043e0441043a04320430';
my $utf32 = '0000041f0000044000000438000004320000043500000442000000200000041c0000043e000004410000043a0000043200000430';
# https://perldoc.perl.org/functions/binmode.html
binmode STDOUT, ':utf8';
# https://perldoc.perl.org/feature.html#The-'say'-feature
say 'UTF-8: ' . $utf8;
# https://perldoc.perl.org/Encode.html#THE-PERL-ENCODING-API
$str = pack('H*',$ucs2be);
say 'UCS-2BE: ' . decode('UCS-2BE',$str);
$str = pack('H*',$ucs2le);
say 'UCS-2LE: ' . decode('UCS-2LE',$str);
$str = pack('H*',$utf16);
say 'UTF-16: '. decode('UTF16',$str);
$str = pack('H*',$utf32);
say 'UTF-32: ' . decode('UTF32',$str);
Output
UTF-8: Привет Москва
UCS-2BE: Привет Москва
UCS-2LE: Привет Москва
UTF-16: Привет Москва
UTF-32: Привет Москва
Supported Cyrillic encodings
use strict;
use warnings;
use feature 'say';
use Encode;
use utf8;
binmode STDOUT, ':utf8';
my $utf8 = 'Привет Москва';
my #encodings = qw/UCS-2 UCS-2LE UCS-2BE UTF-16 UTF-32 ISO-8859-5 CP855 CP1251 KOI8-F KOI8-R KOI8-U/;
say '
:: Supported Cyrillic encoding
---------------------------------------------
UTF-8 ', $utf8;
for (#encodings) {
printf "%-11s %s\n", $_, unpack('H*', encode($_,$utf8));
}
Output
:: Supported Cyrillic encoding
---------------------------------------------
UTF-8 Привет Москва
UCS-2 041f044004380432043504420020041c043e0441043a04320430
UCS-2LE 1f044004380432043504420420001c043e0441043a0432043004
UCS-2BE 041f044004380432043504420020041c043e0441043a04320430
UTF-16 feff041f044004380432043504420020041c043e0441043a04320430
UTF-32 0000feff0000041f0000044000000438000004320000043500000442000000200000041c0000043e000004410000043a0000043200000430
ISO-8859-5 bfe0d8d2d5e220bcdee1dad2d0
CP855 dde1b7eba8e520d3d6e3c6eba0
CP1251 cff0e8e2e5f220cceef1eae2e0
KOI8-F f0d2c9d7c5d420edcfd3cbd7c1
KOI8-R f0d2c9d7c5d420edcfd3cbd7c1
KOI8-U f0d2c9d7c5d420edcfd3cbd7c1
Documentation Encode::Supported

Both are good answer. Here is a slight extension of Polar Bear's code to print details about the string:
use strict;
use warnings;
use feature 'say';
use utf8;
use Encode;
sub about {
my($str) = #_;
# https://perldoc.perl.org/bytes.html
my $charlen = length($str);
my $txt;
{
use bytes;
my $mark = (utf8::is_utf8($str) ? "yes" : "no");
my $bytelen = length($str);
$txt = sprintf("Length: %d byte, %d chars, utf-8: %s, contents: %vd\n",
$bytelen,$charlen,$mark,$str);
}
return $txt;
}
my $str;
my $utf8 = 'Привет Москва';
my $ucs2le = '1f044004380432043504420420001c043e0441043a0432043004'; # Little Endian
my $ucs2be = '041f044004380432043504420020041c043e0441043a04320430'; # Big Endian
my $utf16 = '041f044004380432043504420020041c043e0441043a04320430';
my $utf32 = '0000041f0000044000000438000004320000043500000442000000200000041c0000043e000004410000043a0000043200000430';
binmode STDOUT, ':utf8';
say 'UTF-8: ' . $utf8;
say about($utf8);
{
my $str = pack('H*',$ucs2be);
say 'UCS-2BE: ' . decode('UCS-2BE',$str);
say about($str);
}
{
my $str = pack('H*',$ucs2le);
say 'UCS-2LE: ' . decode('UCS-2LE',$str);
say about($str);
}
{
my $str = pack('H*',$utf16);
say 'UTF-16: '. decode('UTF16',$str);
say about($str);
}
{
my $str = pack('H*',$utf32);
say 'UTF-32: ' . decode('UTF32',$str);
say about($str);
}
# Try identity transcoding
{
my $str_encoded_in_utf16 = encode('UTF16',$utf8);
my $str = decode('UTF16',$str_encoded_in_utf16);
say 'The same: ' . $str;
say about($str);
}
Running this gives:
UTF-8: Привет Москва
Length: 25 byte, 13 chars, utf-8: yes, contents: 208.159.209.128.208.184.208.178.208.181.209.130.32.208.156.208.190.209.129.208.186.208.178.208.176
UCS-2BE: Привет Москва
Length: 26 byte, 26 chars, utf-8: no, contents: 4.31.4.64.4.56.4.50.4.53.4.66.0.32.4.28.4.62.4.65.4.58.4.50.4.48
UCS-2LE: Привет Москва
Length: 26 byte, 26 chars, utf-8: no, contents: 31.4.64.4.56.4.50.4.53.4.66.4.32.0.28.4.62.4.65.4.58.4.50.4.48.4
UTF-16: Привет Москва
Length: 26 byte, 26 chars, utf-8: no, contents: 4.31.4.64.4.56.4.50.4.53.4.66.0.32.4.28.4.62.4.65.4.58.4.50.4.48
UTF-32: Привет Москва
Length: 52 byte, 52 chars, utf-8: no, contents: 0.0.4.31.0.0.4.64.0.0.4.56.0.0.4.50.0.0.4.53.0.0.4.66.0.0.0.32.0.0.4.28.0.0.4.62.0.0.4.65.0.0.4.58.0.0.4.50.0.0.4.48
The same: Привет Москва
Length: 25 byte, 13 chars, utf-8: yes, contents: 208.159.209.128.208.184.208.178.208.181.209.130.32.208.156.208.190.209.129.208.186.208.178.208.176
And a little diagram I made as an overview for next time, covering encode, decode and pack. Because one better be ready for next time.
(The above diagram & its graphml file available here)

how to print the result from pack function?

I like to verify what pack does. I have the following code to give it a try.
$bits = pack 'N','134744072';
how to print bits ?
I did the following:
printf ("bits = %032b \n", $bits);
but it does not work.
Thanks !!

If you want the binary representation of a number, use
my $num = 134744072;
printf("bits = %032b\n", $num);
If you want the binary representation of a string of bytes, use
my $bytes = pack('N', 134744072);
printf("bits = %s\n", unpack('B*', $bytes));

The Devel::Peek module (which comes with Perl) allows you to examine Perl's representation of the variable. This is probably more useful than just a raw print when you're dealing with binary data rather than printable character strings.
#!/usr/bin/perl
use strict;
use warnings;
use Devel::Peek qw(Dump);
my $bits = pack 'N','134744072';
Dump($bits);
Which produces output like this:
SV = PV(0xaedb20) at 0xb15650
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0xb06630 "\10\10\10\10"\0
CUR = 4
LEN = 10
The 'SV' at the beginning indicates that this is a dump of a 'scalar value' (as opposed to say an array or a hash value).
The 'SV = PV' indicates that this scalar contains a string of bytes (as opposed to say an integer or floating point value).
The 'PV = 0xb06630' is the pointer to where those bytes are located.
The "\10\10\10\10"\0 is probably the bit you're interested in. The double quoted string represents the bytes making up the contents of this string.
Inside the string, you would typically see the bytes interpreted as if they were ASCII, so the byte 65 decimal would appear as 'A'. All non-printable characters are displayed in octal with a preceding \.
So your $bits variable contains 4 bytes, each octal '10' which is hex 0x08.
The LEN and CUR are telling you that Perl allocated 10 bytes of storage and is currently using 4 of them (so length($bits) would return 4).

Using Term::ReadLine with Unicode input

I am trying to figure out how to read Unicode input from the terminal using Term::ReadLine. It turns out, if I enter a Unicode character at the prompt, the returned string varies depending on various settings. (I am running Ubuntu 14.10, and have installed Term::ReadLine::Gnu). For example (p.pl):
use open qw( :std :utf8 );
use strict;
use warnings;
use Devel::Peek;
use Term::ReadLine;
my $term = Term::ReadLine->new('ProgramName');
$term->ornaments( 0 );
my $ans = $term->readline("Enter message: ");
Dump ( $ans );
Running p.pl and typing å at the prompt gives output:
Enter message: å
SV = PV(0x83a5a0) at 0x87c080
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x917500 "\303\245"\0
CUR = 2
LEN = 10
So the returned string $ans has not set the UTF-8 flag. However, if I run the program using perl -CS p.pl, the output is:
Enter message: å
SV = PVMG(0x24c12e0) at 0x23050a0
REFCNT = 1
FLAGS = (PADMY,POK,pPOK,UTF8)
IV = 0
NV = 0
PV = 0x248faf0 "\303\245"\0 [UTF8 "\x{e5}"]
CUR = 2
LEN = 10
the UTF-8 flag is correctly set on $ans. So the first question is: Why is command line option -CS different from using the pragma use open qw( :std :utf8 )?
Next, I tested Term::ReadLine::Stub with -CS option:
$ PERL_RL=Stub perl -CS p.pl
the output is now:
Enter message: å
SV = PV(0xf97260) at 0xfd90c8
REFCNT = 1
FLAGS = (PADMY,POK,pPOK,UTF8)
PV = 0x10746e0 "\303\203\302\245"\0 [UTF8 "\x{c3}\x{a5}"]
CUR = 4
LEN = 10
and the output string $ans has been doubly encoded, so the output is corrupted.. Is this a bug, or is it expected behavior?

As explained by Denis Ibaev in his answer, the problem is that Term::ReadLine does not read STDIN, it opens a new input filehandle. As an alternative to calling binmode($term->IN, ':utf8'), it turns out one can make either of command line option -CS or use open qw( :std :utf8) work out of the box with Term::ReadLine by supplying STDIN as an argument to Term::ReadLine->new(), as explained in the answer to this question: Term::Readline: encoding-question.
For example:
use strict;
use utf8;
use open qw( :std :utf8 );
use warnings;
use Term::ReadLine;
my $term = Term::ReadLine->new('Test', \*STDIN, \*STDOUT);
my $answer = $term->readline( 'Enter input: ' );

Term::ReadLine does not read STDIN, it opens new filehandle. And so use open qw(:std :utf8); has no effect.
You need to do something like this:
my $term = Term::ReadLine->new('name');
binmode($term->IN, ':utf8');
Update about -CS:
Option -C sets some value to the magic variable ${^UNICODE}. -CS (or -CI) option makes expression ${^UNICODE} & 0x0001 true. And Term::ReadLine sets UTF-8 flag on for input string if ${^UNICODE} & 0x0001 is true.
Notice, option -CS is different from binmode($term->IN, ':utf8'). The first of which sets UTF-8 flag only, and the second encodes string.

View Perl Variables as Bytes/Bits

Disclaimer: It's been ages since I've done any perl, so if I'm asking/saying something stupid please correct me.
Is it possible to view a byte/bit representation of a perl variable? That is, if I say something like
my $foo = 'a';
I know (think?) the computer sees $foo as something like
0b1100010
Is there a way to get perl to print out the binary representation of a variable?
(Not asking for any practical purpose, just tinkering around with a old friend and trying to understand it more deeply than I did in 1997)

Sure, using unpack:
print unpack "B*", $foo;
Example:
% perl -e 'print unpack "B*", "bar";'
011000100110000101110010
The perldoc pages for pack and perlpacktut give a nice overview about converting between different representations.

The place to start if you want the actual internals is a document called "perlguts". Either perldoc perlguts or read it here: http://perldoc.perl.org/perlguts.html

After seeing the way that Andy interpreted your question, I can follow up by saying that Devel::Peek has a Dump function which can show the internal representation of a variable. It won't take it to the binary level, but if what you are interested in is the internals, you might look at this.
$ perl -MDevel::Peek -e 'my $foo="a";Dump $foo';
SV = PV(0x7fa8a3004e78) at 0x7fa8a3031150
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0x7fa8a2c06190 "a"\0
CUR = 1
LEN = 16
$ perl -MDevel::Peek -e 'my %bar=(x=>"y",a=>"b");Dump \%bar'
SV = IV(0x7fbc5182d6e8) at 0x7fbc5182d6f0
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x7fbc51831168
SV = PVHV(0x7fbc5180c268) at 0x7fbc51831168
REFCNT = 2
FLAGS = (PADMY,SHAREKEYS)
ARRAY = 0x7fbc5140f9f0 (0:6, 1:2)
hash quality = 125.0%
KEYS = 2
FILL = 2
MAX = 7
RITER = -1
EITER = 0x0
Elt "a" HASH = 0xca2e9442
SV = PV(0x7fbc51804f78) at 0x7fbc51807340
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x7fbc5140fa60 "b"\0
CUR = 1
LEN = 16
Elt "x" HASH = 0x9303a5e5
SV = PV(0x7fbc51804e78) at 0x7fbc518070d0
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x7fbc514061a0 "y"\0
CUR = 1
LEN = 16

And one more way:
printf "%v08b\n", 'abc';
output:
01100001.01100010.0110001
(The v flag is a perl-only printf/sprintf feature and also works with numeric formats other than b.)
This differs from the unpack suggestion where there are characters greater than "\xff": unpack will only return the 8 low bits (with a warning), printf '%v...' will show all the bits:
$ perl -we'printf "%vX\n", "\cA\13P\x{1337}"'
1.B.50.1337

You can use ord to return the numeric value of a character, and printf with a %b format to display that value in binary.
print "%08b\n”, ord 'a'
output
01100010

How do I unpack a double-precision value in Perl?

From this question:
bytearray - Perl pack/unpack and length of binary string - Stack Overflow
I've learned that #unparray = unpack("d "x5, $aa); in the snippet below results with string items in the unparray - not with double precision numbers (as I expected).
Is it possible to somehow obtain an array of double-precision values from the $aa bytestring in the snippet below?:
$a = pack("d",255);
print length($a)."\n";
# prints 8
$aa = pack("ddddd", 255,123,0,45,123);
print length($aa)."\n";
# prints 40
#unparray = unpack("d "x5, $aa);
print scalar(#unparray)."\n";
# prints 5
print length($unparray[0])."\n"
# prints 3
printf "%d\n", $unparray[0] '
# prints 255
# one liner:
# perl -e '$a = pack("d",255); print length($a)."\n"; $aa = pack("ddddd", 255,123,0,45,123); print length($aa)."\n"; #unparray = unpack("d "x5, $aa); print scalar(#unparray)."\n"; print length($unparray[0])."\n" '
Many thanks in advance for any answers,
Cheers!

What makes you think it's not stored as a double?
use feature qw( say );
use Config qw( %Config );
use Devel::Peek qw( Dump );
my #a = unpack "d5", pack "d5", 255,123,0,45,123;
say 0+#a; # 5
Dump $a[0]; # NOK (floating point format)
say $Config{nvsize}; # 8 byte floats on this build

Sorry, but you've misunderstood hobbs' answer to your earlier question.
$unparray[0] is a double-precision floating-point value; but length is not like (say) C's sizeof operator, and doesn't tell you the size of its argument. Rather, it converts its argument to a string, and then tells you the length of that string.
For example, this:
my $a = 3.0 / 1.5;
print length($a), "\n";
will print this:
1
because it sets $a to 2.0, which gets stringified as 2, which has length 1.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

how to convert from gbk encoding to utf-8 encoding in Perl - perl

I have a simple question which I do not know how to solve in Perl. I know how to convert from utf-8 to GBK, for example, from e4b8ad to d6d0. But I am not sure how to go backward, i.e. given d6d0, how do I know e4b8ad. Please enlighten me! Many thanks.

Related

Perl: Packing a sequence of bytes into a string

how to print the result from pack function?

Using Term::ReadLine with Unicode input

View Perl Variables as Bytes/Bits

How do I unpack a double-precision value in Perl?

Categories

Resources