If I have a hex value produced with e.g.
my $hex = sprintf "%v02X", $packed_output
and the $packed_output is the result of pack over a series of numbers i.e.
my $packed_output = pack "L>*", map { $_->[0] << 16 | $_->[1] } #array;
is there a way from that $ hex string to get back the $packed_output?
One approach: Split the string up on period, and convert all the hex-string bytes back to numbers with hex, and then pack them all back together again:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $packed_output = pack "L>*", 1, 2, 3, 4, 5, 6;
my $hex = sprintf "%v02X", $packed_output;
# $hex is
# 00.00.00.01.00.00.00.02.00.00.00.03.00.00.00.04.00.00.00.05.00.00.00.06
my $binary = pack "(h2)*", map(hex, split(/\./, $hex));
$Data::Dumper::Useqq = 1;
print Dumper($packed_output, $binary);
# Outputs
# $VAR1 = "\0\0\0\1\0\0\0\2\0\0\0\3\0\0\0\4\0\0\0\5\0\0\0\6";
# $VAR2 = "\0\0\0\1\0\0\0\2\0\0\0\3\0\0\0\4\0\0\0\5\0\0\0\6";
It sounds like what you're really after is an easy way to have a round-trip conversion between binary and textual data, though. Using sprintf to make a hex vector string like the above is not it.
Perl comes with support for industry-standard Base64 encoding, and the older uuencode (Which is built in via pack and unpack, instead of a core module). Examples:
#!/usr/bin/env perl
use strict;
use warnings;
use MIME::Base64;
use feature qw/say/;
my $packed_output = pack "L>*", 1, 2, 3, 4, 5, 6;
# Base64
my $base64 = encode_base64($packed_output, "");
print $base64; # AAAAAQAAAAIAAAADAAAABAAAAAUAAAAG
my $decoded_b64 = decode_base64($base64);
say "It's a match!" if $packed_output eq $decoded_b64;
# uuencode
my $uuencoded = pack "u", $packed_output;
print $uuencoded; # 8`````0````(````#````!`````4````&
my ($decoded_uu) = unpack "u", $uuencoded;
say "Another match!" if $packed_output eq $decoded_uu;
Related
I am using like this,
$a = "002c459f";
$b = $a%10000;
$c = int($a/10000);
print $b; #prints 0
print $c; #prints 2
I want
$b=459f;
$c=002c;
Can anyone suggest how will I get this?
If you had used warnings, you would have gotten a warning message indicating a problem.
Since your 8-bit input is already formatted as a simple hex string, you can just use substr:
use warnings;
use strict;
my $x = '002c459f';
my $y = substr $x, 0, 4;
my $z = substr $x, 4, 4;
print "z=$z, y=$y\n";
Output:
z=459f, y=002c
It is a good practice to also use strict. I changed your variable names since a and b are special variables in Perl.
You should always use use strict; use warnings;! It would have told you that 002c459f isn't a number. (It's the hex representation of a number.) As such, you can't use division before first converting it into a number. You also used the wrong divisor (10000 instead of 0x10000).
my $a_num = hex($a_hex);
my $b_num = $a_num % 0x10000; # More common: my $b_num = $a_num & 0xFFFF;
my $c_num = int( $a_num / 0x10000 ); # More common: my $c_num = $a_num >> 16
my $b_hex = sprintf("%04x", $b_num);
my $c_hex = sprintf("%04x", $c_num);
But if you have exactly eight characters, you can use the following instead:
my ($c, $b) = unpack('a4 a4', $a);
Note: You should avoid using $a and $b as it may interfere with sort and some subs.
Input data is a hex string, regular expression can be applied to split string by 4 characters into an array.
At this point you can use result as a strings, or you can use hex() to convert hex string representation into perl's internal digital representation.
use strict;
use warnings;
use feature 'say';
my $a = "002c459f"; # value is a string
my($b,$c) = $a =~ /([\da-f]{4})/gi;
say "0x$b 0x$c\tstrings"; # values are strings
$b = hex($b); # convert to digit
$c = hex($c); # convert to digit
printf "0x%04x 0x%04x\tdigits\n", $b, $c;
Output
0x002c 0x459f strings
0x002c 0x459f digits
I have this work perl can support 4 hex numbers to swap in another 4 hex
perl -wMstrict -le '
my #bits = unpack "(A1)16", sprintf "%016b", hex shift;
my $bitmap = "D5679123C4EF80AB";
#bits = #bits[ map { hex } split //, $bitmap ];
$"="";
print sprintf "%04X", oct "0b#bits";
' "B455"
Result: CB15
please how can support more bytes like 128 bytes?
and how to use this perl to read the hex from a file.txt ?
thanks in advance.
You could try the following:
use feature qw(say);
use strict;
use warnings;
# Example with 64 bits
my $data = 'B455AB10A1230000'; # original data (64 bits)
my #bits = map { unpack '(A)*', sprintf '%08b', hex } unpack '(A2)*', $data;
my #bitmap = reverse 0..63; # some 64 bits map, replace with your actual data
my $result = unpack "H*", pack 'C*', map { oct "0b$_" } unpack "(A8)*", join '', #bits[#bitmap];
say "Input : $data";
say "Result: $result";
Output:
Input : B455AB10A1230000
Result: 0000c48508d5aa2d
I want to convert the text ( Hindi ) to Unicode in Perl. I have searched in CPAN. But, I could not find the exact module/way which I am looking for. Basically, I am looking for something like this.
My Input is:
इस परीक्षण के लिए है
My expected output is:
\u0907\u0938\u0020\u092a\u0930\u0940\u0915\u094d\u0937\u0923\u0020\u0915\u0947\u0020\u0932\u093f\u090f\u0020\u0939\u0948
How to achieve this in Perl?
Give me some suggestions.
Try this
use utf8;
my $str = 'इस परीक्षण के लिए है';
for my $c (split //, $str) {
printf("\\u%04x", ord($c));
}
print "\n";
You don't really need any module to do that. ord for extracting char code and printf for formatting it as 4-numbers zero padded hex is more than enough:
use utf8;
my $str = 'इस परीक्षण के लिए है';
(my $u_encoded = $str) =~ s/(.)/sprintf "\\u%04x", ord($1)/sge;
# \u0907\u0938\u0020\u092a\u0930\u0940\u0915\u094d\u0937\u0923\u0020\u0915\u0947\u0020\u0932\u093f\u090f\u0020\u0939\u0948
Because I left a few comments on how the other answers might fall short of the expectations of various tools, I'd like to share a solution that encodes characters outside of the Basic Multilingual Plane as pairs of two escapes: "😃" would become \ud83d\ude03.
This is done by:
Encoding the string as UTF-16, without a byte order mark. We explicitly choose an endianess. Here, we arbitrarily use the big-endian form. This produces a string of octets (“bytes”), where two octets form one UTF-16 code unit, and two or four octets represent an Unicode code point.
This is done for convenience and performance; we could just as well determine the numeric values of the UTF-16 code units ourselves.
unpacking the resulting binary string into 16-bit integers which represent each UTF-16 code unit. We have to respect the correct endianess, so we use the n* pattern for unpack (i.e. 16-bit big endian unsigned integer).
Formatting each code unit as an \uxxxx escape.
As a Perl subroutine, this would look like
use strict;
use warnings;
use Encode ();
sub unicode_escape {
my ($str) = #_;
my $UTF_16BE_octets = Encode::encode("UTF-16BE", $str);
my #code_units = unpack "n*", $UTF_16BE_octets;
return join '', map { sprintf "\\u%04x", $_ } #code_units;
}
Test cases:
use Test::More tests => 3;
use utf8;
is unicode_escpape(''), '',
'empty string is empty string';
is unicode_escape("\N{SMILING FACE WITH OPEN MOUTH}"), '\ud83d\ude03',
'non-BMP code points are escaped as surrogate halves';
my $input = 'इस परीक्षण के लिए है';
my $output = '\u0907\u0938\u0020\u092a\u0930\u0940\u0915\u094d\u0937\u0923\u0020\u0915\u0947\u0020\u0932\u093f\u090f\u0020\u0939\u0948';
is unicode_escape($input), $output,
'ordinary BMP code points each have a single escape';
If you want only an simple converter, you can use the following filter
perl -CSDA -nle 'printf "\\u%*v04x\n", "\\u",$_'
#or
perl -CSDA -nlE 'printf "\\u%04x",$_ for unpack "U*"'
like:
echo "इस परीक्षण के लिए है" | perl -CSDA -ne 'printf "\\u%*v04x\n", "\\u",$_'
#or
perl -CSDA -ne 'printf "\\u%*v04x\n", "\\u",$_' <<< "इस परीक्षण के लिए है"
prints:
\u0907\u0938\u0020\u092a\u0930\u0940\u0915\u094d\u0937\u0923\u0020\u0915\u0947\u0020\u0932\u093f\u090f\u0020\u0939\u0948\u000a
Unicode with surrogate pairs.
use strict;
use warnings;
use utf8;
use open qw(:std :utf8);
my $str = "if( \N{U+1F42A}+\N{U+1F410} == \N{U+1F41B} ){ \N{U+1F602} = \N{U+1F52B} } # ορισμός ";
print "$str\n";
for my $ch (unpack "U*", $str) {
if( $ch > 0xffff ) {
my $h = ($ch - 0x10000) / 0x400 + 0xD800;
my $l = ($ch - 0x10000) % 0x400 + 0xDC00;
printf "\\u%04x\\u%04x", $h, $l;
}
else {
printf "\\u%04x", $ch;
}
}
print "\n";
prints
if( 🐪+🐐 == 🐛 ){ 😂 = 🔫 } # ορισμός
\u0069\u0066\u0028\u0020\ud83d\udc2a\u002b\ud83d\udc10\u0020\u003d\u003d\u0020\ud83d\udc1b\u0020\u0029\u007b\u0020\ud83d\ude02\u0020\u003d\u0020\ud83d\udd2b\u0020\u007d\u0020\u0023\u0020\u03bf\u03c1\u03b9\u03c3\u03bc\u03cc\u03c2\u0020
I'm new to perl. I know I can split some constant number of characters via unpack or using regexes.
But is there some standard way to split every n characters and new lines?
Here's the string I'm looking to split:
my $str="hello\nworld";
my $num_split_chars=2;
Perhaps the following will be helpful:
use strict;
use warnings;
use Data::Dumper;
my $str = "hello\nworld";
my $num_split_chars = 2;
$num_split_chars--;
my #arr = $str =~ /.{$num_split_chars}.?/g;
print Dumper \#arr;
Output:
$VAR1 = [
'he',
'll',
'o',
'wo',
'rl',
'd'
];
For example,
my $str = '中國c'; # Chinese language of china
I want to print out the numeric values
20013,22283,99
unpack will be more efficient than split and ord, because it doesn't have to make a bunch of temporary 1-character strings:
use utf8;
my $str = '中國c'; # Chinese language of china
my #codepoints = unpack 'U*', $str;
print join(',', #codepoints) . "\n"; # prints 20013,22283,99
A quick benchmark shows it's about 3 times faster than split+ord:
use utf8;
use Benchmark 'cmpthese';
my $str = '中國中國中國中國中國中國中國中國中國中國中國中國中國中國c';
cmpthese(0, {
'unpack' => sub { my #codepoints = unpack 'U*', $str; },
'split-map' => sub { my #codepoints = map { ord } split //, $str },
'split-for' => sub { my #cp; for my $c (split(//, $str)) { push #cp, ord($c) } },
'split-for2' => sub { my $cp; for my $c (split(//, $str)) { $cp = ord($c) } },
});
Results:
Rate split-map split-for split-for2 unpack
split-map 85423/s -- -7% -32% -67%
split-for 91950/s 8% -- -27% -64%
split-for2 125550/s 47% 37% -- -51%
unpack 256941/s 201% 179% 105% --
The difference is less pronounced with a shorter string, but unpack is still more than twice as fast. (split-for2 is a bit faster than the other splits because it doesn't build a list of codepoints.)
See perldoc -f ord:
foreach my $c (split(//, $str))
{
print ord($c), "\n";
}
Or compressed into a single line: my #chars = map { ord } split //, $str;
Data::Dumpered, this produces:
$VAR1 = [
20013,
22283,
99
];
To have utf8 in your source code recognized as such, you must use utf8; beforehand:
$ perl
use utf8;
my $str = '中國c'; # Chinese language of china
foreach my $c (split(//, $str))
{
print ord($c), "\n";
}
__END__
20013
22283
99
or more tersely,
print join ',', map ord, split //, $str;
http://www.perl.com/pub/2012/04/perlunicook-standard-preamble.html
#!/usr/bin/env perl
use utf8; # so literals and identifiers can be in UTF-8
use v5.12; # or later to get "unicode_strings" feature
use strict; # quote strings, declare variables
use warnings; # on by default
use warnings qw(FATAL utf8); # fatalize encoding glitches
use open qw(:std :utf8); # undeclared streams in UTF-8
# use charnames qw(:full :short); # unneeded in v5.16
# http://perldoc.perl.org/functions/sprintf.html
# vector flag
# This flag tells Perl to interpret the supplied string as a vector of integers, one for each character in the string.
my $str = '中國c';
printf "%*vd\n", ",", $str;