How to get number of bytes consumed by unpack - perl

Is there a way to get number of bytes that "consumed" by an 'unpack' call?
I just want to parse(unpack) different structures from a long string in several steps, like following:
my $record1 = unpack "TEMPLATE", substr($long_str, $pos);
# Advance position pointer
$pos += NUMBER_OF_BYTES_CONSUMED_BY_LAST_UNPACK();
# Other codes that might determin what to read in following steps
# ...
# Read again at the new position
my $record2 = unpack "TEMPLATE2", substr($long_str, $pos);

This does seem like a glaring omission in unpack, doesn't it? As a consolation prize, you could use an a* to the end of the unpack template to return the unused portion of the input string.
# The variable-length "w" format is to make the example slightly more interesting.
$x = pack "w*", 126..129;
while(length $x) {
# unpack one number, keep the rest packed in $x
($n, $x) = unpack "wa*", $x;
print $n;
}
If your packed string is really long, this is not a good idea since it has to make a copy of the "remainder" portion of the string every time you do an unpack.

You can add the character . to the end of the format string:
my (#ary) = unpack("a4v3a*.", "abcdefghijklmn");
say for #ary;
Output:
abcd
26213
26727
27241
klmn
14 # <-- 14 bytes consumed
This was cleverly hidden in the perl5100delta file. If it is documented in perlfunc somewhere, I cannot find it.

Related

Ip to hexa formatting

i need to convert decimal ip to hexa value.
example:110.1.1.3 to 6e01:103.
But by using below code i am getting it in 6e01103. I need it either 6e01:103 or 6e:01:103 format. And then need to concatenate with hexa value 64:ff9b::, my end output needd to be 64:ff9b::6e01:103. Kindly help me in this.
sub ip_hexa($){
my $ip = shift;
my #octets = split /\./, $ip;
my $result;
foreach (#octets){
$hexa_ip = join":",printf("%02x", "$_");
}
return $hexa_ip;
}
I'm not completely certain about the output you want, but there are a few issues with the code which I'll list below:
The $ in the function declaration is not required. It sets the function's prototype which most likely does not do what you think it does. See perlsub for details.
$hexa_ip should be declared before being used as good practice to prevent hard to find errors. Perhaps you meant my $hexa_ip instead of my $result? In any case, use use strict at the start of the program to catch such errors.
printf() prints to screen and only returns a boolean. Look at sprintf for the right function to use.
join() is not being used correctly. See join.
# 6e01:103
sprintf "%x:%x",
unpack 'nn',
pack 'C4',
split /\./,
'110.1.1.3'
# 6e:01:103
sprintf "%x:%x:%x",
unpack 'CCn',
pack 'C4',
split /\./,
'110.1.1.3'
The sprintf lines can be replaced with join ':', map sprintf '%x',

Print and save data in binary format in Perl

My script generates some very very huge files, and I am trying to print/save the output in a binary format to reduce the file size as much as possible!
Each time that script generates five values, like:
$a1 = 1.64729
$a2 = 4.33329
$a3 = 3.55724
$a4 = 1.45759
$a5 = 7.474700
It prints in the output like:
A:1.64729,4.33329,3.55724,1.45759,7.474700
I am not sure whether this is the best way, but I thought to pack each row when it is printing to the output! I used pack/unpack built-in function in Perl!
I had a look at perldoc, but I did not understand which format specifiers were proper (???)!
#!/usr/bin/perl
...
#A = ($a1,$a2,$a3,$a4,$a5);
print pack ("???", ("A:", join(",", map { sprintf "%.1f", $_ } #A)), "\n";
If you compress the file (instead of trying to write binary bytes) you will get a small file. That's because your entire file will have mostly the ten digit characters, plus a decimal point, and a comma.
You can compress a file as you write it via IO::Zlib. This will use either the Zlib library, or the gzip command.
However, if you want to use pack, go ahead. Get the Camel Book which gives much clearer documentation than the standard Perldoc.
It's not all that difficult:
my $output = "A:1.64729,4.33329,3.55724,1.45759,7.474700";
$output =~ s/^A://; #Remove the 'A:'
my #numbers = split /,/, $output # Make into an array
my $packed = pack "d5", #numbers; # Pack five inputs as floating point numbers
say join ",", "d5", $packed; # Unpacks those five decimal encoded numbers
You'll probably have to use syswrite and sysread since aren't reading and writing strings. This is unbuffered reading and writing, and you have to specify the number of bytes you're reading or writing.
One more thing: If you know where the decimal point is in the number (that is, it's always a number between 1 and up to 10) you can convert the number into an integer which will allow you to pack the number into an even smaller number of bytes:
my $output = "A:1.64729,4.33329,3.55724,1.45759,7.474700";
$output =~ s/^A://; #Remove the 'A:'
$output =~ s/,//g; #Remove all the decimal points
my #numbers = split /,/, $output # Make into an array
my $packed = pack "L5", #numbers; # Pack five inputs as unsigned long numbers

perl-how to treat a string as a binary number?

Read a file that contains an address and a data, like below:
#0, 12345678
#1, 5a5a5a5a
...
My aim is to read the address and the data. Consider the data I read is in hex format, and then I need to unpack them to binary number.
So 12345678 would become 00010010001101000101011001111000
Then, I need to further unpack the transferred binary number to another level.
So it becomes, 00000000000000010000000000010000000000000001000100000001000000000000000100000001000000010001000000000001000100010001000000000000
They way I did is like below
while(<STDIN>) {
if (/\#(\S+)\s+(\S+)/) {
$addr = $1;
$data = $2;
$mem{$addr} = ${data};
}
}
foreach $key (sort {$a <=> $b} (keys %mem)) {
my $str = unpack ('B*', pack ('H*',$mem{$key}));
my $str2 = unpack ('B*', pack ('H*', $str));
printf ("#%x ", $key);
printf ("%s",$str2);
printf ("\n");
}
It works, however, my next step is to do some numeric operation on the transferred bits.
Such as bitwise or and shifting. I tried << and | operator, both are for numbers, not strings. So I don't know how to solve this.
Please leave your comments if you have better ideas. Thanks.
You can employ Bit::Vector module from metaCPAN
use strict;
use warnings;
use Bit::Vector;
my $str = "1111000011011001010101000111001100010000001111001010101000111010001011";
printf "orig str: %72s\n", $str;
#only 72 bits for better view
my $vec = Bit::Vector->new_Bin(72,$str);
printf "vec : %72s\n", $vec->to_Bin();
$vec->Move_Left(2);
printf "left 2 : %72s\n", $vec->to_Bin();
$vec->Move_Right(4);
printf "right 4 : %72s\n", $vec->to_Bin();
prints:
orig str: 1111000011011001010101000111001100010000001111001010101000111010001011
vec : 001111000011011001010101000111001100010000001111001010101000111010001011
left 2 : 111100001101100101010100011100110001000000111100101010100011101000101100
right 4 : 000011110000110110010101010001110011000100000011110010101010001110100010
If you need do some math with arbitrary precision, you can also use Math::BigInt or use bigint (http://perldoc.perl.org/bigint.html)
Hex and binary are text representation of numbers. Shifting and bit manipulations are numerical operations. You want a number, not text.
my $hex = '5a5a5a5a';
$num = hex($hex); # Convert to number.
$num >>= 1; # Manipulate the number.
$hex = sprintf('%08X', $num); # Convert back to hex.
In a comment, you mention you want to deal with 256 bit numbers. The native numbers don't support that, but you can use Math::BigInt.
My final solution of this is forget about treat them as numbers, just treat them as string . I use substring and string concentration instead of shift. Then for the or operation , I just add each bit of the string, if it's 0 the result is 0, else is 1.
It may not be the best way to solve this problem. But that's the way I finally used.

Perl Cryptology: Encrypting/Decrypting ASCII chracters with pack and unpack functions

I need help figuring out how these two subroutines work and what values or data structures they return. Here's a minimal representation of the code:
#!/usr/bin/perl
use strict; use warnings;
# an array of ASCII encrypted characters
my #quality = ("C~#p)eOA`/>*", "DCCec)ds~~", "*^&*"); # for instance
# input the quality
# the '#' character in front deferences the subroutine's returned array ref
my #q = #{unpack_qual_to_phred(#quality)};
print pack_phred_to_qual(\#q) . "\n";
sub unpack_qual_to_phred{
my ($qual)=#_;
my $upack_code='c' . length($qual);
my #q=unpack("$upack_code",$qual);
for(my $i=0;$i<#q;$i++){
$q[$i]-=64;
}
return(\#q);
}
sub pack_phred_to_qual{
my ($q_ref)=#_;
#q=#{$q_ref};
for(my $i=0;$i<#q;$i++){
$q[$i]+=64;
}
my $pack_code='c' . int(#q);
my $qual=pack("$pack_code",#q);
return ($qual);
}
1;
From my understanding, the unpack_qual_to_phread() subroutine apparently decrypts the ASCII character elements stored in #quality. The subroutine reads in an array containing elements of ASCII characters. Each element of the array is processed and apparently decrypted. The subroutine then returns an array ref containing elements of the decrypted array. I understand this much however I'm not really familiar with the Perl functions pack and unpack. Also I was unable to find any good examples of them online.
I think the pack_phred_to_qual subroutine converts the quality array ref back into ASCII characters and prints them.
thanks. any help or suggestions are greatly appreciated. Also if someone could provide a simple example of how Perl's pack and unpack functions work that would help too.
Calculating the length is needless. Those functions can be simplified to
sub unpack_qual_to_phred { [ map $_ - 64, unpack 'c*', $_[0] ] }
sub pack_phred_to_qual { pack 'c*', map $_ + 64, #{ $_[0] } }
In encryption terms, it's a crazy simple substitution cypher. It simply subtracts 64 from the character number of each character. It could have been written as
sub encrypt { map $_ - 64, #_ }
sub decrypt { map $_ + 64, #_ }
The pack/unpack doesn't factor in the encryption/decryption at all; it's just a way of iterating over each byte.
It is fairly simple, as packs go. Is is calling unpack("c12", "C~#p)eOA/>*)` which takes each letter in turn and finds the ascii value for that letter, and then subtracts 64 from the value (well, subtracting 64 is a post-processing step, nothing to do with pack). So letter "C" is ascii 67 and 67-64 is 3. Thus the first value out of that function is a 3. Next is "~" which is ascii 126. 126-64 is 62. Next is # which is ascii 35, and 35-64 is -29, etc.
The complete set of numbers being generated from your script is:
3,62,-29,48,-23,37,15,1,32,-17,-2,-22
The "encryption" step simply reverses this process. Adds 64 and then converts to a char.
This is not a full answer to your question, but did you read perlpacktut? Or the pack/unpack docs on perldoc? Those will probably go a long way to helping you understand.
EDIT:
Here's a simple way to think of it: say you have a 4-byte number stored in memory, 1234. If that's in a perl scalar, $num, then
pack('s*', $num)
would return
π♦
or whatever the actual internal storage value of "1234" is. So pack() treated the scalar value as a string, and turned it into the actual binary representation of the number (you see "pi-diamond" printed out, because that's the ASCII representation of that number). Conversely,
unpack('s*', "π♦")
would return the string "1234".
The unpack() part of your unpack_qual_to_phred() subroutine could be simplified to:
my #q = unpack("c12", "C~#p)e0A`/>*");
which would return a list of ASCII character pairs, each pair corresponding to a byte in the second argument.

Read chunks of data in Perl

What is a good way in Perl to split a line into pieces of varying length, when there is no delimiter I can use. My data is organized by column length, so the first variable is in positions 1-4, the second variable is positions 5-15, etc. There are many variables each with different lengths.
Put another way, is there some way to use the split function based on the position in the string, not a matched expression?
Thanks.
Yes there is. The unpack function is well-suited to dealing with fixed-width records.
Example
my $record = "1234ABCDEFGHIJK";
my #fields = unpack 'A4A11', $record; # 1st field is 4 chars long, 2nd is 11
print "#fields"; # Prints '1234 ABCDEFGHIJK'
The first argument is the template, which tells unpack where the fields begin and end. The second argument tells it which string to unpack.
unpack can also be told to ignore character positions in a string by specifying null bytes, x. The template 'A4x2A9' could be used to ignore the "AB" in the example above.
See perldoc -f pack and perldoc perlpacktut for in-depth details and examples.
Instead of using split, try the old-school substr method:
my $first = substr($input, 0, 4);
my $second = substr($input, 5, 10);
# etc...
(I like the unpack method too, but substr is easier to write without consulting the documentation, if you're only parsing out a few fields.)
You could use the substr() function to extract data by offset:
$first = substr($line, 0, 4);
$second = substr($line, 4, 11);
Another option is to use a regular expression:
($first, $second) = ($line =~ /(.{4})(.{11})/);