Print and save data in binary format in Perl - perl

My script generates some very very huge files, and I am trying to print/save the output in a binary format to reduce the file size as much as possible!
Each time that script generates five values, like:
$a1 = 1.64729
$a2 = 4.33329
$a3 = 3.55724
$a4 = 1.45759
$a5 = 7.474700
It prints in the output like:
A:1.64729,4.33329,3.55724,1.45759,7.474700
I am not sure whether this is the best way, but I thought to pack each row when it is printing to the output! I used pack/unpack built-in function in Perl!
I had a look at perldoc, but I did not understand which format specifiers were proper (???)!
#!/usr/bin/perl
...
#A = ($a1,$a2,$a3,$a4,$a5);
print pack ("???", ("A:", join(",", map { sprintf "%.1f", $_ } #A)), "\n";

If you compress the file (instead of trying to write binary bytes) you will get a small file. That's because your entire file will have mostly the ten digit characters, plus a decimal point, and a comma.
You can compress a file as you write it via IO::Zlib. This will use either the Zlib library, or the gzip command.
However, if you want to use pack, go ahead. Get the Camel Book which gives much clearer documentation than the standard Perldoc.
It's not all that difficult:
my $output = "A:1.64729,4.33329,3.55724,1.45759,7.474700";
$output =~ s/^A://; #Remove the 'A:'
my #numbers = split /,/, $output # Make into an array
my $packed = pack "d5", #numbers; # Pack five inputs as floating point numbers
say join ",", "d5", $packed; # Unpacks those five decimal encoded numbers
You'll probably have to use syswrite and sysread since aren't reading and writing strings. This is unbuffered reading and writing, and you have to specify the number of bytes you're reading or writing.
One more thing: If you know where the decimal point is in the number (that is, it's always a number between 1 and up to 10) you can convert the number into an integer which will allow you to pack the number into an even smaller number of bytes:
my $output = "A:1.64729,4.33329,3.55724,1.45759,7.474700";
$output =~ s/^A://; #Remove the 'A:'
$output =~ s/,//g; #Remove all the decimal points
my #numbers = split /,/, $output # Make into an array
my $packed = pack "L5", #numbers; # Pack five inputs as unsigned long numbers

Related

unable to convert string to hex in PERL

I am parsing a file which consists of decimal as well as hexadecimal values separated by ":":
foreach $line (<INFO>) {
my ($seq_no, $size_in_bytes, $Hitcount, $buffer) = split /:/, $line;
# $size in_bytes is a hexadecimal value.
print "check 1 $size_in_bytes\n"; # printing some value in hexadecimal
$size_in_bytes = hex($size_in_bytes);
print "check 2 $size_in_bytes\n"; # Printing ZERO??
}
I tried below approach also but still it is giving ZERO only.
$dec_num = sprintf("%d", hex($num));
Can you please tell me how can I convert string to Decimal
Since the problem is with superfluous spaces in your fields, you should split like this instead
split /\s*:\s*/, $line
That way the spaces will be removed if there are any, but the split will still work fine if not.

Perl Pack Unpack and Reverse Byte Order

I'm trying to write a script to find hex strings in a text file and convert them to their reverse byte order. The trouble I'm having is that some of the hex strings are 16 bit and some are 64 bits. I've used Perl's pack to pack and unpack the 16 bit hex numbers and that works fine, but the 64 bit does not.
print unpack("H*", (pack('I!', 0x20202032))). "\n"; #This works, gives 32202020
#This does not
print unpack("H*", (pack('I!', 0x4f423230313430343239303030636334))). "\n";
I've tried the second with the q and Q (where I get ffffffffffffffff). Am I approaching this all wrong?
As bit of background, I've got a multi-gigabyte pipe-delimited text file that has hex strings in reverse byte order as explained above. Also, the columns of the file are not standard; sometimes the hex strings appear in one column, and sometimes in another. I need to convert the hex strings to their reverse byte order.
Always use warnings;. If you do, you'll get the following message:
Integer overflow in hexadecimal number at scratch.pl line 8.
Hexadecimal number > 0xffffffff non-portable at scratch.pl line 8.
These can be resolved by use bigint; and by changing your second number declaration to hex('0x4f423230313430343239303030636334').
However, that number is still too large for pack 'I' to be able to handle.
Perhaps this can be done using simple string manipulation:
use strict;
use warnings;
my #nums = qw(
0x20202032
0x4f423230313430343239303030636334
);
for (#nums) {
my $rev = join '', reverse m/([[:xdigit:]]{2})/g;
print "$_ -> 0x$rev\n"
}
__END__
Outputs:
0x20202032 -> 0x32202020
0x4f423230313430343239303030636334 -> 0x3463633030303932343034313032424f
Or to handle digits of non-even length:
my $rev = $_;
$rev =~ s{0x\K([[:xdigit:]]*)}{
my $hex = $1;
$hex = "0$hex" if length($hex) % 2;
join '', reverse $hex =~ m/(..)/g;
}e;
print "$_ -> $rev\n"
To be pedantic, the hex numbers in your example are 32-bit and 128-bit long, not 16 and 64. If the longest one was only 64-bit long, you could successfully use the Q pack template as you supposed (provided hat your perl has been compiled to support 64-bit integers).
The pack/unpack solution can be used anyway (if with the addition of a reverse - you also have to remove the leading 0x from the hex strings or trim the last two characters from the results):
print unpack "H*", reverse pack "H*", $hex_string;
Example with your values:
perl -le 'print unpack "H*", reverse pack "H*", "4f423230313430343239303030636334"'
3463633030303932343034313032424f

convert ASCII code to number

I have one file, which contain 12 columns (bam file), the eleventh column contain ASCII code. In one file I have more than one ASCII code. I would like to convert it to number.
I think it this code:
perl -e '$a=ord("ALL_ASCII_CODES_FROM-FILE"); print "$a\t"'
And I would like to create for cycle to read all ASCII codes, which are in eleventh column, convert it to number and count the results to one number.
You need to split the string into individual characters, loop over every character, and call ord in the loop.
my #codes = map ord, split //, $str;
say join '.', map { sprintf("%02X", $_) } #codes;
Conveniently, unpack 'C*' does all of that.
my #codes = unpack 'C*', $str;
say join '.', map { sprintf("%02X", $_) } #codes;
If you do intend to print it out in hex, you can use the v modifier in a printf.
say sprintf("%v02X", $str);
The natural tool to convert a string of characters into a list of the corresponding ASCII codes would be unpack:
my #codes = unpack "C*", $string;
In particular, assuming that you're parsing the QUAL column of an SAM file (or, more generally, any FASTQ-style quality string, I believe the correct conversion would be:
my #qual = map {$_ - 33} unpack "C*", $string;
Ps. From your mention of "columns", I'm assuming you're actually parsing a SAM file, not a BAM file. If I'm reading the spec correctly, the BAM format doesn't seem to use the +33 offset for quality values, so if you are parsing BAM files, you'd simply use the first example above for that.

How to get number of bytes consumed by unpack

Is there a way to get number of bytes that "consumed" by an 'unpack' call?
I just want to parse(unpack) different structures from a long string in several steps, like following:
my $record1 = unpack "TEMPLATE", substr($long_str, $pos);
# Advance position pointer
$pos += NUMBER_OF_BYTES_CONSUMED_BY_LAST_UNPACK();
# Other codes that might determin what to read in following steps
# ...
# Read again at the new position
my $record2 = unpack "TEMPLATE2", substr($long_str, $pos);
This does seem like a glaring omission in unpack, doesn't it? As a consolation prize, you could use an a* to the end of the unpack template to return the unused portion of the input string.
# The variable-length "w" format is to make the example slightly more interesting.
$x = pack "w*", 126..129;
while(length $x) {
# unpack one number, keep the rest packed in $x
($n, $x) = unpack "wa*", $x;
print $n;
}
If your packed string is really long, this is not a good idea since it has to make a copy of the "remainder" portion of the string every time you do an unpack.
You can add the character . to the end of the format string:
my (#ary) = unpack("a4v3a*.", "abcdefghijklmn");
say for #ary;
Output:
abcd
26213
26727
27241
klmn
14 # <-- 14 bytes consumed
This was cleverly hidden in the perl5100delta file. If it is documented in perlfunc somewhere, I cannot find it.

Perl pack/unpack/shift

I've been having this problem in Perl for a few days now, and after scouring countless man pages, perldocs and googling too many search terms, hopefully someone here can help me out.
I am given two strings which represent hex values, i.e. "FFFF", not the Perl hex number 0xFFFF. Given two of these strings, I wish to convert them to binary form, perform a bitwise AND of the two, then take the output of this and examine each bit from LSB to MSB.
I have two problems right now; converting the hex string into a hex number, and shifting
the result of the bitwise AND.
For converting the hex string into a hex number, I've tried the following approaches which don't seem to work when I print them out to examine:
$a = unpack("H*", pack("N*", $a));
$a = sprintf("%H", $a);
Using a 'print' to examine each of these does not show a correct value, nor does using 'sprintf' either...
The second problem I have occurs after I perform a bitwise AND, and I want to examine each bit by shifting right by 1. To avoid the previous problem, I used actual Perl hex numbers instead of hex strings (0xffff instead of "ffff"). If I try to perform a shift right as follows:
#Convert from hex number to binary number
$a = sprintf("%B", $a);
$b = sprintf("%B", $b);
$temp = pack("B*", $a) & pack("B*", $b);
$output = unpack("B*", $temp);
At this point everything looks fine, and using a 'print' I can see that the values of the AND operation look right, but when I try to shift as follows:
$output = pack("B*", $output);
$output = $output >> 1;
$output = unpack("B*", $output);
The resulting value I get is in binary form but not correct.
What is the correct way of performing this kind of operation?
There's no such thing as a "hex number". A number is a number, a hexadecimal representation of a number is just that - a representation.
Just turn it into a number and use bitwise and.
my $num = (hex $a) & (hex $b);
print ($num & 1, "\n") while ($num >>= 1)