Perl, Decoding bits stored in a binary file

Perl, Decoding bits stored in a binary file - perl

I'm decoding a fixed-width binary file in perl. One of the fields is 1 byte in length and the 8 bits are encoded such that:
The 7th Bit unused, The 0-4th bit is used to indicate Field1. The 5-6th Bit is used to indicate Field2.
For example the field 0x27 will give Field1 = 7, Field2 = 1.
I'm reading all the fields into a hash table called raw{} and then decoding the values into another hash called processed{}.
The code (only shown for this one byte) is
while (read(FILE, $buff, 559)) {
%raw = (); # Hash for first-pass extractions
%processed = (); # Hash for cleaned up values, for output
(
......
$raw{'Field12'},
......
)= unpack('
.....
H2
.....
h*',$buff);
$prcoessed{'Field1'} = unpack("B8", pack("H8", $raw{'Field12'})) & 0x1f;
$prcoessed{'Field2'} = (unpack("B8", pack("H8", $raw{'Field12'})) >> 5) & 0x3;
However, I don't get the desired values. I get $prcoessed{'Field1'} = 15 and $prcoessed{'Field2'} = 3. Where am I going wrong?

0x1f is a number, you need to transform it into a character.
Instead of packs and unpacks, you can use ord and chr:
#!/usr/bin/perl
use warnings;
use strict;
my $char = chr 0x27;
my $field1 = $char & chr 0x1f;
my $field2 = $char & chr 0x60;
$field2 = chr(ord($field2) >> 5);
print ord, "\n" for $field1, $field2;
The same script using pack and unpack:
my $char = pack 'C', 0x27;
my $field1 = $char & pack 'C', 0x1f;
my $field2 = $char & pack 'C', 0x60;
$field2 = pack 'C', unpack('C', $field2) >> 5;
print unpack('C', $_), "\n" for $field1, $field2;

Related

How to separate the Hex 8 bits into two 4 bits

I am using like this,
$a = "002c459f";
$b = $a%10000;
$c = int($a/10000);
print $b; #prints 0
print $c; #prints 2
I want
$b=459f;
$c=002c;
Can anyone suggest how will I get this?

If you had used warnings, you would have gotten a warning message indicating a problem.
Since your 8-bit input is already formatted as a simple hex string, you can just use substr:
use warnings;
use strict;
my $x = '002c459f';
my $y = substr $x, 0, 4;
my $z = substr $x, 4, 4;
print "z=$z, y=$y\n";
Output:
z=459f, y=002c
It is a good practice to also use strict. I changed your variable names since a and b are special variables in Perl.

You should always use use strict; use warnings;! It would have told you that 002c459f isn't a number. (It's the hex representation of a number.) As such, you can't use division before first converting it into a number. You also used the wrong divisor (10000 instead of 0x10000).
my $a_num = hex($a_hex);
my $b_num = $a_num % 0x10000; # More common: my $b_num = $a_num & 0xFFFF;
my $c_num = int( $a_num / 0x10000 ); # More common: my $c_num = $a_num >> 16
my $b_hex = sprintf("%04x", $b_num);
my $c_hex = sprintf("%04x", $c_num);
But if you have exactly eight characters, you can use the following instead:
my ($c, $b) = unpack('a4 a4', $a);
Note: You should avoid using $a and $b as it may interfere with sort and some subs.

Input data is a hex string, regular expression can be applied to split string by 4 characters into an array.
At this point you can use result as a strings, or you can use hex() to convert hex string representation into perl's internal digital representation.
use strict;
use warnings;
use feature 'say';
my $a = "002c459f"; # value is a string
my($b,$c) = $a =~ /([\da-f]{4})/gi;
say "0x$b 0x$c\tstrings"; # values are strings
$b = hex($b); # convert to digit
$c = hex($c); # convert to digit
printf "0x%04x 0x%04x\tdigits\n", $b, $c;
Output
0x002c 0x459f strings
0x002c 0x459f digits

Right shift a binary no and get the shifted bits in a variable

I have a binary no say and I have a value in variable say value = 4.
I want to right shift the binary no by no of bits stored in "value" variable and then want to store the shifted bits in a variable and also want to save binary no obtained after right shift in another variable
Example:
binary_number = 110000001
value =4
then shifting no of bits in "value" to right (11000001 >> value)
Now I want to finally have two variables one containing the binary no after shift and one variable with shifted bits.
For above example the solution that I want is
right_shifted_binary = 11000
bits_shifted = 0001
I can not find a proper documentation for the problem as most of the problem are telling about arithmetic right shift.

Generate a bit mask based on $value and use the AND (&) operator:
#!/usr/bin/perl
use warnings;
use strict;
my $binary = 0b110000001;
my $value = 4;
# create mask with $value left-most bits 1
my $mask = ~(~0 << $value);
print "INPUT: ", unpack("B*", pack("N", $binary)), " ($binary)\n";
# right shift by $value bits
my $right_shifted_binary = $binary >> $value;
print "RIGHT: ", unpack("B*", pack("N", $right_shifted_binary)), " ($right_shifted_binary)\n";
# extract "remainder" of shift using mask
my $bits_shifted = $binary & $mask;
print "REMAINDER: ", unpack("B*", pack("N", $bits_shifted)), " ($bits_shifted)\n";
exit 0;
Test run:
$ perl dummy.pl
INPUT: 00000000000000000000000110000001 (385)
RIGHT: 00000000000000000000000000011000 (24)
REMAINDER: 00000000000000000000000000000001 (1)
# Proof
$ echo "24 * 16 + 1" | bc
385
If the binary number is given as string you can convert it to an integer first:
my $binary_string = "110000001";
my $binary = unpack("N", pack("B32", substr("0" x 32 . $binary_string, -32)));
But if it is already a string then the solution would be much simpler:
#!/usr/bin/perl
use warnings;
use strict;
my $binary_string = "110000001";
my $value = 4;
print "INPUT: $binary_string\n";
print "RIGHT: ", substr($binary_string, 0, -$value), "\n";
print "REMAINDER: ", substr($binary_string, -$value), "\n";
exit 0:
$ perl dummy.pl
INPUT: 110000001
RIGHT: 11000
REMAINDER: 0001

Decoding 3-byte integer in Perl

I'm reading a binary file format that starts out with 4 constant check bytes, followed by 3 octets that indicate how long the data portion of the record will be. I can decode this as follows:
read($fh, $x, 7) or do {
last if eof;
die "Can't read: $!";
};
my ($type, $l1, $l2, $l3) = unpack("a4 C3", $x);
my $length = $l1 << 16 | $l2 << 8 | $l3;
Is there a more direct way to read that 3-byte value, without intermediate variables? Something I'm missing in the pack specifications maybe? I haven't used pack very much except for hex encoding and other dilettantish pursuits.

You could insert a null byte into the string in order to be able to use the "N" format:
substr($x, 4, 0, "\0");
my ($type, $length) = unpack "a4 N", $x;
Edit: Or else unpack in two steps:
my ($type, $length) = unpack "a4 a3", $x;
$length = unpack "N", "\0" . $length;

my $type = unpack("a4", $x);
my $len = unpack("N", "\0".substr($x, 4));
or
my ($type, $plen) = unpack("a4 a3", $x);
my $len = unpack("N", "\0$plen");

No, unpack doesn't support 3-byte (or arbitrary length) integers, but you could use an unsigned 16-bit big-endian int to save a little work:
my ($type, $l1, $l23) = unpack("a4 Cn", $x);
my $length = $l1 << 16 | $l23;

Solution: Your method for getting the type is fine. However, I suggest that you unpack the length separately as a four-byte integer, then discard the first byte of those four bytes. This is more efficient even though it overlaps and discards the last byte of the type.
my $type = unpack("a4", $x);
my $length = unpack("x3N", $x); # skips the first 3 bytes of your original 7-byte string
$length = $length & 0xFFFFFF; # returns only the last 3 bytes of the four-byte integer

Writing (and reading) bits to binary files in Perl + EOF handling

I have two related problems (in Perl):
Write data to binary files, in the format: single bit flag followed by 8 bits
Read back the same format
I tried this (and other variations but for the life of me I can't figure this out):
binmode(OUT);
my $bit = pack("B1", '1');
my $byte = pack("H2", "02");
print OUT $bit . $byte;
Using a hex editor, I see I get 16 bits:
1000000000000020
What I want is 9 bits:
100000020
Also: Suppose I write out two of these patterns. That means I end up with 9 + 9 = 18 bits. I am not sure how to handle the last byte (padding?)
This is to compress and uncompress files, with space at premium. I was hoping there would be some simple idiomatic way to do this that I am not aware of.

Files are sequences of bytes. If you want to print out bits, you'll have to use some form of buffering.
my $bits = '';
$bits .= '1'; # Add 1 bit.
$bits .= unpack('B8', pack('C', 0x02)); # Add 8 bits.
$bits .= substr(unpack('B8', pack('C', 0x02)), -6); # Add 6 bits.
This prints as much as the buffer as possible:
my $len = ( length($bits) >> 3 ) << 3;
print($fh, pack('B*', substr($bits, 0, $len, '')));
You'll eventually need to pad the buffer so that you have a multiple of 8 bits in order to flush out the rest. You could simply pad with zeroes.
$bits .= "0" x ( -length($bits) % 8 );
If you're smart, though, you can come up with a padding scheme that can be used to indicate where the file actually ends. Remember, you can't rely on just the file length anymore. If you don't use a smart padding scheme, you'll have to use another method.
One example of a smart padding scheme would be:
$bits .= "0";
$bits .= "1" x ( -length($bits) % 8 );
Then, to unpad, remove all trailing 1 bits and the 0 bit before that.

You can use Bit::Vector to manage your bits and conversion with some more ease,
use Bit::Vector;
my $bit = Bit::Vector->new_Bin( 1, '1' );
my $byte = Bit::Vector->new_Bin( 8, '00000010' );
my $byte_9 = Bit::Vector->new_Bin( 9, '000000010' );
my $nineBits = Bit::Vector->new_Bin( 9, '100000000' );
my $carry = Bit::Vector->new_Bin( 9, '000000000' );
my $ORed = Bit::Vector->new_Bin( 9, '000000000' );
my $added = Bit::Vector->new_Bin( 9, '000000000' );
$ORed->Union($nineBits,$byte_9);
print "bit: 0x". $bit->to_Hex(). "\n";
print "byte 2: 0x". $byte->to_Hex(). "\n";
print "nineBits: 0x". $nineBits->to_Hex(). "\n";
print "nineBits: 0x". $nineBits->to_Bin(). "\n";
print "ORed bit and byte 0x". $ORed->to_Dec(). "\n";
open BINOUT, ">out.bin"
or die "\nCan't open out.bin for writing: $!\n";
binmode BINOUT;
print BINOUT pack ('B*', $ORed->to_Bin()) ."\n"
Here's the output
>perl bitstuff.pl
bit: 0x1
byte 2: 0x02
nineBits: 0x100
nineBits: 0x100000000
ORed bit and byte 0x-254
>cat out.bin
\201^#

extract multiple substr and match & replace using perl

I need to extract multiple substrings at fixed positions from a line and the same time replace whitespaces at another position.
For example, I have a string '01234567890 '. I want to extract characters at positions 1,2,6,7,8 and the same time if position 12, 13 are whitespaces, I want to replace them with 0101. It is all position based.
What is the best way to achieve this using perl ?
I can use substr and string comparison and then concatenate them together, but the code looked rather chuncky....

I would probably split (or: explode) the string into an array of single chars:
my #chars = split //, $string; # // is special with split
Now we can do array slices: extracting multiple arguments at once.
use List::MoreUtils qw(all);
if (all {/\s/} #chars[12, 13]) {
#chars[12, 13] = (0, 1);
my #extracted_chars = #chars[1, 2, 6..8];
# do something with extracted data.
}
We can then turn the #chars back into a string like
$string = join "", #chars;
If you want to remove certain chars instead of extracting them, you would have to use slices inside a loop, an ugly undertaking.
Complete sub with nice interface to do this kind of thing
sub extract (%) {
my ($at, $ws, $ref) = #{{#_}}{qw(at if_whitespace from)};
$ws //= [];
my #chars = split //, $$ref;
if (all {/\s/} #chars[#$ws]) {
#chars[#$ws] = (0, 1) x int(#$ws / 2 + 1);
$$ref = join "", #chars;
return #chars[#$at];
}
return +();
}
my $string = "0123456789ab \tef";
my #extracted = extract from => \$string, at => [1,2,6..8], if_whitespace => [12, 13];
say "#extracted";
say $string;
Output:
1 2 6 7 8
0123456789ab01ef

This is two separate operations, and should be coded as such. This code seems to do what you need.
use strict;
use warnings;
my $str = 'abcdefghijab efghij';
my #extracted = map { substr $str, $_, 1 } 1, 2, 6, 7, 8;
print "#extracted\n";
for (substr $str, 12, 2) {
$_ = '01' if $_ eq ' ';
}
print $str, "\n";
output
b c g h i
abcdefghijab01efghij