Can you extend pack() to handle custom, variable length fields? - perl

The Bitcoin protocol, in order to save space, encodes their integers using what they call variable length integers or varints. The first byte of the varint encodes its length and its interpretation:
FirstByte Value
< 0xfd treat the byte itself as an 8 bit integer
0xfd next 2 bytes form a 16 bit integer
0xfe next 4 bytes form a 32 bit integer
0xff next 8 bytes form a 64 bit integer
(All ints are little endian and unsigned). I wrote the following function to unpack varints:
my $varint = "\xfd\x00\xff"; # \x00\xff in little endian == 65280
say unpack_varint($varint); # print 65280
sub unpack_varint{
my $v = shift;
my $first_byte = unpack "C", $v;
say $first_byte;
if ($first_byte < 253) { # \xfd == 253
return $first_byte;
}
elsif ($first_byte == 253){
return unpack "S<", substr $v, 1, 2;
}
elsif ($first_byte == 254){
return unpack "L<", substr $v, 1, 4;
}
elsif ($first_byte == 255){
return unpack "Q<", substr $v, 1, 8;
}
else{
die "error";
}
}
This works... but its very inelegant b/c if I have a long bytestring with embedded varints, I would have to read up to the beginning of the varint, pass the remainder to the function above, find out how long the encoded varint was, etc. etc. Is there a better way to write this? In particular, can I somehow extend pack() to support this kind of structure?

You can create a set of shift_$type functions that read and delete some value at the beginning of the given string, so your code becomes something as the following:
my $buffer = ...;
my $val1 = shift_varint($buffer);
my $val2 = shift_string($buffer);
my $val3 = shift_uint32($buffer);
...
You can also add a multirecord "shifter":
my ($val1, $val2, $val3) = shift_multi($buffer, qw(varint string uint32));
If you need more speed you could also write a compiler which can convert a set of types into an unpacker sub.

Related

Perl script to convert a binary number to a decimal number

I have to write a Perl script that converts a binary number, specified as an
argument, to a decimal number. In the question there's a hint to use the reverse function.
We have to assume that the binary number is in this format
EDIT: This is what I've progressed to (note this is code from my textbook that I've messed with):
#!/usr/bin/perl
# dec2.pl: Converts decimal number to binary
#
die("No arguments\n") if ( $#ARGV == -1 ) ;
foreach $number (#ARGV) {
$original_number = $number ;
until ($number == 0 ) {
$bit = $number % 2 ;
unshift (#bit_arr, $bit) ;
$number = int($number / 2 );
}
$binary_number = join ("", #bit_arr) ;
print reverse ("The decimal number of $binary_number is $original_number\n");
$#bit_arr = -1;
}
When executed:
>./binary.pl 8
The decimal number of 1000 is 8
I don't know how to word it to make the program know to add up all of the 1's in the number that is inputted.
You could just use sprintf to do the converting for you...
sprintf("%d", 0b010101); # Binary string 010101 -> Decimal 21
sprintf("%b", 21) # Decimal 21 -> Binary 010101 string
Of course, you can also just eval a binary string with 0b in front to indicate binary:
my $binary_string = '010101';
my $decimal = eval("0b$binary"); # 21
You don't have to use reverse, but it makes it easy to think about the problem with respect to exponents and array indices.
use strict;
use warnings;
my $str = '111110100';
my #bits = reverse(split(//, $str));
my $sum = 0;
for my $i (0 .. $#bits) {
next unless $bits[$i];
$sum += 2 ** $i;
}
First of all, you are suppose to convert from a binary to decimal, not the other way around, which you means you take an input like $binary = '1011001';.
The first thing you need to do is obtain the individual bits (a0, a1, etc) from that. We're talking about splitting the string into its individual digits.
for my $bit (split(//, $binary)) {
...
}
That should be a great starting point. With that, you have all that you need to apply the following refactoring of the formula you posted:
n = ( ( ( ... )*2 + a2 )*2 + a1 )*2 + a0
[I have no idea why reverse would be recommended. It's possible to use it, but it's suboptimal.]

perl-how to treat a string as a binary number?

Read a file that contains an address and a data, like below:
#0, 12345678
#1, 5a5a5a5a
...
My aim is to read the address and the data. Consider the data I read is in hex format, and then I need to unpack them to binary number.
So 12345678 would become 00010010001101000101011001111000
Then, I need to further unpack the transferred binary number to another level.
So it becomes, 00000000000000010000000000010000000000000001000100000001000000000000000100000001000000010001000000000001000100010001000000000000
They way I did is like below
while(<STDIN>) {
if (/\#(\S+)\s+(\S+)/) {
$addr = $1;
$data = $2;
$mem{$addr} = ${data};
}
}
foreach $key (sort {$a <=> $b} (keys %mem)) {
my $str = unpack ('B*', pack ('H*',$mem{$key}));
my $str2 = unpack ('B*', pack ('H*', $str));
printf ("#%x ", $key);
printf ("%s",$str2);
printf ("\n");
}
It works, however, my next step is to do some numeric operation on the transferred bits.
Such as bitwise or and shifting. I tried << and | operator, both are for numbers, not strings. So I don't know how to solve this.
Please leave your comments if you have better ideas. Thanks.
You can employ Bit::Vector module from metaCPAN
use strict;
use warnings;
use Bit::Vector;
my $str = "1111000011011001010101000111001100010000001111001010101000111010001011";
printf "orig str: %72s\n", $str;
#only 72 bits for better view
my $vec = Bit::Vector->new_Bin(72,$str);
printf "vec : %72s\n", $vec->to_Bin();
$vec->Move_Left(2);
printf "left 2 : %72s\n", $vec->to_Bin();
$vec->Move_Right(4);
printf "right 4 : %72s\n", $vec->to_Bin();
prints:
orig str: 1111000011011001010101000111001100010000001111001010101000111010001011
vec : 001111000011011001010101000111001100010000001111001010101000111010001011
left 2 : 111100001101100101010100011100110001000000111100101010100011101000101100
right 4 : 000011110000110110010101010001110011000100000011110010101010001110100010
If you need do some math with arbitrary precision, you can also use Math::BigInt or use bigint (http://perldoc.perl.org/bigint.html)
Hex and binary are text representation of numbers. Shifting and bit manipulations are numerical operations. You want a number, not text.
my $hex = '5a5a5a5a';
$num = hex($hex); # Convert to number.
$num >>= 1; # Manipulate the number.
$hex = sprintf('%08X', $num); # Convert back to hex.
In a comment, you mention you want to deal with 256 bit numbers. The native numbers don't support that, but you can use Math::BigInt.
My final solution of this is forget about treat them as numbers, just treat them as string . I use substring and string concentration instead of shift. Then for the or operation , I just add each bit of the string, if it's 0 the result is 0, else is 1.
It may not be the best way to solve this problem. But that's the way I finally used.

convert scientific notation to decimal (not integer) in bash/perl

I have a tab delimited file with several columns (9 columns) that looks like this:
1:21468 1 21468 2.8628817609765984 0.09640845515631684 0.05034710996552612 1.0 0.012377712911711025 54.0
However in column 5 I sometimes have scientific numbers like:
8.159959468796783E-4
8.465114165595303E-4
8.703354859736187E-5
9.05132870067004E-4
I need to have all numbers in column 5 in decimal notation. From the example above:
0.0008159959468796783
0.0008465114165595303
0.00008703354859736187
0.000905132870067004
And I need to change these numbers without changing the rest of the numbers in column 5 or the rest of the file.
I know there is a similar post in Convert scientific notation to decimal in multiple fields. But in this case there was a if statement not related to the type of number present in the field, and it was for all numbers in that column. So, I'm having trouble transforming the information in there to my specific case. Can someone help me figuring this out?
Thank you!
The easyiest (and fastest) way to convert a scientific notation number in perl, to a regular notation number:
my $num = '0.12345678E5';
$num *= 1;
print "$num\n";
As Jim already proposed, one way to do this is to simply treat the number as a string and do the translation yourself. This way you're able to fully maintain your significant digits.
The following demonstrates a function for doing just that. It takes in a number that's potentially in scientific notation, and it returns the decimal representation. Works with both positive and negative exponents:
use warnings;
use strict;
while (<DATA>) {
my ($num, $expected) = split;
my $dec = sn_to_dec($num);
print $dec . ' - ' . ($dec eq $expected ? 'good' : 'bad') . "\n";
}
sub sn_to_dec {
my $num = shift;
if ($num =~ /^([+-]?)(\d*)(\.?)(\d*)[Ee]([-+]?\d+)$/) {
my ($sign, $int, $period, $dec, $exp) = ($1, $2, $3, $4, $5);
if ($exp < 0) {
my $len = 1 - $exp;
$int = ('0' x ($len - length $int)) . $int if $len > length $int;
substr $int, $exp, 0, '.';
return $sign.$int.$dec;
} elsif ($exp > 0) {
$dec .= '0' x ($exp - length $dec) if $exp > length $dec;
substr $dec, $exp, 0, '.' if $exp < length $dec;
return $sign.$int.$dec;
} else {
return $sign.$int.$period.$dec;
}
}
return $num;
}
__DATA__
8.159959468796783E-4 0.0008159959468796783
8.465114165595303E-4 0.0008465114165595303
8.703354859736187E-5 0.00008703354859736187
9.05132870067004E-4 0.000905132870067004
9.05132870067004E+4 90513.2870067004
9.05132870067004E+16 90513287006700400
9.05132870067004E+0 9.05132870067004
If you do this the simple way, by parsing as floating point and then using printf to force it to print as a decimal, you may end up with slightly different results because you're at the upper limit of precision available in double-precision format.
What you should do is split each line into fields, then examine field 5 with something like this.
($u,$d,$exp) = $field[5] =~ /(\d)\.(\d+)[Ee]([-+]\d+)/
If field[5] is in scientific notation this will give you
$u the digit before the decimal
$d the digits after the decimal
$exp the exponent
(if it's not you'll get back undefined values and can just skip the reformatting step)
Using that information you can reassemble the digits with the correct number of leading zeros and decimal point. If the exponent is positive you have to reassemble the digits but then insert the decimal point in the right place.
Once you've reformatted the value the way you want, reassemble the entire line (using, say, join) and write it out.

Trouble understanding obsfucated Perl method

I'm trying my best to decipher some Perl code and convert it into C# code so I can use it with a larger program. I've been able to get most of it converted, but am having trouble with the following method:
sub dynk {
my ($t, $s, $v, $r) = (unpack("b*", $_[0]), unpack("b*", pack("v",$_[1])));
$v^=$t=substr($t,$r=$_*$_[($_[1]>>$_-1&1)+2]).substr($t,0,$r)^$s for (1..16);
pack("b*", $v);
}
It is called like:
$sid = 0;
$rand = pack("H*", 'feedfacedeadbeef1111222233334444');
$skey = dynk($rand, $sid, 2, 3) ^ dynk(substr($dbuf, 0, 16), $sid, -1, -4);
I understand most of it except for this section:
$_*$_[($_[1]>>$_-1&1)+2]
I'm not sure how $_ is being used in that context? If someone could explain that, I think I can get the rest.
pack and unpack take a pattern, and some data, and transform this data according to the pattern. For example, pack "H*", "466F6F" treats the data as a hex string of arbitrary length, and decodes it to the bytes it represents. Here: Foo. The unpack function does the reverse, and extracts data from a binary representation to a certain format.
The "b*" pattern stands produces a bit string – unpack "b*", "42" is "0010110001001100".
The v represents one little-endian 16-bit integer.
The Perl is rather obfuscated. Here is a rewrite that simplifies some aspects.
sub dynk {
# Extract arguments: A salt, another parameter, and then two ints that determine rotation.
my ($initial, $sid, $rot_a, $rot_b) = #_;
# Unpack the initial value to a bitstring
my $temp = unpack("b*", $initial);
# Unpack the 16-bit number $sid to a bitstring
my $sid_bits = unpack("b*", pack("v", $sid));
my $v; # an accumulator
# Loop through the 16 bits of our $sid
for my $bit_number (1..16) {
# Pick the $bit_number-th bit from the $sid as an index for the data
my $bit_value = substr($sid_bits, $bit_number-1, 1);
# calculate rotation from one data argument
my $rotation = $bit_number * ( $bit_value ? $rot_b : $rot_a );
# Rotate the $temp bitstring by $rotation bits
$temp = substr($temp, $rotation) . substr($temp, 0, $rotation);
# XOR the $temp with $sid_bits
$temp = $temp ^ $sid_bits;
# ... and XOR with the $v accumulator
$v = $v ^ $temp;
}
# Pack the bitstring back to binary data, return.
return pack("b*", $v);
}
This seems to be some sort of encryption or hashing. It mainly jumbles the first argument according to the following ones. The larger $sid is, the more extra parameters are used: at least one, at most 16. Each bit is used in turn as an index, thus only two extra parameters are used. The length of the first argument stays constant in this operation, but the output is at least two bytes long.
If one of the extra arguments is zero, no rotation takes place during that loop iteration. Unititializes arguments are considered to be zero.

Perl: test for an arbitrary bit in a bit string

I'm trying to parse CPU node affinity+cache sibling info in Linyx sysfs.
I can get a string of bits, just for example:
0000111100001111
Now I need a function where I have a decimal number (e.g. 4 or 5) and I need to test whether the nth bit is set or not. So it would return true for 4 and false for 5. I could create a string by shifting 1 n number of times, but I'm not sure about the syntax, and is there an easier way? Also, there's no limit on how long the string could be, so I want to avoid decimal <-> binary conversoins.
Assuming that you have the string of bits "0000111100001111" in $str, if you do the precomputation step:
my $bit_vector = pack "b*", $str;
you can then use vec like so:
$is_set = vec $bit_vector, $offset, 1;
so for example, this code
for (0..15) {
print "$_\n" if vec $bit_vector, $_, 1;
}
will output
4
5
6
7
12
13
14
15
Note that the offsets are zero-based, so if you want the first bit to be bit 1, you'll need to add/subtract 1 yourself.
Well, this seems to work, and I'm not going for efficiency:
sub is_bit_set
{
my $bitstring = shift;
my $bit = shift;
my $index = length($bitstring) - $bit - 1;
if (substr($bitstring, $index, 1) == "1") {
return 1;
}
else {
return 0;
}
}
Simpler variant without bit vector, but for sure vector would be more efficient way to deal.
sub is_bit_set
{
my $bitstring = shift;
my $bit = shift;
return int substr($bitstring, -$bit, 1);
}