Perl: test for an arbitrary bit in a bit string - perl

I'm trying to parse CPU node affinity+cache sibling info in Linyx sysfs.
I can get a string of bits, just for example:
0000111100001111
Now I need a function where I have a decimal number (e.g. 4 or 5) and I need to test whether the nth bit is set or not. So it would return true for 4 and false for 5. I could create a string by shifting 1 n number of times, but I'm not sure about the syntax, and is there an easier way? Also, there's no limit on how long the string could be, so I want to avoid decimal <-> binary conversoins.

Assuming that you have the string of bits "0000111100001111" in $str, if you do the precomputation step:
my $bit_vector = pack "b*", $str;
you can then use vec like so:
$is_set = vec $bit_vector, $offset, 1;
so for example, this code
for (0..15) {
print "$_\n" if vec $bit_vector, $_, 1;
}
will output
4
5
6
7
12
13
14
15
Note that the offsets are zero-based, so if you want the first bit to be bit 1, you'll need to add/subtract 1 yourself.

Well, this seems to work, and I'm not going for efficiency:
sub is_bit_set
{
my $bitstring = shift;
my $bit = shift;
my $index = length($bitstring) - $bit - 1;
if (substr($bitstring, $index, 1) == "1") {
return 1;
}
else {
return 0;
}
}

Simpler variant without bit vector, but for sure vector would be more efficient way to deal.
sub is_bit_set
{
my $bitstring = shift;
my $bit = shift;
return int substr($bitstring, -$bit, 1);
}

Related

Perl script to convert a binary number to a decimal number

I have to write a Perl script that converts a binary number, specified as an
argument, to a decimal number. In the question there's a hint to use the reverse function.
We have to assume that the binary number is in this format
EDIT: This is what I've progressed to (note this is code from my textbook that I've messed with):
#!/usr/bin/perl
# dec2.pl: Converts decimal number to binary
#
die("No arguments\n") if ( $#ARGV == -1 ) ;
foreach $number (#ARGV) {
$original_number = $number ;
until ($number == 0 ) {
$bit = $number % 2 ;
unshift (#bit_arr, $bit) ;
$number = int($number / 2 );
}
$binary_number = join ("", #bit_arr) ;
print reverse ("The decimal number of $binary_number is $original_number\n");
$#bit_arr = -1;
}
When executed:
>./binary.pl 8
The decimal number of 1000 is 8
I don't know how to word it to make the program know to add up all of the 1's in the number that is inputted.
You could just use sprintf to do the converting for you...
sprintf("%d", 0b010101); # Binary string 010101 -> Decimal 21
sprintf("%b", 21) # Decimal 21 -> Binary 010101 string
Of course, you can also just eval a binary string with 0b in front to indicate binary:
my $binary_string = '010101';
my $decimal = eval("0b$binary"); # 21
You don't have to use reverse, but it makes it easy to think about the problem with respect to exponents and array indices.
use strict;
use warnings;
my $str = '111110100';
my #bits = reverse(split(//, $str));
my $sum = 0;
for my $i (0 .. $#bits) {
next unless $bits[$i];
$sum += 2 ** $i;
}
First of all, you are suppose to convert from a binary to decimal, not the other way around, which you means you take an input like $binary = '1011001';.
The first thing you need to do is obtain the individual bits (a0, a1, etc) from that. We're talking about splitting the string into its individual digits.
for my $bit (split(//, $binary)) {
...
}
That should be a great starting point. With that, you have all that you need to apply the following refactoring of the formula you posted:
n = ( ( ( ... )*2 + a2 )*2 + a1 )*2 + a0
[I have no idea why reverse would be recommended. It's possible to use it, but it's suboptimal.]

Binary search—Can't use string "1" as a symbol ref while strict refs is in use

I've been browsing over the already answered questions regarding this error message.
I am trying to solve a problem from the Rosalind web site that looks for some indexes using a binary search.
When my subroutine finds the number it seems to ignore it, and if I try to print the $found variable, it gives me the error
Can't use string "1" as a symbol ref while strict refs is in use
The code is this
sub binarysearch
{
my $numbertolook = shift;
my #intarray=#_;
my $lengthint = scalar #intarray;
my #sorted = sort {$a <=> $b} #intarray;
#print $numbertolook, " " , #sorted, "\n";
my $low=0;
my $high=$lengthint-1;
my $found =undef;
my $midpoint;
while ($low<$high)
{
$midpoint=int(($low+$high)/2);
#print $midpoint, " ",$low," ", $high, " ", #sorted, "\n";
if ($numbertolook<$sorted[$midpoint])
{
$high=$midpoint;
}
elsif ($numbertolook>$sorted[$midpoint])
{
$low=$midpoint;
}
elsif ($numbertolook==$sorted[$midpoint])
{
$found=1;
print $found "\n";
last;
}
if ($low==$high-1 and $low==$midpoint)
{
if ($numbertolook==$sorted[$high])
{
$found=1;
print $found "\n";
last;
}
$low=$high;
}
}
return $found;
}
You want
print $found, "\n";
Or
print $found . "\n";
With no operator between $found and the newline, it thinks $found is the filehandle to print a newline to, and is getting an error because it isn't a filehandle.
I'll try to help
First of all, as simple as it may seem, a binary search is quite difficult to code correctly. The main reason is that it's a hotbed of off-by-one errors, which are so prevalent that they have their own Wikipedia page
The issue is that an array containing, say, the values A to Z will have 26 elements with indices 0 to 25. I think FORTRAN bucks the trend, and Lua, but pretty much every other language has the first element of an array at index zero
A zero base works pretty well for everything until you start using divide and conquer algorithms. Merge Sort as well as Binary Search are such algorithms. Binary search goes
Is it in the first half?
If so then search the first half further
Else search the second half further
The hard part is when you have to decide when you've found the object, or when you need to give up looking. Splitting data in two nearly-halves is easy. Knowing when to stop is hard
It's highly efficient for sorted data, but the problem comes when implementing it that, if we do it properly, we have to deal with all sorts of weird index bases beyond zero or one.
Suppose I have an array
my #alpha = 'A' .. 'Q'
If I print scalar #alpha I will see 17, meaning the array has seventeen elements, indexed from 0 to 16
Now I'm looking for E in that array, so I do a binary search, so I want the "first half" and the "second half" of #alpha. If I add 0 to 16 and divide by 2 I get a neat "8", so the middle element is at index 8, which is H
But wait. There are 17 elements, which is an odd number, so if we say the first eight (A .. H) are left of the middle and the last eight (I .. Q) are right of the middle then surely the "middle" is I?
In truth this is all a deception, because a binary search will work however we partition the data. In this case binary means two parts, and although the search would be more efficient if those parts could be equal in size it's not necessary for the algorithm to work. So it can be the first third and the last two-thirds, or just the first element and the rest
That's why using int(($low+high)/2) is fine. It rounds down to the nearest integer so that with our 17-element array $mid is a usable 8 instead of 8.5
But your code still has to account for some unexpected things. In the case of our 17-element array we have calculated the middle index to be 8. So indexes 0 .. 7 are the "first half" while 8 .. 16 are the "second half", and the middle index is where the second half starts
But didn't we round the division down? So in the case of an odd number of elements, shouldn't our mid point be at the end of the first half, and not the start of the second? This is an arcane off-by-one error, but let's see if it still works with a simple even number of elements
#alpha = `A` .. `D`
The start and and indices are 0 and 3; the middle index is int((0+3)/2) == 1. So the first half is 0..1 and the second half is 2 .. 3. That works fine
But there's still a lot more. Say I have to search an array with two elements X and Y. That has two clear halves, and I'm looking for A, which is before the middle. So I now search the one-element list X for A. The minimum and maximum elements of the target array are both zero. The mid-point is int((0+0)/2) == 0. So what happens next?
It is similar but rather worse when we're searching for Z in the same list. The code has to be exactly right, otherwise we will be either searching off the end of the array or checking the last element again and again
Saving the worst for last, suppose
my #alpha = ( 'A', 'B, 'Y, 'Z' )
and I'm looking for M. That lest loose all sorts of optimisations that involve checks that may may the ordinary case much slower
Because of all of this it's by far the best solution to use a library or a language's built-in function to do all of this. In particular, Perl's hashes are usually all you need to check for specific strings and any associated data. The algorithm used is vastly better than a binary search for any non-trivial data sets
Wikipedia shows this algorithm for an iterative binary search
The binary search algorithm can also be expressed iteratively with two index limits that progressively narrow the search range.
int binary_search(int A[], int key, int imin, int imax)
{
// continue searching while [imin,imax] is not empty
while (imin <= imax)
{
// calculate the midpoint for roughly equal partition
int imid = midpoint(imin, imax);
if (A[imid] == key)
// key found at index imid
return imid;
// determine which subarray to search
else if (A[imid] < key)
// change min index to search upper subarray
imin = imid + 1;
else
// change max index to search lower subarray
imax = imid - 1;
}
// key was not found
return KEY_NOT_FOUND;
}
And here is a version of your code that is far from bug-free but does what you intended. You weren't so far off
use strict;
use warnings 'all';
print binarysearch( 76, 10 .. 99 ), "\n";
sub binarysearch {
my $numbertolook = shift;
my #intarray = #_;
my $lengthint = scalar #intarray;
my #sorted = sort { $a <=> $b } #intarray;
my $low = 0;
my $high = $lengthint - 1;
my $found = undef;
my $midpoint;
while ( $low < $high ) {
$midpoint = int( ( $low + $high ) / 2 );
#print $midpoint, " ",$low," ", $high, " ", #sorted, "\n";
if ( $numbertolook < $sorted[$midpoint] ) {
$high = $midpoint;
}
elsif ( $numbertolook > $sorted[$midpoint] ) {
$low = $midpoint;
}
elsif ( $numbertolook == $sorted[$midpoint] ) {
$found = 1;
print "FOUND\n";
return $midpoint;
}
if ( $low == $high - 1 and $low == $midpoint ) {
if ( $numbertolook == $sorted[$high] ) {
$found = 1;
print "FOUND\n";
return $midpoint;
}
return;
}
}
return $midpoint;
}
output
FOUND
66
If you call print with several parameters separated with a space print expects the first one to be a filehandle. This is interprented as print FILEHANDLE LIST from the documentation.
print $found "\n";
What you want to do is either to separate with ,, to call it as print LIST.
print $found, "\n";
or to concat as strings, which will also call it as print LIST, but with only one element in LIST.
print $found . "\n";

Can you extend pack() to handle custom, variable length fields?

The Bitcoin protocol, in order to save space, encodes their integers using what they call variable length integers or varints. The first byte of the varint encodes its length and its interpretation:
FirstByte Value
< 0xfd treat the byte itself as an 8 bit integer
0xfd next 2 bytes form a 16 bit integer
0xfe next 4 bytes form a 32 bit integer
0xff next 8 bytes form a 64 bit integer
(All ints are little endian and unsigned). I wrote the following function to unpack varints:
my $varint = "\xfd\x00\xff"; # \x00\xff in little endian == 65280
say unpack_varint($varint); # print 65280
sub unpack_varint{
my $v = shift;
my $first_byte = unpack "C", $v;
say $first_byte;
if ($first_byte < 253) { # \xfd == 253
return $first_byte;
}
elsif ($first_byte == 253){
return unpack "S<", substr $v, 1, 2;
}
elsif ($first_byte == 254){
return unpack "L<", substr $v, 1, 4;
}
elsif ($first_byte == 255){
return unpack "Q<", substr $v, 1, 8;
}
else{
die "error";
}
}
This works... but its very inelegant b/c if I have a long bytestring with embedded varints, I would have to read up to the beginning of the varint, pass the remainder to the function above, find out how long the encoded varint was, etc. etc. Is there a better way to write this? In particular, can I somehow extend pack() to support this kind of structure?
You can create a set of shift_$type functions that read and delete some value at the beginning of the given string, so your code becomes something as the following:
my $buffer = ...;
my $val1 = shift_varint($buffer);
my $val2 = shift_string($buffer);
my $val3 = shift_uint32($buffer);
...
You can also add a multirecord "shifter":
my ($val1, $val2, $val3) = shift_multi($buffer, qw(varint string uint32));
If you need more speed you could also write a compiler which can convert a set of types into an unpacker sub.

Creating a lazy hashed iterator

In research of this question I peeked at the Iterators chapter in the book Higher Order Perl, and some of the material there was a bit over my head and didn't think necessarily addressed what I specifically want here.
What I mean by lazy hashed iterator is a way to create a structure that would emulate this behavior:
%Ds = {
'1' => 1 .. 20;
'2' => 21 .. 40;
'3' => 41 .. 60;
'4' => 61 .. 80;
...
}
Unfortunately, since this is a hash it would not be in order and thus useless in case of very large numbers.
The behavior is this:
I have a number.
I need to compare it with a sequence of ranges and as a result of the comparison the
code/sub would return another number that is the "key" of that range in case the
number is in that range. (>= with the beginning or <= with the end point of said range)
The "key" of the ranges are numbers from 1..2..3 and so on.
The code/sub will always return for a positive integer no matter how large it is.
By implementing this all lazily I mean if there is a way to emulate this behavior and not compute the sequences of ranges with their respective "keys" with every call of the sub or iteration of a loop. Basically compute once.
Yes it's true that I could choose a maximum boundary, hardcode this in a loop and be done with it but the problem is I don't know of how many of these steps I would need in the end.
Is there a way to do this with perl constructs or maybe perhaps there is a CPAN module that offers this kind of behaviour and my simple search of it didn't uncover it.
Here is a piece of code that illustrates what I mean:
sub get_nr {
my $nr = shift;
my %ds = map { $a = '1' if /1/ .. /20/;
$a = '2' if /21/ .. /40/;
$a = '3' if /41/ .. /60/;
$a = '4' if /61/ .. /80/;
$_ => $a } 1 .. 80;
while (my ($k, $v) = each %ds) {
if ( $k == $nr){
print "number is in range $v \n";
}
}
}
The output for:
get_nr(4);
get_nr(15);
get_nr(22);
get_nr(45);
Is:
number is in range 1
number is in range 1
number is in range 2
number is in range 3
Based on the discussion in the comments, the code you seem to want is a very simple subroutine
sub get_nr {
my $nr = shift;
my $range = int(($nr-1) / 20) + 1;
return $range;
}
You need to compensate for the edge cases, you wanted 20 to return 1, for example, so we need to subtract 1 from the number before dividing it.
If you want to customize further, you might use a variable for the range size, instead of a hard coded number.
sub get_range_number {
my ($n) = #_;
return int(($n-1)/20) + 1;
}
print "$_ is in range ".get_range_number($_)."\n"
for 4, 15, 22, 45;

When is the spaceship operator used outside a sort?

I've only seen the Perl spaceship operator (<=>) used in numeric sort routines. But it seems useful in other situations. I just can't think of a practical use.
What would be an example of when it could be used outside of a Perl sort?
This is a best practice question.
I'm writing a control system for robot Joe that wants to go to robot Mary and recharge her. They move along the integer points on the line. Joe starts at $j and can walk 1 meter in any direction per time unit. Mary stands still at $m and can't move -- she needs a good recharge! The controlling program would look like that:
while ($m != $j) {
$j += ($m <=> $j);
}
The <=> operator would be useful for a binary search algorithm. Most programing languages don't have an operator that does a three-way comparison which makes it necessary to do two comparisons per iteration. With <=> you can do just one.
sub binary_search {
my $value = shift;
my $array = shift;
my $low = 0;
my $high = $#$array;
while ($low <= $high) {
my $mid = $low + int(($high - $low) / 2);
given ($array->[$mid] <=> $value) {
when (-1) { $low = $mid + 1 }
when ( 1) { $high = $mid - 1 }
when ( 0) { return $mid }
}
}
return;
}
In any sort of comparison method. For example, you could have a complicated object, but it still has a defined "order", so you could define a comparison function for it (which you don't have to use inside a sort method, although it would be handy):
package Foo;
# ... other stuff...
# Note: this is a class function, not a method
sub cmp
{
my $object1 = shift;
my $object2 = shift;
my $compare1 = sprintf("%04d%04d%04d", $object1->{field1}, $object1->{field2}, $object1->{field3});
my $compare2 = sprintf("%04d%04d%04d", $object2->{field1}, $object2->{field2}, $object2->{field3});
return $compare1 <=> $compare2;
}
This is a totally contrived example of course. However, in my company's source code I found nearly exactly the above, for comparing objects used for holding date and time information.
One other use I can think of is for statistical analysis -- if a value is repeatedly run against a list of values, you can tell if the value is higher or lower than the set's arithmetic median:
use List::Util qw(sum);
# $result will be
# -1 if value is lower than the median of #setOfValues,
# 1 if value is higher than the median of #setOfValues,
# 0 if value is equal to the median
my $result = sum(map { $value <=> $_ } #setOfValues);
Here's one more, from Wikipedia: "If the two arguments cannot be compared (e.g. one of them is NaN), the operator returns undef.", i.e., you can determine if two numbers are a a number at once, although personally I'd go for the less cryptic Scalar::Util::looks_like_number.