How to isolate leftmost bytes in integer - perl

This has to be done in Perl:
I have integers on the order of e.g. 30_146_890_129 and 17_181_116_691 and 21_478_705_663.
These are supposedly made up of 6 bytes, where:
bytes 0-1 : value a
bytes 2-3 : value b
bytes 4-5 : value c
I want to isolate what value a is. How can I do this in Perl?
I've tried using the >> operator:
perl -e '$a = 330971351478 >> 16; print "$a\n";'
5050222
perl -e '$a = 17181116691 >> 16; print "$a\n";'
262163
But these numbers are not on the order of what I am expecting, more like 0-1000.
Bonus if I can also get values b and c but I don't really need those.
Thanks!

number >> 16 returns number shifted by 16 bit and not the shifted bits as you seem to assume. To get the last 16 bit you might for example use number % 2**16 or number & 0xffff. To get to b and c you can just shift before getting the last 16 bits, i.e.
$a = $number & 0xffff;
$b = ($number >> 16) & 0xffff;
$c = ($number >> 32) & 0xffff;

If you have 6 bytes, you don't need to convert them to a number first. You can use one the following depending on the order of the bytes: (Uppercase represents the most significant byte.)
my ($num_c, $num_b, $num_a) = unpack('nnn', "\xCC\xcc\xBB\xbb\xAA\xaa");
my ($num_a, $num_b, $num_c) = unpack('nnn', "\xAA\xaa\xBB\xbb\xAA\xaa");
my ($num_c, $num_b, $num_a) = unpack('vvv', "\xcc\xCC\xbb\xBB\xaa\xAA");
my ($num_a, $num_b, $num_c) = unpack('vvv', "\xaa\xAA\xbb\xBB\xcc\xCC");
If you are indeed provided with a number 0xCCccBBbbAAaa), you can convert it to bytes then extract the numbers you want from it as follows:
my ($num_c, $num_b, $num_a) = unpack('xxnnn', pack('Q>', $num));
Alternatively, you could also use an arithmetic approach like you attempted.
my $num_a = $num & 0xFFFF;
my $num_b = ( $num >> 16 ) & 0xFFFF;
my $num_c = $num >> 32;
While the previous two solutions required a Perl built to use 64-bit integers, the following will work with any build of Perl:
my $num_a = $num % 2**16;
my $num_b = ( $num / 2**16 ) % 2**16;
my $num_c = int( $num / 2**32 );
Let's look at ( $num >> 16 ) & 0xFFFF in detail.
Original number: 0x0000CCccBBbbAAaa
After shifting: 0x00000000CCccBBbb
After masking: 0x000000000000BBbb

Related

What unicode characters are suitable for making a heatmap in console?

I want to make a command that takes a matrix or vectors of numbers and prints a string-representation where every number is mapped to a character with varying darkness depending on its value.
The only characters I've found that gives a consistent shape but varying color is the unicode block elements ' ░▒▓█' (see e.g. wikipedia), but this only gives me 5 possible shades (space, 3 shades, 1 filled block). I use every character twice so the widhts is approximately the same as the height.
What other characters are suitable for drawing a heatmap in console?
See example code in python below. The question is of course applicable for other languages as well.
import numpy as np
def ascii_heatmap(matrix: np.ndarray, disp=True):
assert matrix.ndim <= 2
matrix = np.atleast_2d(matrix)
vmax = matrix.max()
vmin = matrix.min()
symbolrange= ' ░▒▓█'
symbol_index_matrix = (matrix - vmin) * (len(symbolrange)-1) / (vmax-vmin)
heatmap_rows = []
for row in symbol_index_matrix:
heatmap_rows.append("".join(map(lambda x: symbolrange[int(x)]*2, row)))
heatmap = "\n".join(heatmap_rows)
if disp==True:
print(heatmap)
return heatmap
#Examples with vector and matrix
ascii_heatmap(np.array([1,2,3,4,5]))
ascii_heatmap(np.arange(9).reshape((3,3)))
Use the turbo ramp (Python code) and true colour terminal output.
#!/usr/bin/env perl
my $step = 17;
for (my $r = 0; $r <= 255; $r += $step) {
for (my $g = 0; $g <= 255; $g += $step) {
for (my $b = 0; $b <= 255; $b += $step) {
print "\e[48;2;$r;$g;${b}m ";
# ↑ escape char ↑ coloured space char
}
}
}

Expression for setting lowest n bits that works even when n equals word size

NB: the purpose of this question is to understand Perl's bitwise operators better. I know of ways to compute the number U described below.
Let $i be a nonnegative integer. I'm looking for a simple expression E<$i>1 that will evaluate to the unsigned int U, whose $i lowest bits are all 1's, and whose remaining bits are all 0's. E.g. E<8> should be 255. In particular, if $i equals the machine's word size (W), E<$i> should equal ~02.
The expressions (1 << $i) - 1 and ~(~0 << $i) both do the right thing, except when $i equals W, in which case they both take on the value 0, rather than ~0.
I'm looking for a way to do this that does not require computing W first.
EDIT: OK, I thought of an ugly, plodding solution
$i < 1 ? 0 : do { my $j = 1 << $i - 1; $j < $j << 1 ? ( $j << 1 ) - 1 : ~0 }
or
$i < 1 ? 0 : ( 1 << ( $i - 1 ) ) < ( 1 << $i ) ? ( 1 << $i ) - 1 : ~0
(Also impractical, of course.)
1 I'm using the strange notation E<$i> as shorthand for "expression based on $i".
2 I don't have a strong preference at the moment for what E<$i> should evaluate to when $i is strictly greater than W.
On systems where eval($Config{nv_overflows_integers_at}) >= 2**($Config{ptrsize*8}) (which excludes one that uses double-precision floats and 64-bit ints),
2**$i - 1
On all systems,
( int(2**$i) - 1 )|0
When i<W, int will convert the NV into an IV/UV, allowing the subtraction to work on systems with the precision of NVs is less than the size of UVs. |0 has no effect in this case.
When i≥W, int has no effect, so the subtraction has no effect. |0 therefore overflows, in which case Perl returns the largest integer.
I don't know how reliable that |0 behaviour is. It could be compiler-specific. Don't use this!
use Config qw( %Config );
$i >= $Config{uvsize}*8 ? ~0 : ~(~0 << $i)
Technically, the word size is looked up, not computed.
Fun challenge!
use Devel::Peek qw[Dump];
for my $n (8, 16, 32, 64) {
Dump(~(((1 << ($n - 1)) << 1) - 1) ^ ~0);
}
Output:
SV = IV(0x7ff60b835508) at 0x7ff60b835518
REFCNT = 1
FLAGS = (PADTMP,IOK,pIOK)
IV = 255
SV = IV(0x7ff60b835508) at 0x7ff60b835518
REFCNT = 1
FLAGS = (PADTMP,IOK,pIOK)
IV = 65535
SV = IV(0x7ff60b835508) at 0x7ff60b835518
REFCNT = 1
FLAGS = (PADTMP,IOK,pIOK)
IV = 4294967295
SV = IV(0x7ff60b835508) at 0x7ff60b835518
REFCNT = 1
FLAGS = (PADTMP,IOK,pIOK,IsUV)
UV = 18446744073709551615
Perl compiled with:
ivtype='long', ivsize=8, nvtype='double', nvsize=8
The documentation on the shift operators in perlop has an answer to your problem: use bigint;.
From the documentation:
Note that both << and >> in Perl are implemented directly using << and >> in C. If use integer (see Integer Arithmetic) is in force then signed C integers are used, else unsigned C integers are used. Either way, the implementation isn't going to generate results larger than the size of the integer type Perl was built with (32 bits or 64 bits).
The result of overflowing the range of the integers is undefined because it is undefined also in C. In other words, using 32-bit integers, 1 << 32 is undefined. Shifting by a negative number of bits is also undefined.
If you get tired of being subject to your platform's native integers, the use bigint pragma neatly sidesteps the issue altogether:
print 20 << 20; # 20971520
print 20 << 40; # 5120 on 32-bit machines,
# 21990232555520 on 64-bit machines
use bigint;
print 20 << 100; # 25353012004564588029934064107520

How do I determine the maximum range for perl's range iterator?

I can exceed perl's range iteration bounds like so, with or without -Mbigint:
$» perl -E 'say $^V; say for (0..shift)' 1e19
v5.16.2
Range iterator outside integer range at -e line 1.
How can I determine this upper limit, without simply trying until I exceed it?
It's an IV.
>> similarly works on integers, so you can use
my $max_iv = -1 >> 1;
my $min_iv = -(-1 >> 1) - 1;
They can also be derived from the size of an IV.
my $max_iv = (1 << ($iv_bits-1)) - 1;
my $min_iv = -(1 << ($iv_bits-1));
The size of an IV can be obtained using
use Config qw( %Config );
my $iv_bits = 8 * $Config{ivsize};
or
my $iv_bits = 8 * length pack 'j', 0;

Randomly selecting letters by frequency of use

After feeding few Shakespeare books to my Perl script I have a hash with 26 english letters as keys and the number of their occurences in texts - as value:
%freq = (
a => 24645246,
b => 1409459,
....
z => 807451,
);
and of course the total number of all letters - let's say in the $total variable.
Is there please a nice trick to generate a string holding 16 random letters (a letter can occur several times there) - weighted by their frequency of use?
To be used in a word game similar to Ruzzle:
Something elegant - like picking a random line from a file, as suggested by a Perl Cookbook receipt:
rand($.) < 1 && ($line = $_) while <>;
The Perl Cookbook trick for picking a random line (which can also be found in perlfaq5) can be adapted for weighted sampling too:
my $chosen;
my $sum = 0;
foreach my $item (keys %freq) {
$sum += $freq{$item};
$chosen = $item if rand($sum) < $freq{$item};
}
Here, $sum corresponds to the line counter $. and $freq{$item} to the constant 1 in the Cookbook version.
If you're going to be picking a lot of weighted random samples, you can speed this up a bit with some preparation (note that this destroys %freq, so make a copy first if you want to keep it):
# first, scale all frequencies so that the average frequency is 1:
my $avg = 0;
$avg += $_ for values %freq;
$avg /= keys %freq;
$_ /= $avg for values %freq;
# now, prepare the array we'll need for fast weighted sampling:
my #lookup;
while (keys %freq) {
my ($lo, $hi) = (sort {$freq{$a} <=> $freq{$b}} keys %freq)[0, -1];
push #lookup, [$lo, $hi, $freq{$lo} + #lookup];
$freq{$hi} -= (1 - $freq{$lo});
delete $freq{$lo};
}
Now, to draw a random weighted sample from the prepared distribution, you just do this:
my $r = rand #lookup;
my ($lo, $hi, $threshold) = #{$lookup[$r]};
my $chosen = ($r < $threshold ? $lo : $hi);
(This is basically the Square Histogram method described in Marsaglia, Tsang & Wang (2004), "Fast Generation of Discrete Random Variables", J. Stat. Soft. 11(3) and originally due to A.J. Walker (1974).)
I have no clue about Perl syntax so I'll just write pseudo-code. You can do something like that
sum <= 0
foreach (letter in {a, z})
sum <= sum + freq[letter]
pick r, a random integer in [0, sum[
letter <= 'a' - 1
do
letter <= letter + 1
r <= r - freq(letter)
while r > 0
letter is the resulting value
The idea behind this code is to make a stack of boxes for each letter. The size of each box is the frequency of the letter. Then we choose a random location on this stack and see which letter's box we landed.
Example :
freq(a) = 5
freq(b) = 3
freq(c) = 3
sum = 11
| a | b | c |
- - - - - - - - - - -
When we choose a 0 <= r < 11, we have the following probabilities
Pick a 'a' = 5 / 11
Pick a 'b' = 3 / 11
Pick a 'c' = 3 / 11
Which is exactly what we want.
You can first built a table of the running sum of the frequency. So if you have the following data:
%freq = (
a => 15,
b => 25,
c => 30,
d => 20
);
the running sum would be;
%running_sums = (
a => 0,
b => 15,
c => 40, # 15 + 25
d => 70, # 15 + 25 + 30
);
$max_sum = 90; # 15 + 25 + 30 + 20
To pick a single letter with the weighted frequency, you need to select a number between [0,90), then you can do a linear search on the running_sum table for the range that includes the letter. For example, if your random number is 20 then the appropriate range is 15-40, which is for the letter 'b'. Using linear search gives a total running time of O(m*n) where m is the number of letters we need and n is the size of the alphabet (therefore m=16, n=26). This is essentially what #default locale do.
Instead of linear search, you can also do a binary search on the running_sum table to get the closest number rounded down. This gives a total running time of O(m*log(n)).
For picking m letters though, there is a faster way than O(m*log(n)), perticularly if n < m. First you generate m random numbers in sorted order (which can be done without sorting in O(n)) then you do a linear matching for the ranges between the list of sorted random numbers and the list of running sums. This gives a total runtime of O(m+n). The code in its entirety running in Ideone.
use List::Util qw(shuffle);
my %freq = (...);
# list of letters in sorted order, i.e. "a", "b", "c", ..., "x", "y", "z"
# sorting is O(n*log(n)) but it can be avoided if you already have
# a list of letters you're interested in using
my #letters = sort keys %freq;
# compute the running_sums table in O(n)
my $sum = 0;
my %running_sum;
for(#letters) {
$running_sum{$_} = $sum;
$sum += $freq{$_};
}
# generate a string with letters in $freq frequency in O(m)
my $curmax = 1;
my $curletter = $#letters;
my $i = 16; # the number of letters we want to generate
my #result;
while ($i > 0) {
# $curmax generates a uniformly distributed decreasing random number in [0,1)
# see http://repository.cmu.edu/cgi/viewcontent.cgi?article=3483&context=compsci
$curmax = $curmax * (1-rand())**(1. / $i);
# scale the random number $curmax to [0,$sum)
my $num = int ($curmax * $sum);
# find the range that includes $num
while ($num < $running_sum{$letters[$curletter]}) {
$curletter--;
}
push(#result, $letters[$curletter]);
$i--;
}
# since $result is sorted, you may want to use shuffle it first
# Fisher-Yates shuffle is O(m)
print "", join('', shuffle(#result));

Perl function for negative integers using the 2's complement

I am trying to convert AD maxpwdAge (a 64-bit integer) into a number of days.
According to Microsoft:
Uses the IADs interface's Get method to retrieve the value of the domain's maxPwdAge attribute (line 5).
Notice we use the Set keyword in VBScript to initialize the variable named objMaxPwdAge—the variable used to store the value returned by Get. Why is that?
When you fetch a 64-bit large integer, ADSI does not return one giant scalar value. Instead, ADSI automatically returns an IADsLargeInteger object. You use the IADsLargeInteger interface's HighPart and LowPart properties to calculate the large integer's value. As you may have guessed, HighPart gets the high order 32 bits, and LowPart gets the low order 32 bits. You use the following formula to convert HighPart and LowPart to the large integer's value.
The existing code in VBScript from the same page:
Const ONE_HUNDRED_NANOSECOND = .000000100 ' .000000100 is equal to 10^-7
Const SECONDS_IN_DAY = 86400
Set objDomain = GetObject("LDAP://DC=fabrikam,DC=com") ' LINE 4
Set objMaxPwdAge = objDomain.Get("maxPwdAge") ' LINE 5
If objMaxPwdAge.LowPart = 0 Then
WScript.Echo "The Maximum Password Age is set to 0 in the " & _
"domain. Therefore, the password does not expire."
WScript.Quit
Else
dblMaxPwdNano = Abs(objMaxPwdAge.HighPart * 2^32 + objMaxPwdAge.LowPart)
dblMaxPwdSecs = dblMaxPwdNano * ONE_HUNDRED_NANOSECOND ' LINE 13
dblMaxPwdDays = Int(dblMaxPwdSecs / SECONDS_IN_DAY) ' LINE 14
WScript.Echo "Maximum password age: " & dblMaxPwdDays & " days"
End If
How can I do this in Perl?
Endianness may come into this, but you may be able to say
#!/usr/bin/perl
use strict;
use warnings;
my $num = -37_108_517_437_440;
my $binary = sprintf "%064b", $num;
my ($high, $low) = $binary =~ /(.{32})(.{32})/;
$high = oct "0b$high";
$low = oct "0b$low";
my $together = unpack "q", pack "LL", $low, $high;
print "num $num, low $low, high $high, together $together\n";
Am I missing something? As far as I can tell from your question, your problem has nothing at all to do with 2’s complement. As far as I can tell, all you need/want to do is
use Math::BigInt;
use constant MAXPWDAGE_UNIT_PER_SEC => (
1000 # milliseconds
* 1000 # microseconds
* 10 # 100 nanoseconds
);
use constant SECS_PER_DAY => (
24 # hours
* 60 # minutes
* 60 # seconds
);
my $maxpwdage_full = ( Math::BigInt->new( $maxpwdage_highpart ) << 32 ) + $maxpwdage_lowpart;
my $days = $maxpwdage_full / MAXPWDAGE_UNIT_PER_SEC / SECS_PER_DAY;
Note that I deliberately use 2 separate constants, and I divide by them in sequence, because that keeps the divisors smaller than the range of a 32-bit integer. If you want to write this another way and you want it to work correctly on 32-bit perls, you’ll have to keep all the precision issues in mind.