I am getting rowcount of a sybase table in perl. For example table have 100 rows, so n=100
I want to split this value into 3 parts
1-33 | 34-66 | 67-99 or 100
please advise how do get this in perl.
Reason for this split: I need to pass the values 1 and 33 as input parameter to a stored proc to select rows whose identity column value is between 1 and 33.
same goes for 34-66 & 67-99
The interesting part is deciding where each range starts. From there it's easy to decide that each range ends at one less than the start of the next range.
This partition() function will determine the start points for given number of partitions within a given number of elements starting at a given offset.
sub partition {
my ($offset, $n_elements, $n_partitions) = #_;
die "Cannot create $n_partitions partitions from $n_elements elements.\n"
if $n_partitions > $n_elements;
my $step = int($n_elements / $n_partitions);
return map {$step * $_ + $offset} 0 .. $n_partitions - 1;
}
Here's how it works:
First, determine what the step should be by dividing the number of elements by the number of partitions, and preserving the integer by truncating any trailing decimal places.
Next walk through the steps by starting at zero and multiplying by the step number (or the partition number). So if the step is 5 then 5*0=0, 5x1=5, 5x2=10, and so on. We will not look at the last step, because it makes more sense to include an "off by one" in the last partition than to start a new partition with only one element.
Finally, we allow for an offset to be applied, so that partition(0,100,5)means to find the starting element positions for five partitions starting at zero and continuing for 100 elements (so a range of 0 to 99). And partition(1,100,5) would mean start at 1 and continue to 100 elements partitioning in five segments, so a range of 1 to 100.
Here's an example of putting the function to use to find the partition points in a set of several ranges:
use strict;
use warnings;
use Test::More;
sub partition {
my ($offset, $n_elements, $n_partitions) = #_;
die "Cannot create $n_partitions partitions from $n_elements elements.\n"
if $n_partitions > $n_elements;
my $step = int($n_elements / $n_partitions);
return map {$step * $_ + $offset} 0 .. $n_partitions - 1;
}
while(<DATA>) {
chomp;
next unless length;
my ($off, $n_elems, $n_parts, #starts) = split /,\s*/;
local $" = ',';
is_deeply
[partition($off, $n_elems, $n_parts)],
[#starts],
"Partitioning $n_elems elements starting at $off by $n_parts yields start positions of [#starts]";
}
done_testing();
__DATA__
0,10,2,0,5
1,11,2,1,6
0,3,2,0,1
0,7,3,0,2,4
0,21,3,0,7,14
0,21,7,0,3,6,9,12,15,18
0,20,3,0,6,12
0,100,4,0,25,50,75
1,100,4,1,26,51,76
1,100,3,1,34,67
0,10,1,0
1,10,10,1,2,3,4,5,6,7,8,9,10
This yields the following output:
ok 1 - Partitioning 10 elements starting at 0 by 2 yields start positions of [0,5]
ok 2 - Partitioning 11 elements starting at 1 by 2 yields start positions of [1,6]
ok 3 - Partitioning 3 elements starting at 0 by 2 yields start positions of [0,1]
ok 4 - Partitioning 7 elements starting at 0 by 3 yields start positions of [0,2,4]
ok 5 - Partitioning 21 elements starting at 0 by 3 yields start positions of [0,7,14]
ok 6 - Partitioning 21 elements starting at 0 by 7 yields start positions of [0,3,6,9,12,15,18]
ok 7 - Partitioning 20 elements starting at 0 by 3 yields start positions of [0,6,12]
ok 8 - Partitioning 100 elements starting at 0 by 4 yields start positions of [0,25,50,75]
ok 9 - Partitioning 100 elements starting at 1 by 4 yields start positions of [1,26,51,76]
ok 10 - Partitioning 100 elements starting at 1 by 3 yields start positions of [1,34,67]
ok 11 - Partitioning 10 elements starting at 0 by 1 yields start positions of [0]
ok 12 - Partitioning 10 elements starting at 1 by 10 yields start positions of [1,2,3,4,5,6,7,8,9,10]
1..12
For additional examples look at Split range 0 to M into N non-overlapping (roughly equal) ranges. on PerlMonks.
Your question is looking for complete range start and end points. This method makes it rather trivial:
sub partition {
my ($offset, $n_elements, $n_partitions) = #_;
my $step = int($n_elements / $n_partitions);
return map {$step * $_ + $offset} 0 .. $n_partitions - 1;
}
my $n_elems = 100;
my $offset = 1;
my $n_parts = 3;
my #starts = partition($offset, $n_elems, $n_parts);
my #ranges = map{
[
$starts[$_],
($starts[$_+1] // $n_elems+$offset)-1,
]
} 0..$#starts;
print "($_->[0], $_->[1])\n" foreach #ranges;
The output:
(1, 33)
(34, 66)
(67, 100)
Even more implementation examples appear in Algorithm for dividing a range into ranges and then finding which range a number belongs to on the StackExchange Software Engineering forum.
Related
I have a table in MATLAB with attributes in the first three columns and data from the fourth column onwards. I was trying to sort the entire table based on the first three columns. However, one of the columns (Column C) contains months ('January', 'February' ...etc). The sortrows function would only let me choose 'ascend' or 'descend' but not a custom option to sort by month. Any help would be greatly appreciated. Below is the code I used.
sortrows(Table, {'Column A','Column B','Column C'} , {'ascend' , 'ascend' , '???' } )
As #AnonSubmitter85 suggested, the best thing you can do is to convert your month names to numeric values from 1 (January) to 12 (December) as follows:
c = {
7 1 'February';
1 0 'April';
2 1 'December';
2 1 'January';
5 1 'January';
};
t = cell2table(c,'VariableNames',{'ColumnA' 'ColumnB' 'ColumnC'});
t.ColumnC = month(datenum(t.ColumnC,'mmmm'));
This will facilitate the access to a standard sorting criterion for your ColumnC too (in this example, ascending):
t = sortrows(t,{'ColumnA' 'ColumnB' 'ColumnC'},{'ascend', 'ascend', 'ascend'});
If, for any reason that is unknown to us, you are forced to keep your months as literals, you can use a workaround that consists in sorting a clone of the table using the approach described above, and then applying to it the resulting indices:
c = {
7 1 'February';
1 0 'April';
2 1 'December';
2 1 'January';
5 1 'January';
};
t_original = cell2table(c,'VariableNames',{'ColumnA' 'ColumnB' 'ColumnC'});
t_clone = t_original;
t_clone.ColumnC = month(datenum(t_clone.ColumnC,'mmmm'));
[~,idx] = sortrows(t_clone,{'ColumnA' 'ColumnB' 'ColumnC'},{'ascend', 'ascend', 'ascend'});
t_original = t_original(idx,:);
After feeding few Shakespeare books to my Perl script I have a hash with 26 english letters as keys and the number of their occurences in texts - as value:
%freq = (
a => 24645246,
b => 1409459,
....
z => 807451,
);
and of course the total number of all letters - let's say in the $total variable.
Is there please a nice trick to generate a string holding 16 random letters (a letter can occur several times there) - weighted by their frequency of use?
To be used in a word game similar to Ruzzle:
Something elegant - like picking a random line from a file, as suggested by a Perl Cookbook receipt:
rand($.) < 1 && ($line = $_) while <>;
The Perl Cookbook trick for picking a random line (which can also be found in perlfaq5) can be adapted for weighted sampling too:
my $chosen;
my $sum = 0;
foreach my $item (keys %freq) {
$sum += $freq{$item};
$chosen = $item if rand($sum) < $freq{$item};
}
Here, $sum corresponds to the line counter $. and $freq{$item} to the constant 1 in the Cookbook version.
If you're going to be picking a lot of weighted random samples, you can speed this up a bit with some preparation (note that this destroys %freq, so make a copy first if you want to keep it):
# first, scale all frequencies so that the average frequency is 1:
my $avg = 0;
$avg += $_ for values %freq;
$avg /= keys %freq;
$_ /= $avg for values %freq;
# now, prepare the array we'll need for fast weighted sampling:
my #lookup;
while (keys %freq) {
my ($lo, $hi) = (sort {$freq{$a} <=> $freq{$b}} keys %freq)[0, -1];
push #lookup, [$lo, $hi, $freq{$lo} + #lookup];
$freq{$hi} -= (1 - $freq{$lo});
delete $freq{$lo};
}
Now, to draw a random weighted sample from the prepared distribution, you just do this:
my $r = rand #lookup;
my ($lo, $hi, $threshold) = #{$lookup[$r]};
my $chosen = ($r < $threshold ? $lo : $hi);
(This is basically the Square Histogram method described in Marsaglia, Tsang & Wang (2004), "Fast Generation of Discrete Random Variables", J. Stat. Soft. 11(3) and originally due to A.J. Walker (1974).)
I have no clue about Perl syntax so I'll just write pseudo-code. You can do something like that
sum <= 0
foreach (letter in {a, z})
sum <= sum + freq[letter]
pick r, a random integer in [0, sum[
letter <= 'a' - 1
do
letter <= letter + 1
r <= r - freq(letter)
while r > 0
letter is the resulting value
The idea behind this code is to make a stack of boxes for each letter. The size of each box is the frequency of the letter. Then we choose a random location on this stack and see which letter's box we landed.
Example :
freq(a) = 5
freq(b) = 3
freq(c) = 3
sum = 11
| a | b | c |
- - - - - - - - - - -
When we choose a 0 <= r < 11, we have the following probabilities
Pick a 'a' = 5 / 11
Pick a 'b' = 3 / 11
Pick a 'c' = 3 / 11
Which is exactly what we want.
You can first built a table of the running sum of the frequency. So if you have the following data:
%freq = (
a => 15,
b => 25,
c => 30,
d => 20
);
the running sum would be;
%running_sums = (
a => 0,
b => 15,
c => 40, # 15 + 25
d => 70, # 15 + 25 + 30
);
$max_sum = 90; # 15 + 25 + 30 + 20
To pick a single letter with the weighted frequency, you need to select a number between [0,90), then you can do a linear search on the running_sum table for the range that includes the letter. For example, if your random number is 20 then the appropriate range is 15-40, which is for the letter 'b'. Using linear search gives a total running time of O(m*n) where m is the number of letters we need and n is the size of the alphabet (therefore m=16, n=26). This is essentially what #default locale do.
Instead of linear search, you can also do a binary search on the running_sum table to get the closest number rounded down. This gives a total running time of O(m*log(n)).
For picking m letters though, there is a faster way than O(m*log(n)), perticularly if n < m. First you generate m random numbers in sorted order (which can be done without sorting in O(n)) then you do a linear matching for the ranges between the list of sorted random numbers and the list of running sums. This gives a total runtime of O(m+n). The code in its entirety running in Ideone.
use List::Util qw(shuffle);
my %freq = (...);
# list of letters in sorted order, i.e. "a", "b", "c", ..., "x", "y", "z"
# sorting is O(n*log(n)) but it can be avoided if you already have
# a list of letters you're interested in using
my #letters = sort keys %freq;
# compute the running_sums table in O(n)
my $sum = 0;
my %running_sum;
for(#letters) {
$running_sum{$_} = $sum;
$sum += $freq{$_};
}
# generate a string with letters in $freq frequency in O(m)
my $curmax = 1;
my $curletter = $#letters;
my $i = 16; # the number of letters we want to generate
my #result;
while ($i > 0) {
# $curmax generates a uniformly distributed decreasing random number in [0,1)
# see http://repository.cmu.edu/cgi/viewcontent.cgi?article=3483&context=compsci
$curmax = $curmax * (1-rand())**(1. / $i);
# scale the random number $curmax to [0,$sum)
my $num = int ($curmax * $sum);
# find the range that includes $num
while ($num < $running_sum{$letters[$curletter]}) {
$curletter--;
}
push(#result, $letters[$curletter]);
$i--;
}
# since $result is sorted, you may want to use shuffle it first
# Fisher-Yates shuffle is O(m)
print "", join('', shuffle(#result));
I have two files:
regions.txt: First column is the chromosome name, second and third are start and end position.
1 100 200
1 400 600
2 600 700
coverage.txt: First column is chromosome name, again second and third are start and end positions, and last column is the score.
1 100 101 5
1 101 102 7
1 103 105 8
2 600 601 10
2 601 602 15
This file is very huge it is about 15GB with about 300 million lines.
I basically want to get the mean of all scores in coverage.txt that are in each region in regions.txt.
In other words, start at the first line in regions.txt, if there is a line in coverage.txt which has the same chromosome, start-coverage is >= start-region, and end-coverage is <= end-region, then save its score to a new array. After finish searching in all coverages.txt print the region chromosome, start, end, and the mean of all scores that have been found.
Expected output:
1 100 200 14.6 which is (5+7+8)/3
1 400 600 0 no match at coverages.txt
2 600 700 12.5 which is (10+15)/2
I built the following MATLAB script which take very long time since I have to loop over coverage.txt many time. I don't know how to make a fast awk similar script.
My matlab script
fc = fopen('coverage.txt', 'r');
ft = fopen('regions.txt', 'r');
fw = fopen('out.txt', 'w');
while feof(ft) == 0
linet = fgetl(ft);
scant = textscan(linet, '%d%d%d');
tchr = scant{1};
tx = scant{2};
ty = scant{3};
coverages = [];
frewind(fc);
while feof(fc) == 0
linec = fgetl(fc);
scanc = textscan(linec, '%d%d%d%d');
cchr = scanc{1};
cx = scanc{2};
cy = scanc{3};
cov = scanc{4};
if (cchr == tchr) && (cx >= tx) && (cy <= ty)
coverages = cat(2, coverages, cov);
end
end
covmed = median(coverages);
fprintf(fw, '%d\t%d\t%d\t%d\n', tchr, tx, ty, covmed);
end
Any suggestions to make an alternative using AWK, Perl, or , ... etc I will aslo be pleased if someone can teach me how to get rid of all loops in my matlab script.
Thanks
Here is a Perl solution. I use hashes (aka dictionaries) to access the various ranges via the chromosome, thus reducing the number of loop iterations.
This is potentially efficient, as I don't do a full loop over regions.txt on every input line. Efficiency could perhaps be increased further when multithreading is used.
#!/usr/bin/perl
my ($rangefile) = #ARGV;
open my $rFH, '<', $rangefile or die "Can't open $rangefile";
# construct the ranges. The chromosome is used as range key.
my %ranges;
while (<$rFH>) {
chomp;
my #field = split /\s+/;
push #{$ranges{$field[0]}}, [#field[1,2], 0, 0];
}
close $rFH;
# iterate over all the input
while (my $line = <STDIN>) {
chomp $line;
my ($chrom, $lower, $upper, $value) = split /\s+/, $line;
# only loop over ranges with matching chromosome
foreach my $range (#{$ranges{$chrom}}) {
if ($$range[0] <= $lower and $upper <= $$range[1]) {
$$range[2]++;
$$range[3] += $value;
last; # break out of foreach early because ranges don't overlap
}
}
}
# create the report
foreach my $chrom (sort {$a <=> $b} keys %ranges) {
foreach my $range (#{$ranges{$chrom}}) {
my $value = $$range[2] ? $$range[3]/$$range[2] : 0;
printf "%d %d %d %.1f\n", $chrom, #$range[0,1], $value;
}
}
Example invocation:
$ perl script.pl regions.txt <coverage.txt >output.txt
Output on the example input:
1 100 200 6.7
1 400 600 0.0
2 600 700 12.5
(because (5+7+8)/3 = 6.66…)
Normally, I would load the files into R and calculate it, but given that one of them is so huge, this would become a problem. Here are some thoughts that might help you solving it.
Consider splitting coverage.txt by chromosomes. This would make the calculations less demanding.
Instead of looping over coverage.txt, you first read the regions.txt full into memory (I assume it is much smaller). For each region, you keep a score and a number.
Process coverage.txt line by line. For each line, you determine the chromosome and the region that this particular stretch belongs to. This will require some footwork, but if regions.txt is not too large, it might be more efficient. Add the score to the score of the region and increment number by one.
An alternative, most efficient way requires both files to be sorted first by chromosome, then by position.
Take a line from regions.txt. Record the chromosome and positions. If there is a line remaining from previous loop, go to 3.; otherwise go to 2.
Take a line from coverage.txt.
Check whether it is within the current region.
yes: add the score to the region, increment number. Move to 2.
no: divide score by number, write the current region to output, go to 1.
This last method requires some fine tuning, but will be most efficient -- it requires to go through each file only once and does not require to store almost anything in the memory.
Here's one way using join and awk. Run like:
join regions.txt coverage.txt | awk -f script.awk - regions.txt
Contents of script.awk:
FNR==NR && $4>=$2 && $5<=$3 {
sum[$1 FS $2 FS $3]+=$6
cnt[$1 FS $2 FS $3]++
next
}
{
if ($1 FS $2 FS $3 in sum) {
printf "%s %.1f\n", $0, sum[$1 FS $2 FS $3]/cnt[$1 FS $2 FS $3]
}
else if (NF == 3) {
print $0 " 0"
}
}
Results:
1 100 200 6.7
1 400 600 0
2 600 700 12.5
Alternatively, here's the one-liner:
join regions.txt coverage.txt | awk 'FNR==NR && $4>=$2 && $5<=$3 { sum[$1 FS $2 FS $3]+=$6; cnt[$1 FS $2 FS $3]++; next } { if ($1 FS $2 FS $3 in sum) printf "%s %.1f\n", $0, sum[$1 FS $2 FS $3]/cnt[$1 FS $2 FS $3]; else if (NF == 3) print $0 " 0" }' - regions.txt
Here is a simple MATLAB way to bin your coverage into regions:
% extract the regions extents
bins = regions(:,2:3)';
bins = bins(:);
% extract the coverage - only the start is needed
covs = coverage(:,2);
% use histc to place the coverage start into proper regions
% this line counts how many coverages there are in a region
% and assigns them proper region ids.
[h, i]= histc(covs(:), bins(:));
% sum the scores into correct regions (second output of histc gives this)
total = accumarray(i, coverage(:,4), [numel(bins),1]);
% average the score in regions (first output of histc is useful)
avg = total./h;
% remove every second entry - our regions are defined by start/end
avg = avg(1:2:end);
Now this works assuming that the regions are non-overlapping, but I guess that is the case. Also, every entry in the coverage file has to fall into some region.
Also, it is trivial to 'block' this approach over coverages, if you want to avoid reading in the whole file. You only need the bins, your regions file, which presumably is small. You can process the coverages in blocks, incrementally add to total and compute the average in the end.
I am relatively new to Perl, and I need to make a relatively sophisticated matricial computation and don't know what data structures to use.
Not sure if this is the appropriate forum for this, but say you have following matrix in a multi-dimensional array in Perl:
0.2 0.7 0.2
0.6 0.8 0.7
0.6 0.1 0.8
0.1 0.2 0.9
0.6 0.3 0.0
0.6 0.9 0.2
I am trying to identify column segments in this Matrix corresponding to continuous values that are higher than a given threshold, e.g. 0.5
For example, if we threshold this matrix, we have:
0 1 0
1 1 1
1 0 1
0 0 1
1 0 0
1 1 0
If we now focus on the first column:
0
1
1
0
1
1
we can see that there are two continuous segments:
0 1 1 0 1 1
The first track (sequence of ones) starts with index 1 and ends with index 2
The second track (sequence of ones) starts with index 4 and ends with index 5
I would like to detect all such tracks in the original matrix, but I don't know how to proceed or what Perl data structures are most appropriate for this.
Ideally I would like something easy to index, e.g. assuming that we use the variable tracks, I can store the indices for the first column (index 0) as follows:
# First column, first track
$tracks{0}{0}{'start'} = 1;
$tracks{0}{0}{'end'} = 2;
# First column, second track
$tracks{0}{1}{'start'} = 4;
$tracks{0}{1}{'end'} = 5;
# ...
What are good data structures and/or libraries I can use to approach this problem in Perl?
I am just giving the algorithmic answer and you can code it in whatever language you like.
Split the problem into subproblems:
Thresholding: depending how you store you input this can be as simple as an iteration over an $n$ dimensional matrix, or a tree/list traversal if your matrices are sparse. This is the easy bit.
The algorithm for finding continuous segments is called 'run-length-encoding'. It takes a sequence with possible duplicates like
1 0 0 1 1 1 1 0 1 and returns another sequence which tells you which element is next, and how many of them are there. So for example the above sequence would be 1 1 0 2 1 4 0 1 1 1. The encoding is unique so if you ever want to invert it you are OK.
The first 1 is there because the original input starts with 1, and first 0 is there because after the 1 there is a 0, and the fourth number is two because there are two consecutive zeros. There are zillions of rle-encoders if you don't want to do your own.
Its main purpose is compression and it works reasonably well for that purpose if you have long runs of identical items. Depending on your needs you may have to run it horizontally, vertically and even diagonally.
You find the precise algorithm in all the classical books on data structures and algorithm. I'd suggest Cormen-Leiseron-Rivest-Stein: 'Introduction to Algorithms' first, then Knuth.
Once you get the gist, you can safely 'fuse' the thresholding with RLE to avoid iterating twice over your inputs.
This seems to do what you want. I have represented the data in the form you suggested, as the ideal form depends entirely on what you want to do with the result
It works by calculating the list of 0s and 1s from each column, adding barrier values of zero at each end (one in $prev and one in the for list) and then scanning the list for changes between 1 and 0
Every time a change is found, a track start or end is recorded. If $start is undefined then the current index is recorded as the start of a segment, otherwise the current segment ended at one less than the current index. A hash is built with start and end keys, and pushed onto the #segments array.
The final set of nested loops dumps the calculated data in the form you show in the question
use strict;
use warnings;
use constant THRESHOLD => 0.5;
my #data = (
[ qw/ 0.2 0.7 0.2 / ],
[ qw/ 0.6 0.8 0.7 / ],
[ qw/ 0.6 0.1 0.8 / ],
[ qw/ 0.1 0.2 0.9 / ],
[ qw/ 0.6 0.3 0.0 / ],
[ qw/ 0.6 0.9 0.2 / ],
);
my #tracks;
for my $colno (0 .. $#{$data[0]}) {
my #segments;
my $start;
my $prev = 0;
my $i = 0;
for my $val ( (map { $_->[$colno] > THRESHOLD ? 1 : 0 } #data), 0 ) {
next if $val == $prev;
if (defined $start) {
push #segments, { start => $start, end=> $i-1 };
undef $start;
}
else {
$start = $i;
}
}
continue {
$prev = $val;
$i++;
}
push #tracks, \#segments;
}
# Dump the derived #tracks data
#
for my $colno (0 .. $#tracks) {
my $col = $tracks[$colno];
for my $track (0 .. $#$col) {
my $data = $col->[$track];
printf "\$tracks[%d][%d]{start} = %d\n", $colno, $track, $data->{start};
printf "\$tracks[%d][%d]{end} = %d\n", $colno, $track, $data->{end};
}
print "\n";
}
output
$tracks[0][0]{start} = 1
$tracks[0][0]{end} = 2
$tracks[0][1]{start} = 4
$tracks[0][1]{end} = 5
$tracks[1][0]{start} = 0
$tracks[1][0]{end} = 1
$tracks[1][1]{start} = 5
$tracks[1][1]{end} = 5
$tracks[2][0]{start} = 1
$tracks[2][0]{end} = 3
Lamenting the poor support for multidimensional arrays by Perl, I soon found myself throwing together a small solution of my own. The algorithm is rather similar to Borodins idea, but with a slightly different structure:
sub tracks {
my ($data) = #_; # this sub takes a callback as argument
my #tracks; # holds all found ranges
my #state; # is true if we are inside a range/track. Also holds the starting index of the current range.
my $rowNo = 0; # current row number
while (my #row = $data->()) { # fetch new data
for my $i (0..$#row) {
if (not $state[$i] and $row[$i]) {
# a new track is found
$state[$i] = $rowNo+1; # we have to pass $rowNo+1 to ensure a true value
} elsif ($state[$i] and not $row[$i]) {
push #{$tracks[$i]}, [$state[$i]-1, $rowNo-1]; # push a found track into the #tracks array. We have to adjust the values to revert the previous adjustment.
$state[$i] = 0; # reset state to false
}
}
} continue {$rowNo++}
# flush remaining tracks
for my $i (0..$#state) {
push #{$tracks[$i]}, [$state[$i]-1, $rowNo-1] if $state[$i]
}
return #tracks;
}
#state doubles as a flag indicating if we are inside a track and as a record for the track starting index. In the state and tracks arrays, the index indicates the current column.
As a data source, I used an external file, but this can be easily plugged into anything, e.g. a preexisting array. The only contract is that it must return an arbitrary sequence of true and false values and the empty list when no further data is available.
my $limit = 0.5
my $data_source = sub {
defined (my $line = <>) or return (); # return empty list when data is empty
chomp $line;
return map {$_ >= $limit ? $_ : 0} split /\s+/, $line; # split the line and map the data to true and false values
};
With the data you gave copy-pasted as input, I get the following printout as output (printing code omitted):
[ [1 2], [4 5] ]
[ [0 1], [5 5] ]
[ [1 3] ]
With your structure, this would be
$tracks[0][0][0] = 1;
$tracks[0][0][1] = 2;
$tracks[0][1][0] = 4;
...;
If this is modified to a hash, further data like the original value could be incorporated.
I am trying to convert AD maxpwdAge (a 64-bit integer) into a number of days.
According to Microsoft:
Uses the IADs interface's Get method to retrieve the value of the domain's maxPwdAge attribute (line 5).
Notice we use the Set keyword in VBScript to initialize the variable named objMaxPwdAge—the variable used to store the value returned by Get. Why is that?
When you fetch a 64-bit large integer, ADSI does not return one giant scalar value. Instead, ADSI automatically returns an IADsLargeInteger object. You use the IADsLargeInteger interface's HighPart and LowPart properties to calculate the large integer's value. As you may have guessed, HighPart gets the high order 32 bits, and LowPart gets the low order 32 bits. You use the following formula to convert HighPart and LowPart to the large integer's value.
The existing code in VBScript from the same page:
Const ONE_HUNDRED_NANOSECOND = .000000100 ' .000000100 is equal to 10^-7
Const SECONDS_IN_DAY = 86400
Set objDomain = GetObject("LDAP://DC=fabrikam,DC=com") ' LINE 4
Set objMaxPwdAge = objDomain.Get("maxPwdAge") ' LINE 5
If objMaxPwdAge.LowPart = 0 Then
WScript.Echo "The Maximum Password Age is set to 0 in the " & _
"domain. Therefore, the password does not expire."
WScript.Quit
Else
dblMaxPwdNano = Abs(objMaxPwdAge.HighPart * 2^32 + objMaxPwdAge.LowPart)
dblMaxPwdSecs = dblMaxPwdNano * ONE_HUNDRED_NANOSECOND ' LINE 13
dblMaxPwdDays = Int(dblMaxPwdSecs / SECONDS_IN_DAY) ' LINE 14
WScript.Echo "Maximum password age: " & dblMaxPwdDays & " days"
End If
How can I do this in Perl?
Endianness may come into this, but you may be able to say
#!/usr/bin/perl
use strict;
use warnings;
my $num = -37_108_517_437_440;
my $binary = sprintf "%064b", $num;
my ($high, $low) = $binary =~ /(.{32})(.{32})/;
$high = oct "0b$high";
$low = oct "0b$low";
my $together = unpack "q", pack "LL", $low, $high;
print "num $num, low $low, high $high, together $together\n";
Am I missing something? As far as I can tell from your question, your problem has nothing at all to do with 2’s complement. As far as I can tell, all you need/want to do is
use Math::BigInt;
use constant MAXPWDAGE_UNIT_PER_SEC => (
1000 # milliseconds
* 1000 # microseconds
* 10 # 100 nanoseconds
);
use constant SECS_PER_DAY => (
24 # hours
* 60 # minutes
* 60 # seconds
);
my $maxpwdage_full = ( Math::BigInt->new( $maxpwdage_highpart ) << 32 ) + $maxpwdage_lowpart;
my $days = $maxpwdage_full / MAXPWDAGE_UNIT_PER_SEC / SECS_PER_DAY;
Note that I deliberately use 2 separate constants, and I divide by them in sequence, because that keeps the divisors smaller than the range of a 32-bit integer. If you want to write this another way and you want it to work correctly on 32-bit perls, you’ll have to keep all the precision issues in mind.