Converting text MB/GB to value - perl

I have a variable which contains a file size:
my $tx = "41.4 MB";
or
my $tx = "34.4 GB";
How do I go about converting this to a KB value. So if tx contains MB then * 1024, and if tx contains GB then * 1024 * 1024?

You need to separate out and test the units.
use strict;
use warnings;
sub size_to_kb {
my $size = shift;
my ($num, $units) = split ' ', $size;
if ($units eq 'MB') {
$num *= 1024;
} elsif ($units eq 'GB') {
$num *= 1024 ** 2;
} elsif ($units ne 'KB') {
die "Unrecognized units: $units"
}
return "$num KB";
}
print size_to_kb("41.4 MB"), "\n";
print size_to_kb("34.4 GB"), "\n";
Outputs:
42393.6 KB
36071014.4 KB
< / hand holding >

Related

Strawberry Perl: "out of memory!"

trying to run the following code:
$combs = combinations(\#set,$k);
while (my $c = $combs->next)
{
$nrc=1;
}
Gives me "out of memory!" when I hit Ctrl+C (because its taking too long and it should not) if I pass a set from, for example, (0..450) and numbers to combine ($k) of 6. This issue does not occur with, lets say, a set of 0..45 and $k=6.
Note that the while loop seems to do nothing, in the original script it printed out the combination and incremented a counter that will hold the total number of combinations. But since I was not sure what the problem was, I decided to eliminate that.
I've read the Algorithm:Combinatorics on CPAN and it states that memory usage is minimal, so I don't know what's happening.
I am using Strawberry Perl 32bit on a Windows 10 machine.
Thanks.
--------------------- COMPLETE CODE
#!/usr/bin/perl
use List::MoreUtils "uniq";
use Algorithm::Combinatorics "combinations";
my $argc = $#ARGV+1;
my #set;
if ($argc == 0)
{
print STDERR "Valor minimo de rango: "; # range min
my $minrange = int <STDIN>;
print STDERR "Valor maximo de rango: "; #range max
my $maxrange = int <STDIN>;
#set = uniq sort { $a <=> $b }($minrange...$maxrange);
}
elsif ($argc == 1)
{
open(SETFROMFILE,"<$ARGV[0]") or die "No se puedo abrir el fichero, $!";
chomp(#set = <SETFROMFILE>);
close(SETFROMFILE);
#set = uniq sort { $a <=> $b } #set;
}
else
{
print STDERR "Uso: $0 [file]\n";
exit;
}
my $nrc = 0;
print STDERR "\n";
print STDERR "Numeros a combinar: "; # get subset
my $k = <STDIN>;
if ($k == 0) { exit; }
$combs = combinations(\#set,$k);
print STDERR "\n";
while (my $c = $combs->next)
{
print join(";",#$c) . "\n";
$nrc++;
}
print STDERR "\n";
print STDERR "numero total de combinaciones: $nrc\n";
It works for me.
use strict;
use warnings;
use Algorithm::Combinatorics qw( combinations );
sub show_mem { system('ps', '--no-heading', '-o', 'rss', $$); }
my #set = (0..450);
my $k = 6;
my $count = 0;
#show_mem();
my $combs = combinations(\#set, $k);
#show_mem();
while (my $c = $combs->next) {
++$count;
if (($count % 100_000) == 0) {
print("$count\n");
#show_mem();
}
}
Output:
784
784
100000
776
200000
784
300000
788
400000
776
500000
780
600000
784
700000
768
800000
784
900000
784
1000000
776
...
Of course, it will take forever to go through all C(451, 6) = 11,303,769,578,640 combinations! (We're talking about 251 days on my machine[1].)
(Note that 11,303,769,578,640 is too large for a 32-bit integer. Fortunately, Perl will switching to using a double-precision floating-point number, and those are large enough to hold that all numbers up to and including that one.)
By the way, if you just need the number of combinations, you can use
my $count = 1; $count *= ( #set - $_ + 1 ) / $_ for 1..$k;
How I timed it:
use Algorithm::Combinatorics qw( combinations );
use Time::HiRes qw( time );
my #set = (0..450);
my $k = 6;
my $count = 0;
my $combs = combinations(\#set, $k);
my $s = time;
while (my $c = $combs->next) {
++$count;
last if $count == 1_000_000;
}
my $e = time;
print($e-$s, "\n");
There are 11.1 trillion combinations of six items out of 450. I'm not surprised it ran out of memory!

How can I normalize my results in perl with foreach control structure?

I have this output:
10dvex2_miRNA_ce.out.data|6361
10dvex2_miRNA_ce.out.data|6361
10dvex2_misc_RNA_ce.out.data|0
10dvex2_rRNA_ce.out.data|239
with this script in Perl:
#!/usr/bin/perl
use warnings;
use strict;
open(MYINPUTFILE, $ARGV[0]); # open for input
my #lines = <MYINPUTFILE>; # read file into list
my $count = 0;
print "Frag"."\t"."ncRNA"."\t"."Amount"."\n";
foreach my $lines (#lines){
my $pattern = $lines;
$pattern =~ s/(.*)dvex\d_(.*)_(.*).(out.data)\|(.*)/$1 $2 $3 $5/g;
$count += $5;
print $1."\t".$2.$3."\t".$5."\n";
}
close(MYINPUTFILE);
exit;
I extract this information:
Frag ncRNA Amount
10 miRNAce 6361
10 misc_RNAce 0
10 rRNAce 239
but in the Amount column I want to report those numbers divided by the total result (6600). In this case I want this output:
Frag ncRNA Amount
10 miRNAce 0.964
10 misc_RNAce 0
10 rRNAce 0.036
My problem is extract the TOTAL result in the loop...to normalize this data. Some ideas?
Perhaps the following will be helpful:
use strict;
use warnings;
my ( %hash, $total, %seen, #array );
while (<>) {
next if $seen{$_}++;
/(\d+).+?_([^.]+).+\|(\d+)$/;
$hash{$1}{$2} = $3;
$total += $3;
}
print "Frag\tncRNA\tAmount\n";
while ( my ( $key1, $val1 ) = each %hash ) {
while ( my ( $key2, $val2 ) = each %$val1 ) {
my $frac = $val2 / $total == 0 ? 0 : sprintf( '%.3f', $val2 / $total );
push #array, "$key1\t$key2\t$frac\n";
}
}
print map { $_->[0] }
sort { $b->[1] <=> $a->[1] }
map { [ $_, (split)[2] ] }
#array;
Output from your data set:
Frag ncRNA Amount
10 miRNA_ce 0.964
10 rRNA_ce 0.036
10 misc_RNA_ce 0
Identical lines are skipped, and then the required elements are captured from each line. A running total is kept for the subsequent calculation. Your desired output showed sorting from high to low, which is why each record is pushed onto #array. However, if sorting isn't necessary, you can just print that line and omit the Schwartzian transform on #array.
Hope this helps!
To do this you will need two passes over the data.
#! /usr/bin/env perl
use warnings;
use strict;
print join("\t",qw'Frag ncRNA Amount'),"\n";
my #data;
my $total = 0;
# parse the lines
while( <> ){
my #elem = /(.+?)(?>dvex)\d_(.+)_([^._]+)[.]out[.]data[|](d+)/;
next unless #elem;
# running total
$total += $elem[-1];
# combine $2 and $3
splice #elem, 1, 2, $2.$3; # $elem[1].$elem[2];
push #data, \#elem;
}
# print them
for( #data ){
my #copy = #$_;
$copy[-1] = $copy[-1] / $total;
$copy[-1] = sprintf('%.3f', $copy[-1]) if $copy[-1];
print join("\t",#copy),"\n";
}

perl way of getting a value expressed with the memory unit

I'm searching a way to reduce the following piece of code to a single regexp statement:
if( $current_value =~ /(\d+)(MB)*/ ){
$current_value = $1 * 1024 * 1024;
}
elsif( $current_value =~ /(\d+)(GB)*/ ){
$current_value = $1 * 1024 * 1024 * 1024;
}
elsif( $current_value =~ /(\d+)(KB)*/ ){
$current_value = $1 * 1024;
}
The code performs an evaluation of the value that can be expressed as a single number (bytes), a number and KB (kilobytes), with megabytes (MB) and so on. Any idea on how to reduce the block code?
Number::Format
use warnings;
use strict;
use Number::Format qw(format_bytes);
print format_bytes(1024), "\n";
print format_bytes(2535116549), "\n";
__END__
1K
2.36G
You could set up a hash like this:
my %FACTORS = ( 'KB' => 1024, 'MB' => 1024**2, 'GB' => 1024**3 );
And then parse the text like this:
if ( $current_value =~ /(\d+)(KB|MB|GB)/ ) {
$current_value = $1 * $FACTORS{$2};
}
In your example the regex has a * which I'm not sure you intend, because * means "zero or more" and so (+\d)(MB)* would match 10 or 10MB or 10MBMB or 10MBMBMBMBMBMBMB.
Using benzado's modified code, here is a test you can run to see if it works.
We advise you to always put code like this in a reusable method, and write a small unit-test for it:
use Test::More;
plan tests => 4;
##
# Convert a string denoting '50MB' into an amount in bytes.
my %FACTORS = ( 'KB' => 1024, 'MB' => 1024*1024, 'GB' => 1024*1024*1024 );
sub string_to_bytes {
my $current_value = shift;
if ( $current_value =~ /(\d+)(KB|MB|GB)/ ) {
$current_value = $1 * $FACTORS{$2};
}
return $current_value;
}
my $tests = {
'50' => 50,
'52KB' => 52*1024,
'55MB' => 55*1024*1024,
'57GB' => 57*1024*1024*1024
};
foreach(keys %$tests) {
is( string_to_bytes($_),$tests->{$_},
"Testing if $_ becomes $tests->{$_}");
}
Running this gives:
$ perl testz.pl
1..4
ok 1 - Testing if 55MB becomes 57671680
ok 2 - Testing if 50 becomes 50
ok 3 - Testing if 52KB becomes 53248
ok 4 - Testing if 57GB becomes 61203283968
Now you can
Add more testcases (what happens with BIG numbers? What do you want to happen? What for undef, for strings, when kB is written with small k, when you encounter kibiB or kiB or Kb?)
Turn this into a module
Write documentation in POD
Upload the Module to CPAN
And voilá!
You can do it in one regexp, by putting code snippits inside the regexp to handle the three cases differently
my $r;
$current_value =~ s/
(\d+)(?:
Ki (?{ $r = $^N * 1024 })
| Mi (?{ $r = $^N * 1024 * 1024 })
| Gi (?{ $r = $^N * 1024 * 1024 * 1024 })
)/$r/xso;
There is a problem with using KB for 1024 bytes. Kilo as a prefix generally means 1000 of a thing not 1024.
The problem gets even worse with MB since it has meant 1000*1000, 1024*1024, and 1000*1024.
A 1.44 MB floppy actually holds 1.44 * 1000 * 1024.
The only real way out of this is to use the new KiB (Kibibyte) to mean 1024 bytes.
The way you implemented it also has the limitation that you can't use 8.4Gi to mean 8.4 * 1024 * 1024. To remove that limitation I used $RE{num}{real} from Regexp::Common instead of \d+.
Some of the other answers hardwire the match by writing out all of the possible matches. That can get very tedious, not to mention error prone. To get around that I used the keys of %multiplier to generate the regex. This means that if you add or remove elements from %multiplier you won't have to modify the regex by hand.
use strict;
use warnings;
use Regexp::Common;
my %multiplier;
my $multiplier_match;
{
# populate %multiplier
my %exponent = (
K => 1, # Kilo Kibi
M => 2, # Mega Mebi
G => 3, # Giga Gibi
T => 4, # Tera Tebi
P => 5, # Peta Pebi
E => 6, # Exa Exbi
Z => 7, # Zetta Zebi
Y => 8, # Yotta Yobi
);
while( my ($str,$exp) = each %exponent ){
#multiplier{ $str, "${str}B" } = (1000 ** $exp) x2; # K KB
#multiplier{ "${str}i", "${str}iB" } = (1024 ** $exp) x2; # Ki KiB
}
# %multiplier now holds 32 pairs (8*4)
# build $multiplier_match
local $" #" # fix broken highlighting
= '|';
my #keys = keys %multiplier;
$multiplier_match = qr(#keys);
}
sub remove_multiplier{
die unless #_ == 1;
local ($_) = #_;
# s/^($RE{num}{real})($multiplier_match)$/ $1 * $multiplier{$2} /e;
if( /^($RE{num}{real})($multiplier_match)$/ ){
return $1 * $multiplier{$2};
}
return $_;
}
If you absolutely need 1K to mean 1024 then you only need to change one line.
# #multiplier{ $str, "${str}B" } = (1000 ** $exp) x2; # K KB
#multiplier{ $str, "${str}B" } = (1024 ** $exp) x2; # K KB
Note that since I used $RE{num}{real} from Regexp::Common it will also work with 5.3e1Ki.

Help me finish the last part of my app? It solves any Countdown Numbers game on Channel 4 by brute forcing every possibly equation

For those not familiar with the game. You're given 8 numbers and you have to reach the target by using +, -, / and *.
So if the target is 254 and your game numbers are 2, 50, 5, 2, 1, you would answer the question correctly by saying 5 * 50 = 250. Then 2+2 is four. Add that on aswell to get 254.
Some videos of the game are here:
Video 1
video 2
Basically I brute force the game using by generating all perms of all sizes for the numbers and all perms of the symbols and use a basic inflix calculator to calculate the solution.
However it contains a flaw because all the solutions are solved as following: ((((1+1)*2)*3)*4). It doesn't permutate the brackets and it's causing my a headache.
Therefore I cannot solve every equation. For example, given
A target of 16 and the numbers 1,1,1,1,1,1,1,1 it fails when it should do (1+1+1+1)*(1+1+1+1)=16.
I'd love it in someone could help finish this...in any language.
This is what I've written so far:
#!/usr/bin/env perl
use strict;
use warnings;
use Algorithm::Permute;
# GAME PARAMETERS TO FILL IN
my $target = 751;
my #numbers = ( '2', '4', '7', '9', '1', '6', '50', '25' );
my $num_numbers = scalar(#numbers);
my #symbols = ();
foreach my $n (#numbers) {
push(#symbols, ('+', '-', '/', '*'));
}
my $num_symbols = scalar(#symbols);
print "Symbol table: " . join(", ", #symbols);
my $lst = [];
my $symb_lst = [];
my $perms = '';
my #perm = ();
my $symb_perms = '';
my #symb_perm;
my $print_mark = 0;
my $progress = 0;
my $total_perms = 0;
my #closest_numbers;
my #closest_symb;
my $distance = 999999;
sub calculate {
my #oprms = #{ $_[0] };
my #ooperators = #{ $_[1] };
my #prms = #oprms;
my #operators = #ooperators;
#print "PERMS: " . join(", ", #prms) . ", OPERATORS: " . join(", ", #operators);
my $total = pop(#prms);
foreach my $operator (#operators) {
my $x = pop(#prms);
if ($operator eq '+') {
$total += $x;
}
if ($operator eq '-') {
$total -= $x;
}
if ($operator eq '*') {
$total *= $x;
}
if ($operator eq '/') {
$total /= $x;
}
}
#print "Total: $total\n";
if ($total == $target) {
#print "ABLE TO ACCURATELY SOLVE WITH THIS ALGORITHM:\n";
#print "PERMS: " . join(", ", #oprms) . ", OPERATORS: " . join(", ", #ooperators) . ", TOTAL=$total\n";
sum_print(\#oprms, \#ooperators, $total, 0);
exit(0);
}
my $own_distance = ($target - $total);
if ($own_distance < 0) {
$own_distance *= -1;
}
if ($own_distance < $distance) {
#print "found a new solution - only $own_distance from target $target\n";
#print "PERMS: " . join(", ", #oprms) . ", OPERATORS: " . join(", ", #ooperators) . ", TOTAL=$total\n";
sum_print(\#oprms, \#ooperators, $total, $own_distance);
#closest_numbers = #oprms;
#closest_symb = #ooperators;
$distance = $own_distance;
}
$progress++;
if (($progress % $print_mark) == 0) {
print "Tested $progress permutations. " . (($progress / $total_perms) * 100) . "%\n";
}
}
sub factorial {
my $f = shift;
$f == 0 ? 1 : $f*factorial($f-1);
}
sub sum_print {
my #prms = #{ $_[0] };
my #operators = #{ $_[1] };
my $total = $_[2];
my $distance = $_[3];
my $tmp = '';
my $op_len = scalar(#operators);
print "BEST SOLUTION SO FAR: ";
for (my $x = 0; $x < $op_len; $x++) {
print "(";
}
$tmp = pop(#prms);
print "$tmp";
foreach my $operator (#operators) {
$tmp = pop(#prms);
print " $operator $tmp)";
}
if ($distance == 0) {
print " = $total\n";
}
else {
print " = $total (distance from target $target is $distance)\n";
}
}
# look for straight match
foreach my $number (#numbers) {
if ($number == $target) {
print "matched!\n";
}
}
for (my $x = 1; $x < (($num_numbers*2)-1); $x++) {
$total_perms += factorial($x);
}
print "Total number of permutations: $total_perms\n";
$print_mark = $total_perms / 100;
if ($print_mark == 0) {
$print_mark = $total_perms;
}
for (my $num_size=2; $num_size <= $num_numbers; $num_size++) {
$lst = \#numbers;
$perms = new Algorithm::Permute($lst, $num_size);
print "Perms of size: $num_size.\n";
# print matching symb permutations
$symb_lst = \#symbols;
$symb_perms = new Algorithm::Permute($symb_lst, $num_size-1);
while (#perm = $perms->next) {
while (#symb_perm = $symb_perms->next) {
calculate(\#perm, \#symb_perm);
}
$symb_perms = new Algorithm::Permute($symb_lst, $num_size-1);
}
}
print "exhausted solutions";
print "CLOSEST I CAN GET: $distance\n";
sum_print(\#closest_numbers, \#closest_symb, $target-$distance, $distance);
exit(0);
Here is the example output:
[15:53: /mnt/mydocuments/git_working_dir/countdown_solver$] perl countdown_solver.pl
Symbol table: +, -, /, *, +, -, /, *, +, -, /, *, +, -, /, *, +, -, /, *, +, -, /, *, +, -, /, *, +, -, /, *Total number of permutations: 93928268313
Perms of size: 2.
BEST SOLUTION SO FAR: (2 + 4) = 6 (distance from target 751 is 745)
BEST SOLUTION SO FAR: (2 * 4) = 8 (distance from target 751 is 743)
BEST SOLUTION SO FAR: (4 + 7) = 11 (distance from target 751 is 740)
BEST SOLUTION SO FAR: (4 * 7) = 28 (distance from target 751 is 723)
BEST SOLUTION SO FAR: (4 * 9) = 36 (distance from target 751 is 715)
BEST SOLUTION SO FAR: (7 * 9) = 63 (distance from target 751 is 688)
BEST SOLUTION SO FAR: (4 * 50) = 200 (distance from target 751 is 551)
BEST SOLUTION SO FAR: (7 * 50) = 350 (distance from target 751 is 401)
BEST SOLUTION SO FAR: (9 * 50) = 450 (distance from target 751 is 301)
Perms of size: 3.
BEST SOLUTION SO FAR: ((4 + 7) * 50) = 550 (distance from target 751 is 201)
BEST SOLUTION SO FAR: ((2 * 7) * 50) = 700 (distance from target 751 is 51)
BEST SOLUTION SO FAR: ((7 + 9) * 50) = 800 (distance from target 751 is 49)
BEST SOLUTION SO FAR: ((9 + 6) * 50) = 750 (distance from target 751 is 1)
Perms of size: 4.
BEST SOLUTION SO FAR: (((9 + 6) * 50) + 1) = 751
Here is Java applet (source) and Javascript version.
The suggestion to use reverse polish notation is excellent.
If you have N=5 numbers, the template is
{num} {num} {ops} {num} {ops} {num} {ops} {num} {ops}
There can be zero to N ops in any spot, although the total number will be N-1. You just have to try different placements of numbers and ops.
The (((1+1)+1)+1)*(((1+1)+1)+1)=16 solution will be found when you try
1 1 + 1 + 1 + 1 1 + 1 + 1 + *
Update: Maybe not so good, since finding the above could take 433,701,273,600 tries. The number was obtained using the following:
use strict;
use warnings;
{
my %cache = ( 0 => 1 );
sub fact { my ($n) = #_; $cache{$n} ||= fact($n-1) * $n }
}
{
my %cache;
sub C {
my ($n,$r) = #_;
return $cache{"$n,$r"} ||= do {
my $i = $n;
my $j = $n-$r;
my $c = 1;
$c *= $i--/$j-- while $j;
$c
};
}
}
my #nums = (1,1,1,1,1,1,1,1);
my $Nn = 0+#nums; # Number of numbers.
my $No = $Nn-1; # Number of operators.
my $max_tries = do {
my $num_orderings = fact($Nn);
{
my %counts;
++$counts{$_} for #nums;
$num_orderings /= fact($_) for values(%counts);
}
my $op_orderings = 4 ** $No;
my $op_placements = 1;
$op_placements *= C($No, $_) for 1..$No-1;
$num_orderings * $op_orderings * $op_placements
};
printf "At most %.f tries needed\n", $max_tries;

How do I check for numeric overflow and underflow conditions in Perl?

I am doing one multiplication operation between two float variables. After that i need to check for numeric overflow, underflow and divide by zero errors if any.
How I can do this?
Here's a way to check for overflow (which is actually just floating point +Infinity and -Infinity):
#!perl -w
use strict;
my $x = 10 ** 200;
my $positive_overflow = $x * $x;
my $negative_overflow = -$x * $x;
print is_infinity($positive_overflow) ? 'true' : 'false';
print "\n";
print is_infinity($negative_overflow) ? 'true' : 'false';
print "\n";
sub is_infinity
{
my $x = shift;
return $x =~ /inf/i;
}
Division by zero is tricky because you can't actually perform the division in normal program scope without having it die on you. You can wrap it in eval though:
#!perl -w
use strict;
my $x = 100;
my $y = 0;
my $q = try_divide($x, $y);
print "Might be division by zero...\n" if !defined $q;
$y = 10;
$q = try_divide($x, $y);
print "$q\n";
sub try_divide
{
my $x = shift;
my $y = shift;
my $q;
eval { $q = $x / $y };
return $q;
}
If your dividend is non-zero and your quotient (result) is zero, you've had an underflow.
If your result is non-zero, underflow can be checked by finding the number closest to 1 and multiplying your non-result by it and seeing if it changes; if it was a subnormal result, it will remain unchanged, since it will lack the full range of precision a normal result would have.
my $underflow_checker;
for ( my $i = 1; 1 + $i > 1; $i /= 2 ) { $underflow_checker = 1 + $i }
...
$x = 2**-520;
$y = 2**520;
$result = $x / $y;
if ( $result == 0 && $x != 0 || $result != 0 && $result * $underflow_checker == $result ) { print "Underflow!\n" }