I'm searching a way to reduce the following piece of code to a single regexp statement:
if( $current_value =~ /(\d+)(MB)*/ ){
$current_value = $1 * 1024 * 1024;
}
elsif( $current_value =~ /(\d+)(GB)*/ ){
$current_value = $1 * 1024 * 1024 * 1024;
}
elsif( $current_value =~ /(\d+)(KB)*/ ){
$current_value = $1 * 1024;
}
The code performs an evaluation of the value that can be expressed as a single number (bytes), a number and KB (kilobytes), with megabytes (MB) and so on. Any idea on how to reduce the block code?
Number::Format
use warnings;
use strict;
use Number::Format qw(format_bytes);
print format_bytes(1024), "\n";
print format_bytes(2535116549), "\n";
__END__
1K
2.36G
You could set up a hash like this:
my %FACTORS = ( 'KB' => 1024, 'MB' => 1024**2, 'GB' => 1024**3 );
And then parse the text like this:
if ( $current_value =~ /(\d+)(KB|MB|GB)/ ) {
$current_value = $1 * $FACTORS{$2};
}
In your example the regex has a * which I'm not sure you intend, because * means "zero or more" and so (+\d)(MB)* would match 10 or 10MB or 10MBMB or 10MBMBMBMBMBMBMB.
Using benzado's modified code, here is a test you can run to see if it works.
We advise you to always put code like this in a reusable method, and write a small unit-test for it:
use Test::More;
plan tests => 4;
##
# Convert a string denoting '50MB' into an amount in bytes.
my %FACTORS = ( 'KB' => 1024, 'MB' => 1024*1024, 'GB' => 1024*1024*1024 );
sub string_to_bytes {
my $current_value = shift;
if ( $current_value =~ /(\d+)(KB|MB|GB)/ ) {
$current_value = $1 * $FACTORS{$2};
}
return $current_value;
}
my $tests = {
'50' => 50,
'52KB' => 52*1024,
'55MB' => 55*1024*1024,
'57GB' => 57*1024*1024*1024
};
foreach(keys %$tests) {
is( string_to_bytes($_),$tests->{$_},
"Testing if $_ becomes $tests->{$_}");
}
Running this gives:
$ perl testz.pl
1..4
ok 1 - Testing if 55MB becomes 57671680
ok 2 - Testing if 50 becomes 50
ok 3 - Testing if 52KB becomes 53248
ok 4 - Testing if 57GB becomes 61203283968
Now you can
Add more testcases (what happens with BIG numbers? What do you want to happen? What for undef, for strings, when kB is written with small k, when you encounter kibiB or kiB or Kb?)
Turn this into a module
Write documentation in POD
Upload the Module to CPAN
And voilá!
You can do it in one regexp, by putting code snippits inside the regexp to handle the three cases differently
my $r;
$current_value =~ s/
(\d+)(?:
Ki (?{ $r = $^N * 1024 })
| Mi (?{ $r = $^N * 1024 * 1024 })
| Gi (?{ $r = $^N * 1024 * 1024 * 1024 })
)/$r/xso;
There is a problem with using KB for 1024 bytes. Kilo as a prefix generally means 1000 of a thing not 1024.
The problem gets even worse with MB since it has meant 1000*1000, 1024*1024, and 1000*1024.
A 1.44 MB floppy actually holds 1.44 * 1000 * 1024.
The only real way out of this is to use the new KiB (Kibibyte) to mean 1024 bytes.
The way you implemented it also has the limitation that you can't use 8.4Gi to mean 8.4 * 1024 * 1024. To remove that limitation I used $RE{num}{real} from Regexp::Common instead of \d+.
Some of the other answers hardwire the match by writing out all of the possible matches. That can get very tedious, not to mention error prone. To get around that I used the keys of %multiplier to generate the regex. This means that if you add or remove elements from %multiplier you won't have to modify the regex by hand.
use strict;
use warnings;
use Regexp::Common;
my %multiplier;
my $multiplier_match;
{
# populate %multiplier
my %exponent = (
K => 1, # Kilo Kibi
M => 2, # Mega Mebi
G => 3, # Giga Gibi
T => 4, # Tera Tebi
P => 5, # Peta Pebi
E => 6, # Exa Exbi
Z => 7, # Zetta Zebi
Y => 8, # Yotta Yobi
);
while( my ($str,$exp) = each %exponent ){
#multiplier{ $str, "${str}B" } = (1000 ** $exp) x2; # K KB
#multiplier{ "${str}i", "${str}iB" } = (1024 ** $exp) x2; # Ki KiB
}
# %multiplier now holds 32 pairs (8*4)
# build $multiplier_match
local $" #" # fix broken highlighting
= '|';
my #keys = keys %multiplier;
$multiplier_match = qr(#keys);
}
sub remove_multiplier{
die unless #_ == 1;
local ($_) = #_;
# s/^($RE{num}{real})($multiplier_match)$/ $1 * $multiplier{$2} /e;
if( /^($RE{num}{real})($multiplier_match)$/ ){
return $1 * $multiplier{$2};
}
return $_;
}
If you absolutely need 1K to mean 1024 then you only need to change one line.
# #multiplier{ $str, "${str}B" } = (1000 ** $exp) x2; # K KB
#multiplier{ $str, "${str}B" } = (1024 ** $exp) x2; # K KB
Note that since I used $RE{num}{real} from Regexp::Common it will also work with 5.3e1Ki.
Related
I have the size of disk value as below.
323.2T, 123.23G, 1.011T, 2.42M.
How to convert all these into KB in Perl
I would build a hash of multipliers for each factor and use it in a regex substitution
The following starts with a multiple of 1 for Kilobytes and increases it my a factor of 1024 == 210 for each subsequent factor. You can change 1024 to 1000 == 103 if that's what you prefer
The substitution simply looks for a sequence of digits and decimal points followed by one of the eligible factor letters, does the multiplication and replaces the letter with K
use strict;
use warnings 'all';
use feature 'say';
my %factors;
{
my $f = 1;
for my $c ( qw/ K M G T P E / ) {
$factors{$c} = $f;
$f *= 1024;
}
}
my $s = '323.2T, 123.23G, 1.011T, 2.42M';
$s =~ s/([\d.]+)([KMGTPE])/$1 * $factors{$2} . 'K'/eg;
say $s;
output
347033357516.8K, 129216020.48K, 1085552984.064K, 2478.08K
I have a random number between 0.001 and 1000 and I need perl to print it with a fixed column width of 5 characters. That is, if it's too long, it should be rounded, and if it's too short, it should be padded with spaces.
Everything I found online suggested using sprintf, but sprintf ignores the field width if the number is too long.
Is there any way to get perl to do this?
What doesn't work:
my $number = 0.001 + rand(1000);
my $formattednumber = sprintf("%5f", $number);
print <<EOF;
$formattednumber
EOF
You need to define your sprintf pattern dynamically. The number of decimals depends on the number of digits on the left hand side of the decimal point.
This function will do that for you.
use strict;
use warnings 'all';
use feature 'say';
sub round_to_col {
my ( $number, $width ) = #_;
my $width_int = length int $number;
return sprintf(
sprintf( '%%%d.%df', $width_int, $width - 1 - $width_int ),
$number
);
}
my $number = 0.001 + rand(1000);
say round_to_col( $number, 5);
Output could be:
11.18
430.7
0.842
You could use pack after the sprintf. It may not be a computationally efficient approach, but it is relatively simple to implement and maintain:
my $formattednumber = pack ('A5', sprintf("%5f", $number));
The answer posted by simbabque does not cover all cases, so this is my improved version, just in case anyone also needs something like this:
sub round_to_col {
my ( $number, $width ) = #_;
my $width_int = length int $number;
my $sprintf;
print "round_to_col error: number longer than width" if $width_int > $width;
$sprintf = "%d" if $width_int == $width;
$sprintf = "% d" if $width_int == $width - 1;
$sprintf = sprintf( '%%%d.%df', $width_int, $width - 1 - $width_int )
if $width_int < $width -1;
return sprintf( $sprintf , $number );
}
I have a variable which contains a file size:
my $tx = "41.4 MB";
or
my $tx = "34.4 GB";
How do I go about converting this to a KB value. So if tx contains MB then * 1024, and if tx contains GB then * 1024 * 1024?
You need to separate out and test the units.
use strict;
use warnings;
sub size_to_kb {
my $size = shift;
my ($num, $units) = split ' ', $size;
if ($units eq 'MB') {
$num *= 1024;
} elsif ($units eq 'GB') {
$num *= 1024 ** 2;
} elsif ($units ne 'KB') {
die "Unrecognized units: $units"
}
return "$num KB";
}
print size_to_kb("41.4 MB"), "\n";
print size_to_kb("34.4 GB"), "\n";
Outputs:
42393.6 KB
36071014.4 KB
< / hand holding >
Is this usage of unpack correct if I would like to try this guessing subroutine with the variables first 1000 bytes?
#!/usr/bin/env perl
use warnings;
use 5.10.1;
my $var = ...;
my $part = unpack( 'b1000', $var ) ;
sub is_binary_data {
local $_ = shift;
( tr/ -~//c / length ) >= .3;
}
if ( is_binary_data( $part ) ) {
say "Binary";
}
else {
say "Text";
}
No it isn't since unpack will create a string of 0 and 1's (up to 1000 of them) which would certainly pass the ascii test (which I believe tr, -~,,c / length is)
I would suggest using just substr ($var, 0, 1000) instead.
Also, maybe \r and \n should appear in the tr//.
I need to do some arithmetic with large hexadecimal numbers below, but when I try to output I'm getting overflow error messages "Hexadecimal number > 0xffffffff non-portable", messages about not portable, or the maximum 32-bit hex value FFFFFFFF.
All of which imply that the standard language and output routines only cope with 32 bit values. I need 64-bit values and have done a lot of research, but I found nothing that BOTH enables the arithmetic AND outputs the large number in hex.
my $result = 0x00000200A0000000 +
( ( $id & 0xFFFFF ) * 2 ) + ( ( $id / 0x100000 ) * 0x40000000 );
So, for $id with the following values I should get $result:
$id = 0, $result = 0x00000200A0000000
$id = 1, $result = 0x00000200A0000002
$id = 2, $result = 0x00000200A0000004
How can I do this?
Here is my inconclusive research results, with reasons why:
How can I do 64-bit arithmetic in Perl?
How can I sum large hexadecimal values in Perl? Vague, answer not definitively precise and no example.
Integer overflow
non conclusive
Integer overflow
non conclusive
bigint
no info about assignment, arithmetic or output
bignum
examples not close to my problem.
How can I sprintf a big number in Perl?
example given is not enough info for me: doesn't deal with hex
assignment or arithmetic.
Re: secret code generator
Some examples using Fleximal, mentions to_str to output value of
variable but 1) I don't see how the
variable was assigned and 2) I get
error "Can't call method "to_str"
without a package or object
reference" when I run my code using
it.
String to Hex
Example of using Math::BigInt which
doesn't work for me - still get
overflow error.
Is there a 64-bit hex()?
Nearly there - but doesn't deal with
outputting the large number in hex,
it only talks of decimal.
CPAN Math:Fleximal
does the arithmetic, but there doesn't seem to be any means to actually
output the value still in hex
sprintf
Doesn't seem to be able to cope with
numbers greater than 32-bits, get the
saturated FFFFFFFF message.
Edit: Update - new requirement and supplied solution - please feel free to offer comments
Chas. Owens answer is still accepted and excellent (part 2 works for me, haven't tried the part 1 version for newer Perl, though I would invite others to confirm it).
However, another requirement was to be able to convert back from the result to the original id.
So I've written the code to do this, here's the full solution, including #Chas. Owens original solution, followed by the implementation for this new requirement:
#!/usr/bin/perl
use strict;
use warnings;
use bigint;
use Carp;
sub bighex {
my $hex = shift;
my $part = qr/[0-9a-fA-F]{8}/;
croak "$hex is not a 64-bit hex number"
unless my ($high, $low) = $hex =~ /^0x($part)($part)$/;
return hex("0x$low") + (hex("0x$high") << 32);
}
sub to_bighex {
my $decimal = shift;
croak "$decimal is not an unsigned integer"
unless $decimal =~ /^[0-9]+$/;
my $high = $decimal >> 32;
my $low = $decimal & 0xFFFFFFFF;
return sprintf("%08x%08x", $high, $low);
}
for my $id (0 ,1, 2, 0xFFFFF, 0x100000, 0x100001, 0x1FFFFF, 0x200000, 0x7FDFFFFF ) {
my $result = bighex("0x00000200A0000000");
$result += ( ( $id & 0xFFFFF ) * 2 ) + ( ( $id / 0x100000 ) * 0x40000000 );
my $clusterid = to_bighex($result);
# the convert back code here:
my $clusterid_asHex = bighex("0x".$clusterid);
my $offset = $clusterid_asHex - bighex("0x00000200A0000000");
my $index_small_units = ( $offset / 2 ) & 0xFFFFF;
my $index_0x100000_units = ( $offset / 0x40000000 ) * 0x100000;
my $index = $index_0x100000_units + $index_small_units;
print "\$id = ".to_bighex( $id ).
" clusterid = ".$clusterid.
" back to \$id = ".to_bighex( $index ).
" \n";
}
Try out this code at http://ideone.com/IMsp6.
#!/usr/bin/perl
use strict;
use warnings;
use bigint qw/hex/;
for my $id (0 ,1, 2) {
my $result = hex("0x00000200A0000000") +
( ( $id & 0xFFFFF ) * 2 ) + ( ( $id / 0x100000 ) * 0x40000000 );
printf "%d: %#016x\n", $id, $result;
}
The bigint pragma replaces the hex function with a version that can handle numbers that large. It also transparently makes the mathematical operators deal with big ints instead of the ints on the target platform.
Note, this only works in Perl 5.10 and later. If you are running an earlier version of Perl 5, you can try this:
#!/usr/bin/perl
use strict;
use warnings;
use bigint;
use Carp;
sub bighex {
my $hex = shift;
my $part = qr/[0-9a-fA-F]{8}/;
croak "$hex is not a 64-bit hex number"
unless my ($high, $low) = $hex =~ /^0x($part)($part)$/;
return hex("0x$low") + (hex("0x$high") << 32);
}
sub to_bighex {
my $decimal = shift;
croak "$decimal is not an unsigned integer"
unless $decimal =~ /^[0-9]+$/;
my $high = $decimal >> 32;
my $low = $decimal & 0xFFFFFFFF;
return sprintf("%08x%08x", $high, $low);
}
for my $id (0 ,1, 2) {
my $result = bighex("0x00000200A0000000");
$result += ( ( $id & 0xFFFFF ) * 2 ) + ( ( $id / 0x100000 ) * 0x40000000 );
print "$id ", to_bighex($result), "\n";
}
The comment by ysth is right. Short example of 64-bit arithmetics using Perl from Debian stretch without Math::BigInt aka "use bigint":
#!/usr/bin/perl -wwi
sub do_64bit_arith {
use integer;
my $x = ~2;
$x <<= 4;
printf "0x%08x%08x\n", $x>>32, $x;
}
do_64bit_arith();
exit 0;
The script prints 0xffffffffffffffffffffffffffffffd0.