I am trying to finish a homework for my class. After I installed a API::Twitter module using CPAN, the Net::Twitter module seems to be affected and can no longer be used.
I tried to run this code in my Mac terminal:
song_hanlun_hw8.pl
#!/usr/bin/perl
use Net::Twitter;
use JSON;
use LWP::Simple;
use XML::Bare;
use Data::Dumper;
# keys for twitter
$consumer_key = "key";
$consumer_secret = "key";
$token = "key-key";
$token_secret = "key";
# keys for sentiment analysis
$apikey = "key";
# As of 13-Aug-2010, Twitter requires OAuth for authenticated requests
my $nt = Net::Twitter->new(
traits => [qw/API::RESTv1_1/],
# traits => [qw/API::Search/],
consumer_key => $consumer_key,
consumer_secret => $consumer_secret,
access_token => $token,
access_token_secret => $token_secret,
);
#enter a term you want to search for
$search_term = "Halloween";
$count = 100;
my $r = $nt->search( { q => $search_term, count => $count } );
# print Dumper $r;
for my $status ( #{ $r->{statuses} } ) {
push #tweets, $status->{text};
}
$nextMaxId = $r->{search_metadata}->{next_results};
$nextMaxId =~ s/\?max_id=//g;
$nextMaxId =~ s/&q=Halloween&count=100&include_entities=1//g;
# print $nextMaxId;
my $r2 = $nt->search( { q => $search_term, count => $count, max_id => $nextMaxId } );
for my $status ( #{ $r2->{statuses} } ) {
push #tweets, $status->{text};
}
foreach $tweet ( #tweets ) {
# $tweet = $tweets[1];
$return = get "http://gateway-a.watsonplatform.net/calls/text/TextGetTextSentiment?apikey=$apikey&text=$tweet&outputMode=xml";
$bare = new XML::Bare( text => $return );
$root = $bare->parse();
$sentiment = $root->{results}->{docSentiment}->{score}->{value};
push #sentiments, $sentiment;
# print "$sentiment";
}
$m100to75 = 0;
$m75to50 = 0;
$m50to25 = 0;
$m25to0 = 0;
$p0to25 = 0;
$p25to50 = 0;
$p50to75 = 0;
$p75to100 = 0;
foreach $sent ( #sentiments ) {
# smaller than zero. four ranges.
if ( $sent >= -1 && $sent < -0.75 ) {
$m100to75++;
}
if ( $sent >= -0.75 && $sent < -0.5 ) {
$m75to50++;
}
if ( $sent >= -0.5 && $sent < -0.25 ) {
$m50to25++;
}
if ( $sent >= -0.25 && $sent < 0 ) {
$m25to0++;
}
#bigger than zero. four ranges.
if ( $sent >= 0 && $sent < 0.25 ) {
$p0to25++;
}
if ( $sent >= 0.25 && $sent < 0.5 ) {
$p25to50++;
}
if ( $sent >= 0.5 && $sent < 0.75 ) {
$p50to75++;
}
if ( $sent >= 0.75 && $sent < 1 ) {
$p75to100++;
}
}
# print "$m100to75\n$m75to50\n$m50to25\n$m25to0\n$p0to25\n$p25to50\n$p50to75\n$p75to100\n";
print "tweets sentiment score summary histogram:\n";
print "-1.00 to -0.75: ";
for ( $i = 0 ; $i < $m100to75 ; $i++ ) {
print "*";
}
print "\n";
print "-0.75 to -0.50: ";
for ( $i = 0 ; $i < $m75to50 ; $i++ ) {
print "*";
}
print "\n";
print "-0.50 to -0.25: ";
for ( $i = 0 ; $i < $m50to25 ; $i++ ) {
print "*";
}
print "\n";
print "-0.25 to -0.00: ";
for ( $i = 0 ; $i < $m25to0 ; $i++ ) {
print "*";
}
print "\n";
print "+0.00 to +0.25: ";
for ( $i = 0 ; $i < $p0to25 ; $i++ ) {
print "*";
}
print "\n";
print "+0.25 to +0.50: ";
for ( $i = 0 ; $i < $p25to50 ; $i++ ) {
print "*";
}
print "\n";
print "+0.50 to +0.75: ";
for ( $i = 0 ; $i < $p50to75 ; $i++ ) {
print "*";
}
print "\n";
print "+0.75 to +1.00: ";
for ( $i = 0 ; $i < $p75to100 ; $i++ ) {
print "*";
}
print "\n";
But the terminal returns
Invalid version format (version required) at /Library/Perl/5.18/Module/Runtime.pm line 386.
BEGIN failed--compilation aborted at /Library/Perl/5.18/Net/Twitter.pm line 3.
Compilation failed in require at song_hanlun_hw8.pl line 3.
BEGIN failed--compilation aborted at song_hanlun_hw8.pl line 3.
Could someone tell me how to fix this?
After some Google search I highly believe downgrading the Module::Runtime would solve this issue. But I could't find how.
Eventually I solved this issue by delete the Runtime module file in perl folder and reinstalled the Net::Twitter module, which also reinstalled the Runtime that Net::Twitter depends on.
Related
I'm trying to count the number of bases using a for loop and the substr function but the counts are off and I'm not sure why! Please help! I have to use these functions in my assignment. Where am I going wrong? Here is my code:
use strict;
use warnings;
my $user_input = "accgtutf5";
#initalizing the lengths
my $a_base_total = 0;
my $c_base_total = 0;
my $g_base_total = 0;
my $t_base_total = 0;
my $other_total = 0;
for ( my $position = 0; $position < length $user_input; $position++ ) {
my $nucleotide = substr( $user_input, $position, 1 );
if ( $nucleotide eq "a" ) {
$a_base_total++;
} elsif ( $nucleotide eq "c" ) {
$c_base_total++;
} elsif ( $nucleotide eq "g" ) {
$g_base_total++;
} elsif ( $nucleotide eq "t" ) {
$t_base_total++;
} else {
$other_total++;
}
$position++;
}
print "a = $a_base_total\n";
print "c = $c_base_total\n";
print "g = $g_base_total\n";
print "t = $t_base_total\n";
print "other = $other_total\n";
The output I'm getting is :
a=1
c=1
g=0
t=2
other=1
When it should be:
a = 1
c = 2
g = 1
t = 2
other = 3
Thanks in advance! :)
You're incrementing twice.
Simply remove this line:
$position++;
Also, instead of iterating on position, I would suggest iterating on character.
Your script can be simplified to just:
use strict;
use warnings;
my $user_input = "accgtutf5";
my %count;
for my $nucleotide (split '', $user_input) {
$nucleotide = 'other' unless $nucleotide =~ /[acgt]/;
$count{$nucleotide}++;
}
printf "%s = %d\n", $_, $count{$_} // 0 for qw(a c g t other);
You are incrementing $position twice: once at the for and once at the end of the loop. Remove the second $position++.
I am writing a Perl code, using substr to extract characters one by one but encountered a very strange problem.
I am trying to do the following
Scan character one by one, if it is # go to end of line, if it is ' or " then find the next matching one. Also added HTML color tag to highlight them. Everything else just print.
Here is the block of code
while ($char = (substr $src, $off_set, 1)) {
if ($char eq '#') {
$end_index = index $src, "\n", $off_set+ 1;
my $c = substr($src, $off_set, $end_index-$off_set+1);
print $comment_color.$c.$color_end;
} elsif (($char eq '"') || ($char eq "'")) {
$end_index = index ($src, $char, $off_set+1);
my $char_before = substr $src, $end_index-1, 1;
while ($end_index > 0 && $char_before eq '\\') {
$end_index = index $src, $char, $end_index + 1;
$char_before = substr $src, $end_index-1, 1;
}
my $s = substr($src, $off_set, $end_index-$off_set+1);
print $string_color.$s.$color_end;
} else {
print $char;
$end_index++;
}
$off_set = $end_index + 1;
}
When I use the following testing code, the script will just exit on first 0, if I remove all the 0 then it runs ok. If I remove first 0, it will exit on 2nd. I really have no idea why this happens.
# Comment 1
my $zero = 0;
my #array = (0xdead_beef, 0377, 0b011011);
# xor
sub sample2
{
print "true or false";
return 3 + 4 eq " 7"; # true or false
}
#now write input to STDOUT
print $time . "\n";
my $four = "4";
Check for defined in your while loop:
while (defined(my $char = substr $src, $off_set, 1)) {
The reason why your code was exiting early is because '0' is a false value, and therefore the while would end. Instead, this will check if any value is pulled from the substr call.
This is your loop condition:
while ($char = (substr $src, $off_set, 1)) {
...
So what happens when $char = "0"? As Perl considers that to be a false value, the loop will terminate. Instead, loop as long as characters are left:
while ($off_set < length $src) {
my $char = substr $src, $off_set, 1;
...
Anyway, your code is convoluted and hard to read. Consider using regular expressions instead:
use re '/xsm';
my $src = ...;
pos($src) = 0;
my $out = '';
while (pos($src) < length $src) {
if ($src =~ m/\G ([#][^\n]*)/gc) {
$out .= colored(comment => $1);
}
elsif ($src =~ m/\G (["] (?:[^"\\]++|[\\].)* ["])/gc) {
$out .= colored(string => $1);
}
elsif ($src =~ m/\G (['] (?:[^'\\]++|[\\]['\\])* ['])/gc) {
$out .= colored(string => $1);
}
elsif ($src =~ m/\G ([^"'#]+)/gc) {
$out .= $1;
}
else {
die "illegal state";
}
}
where colored is some helper function.
Finally, figured out it is the while loop. It exit the loop when it sees a 0.
Updated the while loop condition to
while (($char = (substr $src, $off_set, 1)) || ($off_set < (length $src))) {
and it is working now.
I'm comparing a reference sequence of size 5500 bases and query sequence of size 3600, using dynamic programming (semi global alignment), in fact I don't know much about complexity and performance and the code is blowing up and giving me the error "out of memory". Knowing that it works normally on smaller sequences, my question is: This behavior is normal or I might have another problem in code ?if it's normal any hint to solve this problem ? Thanks in advance.
sub semiGlobal {
my ( $seq1, $seq2,$MATCH,$MISMATCH,$GAP ) = #_;
# initialization: first row to 0 ;
my #matrix;
$matrix[0][0]{score} = 0;
$matrix[0][0]{pointer} = "none";
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
$matrix[0][$j]{score} = 0;
$matrix[0][$j]{pointer} = "none";
}
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
$matrix[$i][0]{score} = $GAP * $i;
$matrix[$i][0]{pointer} = "up";
}
# fill
my $max_i = 0;
my $max_j = 0;
my $max_score = 0;
print "seq2: ".length($seq2);
print "seq1: ".length($seq1);
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
my ( $diagonal_score, $left_score, $up_score );
# calculate match score
my $letter1 = substr( $seq1, $j - 1, 1 );
my $letter2 = substr( $seq2, $i - 1, 1 );
if ( $letter1 eq $letter2 ) {
$diagonal_score = $matrix[ $i - 1 ][ $j - 1 ]{score} + $MATCH;
}
else {
$diagonal_score = $matrix[ $i - 1 ][ $j - 1 ]{score} + $MISMATCH;
}
# calculate gap scores
$up_score = $matrix[ $i - 1 ][$j]{score} + $GAP;
$left_score = $matrix[$i][ $j - 1 ]{score} + $GAP;
# choose best score
if ( $diagonal_score >= $up_score ) {
if ( $diagonal_score >= $left_score ) {
$matrix[$i][$j]{score} = $diagonal_score;
$matrix[$i][$j]{pointer} = "diagonal";
}
else {
$matrix[$i][$j]{score} = $left_score;
$matrix[$i][$j]{pointer} = "left";
}
}
else {
if ( $up_score >= $left_score ) {
$matrix[$i][$j]{score} = $up_score;
$matrix[$i][$j]{pointer} = "up";
}
else {
$matrix[$i][$j]{score} = $left_score;
$matrix[$i][$j]{pointer} = "left";
}
}
# set maximum score
if ( $matrix[$i][$j]{score} > $max_score ) {
$max_i = $i;
$max_j = $j;
$max_score = $matrix[$i][$j]{score};
}
}
}
my $align1 = "";
my $align2 = "";
my $j = $max_j;
my $i = $max_i;
while (1) {
if ( $matrix[$i][$j]{pointer} eq "none" ) {
$stseq1 = $j;
last;
}
if ( $matrix[$i][$j]{pointer} eq "diagonal" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
$j--;
}
elsif ( $matrix[$i][$j]{pointer} eq "left" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= "-";
$j--;
}
elsif ( $matrix[$i][$j]{pointer} eq "up" ) {
$align1 .= "-";
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
}
}
$align1 = reverse $align1;
$align2 = reverse $align2;
return ( $align1, $align2, $stseq1 ,$max_j);
}
One way to possibly solve the problem is to tie the #matrix with a file. However, this will dramatically slow down the program. Consider this:
sub semiGlobal {
use Tie::Array::CSV;
tie my #matrix, 'Tie::Array::CSV', 'temp.txt'; # Don't forget to add your own error handler.
my ( $seq1, $seq2,$MATCH,$MISMATCH,$GAP ) = #_;
# initialization: first row to 0 ;
$matrix[0][0] = '0 n';
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
$matrix[0][$j] = '0 n';
}
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
my $score = $GAP * $i;
$matrix[$i][0] = join ' ',$score,'u';
}
#print Dumper(\#matrix);
# fill
my $max_i = 0;
my $max_j = 0;
my $max_score = 0;
print "seq2: ".length($seq2)."\n";
print "seq1: ".length($seq1)."\n";
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
my ( $diagonal_score, $left_score, $up_score );
# calculate match score
my $letter1 = substr( $seq1, $j - 1, 1 );
my $letter2 = substr( $seq2, $i - 1, 1 );
my $score = (split / /, $matrix[ $i - 1 ][ $j - 1 ])[0];
if ( $letter1 eq $letter2 ) {
$diagonal_score = $score + $MATCH;
}
else {
$diagonal_score = $score + $MISMATCH;
}
# calculate gap scores
$up_score = (split / /,$matrix[ $i - 1 ][$j])[0] + $GAP;
$left_score = (split / /,$matrix[$i][ $j - 1 ])[0] + $GAP;
# choose best score
if ( $diagonal_score >= $up_score ) {
if ( $diagonal_score >= $left_score ) {
$matrix[$i][$j] = join ' ',$diagonal_score,'d';
}
else {
$matrix[$i][$j] = join ' ', $left_score, 'l';
}
}
else {
if ( $up_score >= $left_score ) {
$matrix[$i][$j] = join ' ', $up_score, 'u';
}
else {
$matrix[$i][$j] = join ' ', $left_score, 'l';
}
}
# set maximum score
if ( (split / /, $matrix[$i][$j])[0] > $max_score ) {
$max_i = $i;
$max_j = $j;
$max_score = (split / /, $matrix[$i][$j])[0];
}
}
}
my $align1 = "";
my $align2 = "";
my $stseq1;
my $j = $max_j;
my $i = $max_i;
while (1) {
my $pointer = (split / /, $matrix[$i][$j])[1];
if ( $pointer eq "n" ) {
$stseq1 = $j;
last;
}
if ( $pointer eq "d" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
$j--;
}
elsif ( $pointer eq "l" ) {
$align1 .= substr( $seq1, $j - 1, 1 );
$align2 .= "-";
$j--;
}
elsif ( $pointer eq "u" ) {
$align1 .= "-";
$align2 .= substr( $seq2, $i - 1, 1 );
$i--;
}
}
$align1 = reverse $align1;
$align2 = reverse $align2;
untie #matrix; # Don't forget to add your own error handler.
unlink 'temp.txt'; # Don't forget to add your own error handler.
return ( $align1, $align2, $stseq1 ,$max_j);
}
You can still use your original sub for short sequences, and switch to this sub for long ones.
I think that #j_random_hacker and #Ashalynd are on the right track regarding using this algorithm in most Perl implementations. The datatypes you're using are going to use more memory that absolutely needed for the calculations.
So this is "normal" in that you should expect to see this kind of memory usage for how you've written this algorithm in perl. You may have other problems in surrounding code that are using a lot of memory but this algorithm will hit your memory hard with large sequences.
You can address some of the memory issues by changing the datatypes that you're using as #Ashalynd suggests. You could try changing the hash which holds score and pointer into an array and changing the string pointers into integer values. Something like this might get you some benefit while still maintaining readability:
use strict;
use warnings;
# define constants for array positions and pointer values
# so the code is still readable.
# (If you have the "Readonly" CPAN module you may want to use it for constants
# instead although none of the downsides of the "constant" pragma apply in this code.)
use constant {
SCORE => 0,
POINTER => 1,
DIAGONAL => 0,
LEFT => 1,
UP => 2,
NONE => 3,
};
...
sub semiGlobal2 {
my ( $seq1, $seq2,$MATCH,$MISMATCH,$GAP ) = #_;
# initialization: first row to 0 ;
my #matrix;
# score and pointer are now stored in an array
# using the defined constants as indices
$matrix[0][0][SCORE] = 0;
# pointer value is now a constant integer
$matrix[0][0][POINTER] = NONE;
for ( my $j = 1 ; $j <= length($seq1) ; $j++ ) {
$matrix[0][$j][SCORE] = 0;
$matrix[0][$j][POINTER] = NONE;
}
for ( my $i = 1 ; $i <= length($seq2) ; $i++ ) {
$matrix[$i][0][SCORE] = $GAP * $i;
$matrix[$i][0][POINTER] = UP;
}
... # continue to make the appropriate changes throughout the code
However, when I tested this I didn't get a huge benefit when attempting to align a 3600 char string in a 5500 char string of random data. I programmed my code to abort when it consumed more than 2GB of memory. The original code aborted after 23 seconds while the one using constants and an array instead of a hash aborted after 32 seconds.
If you really want to use this specific algorithm I'd check out the performance of Algorithm::NeedlemanWunsch. It doesn't look like it's very mature but it may have addressed your performance issues. Otherwise look into writing an Inline or Perl XS wrapper around a C implementation
I would like to write a Perl function that gets a GFF3 filename and a range (i.e. 100000 .. 2000000). and returns a reference to an array containing all names/accessions of genes found in this range.
I guess using bioperl will make sense, but I have very little experience with it. I can write a script that parses a GFF3 by my self, but if using bioperl (or another packagae) is not too complicated - I'd rather reuse their code.
use Bio::Tools::GFF;
my $range_start = 100000;
my $range_end = 200000;
my #features_in_range = ( );
my $gffio = Bio::Tools::GFF->new(-file => $gff_file, -gff_version => 3);
while (my $feature = $gffio->next_feature()) {
## What about features that are not contained within the coordinate range but
## do overlap it? Such features won't be caught by this check.
if (
($feature->start() >= $range_start)
&&
($feature->end() <= $range_end)
) {
push #features_in_range, $feature;
}
}
$gffio->close();
DISCLAIMER: Naive implementation. I just banged that out, it's had no testing. I won't even guarantee it compiles.
You do want to use BioPerl for this, using possibly the Bio::Tools::GFF module.
You should really ask on the BioPerl mailing list. It's very friendly and the subscribers are very knowledgeable -- they'll definitely be able to help you. And once you do get an answer (and if you don't get one here first), I suggest answering your own question here with the answer so we can all benefit!
The following function takes a hash of targets and ranges and returns a function that will iterate over all targets that overlap any of the ranges. The targets should be a reference to an array of references:
my $targets =
[
[
$start,
$end,
],
...,
]
The ranges should be a reference to an array of hashes:
my $ranges =
[
{
seqname => $seqname,
source => $source,
feature => $feature,
start => $start,
end => $end,
score => $score,
strand => $strand,
frame => $frame,
attribute => $attribute,
},
...,
]
You can, of course, just pass a single target.
my $brs_iterator
= binary_range_search( targets => $targets, ranges => $ranges );
while ( my $gff_line = $brs_iterator->() ) {
# do stuff
}
sub binary_range_search {
my %options = #_;
my $targets = $options{targets} || croak 'Need a targets parameter';
my $ranges = $options{ranges} || croak 'Need a ranges parameter';
my ( $low, $high ) = ( 0, $#{$ranges} );
my #iterators = ();
TARGET:
for my $range (#$targets) {
RANGE_CHECK:
while ( $low <= $high ) {
my $try = int( ( $low + $high ) / 2 );
$low = $try + 1, next RANGE_CHECK
if $ranges->[$try]{end} < $range->[0];
$high = $try - 1, next RANGE_CHECK
if $ranges->[$try]{start} > $range->[1];
my ( $down, $up ) = ($try) x 2;
my %seen = ();
my $brs_iterator = sub {
if ( $ranges->[ $up + 1 ]{end} >= $range->[0]
and $ranges->[ $up + 1 ]{start} <= $range->[1]
and !exists $seen{ $up + 1 } )
{
$seen{ $up + 1 } = undef;
return $ranges->[ ++$up ];
}
elsif ( $ranges->[ $down - 1 ]{end} >= $range->[0]
and $ranges->[ $down - 1 ]{start} <= $range->[1]
and !exists $seen{ $down - 1 }
and $down > 0 )
{
$seen{ $down - 1 } = undef;
return $ranges->[ --$down ];
}
elsif ( !exists $seen{$try} ) {
$seen{$try} = undef;
return $ranges->[$try];
}
else {
return;
}
};
push #iterators, $brs_iterator;
next TARGET;
}
}
# In scalar context return master iterator that iterates over the list of range iterators.
# In list context returns a list of range iterators.
return wantarray
? #iterators
: sub {
while (#iterators) {
if ( my $range = $iterators[0]->() ) {
return $range;
}
shift #iterators;
}
return;
};
}
Do you know an easy and straight-forward method/sub/module which allows me to convert a number (say 1234567.89) to an easily readable form - something like 1.23M?
Right now I can do this by making several comparisons, but I'm not happy with my method:
if($bytes > 1000000000){
$bytes = ( sprintf( "%0.2f", $bytes/1000000000 )). " Gb/s";
}
elsif ($bytes > 1000000){
$bytes = ( sprintf( "%0.2f", $bytes/1000000 )). " Mb/s";
}
elsif ($bytes > 1000){
$bytes = ( sprintf( "%0.2f", $bytes/1000 )). " Kb/s";
}
else{
$bytes = sprintf( "%0.2f", $bytes ). "b/s";
}
Thank you for your help!
The Number::Bytes::Human module should be able to help you out.
An example of how to use it can be found in its synopsis:
use Number::Bytes::Human qw(format_bytes);
$size = format_bytes(0); # '0'
$size = format_bytes(2*1024); # '2.0K'
$size = format_bytes(1_234_890, bs => 1000); # '1.3M'
$size = format_bytes(1E9, bs => 1000); # '1.0G'
# the OO way
$human = Number::Bytes::Human->new(bs => 1000, si => 1);
$size = $human->format(1E7); # '10MB'
$human->set_options(zero => '-');
$size = $human->format(0); # '-'
Number::Bytes::Human seems to do exactly what you want.
sub magnitudeformat {
my $val = shift;
my $expstr;
my $exp = log($val) / log(10);
if ($exp < 3) { return $val; }
elsif ($exp < 6) { $exp = 3; $expstr = "K"; }
elsif ($exp < 9) { $exp = 6; $expstr = "M"; }
elsif ($exp < 12) { $exp = 9; $expstr = "G"; } # Or "B".
else { $exp = 12; $expstr = "T"; }
return sprintf("%0.1f%s", $val/(10**$exp), $expstr);
}
In pure Perl form, I've done this with a nested ternary operator to cut on verbosity:
sub BytesToReadableString($) {
my $c = shift;
$c >= 1073741824 ? sprintf("%0.2fGB", $c/1073741824)
: $c >= 1048576 ? sprintf("%0.2fMB", $c/1048576)
: $c >= 1024 ? sprintf("%0.2fKB", $c/1024)
: $c . "bytes";
}
print BytesToReadableString(225939) . "/s\n";
Outputs:
220.64KB/s
This snippet is in PHP, and it's loosely based on some example someone else had on their website somewhere (sorry buddy, I can't remember).
The basic concept is instead of using if, use a loop.
function formatNumberThousands($a,$dig)
{
$unim = array("","k","m","g");
$c = 0;
while ($a>=1000 && $c<=3) {
$c++;
$a = $a/1000;
}
$d = $dig-ceil(log10($a));
return number_format($a,($c ? $d : 0))."".$unim[$c];
}
The number_format() call is a PHP library function which returns a string with commas between the thousands groups. I'm not sure if something like it exists in perl.
The $dig parameter sets a limit on the number of digits to show. If $dig is 2, it will give you 1.2k from 1237.
To format bytes, just divide by 1024 instead.
This function is in use in some production code to this day.