Progress line in perl - perl

I would like to create a very simple progressbar for my script. So far I've got this, and it works. However, I cannot get it to be a percentage out of 100. My code is the following and it produces basically a dot for every 5 entries in #entries.
my $total_entries = #entries;
my $count = 0;
my $count_tens = $total_entries/0.2;
$count_tens = sprintf ('%d',$count_tens);
foreach (#entries){
# do some stuff #
for (1 .. $total_entries){
if ($count == $count_tens){
print ".";
$count = 0;
}
$count++;
}
}
I would like to have something that produces always a fixed amount of dots, regardless of the total number of entries in #entries.

Let's say we want 80 dots. Then:
my $number_of_dots = 80;
my #items = 0 .. 20; # or something
my $items_per_dot = #items / $number_of_dots;
STDOUT->autoflush(1); # print everything out immediately
for my $i (0 .. $#items) {
my $dots = $i / $items_per_dot;
print "\r", "." x $dots;
sleep 1; # do something
}
print "\n";
Note that we avoid rounding errors by calculating the number of dots per item anew on each iteration. The \r will move the cursor to the start of the line, so the existing dots will be overwritten each time. You can easily skip the printing if the $dots value doesn't change between iterations.

Rather than rewriting the wheel, you may want to use existing code that has already been written, tested and debugged.
http://metacpan.org/pod/Term::ProgressBar

Related

if-clause seems not to be executed under certain arbitrary conditions

I am trying to divide a triangular matrix into parts, which hold approximately the same amount of elements.
I wrote the following code, which works nicely for most combinations of inputs, and segments my matrix into the given number of parts from 0 to $length.
However, there are input combinations like $length = 2003 and $number_of_segments = 50, where the last segment is missing in the output.
I tested the values of $threshold and $total, but they seem to be correct even in those odd cases.
Do you have any ideas, where the bug is?
#!/usr/bin/perl
use strict; #should always be used
use warnings; #that one too
use autodie; #just in case I forgot to check anything
my $length = shift or die "ERROR: Not enough arguments!\n"; #number of rows in the matrix
my $number_of_segments = shift or die "ERROR: Not enough arguments!\n"; #number of segments we want to get
my #segments = รท #array of segment-limits
print "$_\n" foreach #segments;
sub divide {
my #segments = (0); #the first segment starts at 0
my $number_of_pairs = ($length*($length+1))/2; #number of elements in matrix
my $total = 0; #counter for the elements we already visited
my $segment_counter = 1; #we are in the first segment
for (my $i=0; $i<$length; $i++){ #going over the rows of the matrix
$total += $length-$i; #counting the elements in each row
my $threshold = ($number_of_pairs/$number_of_segments)*$segment_counter; #threshold for the next segment
if ($total >= $threshold){ #if our current segment is large enough
push #segments, $i+1; #save the limit
$segment_counter++; #and open the next segment
}
}
return #segments;
}
The problem is that you can't generally compare floating-point numbers for equality because of their limited accuracy. The final value of $threshold comes out fractionally high (2007006.0000000002 on my 32-bit Perl) so you have to allow for a margin of error.
If you change the test to
if ( $total + 1E-8 >= $threshold ) { ... }
then you will get the results you expect. You may have to adjust the delta value to get the correct results.
Note that this is a very slow and inaccurate way of doing things. You should really keep all the arithmetic to integers instead of floating point values, but I don't have time at present to refactor your code

Overwriting a print statement with one that is shorter in length

I am trying to overwrite a print statement in a foreach loop to create somewhat of a progress bar.
What I am doing:
my $arraySize = #listOfIps;
local $| = 1;
my $counter = 0;
my $progressString;
print 'Progress: ';
foreach my $ip (#listOfIps) {
$counter++;
print "\b" x length($progressString) if defined $progressString;
$progressString = "\r$counter / $arraySize - Working on $ip";
print $progressString;
#does stuff here but thats irrelevant to the problem
}
The problem I am having is that when the foreach loop gets to an IP that is shorter than the previous one it has printed eg 10.0.0.1 it still displays the extra characters left over from the previous longer print statement.
The problem:
Progress: 3 / 10 - Working on 200.144.223.211
then overwriting this print statement with the next smaller ip address in the array gives:
Progress: 4 / 10 - Working on 10.0.0.1223.211
and so on... when actually it should print just :
Progress: 4 / 10 - Working on 10.0.0.1
so that it does not have any of the characters from the previous print left over.
There must be something really obvious I am overlooking here, as I can't see any reason why this would not be working.
Use printf with a format that pads the IP string with spaces:
printf "\r$counter / $arraySize - Working on %-15s", $ip;
Adapt the number 15 to the length of your longest IP. If you have a variable, you can use it in the format string like so (thanks amon for sharing):
printf ".... Working on %-*s", $length, $ip;
Though that is quite excessive, since you can control all variable length strings with the printf:
printf "\r%-*s / %-*s - Working on %-*s", 5, $counter, 5, $arraySize, 15, $ip;
The \b escape is apparently a non-destructive backspace, which does not delete, according to this answer. Which would mean that it does just about the same as \r in your case.

Perl: increment 2d array cell?

I have a set of numerical data for which is important to me to know what pairs of numbers occurred together, and how many times. Each set of data contain 7 numbers betwen 1 and 20. There are several hundred sets of data.
Essentially, by parsing each set of my data, I want to create a 20 x 20 array that I can use to keep a count of when pairs of numbers occurred together.
I have done a lot of searching, but maybe I've used the wrong key words. I've seen loads of examples how to create a "2D array" - I know perl doesn't actually do that, and that it's really an array of references - and to print the values contained therein, but nothing really on how to work with one particular cell by number and alter it.
Below is my conceptual code. The commented lines don't work, but illustrate what I want to achieve. I'm reasonably new to coding perl, and this just seems to advanced for me to understand the examples I've seen and translate it into something I can actually use.
my #datapairs;
while (<DATAFILE>)
{
chomp;
my #data = split(",",$_);
for ($prcount=0; $prcount <=5; $prcount++)
{
for ($othcount=($prcount+1); $othcount<=6; $othcount++)
{
#data[$prcount]=#data[$prcount]+1;
#data[$othcount]=#data[$othcount]+1;
#data[$prcount]=#data[$prcount]-1;
#data[$othcount]=#data[$othcount]-1;
print #data[$prcount]." ".#data[$othcount]."; ";
##datapairs[#data[$prcount]][#data[$othcount]]++;
##datapairs[#data[$othcount]][#data[$prcount]]++;
}
}
}
Any input or suggestions would be much appreciated.
To access a "cell" in a "2-d array" in Perl (as you alredy figured out, it's an array of arrayrefs), is simple:
my #datapairs;
# Add 1 for a pair with indexes $i and $j
$datapairs[$i]->[$j]++;
print that value
print "$datapairs[$i]->[$j]\n";
It's not clear what you mean by "occur together" - if you mean "in the same length-7 array", it's easy:
my #datapairs;
while (<DATAFILE>) {
chomp;
my #data = split(",", $_);
for (my $prcount = 0; $prcount <= 5; $prcount++) {
for (my $othcount = $prcount + 1; $othcount <=6 ; $othcount++) {
$datapairs[ $data[$prcount] ]->[ $data[$othcount] ]++;
}
}
}
# Print
for (my $i = 0; $i < 20; $i++) {
for (my $j = 0; $j < 20; $j++) {
print "$datapairs[$i]->[$j], ";
}
print "\n";
}
As a side note, personally, just for stylistic reasons, I strongly prefer to reference EVERYTHING, e.g. use arrayref of arrayrefs instead of array of arrays. E.g.
my $datapairs;
# Add 1 for a pair with indexes $i and $j
$datapairs->[$i]->[$j]++;
print that value
print "$datapairs->[$i]->[$j]\n";
The second (and third...) arrow dereference operator is optional in Perl but I personally find it significantly more readable to enforce its usage - it spaces out the index expressions.

Perl need the right grep operator to match value of variable

I want to see if I have repeated items in my array, there are over 16.000 so will automate it
There may be other ways but I started with this and, well, would like to finish it unless there is a straightforward command. What I am doing is shifting and pushing from one array into another and this way, check the destination array to see if it is "in array" (like there is such a command in PHP).
So, I got this sub routine and it works with literals, but it doesn't with variables. It is because of the 'eq' or whatever I should need. The 'sourcefile' will contain one or more of the words of the destination array.
// Here I just fetch my file
$listamails = <STDIN>;
# Remove the newlines filename
chomp $listamails;
# open the file, or exit
unless ( open(MAILS, $listamails) ) {
print "Cannot open file \"$listamails\"\n\n";
exit;
}
# Read the list of mails from the file, and store it
# into the array variable #sourcefile
#sourcefile = <MAILS>;
# Close the handle - we've read all the data into #sourcefile now.
close MAILS;
my #destination = ('hi', 'bye');
sub in_array
{
my ($destination,$search_for) = #_;
return grep {$search_for eq $_} #$destination;
}
for($i = 0; $i <=100; $i ++)
{
$elemento = shift #sourcefile;
if(in_array(\#destination, $elemento))
{
print "it is";
}
else
{
print "it aint there";
}
}
Well, if instead of including the $elemento in there I put a 'hi' it does work and also I have printed the value of $elemento which is also 'hi', but when I put the variable, it does not work, and that is because of the 'eq', but I don't know what else to put. If I put == it complains that 'hi' is not a numeric value.
When you want distinct values think hash.
my %seen;
#seen{ #array } = ();
if (keys %seen == #array) {
print "\#array has no duplicate values\n";
}
It's not clear what you want. If your first sentence is the only one that matters ("I want to see if I have repeated items in my array"), then you could use:
my %seen;
if (grep ++$seen{$_} >= 2, #array) {
say "Has duplicates";
}
You said you have a large array, so it might be faster to stop as soon as you find a duplicate.
my %seen;
for (#array) {
if (++$seen{$_} == 2) {
say "Has duplicates";
last;
}
}
By the way, when looking for duplicates in a large number of items, it's much faster to use a strategy based on sorting. After sorting the items, all duplicates will be right next to each other, so to tell if something is a duplicate, all you have to do is compare it with the previous one:
#sorted = sort #sourcefile;
for (my $i = 1; $i < #sorted; ++$i) { # Start at 1 because we'll check the previous one
print "$sorted[$i] is a duplicate!\n" if $sorted[$i] eq $sorted[$i - 1];
}
This will print multiple dupe messages if there are multiple dupes, but you can clean it up.
As eugene y said, hashes are definitely the way to go here. Here's a direct translation of the code you posted to a hash-based method (with a little more Perlishness added along the way):
my #destination = ('hi', 'bye');
my %in_array = map { $_ => 1 } #destination;
for my $i (0 .. 100) {
$elemento = shift #sourcefile;
if(exists $in_array{$elemento})
{
print "it is";
}
else
{
print "it aint there";
}
}
Also, if you mean to check all elements of #sourcefile (as opposed to testing the first 101 elements) against #destination, you should replace the for line with
while (#sourcefile) {
Also also, don't forget to chomp any values read from a file! Lines read from a file have a linebreak at the end of them (the \r\n or \n mentioned in comments on the initial question), which will cause both eq and hash lookups to report that otherwise-matching values are different. This is, most likely, the reason why your code is failing to work correctly in the first place and changing to use sort or hashes won't fix that. First chomp your input to make it work, then use sort or hashes to make it efficient.

How is this Perl code selecting two different elements from an array?

I have inherited some code from a guy whose favorite past time was to shorten every line to its absolute minimum (and sometimes only to make it look cool). His code is hard to understand but I managed to understand (and rewrite) most of it.
Now I have stumbled on a piece of code which, no matter how hard I try, I cannot understand.
my #heads = grep {s/\.txt$//} OSA::Fast::IO::Ls->ls($SysKey,'fo','osr/tiparlo',qr{^\d+\.txt$}) || ();
my #selected_heads = ();
for my $i (0..1) {
$selected_heads[$i] = int rand scalar #heads;
for my $j (0..#heads-1) {
last if (!grep $j eq $_, #selected_heads[0..$i-1]);
$selected_heads[$i] = ($selected_heads[$i] + 1) % #heads; #WTF?
}
my $head_nr = sprintf "%04d", $i;
OSA::Fast::IO::Cp->cp($SysKey,'',"osr/tiparlo/$heads[$selected_heads[$i]].txt","$recdir/heads/$head_nr.txt");
OSA::Fast::IO::Cp->cp($SysKey,'',"osr/tiparlo/$heads[$selected_heads[$i]].cache","$recdir/heads/$head_nr.cache");
}
From what I can understand, this is supposed to be some kind of randomizer, but I never saw a more complex way to achieve randomness. Or are my assumptions wrong? At least, that's what this code is supposed to do. Select 2 random files and copy them.
=== NOTES ===
The OSA Framework is a Framework of our own. They are named after their UNIX counterparts and do some basic testing so that the application does not need to bother with that.
This looks like some C code with Perl syntax. Sometimes knowing the language the person is thinking in helps you figure out what's going on. In this case, the person's brain is infected with the inner workings of memory management, pointer arithmetic, and other low level concerns, so he wants to minutely control everything:
my #selected_heads = ();
# a tricky way to make a two element array
for my $i (0..1) {
# choose a random file
$selected_heads[$i] = int rand #heads;
# for all the files (could use $#heads instead)
for my $j (0..#heads-1) {
# stop if the chosen file is not already in #selected_heads
# it's that damned ! in front of the grep that's mind-warping
last if (!grep $j eq $_, #selected_heads[0..$i-1]);
# if we are this far, the two files we selected are the same
# choose a different file if we're this far
$selected_heads[$i] = ($selected_heads[$i] + 1) % #heads; #WTF?
}
...
}
This is a lot of work because the original programmer either doesn't understand hashes or doesn't like them.
my %selected_heads;
until( keys %selected_heads == 2 )
{
my $try = int rand #heads;
redo if exists $selected_heads{$try};
$selected_heads{$try}++;
}
my #selected_heads = keys %selected_heads;
If you still hate hashes and have Perl 5.10 or later, you can use smart-matching to check if a value is in an array:
my #selected_heads;
until( #selected_heads == 2 )
{
my $try = int rand #heads;
redo if $try ~~ #selected_heads;
push #selected_heads, $try;
}
However, you have a special constraint on this problem. Since you know there are only two elements, you just have to check if the element you want to add is the prior element. In the first case it won't be undef, so the first addition always works. In the second case, it just can't be the last element in the array:
my #selected_heads;
until( #selected_heads == 2 )
{
my $try = int rand #heads;
redo if $try eq $selected_heads[-1];
push #selected_heads, $try;
}
Huh. I can't remember the last time I used until when it actually fit the problem. :)
Note that all of these solutions have the problem that they can cause an infinite loop if the number of original files is less than 2. I'd add a guard condition higher up so the no and single file cases through an error and perhaps the two file case doesn't bother to order them.
Another way you might do this is to shuffle (say, with List::Util) the entire list of original files and just take off the first two files:
use List::Util qw(shuffle);
my #input = 'a' .. 'z';
my #two = ( shuffle( #input ) )[0,1];
print "selected: #two\n";
It selects a random element from #heads.
Then it adds on another random but different element from #heads (if it is the element previously selected, it scrolls through #heads till it find an element not previously selected).
In summary, it selects N (in your case N=2) different random indexes in #heads array and then copies files corresponding to those indexes.
Personally I would write it a bit differently:
# ...
%selected_previously = ();
foreach my $i (0..$N) { # Generalize for N random files instead of 2
my $random_head_index = int rand scalar #heads;
while ($selected_previously[$random_head_index]++) {
$random_head_index = $random_head_index + 1) % #heads; # Cache me!!!
}
# NOTE: "++" in the while() might be considered a bit of a hack
# More readable version: $selected_previously[$random_head_index]=1; here.
The part you labeled "WTF" isn't so troubling, it's just simply making sure that $selected_heads[$i] remains as a valid subscript of #head. The really troubling part is that it is a pretty inefficient way of making sure he's not selecting the same file.
Then again, if the size of #heads is small, stepping from 0..$#heads is probably more efficient than just generating int rand( 2 ) and testing if they are the same.
But basically it copies two files at random (why?) as a '.txt' file and a '.cache' file.
How about just
for my $i (0..1) {
my $selected = splice( #heads, rand #heads, 1 );
my $head_nr = sprintf "%04d", $i;
OSA::Fast::IO::Cp->cp($SysKey,'',"osr/tiparlo/$selected.txt","$recdir/heads/$head_nr.txt");
OSA::Fast::IO::Cp->cp($SysKey,'',"osr/tiparlo/$selected.cache","$recdir/heads/$head_nr.cache");
}
unless #heads or #selected_heads are used later.
Here's yet another way to select 2 unique random indices:
my #selected_heads = ();
my #indices = 0..$#heads;
for my $i (0..1) {
my $j = int rand (#heads - $i);
push #selected_heads, $indices[$j];
$indices[$j] = $indices[#heads - $i - 1];
}