C-style loops vs Perl loops (in Perl) - perl

I feel like there is something I don't get about perl's looping mechanism.
It was my understanding that
for my $j (0 .. $#arr){...}
was functionally equivalent to:
for(my $i=0; $i<=$#arr; $i++){..}
However, in my code there seems to be some slight differences in the way they operate. specifically, the time in which they decide when to terminate. for example:
assume #arr is initialized with one variable in it.
These two blocks should do the same thing right?
for my $i (0 .. $#arr)
{
if(some condition that happens to be true)
{
push(#arr, $value);
}
}
and
for (my $i=0; $i<=$#arr; $i++)
{
if(some condition that happens to be true)
{
push(#arr, $value);
}
}
In execution however, even though a new value gets pushed in both cases, the first loop will stop after only one iteration.
Is this supposed to happen? if so why?
EDIT: Thank you for all of your answers, I am aware I can accomplish the same thing with other looping mechanisms. when I asked if there was a another syntax, I was specifically talking about using for. Obviously there isn't. as the syntax to do what I want is already achieved with the c style. I was only asking because I was told to avoid the c style but i still like my for loops.

$i<=$#arr is evaluated before each loop while (0 .. $#arr) is evaluated once before any loop.
As such, the first code doesn't "see" the changes to the #arr size.

Is there another syntax I can use that would force the evaluation after each iteration? (besides using c-style)
for (my $i=0; $i<=$#arr; $i++) {
...
}
is just another way of writing
my $i=0;
while ($i<=$#arr) {
...
} continue {
$i++;
}
(Except the scope of $i is slightly different.)

An alternative would be the do-while construct, although it is a little ungainly.
my $i;
do {
push #arr, $value if condition;
} while ( $i++ < #arr );

Related

Reducing code verbosity and efficiency

I came across the below where some heavy stipulations were done, finally we got a number of #hits and we need to return just one:
if ($#hits > 0)
{
my $highestScore = 0;
my $chosenMatch = "";
for $hit (#hits)
{
my $currScore = 0;
foreach $k (keys %{$hit})
{
next if $k eq $retColumn;
$currScore++ if ($hit->{$k} =~ /\S+/);
}
if ($currScore > $highestScore)
{
$chosenMatch = $hit;
$highestScore = $currScore;
}
}
return ($chosenMatch);
}
elsif ($#hits == 0)
{
return ($hits[0]);
}
That's an eye full and I was hoping to simplify the above code, I came up with:
return reduce {grep /\S+/, values %{$a} > grep /\S+/, values %{$b} ? $a : $b} #matches;
After of using of course useing, List::Util
I wonder if the terse version is any efficient and/or advantage over the original one. Also, there's one condition that's skipped: if $k eq $retColumn, how can I efficiently get that in?
There is a famous quote:
"Premature optimisation is the root of all evil" - Donald Knuth
It is almost invariably the case that making code more concise really doesn't make much difference to the efficiency, and can cause significant penalties to readability and maintainability.
Algorithm is important, code layout ... isn't really. Things like reduce, map and grep are still looping - they're just doing so behind the scenes. You've gained almost no efficiency by using them, you've just saved some bytes in your file. That's fine if they make your code more clear, but that should be your foremost consideration.
Please - keep things clear first, foremost and always. Make your algorithm good. Don't worry about replacing an explicit loop with a grep or map unless these things make your code clearer.
And in the interests of being constructive:
use strict and warnings is really important. Really really important.
To answer your original question:
I wonder if the terse version is any efficient and/or advantage over the original one
No, I think if anything the opposite. Short of profiling code speed, the rule of thumb is look at number and size of loops - a single chunk of code rarely makes much difference, but running it lots and lots of times (unnecessarily) is where you get your inefficiency.
In your first example - you have two loops, a foreach loop inside a for loop. It looks like you traverse your #hits data structure once, and 'unwrap' it to get at the inner layers.
In your second example, both your greps are loops, and your reduce is as well. If I'm reading it correctly, then it'll be traversing your data structure multiple times. (Because you are greping values $a and $b - these will be applied several times).
So I don't think you have gained either readability or efficiency by doing what you've done. But you have made a function that's going to make future maintenance programmers have to think really hard. To take another quote:
"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" - Brian Kernighan
I wonder if the terse version is any efficient and/or advantage over the original one
The terse version is less efficient than the original because it calculates the score of every element twice, but it does have readability advantages.
The following keeps the readability gain (and even adds some):
sub get_score {
my ($match) = #_;
my #keys = grep { $_ ne $retColumn } keys %$match;
my $score = grep { /\S/ } #{$match}{ #keys };
return $score;
}
return reduce { get_score($a) > get_score($b) ? $a : $b } #matches;
You can look at any part of that sub and understand it without looking around. The least context you need to understand code, the more readable it is.
If you did need an efficiency boost, you can avoid calling get_score on every input twice by using a Schwartzian Transform. As with many optimizations, you will take a readability hit, but at least it's idiomatic (well known and thus well recognizable).
return
map { $_->[0] }
reduce { $a->[1] > $b->[1] ? $a : $b }
map { [ $match, get_score($match) ] }
#matches;

Perl explicit package in nested loops

I declared this array:
my #array
And in this code block...
for (my $i=0; $i<$j; $i++) {
do {
// stdout operations
} while (! ($arr != 1 ));
}
The error is specifically in the } while (! ($arr != 1 )); line.
Here's the specific error:
Global symbol "$arr" requires explicit package name at exer4bernal.pl line 71.
Why do I have this problem in 2 levels of nested loops? I never had this in only 1 level. What should I change to fix this? Thanks!
What you are seeing is totally unrelated to nesting of loop. What Perl is trying to tell you is that it doesn't know about the $arr variable. Did you mean #array or $#array?
Normally, you shouldn't be using do...while blocks. What is $arr? Where is that value declared? Where is it changed in your while loop? What is $j?
Actually, what are you trying to do with a double loop? This is usually considered bad programming because when you move from processing x elements to y elements, you increase your processing time by y2- (x * y);
Maybe this is more what you mean?
for my $index ( (0..$#array) ) {
next if $array[$index] = 1;
...
}
Note I got rid of that ugly C Style for loop and replaced it with one that's easier to understand.

Perl need the right grep operator to match value of variable

I want to see if I have repeated items in my array, there are over 16.000 so will automate it
There may be other ways but I started with this and, well, would like to finish it unless there is a straightforward command. What I am doing is shifting and pushing from one array into another and this way, check the destination array to see if it is "in array" (like there is such a command in PHP).
So, I got this sub routine and it works with literals, but it doesn't with variables. It is because of the 'eq' or whatever I should need. The 'sourcefile' will contain one or more of the words of the destination array.
// Here I just fetch my file
$listamails = <STDIN>;
# Remove the newlines filename
chomp $listamails;
# open the file, or exit
unless ( open(MAILS, $listamails) ) {
print "Cannot open file \"$listamails\"\n\n";
exit;
}
# Read the list of mails from the file, and store it
# into the array variable #sourcefile
#sourcefile = <MAILS>;
# Close the handle - we've read all the data into #sourcefile now.
close MAILS;
my #destination = ('hi', 'bye');
sub in_array
{
my ($destination,$search_for) = #_;
return grep {$search_for eq $_} #$destination;
}
for($i = 0; $i <=100; $i ++)
{
$elemento = shift #sourcefile;
if(in_array(\#destination, $elemento))
{
print "it is";
}
else
{
print "it aint there";
}
}
Well, if instead of including the $elemento in there I put a 'hi' it does work and also I have printed the value of $elemento which is also 'hi', but when I put the variable, it does not work, and that is because of the 'eq', but I don't know what else to put. If I put == it complains that 'hi' is not a numeric value.
When you want distinct values think hash.
my %seen;
#seen{ #array } = ();
if (keys %seen == #array) {
print "\#array has no duplicate values\n";
}
It's not clear what you want. If your first sentence is the only one that matters ("I want to see if I have repeated items in my array"), then you could use:
my %seen;
if (grep ++$seen{$_} >= 2, #array) {
say "Has duplicates";
}
You said you have a large array, so it might be faster to stop as soon as you find a duplicate.
my %seen;
for (#array) {
if (++$seen{$_} == 2) {
say "Has duplicates";
last;
}
}
By the way, when looking for duplicates in a large number of items, it's much faster to use a strategy based on sorting. After sorting the items, all duplicates will be right next to each other, so to tell if something is a duplicate, all you have to do is compare it with the previous one:
#sorted = sort #sourcefile;
for (my $i = 1; $i < #sorted; ++$i) { # Start at 1 because we'll check the previous one
print "$sorted[$i] is a duplicate!\n" if $sorted[$i] eq $sorted[$i - 1];
}
This will print multiple dupe messages if there are multiple dupes, but you can clean it up.
As eugene y said, hashes are definitely the way to go here. Here's a direct translation of the code you posted to a hash-based method (with a little more Perlishness added along the way):
my #destination = ('hi', 'bye');
my %in_array = map { $_ => 1 } #destination;
for my $i (0 .. 100) {
$elemento = shift #sourcefile;
if(exists $in_array{$elemento})
{
print "it is";
}
else
{
print "it aint there";
}
}
Also, if you mean to check all elements of #sourcefile (as opposed to testing the first 101 elements) against #destination, you should replace the for line with
while (#sourcefile) {
Also also, don't forget to chomp any values read from a file! Lines read from a file have a linebreak at the end of them (the \r\n or \n mentioned in comments on the initial question), which will cause both eq and hash lookups to report that otherwise-matching values are different. This is, most likely, the reason why your code is failing to work correctly in the first place and changing to use sort or hashes won't fix that. First chomp your input to make it work, then use sort or hashes to make it efficient.

Is there a cleaner way to conditionally 'last' out of this Perl loop?

Not really knowing Perl, I have been enhancing a Perl script with help from a friendly search engine.
I find that I need to break out of a loop while setting a flag if a condition comes true:
foreach my $element (#array) {
if($costlyCondition) {
$flag = 1;
last;
}
}
I know that the nicer way to use 'last' is something like this:
foreach my $element (#array) {
last if ($costlyCondition);
}
Of course, this means that while I can enjoy the syntactic sugar, I cannot set my flag inside the loop, which means I need to evaluate $costlyCondition once again outside.
Is there a cleaner way to do this?
you can use a do {...} block:
do {$flag = 1; last} if $costlyCondition
you can use the , operator to join the statements:
$flag = 1, last if $costlyCondition;
you can do the same with the logical && operator:
(($flag = 1) && last) if $costlyCondition;
or even the lower priority and:
(($flag = 1) and last) if $costlyCondition;
at the end of the day, there's no real reason to do any of these. They all do exactly the same as your original code. If your original code works and is legible, leave it like it is.
I agree with Nathan, that while neat looking code is neat, sometimes a readable version is better. Just for the hell of it, though, here's a horrible version:
last if $flag = $costly_condition;
Note the use of assignment = instead of equality ==. The assignment will return whatever value is in $costly_condition.
This of course will not make $flag = 1, but whatever $costly_condition is. But, since that needs to be true, so will $flag. To remedy that, you can - as Zaid mentioned in the comments - use:
last if $flag = !! $costly_condition;
As mentioned, pretty horrible solutions, but they do work.
One thought is to do the loop in a subroutine that returns different values depending on the exit point.
my $flag = check_elements(\#array);
# later...
sub check_elements {
my $arrayref = shift;
for my $ele (#$arrayref) {
return 1 if $costly_condition;
}
return 0;
}
This is possible, but highly not recommended: such tricks decrease readability of your code.
foreach my $element (#array) {
$flag = 1 and last if $costlyCondition;
}

Perl programming: continue block

I have just started learning Perl scripting language and have a question.
In Perl, what is the logical reason for having continue block work with while and do while loops, but not with for loop?
From http://perldoc.perl.org/functions/continue.html
If
there is a continue BLOCK attached to
a BLOCK (typically in a while or
foreach ), it is always executed just
before the conditional is about to be
evaluated again, just like the third
part of a for loop in C.
Meaning that in the for loop, the third argument IS the continue expression, e.g. for (initialization; condition; continue), so therefore it is not needed. On the other hand, if you use for in the foreach style, such as:
for (0 .. 10) {
print "$i\n";
} continue { $i++ }
It will be acceptable.
I suspect that the continue block isn't used in for loops since it is exactly equivalent to the for loop's 3rd expression (increment/decrement, etc.)
eg. the following blocks of code are mostly equivalent:
for ($i = 0; $i < 10; $i++)
{
}
$i = 0;
while ($i < 10)
{
}
continue
{
$i++;
}
You can use a continue block everywhere it makes sense: with while, until and foreach loops, as well as 'basic' blocks -- blocks that aren't part of another statement. Note that you can use the keyword for instead of foreach for the list iteration construct, and of course you can have a continue block in that case.
As everybody else said, for (;;) loops already have a continue part -- which one would you want to execute first?
continue blocks also don't work with do { ... } while ... because syntactically that's a very different thing (do is a builtin function taking a BLOCK as its argument, and the while part is a statement modifier). I suppose you could use the double curly construct with them (basic block inside argument block), if you really had to:
do {
{
...;
continue if $bad;
...;
}
continue {
...; # clean up
}
} while $more;