Perl my and our variable in for loop gives misleading results [duplicate] - perl

Output should be aaabbbcccdddeee, not aaa555234dddeee
perl -wle'
map { for (my $i =2; $i <5; $i++) { push #a, $_ } } "a".."e";
print #a
'
aaa555234dddeee
expected behavior only when referencing/dereferencing $_ variable,
perl -wle'
map { for (my $i =2; $i <5; $i++) { push #a, ${\$_} } } "a".."e";
print #a
'
aaabbbcccdddeee
Is this a documented bug (or feature)?

Looks to me like this is an instance of a bug reported just last month (but that has been around for 12 years): https://rt.perl.org/Public/Bug/Display.html?id=123285
It is fixed but I don't believe in any released version yet.

Related

Variable Scope outside foreach loop Perl

Here is the problem:
Generating 10 iterations of 50 iterations and accessing the 50 character string outside the inner foreach loop.
I have tried putting the 50x iteration inside a sub function and calling it, but that was unsuccessful.
Thus far, I only get a single character outside the foreach loop whether it's in a sub function or not. I'm fairly certain this is a scope issue that I'm failing to see.
So, code:
#!/usr/bin/perl
use strict;
use warnings;
my #dna = ('A','G','T','C');
my $i;
my $str;
for ($i=1; $i<11; $i++){
#print $i . " ";
foreach(1..50){
my $nt = int(rand $#dna + 1);
$str = $dna[$nt];
#correct here all 50 nts
print $str;
}
#single nt here
#print $str;
print "\n";
}
Output: Corerct, but I need to access $str as is below but outside the foreach loop and within the first for loop.
TGATTAGCGTCCGCGCGTATTGTATTAAGCCACAGAATGTAATGCCAAGA
GCTATAGGAAGACGCCGATCCCTGGACCGGCACAGGCACGGTAACAGCAG
TTGTTGTAGGATCCCAGGGAGCGAAGCACGTGAACTGCGACTAATTTCAA
TAACCAGGCAACACTAAACAGCTCCCATGTGTAAGGACGTATAGGCAGTT
GTAATTGTAGATCACAAAATTTACACGGTATAGCATTAACTGGAACCTGC
AACAGTGCCGTTTATTAATCTCCTCTAGTGTAGGGACGAATCGACCACGG
CGTGAGCAAGCACAAATATCCTTTAGGGGTGTGCTTAAAACACCCAGTAG
GAGTTCATAGGCCAACAATATGGCAAAGCCTTGCCCCATCAAATTCGGCG
TTGCGTCTGCGAACACTGTTGGTGTGCCTTTAGTGCGGGTTACTCGAGAA
CGCGATCTCCGTTTATAACGCTAGCAAACTACTACGGACCGAGGCATCGC
I removed the extra space in the string. It was superfluous.
This was another attempt at getting to the variable to no avail:
use strict;
use warnings;
my $str;
my #dna = ('A','G','T','C');
for (my $i=1; $i<11; $i++){
fifty();
print $str;
}
sub fifty {
foreach (1 .. 50){
my $nt = int(rand $#dna + 1);
$str = $dna[$nt];
return $str;
}
}
for (my $i=1; $i<11; $i++){
fifty();
Infiftyyou return something but you discard ist, as you do no assignement like $str= fifty();
print $str;
}
And here you print something that has no value yet as it seems - in fact you assign a value in fifty- but you shouldn't use global variables.
sub fifty {
foreach (1 .. 50){
my $nt = int(rand $#dna + 1);
$str = $dna[$nt];
Here you discard whatever is in $str and assign one letter instead. Also you assign to a global variable - which you should avoid.
return $str;
}
}
And here you directly leave fifty and return just the one character - which you (see above) discard.
I found this to work perfectly: Turns out to be scope as far as I could tell and not sure why I was stuck. Regardless, moving on now.
#!/usr/bin/perl
use strict;
use warnings;
my #dna = ('A','G','T','C');
my $i;
my $str;
for ($i=1; $i<11; $i++){
my $filename = "seq_" . $i;
open(my $OUT, '>', $filename) or die("Can't open $filename($!)");
foreach(1..50){
my $nt = int(rand $#dna + 1);
$str = $dna[$nt];
print $OUT $str;
}
close $filename;
}

perl cookbook fixstyle2 perplexed by & 1

From once again perl cookbook, I know what this program does and I understand most of it but below code is escapes me.
It is using basically if else but what is ( $i++ & 1 ) mean??
#!/usr/bin/perl -w
# fixstyle2 - like fixstyle but faster for many many matches
use strict;
my $verbose = (#ARGV && $ARGV[0] eq '-v' && shift);
my %change = ();
while (<DATA>) {
chomp;
my ($in, $out) = split /\s*=>\s*/;
next unless $in && $out;
$change{$in} = $out;
}
if (#ARGV) {
$^I = ".orig";
} else {
warn "$0: Reading from stdin\n" if -t STDIN;
}
while (<>) {
my $i = 0;
s/^(\s+)// && print $1; # emit leading whitespace
for (split /(\s+)/, $_, -1) { # preserve trailing whitespace
print( ($i++ & 1) ? $_ : ($change{$_} || $_));
}
}
__END__
analysed analyzed
$i++ returns the value of $i and increments $i afterwards. & is the "bitwise and" operator, so it takes the before mentioned value of $i and checks its last bit (as 1 in binary is 00..01).
As $i is incremented by 1 in each iteration, in binary its last bit changes from 1 to 0 and vice versa in each step, therefore the expression just determines odd versus even words.

What am I not getting about foreach loops?

It was always my understanding that
foreach (#arr)
{
....
}
and
for(my $i=0; $i<#arr; $i++)
{
.....
}
were functionally equivalent.
However, in all of my code, whenever I use a foreach loop I run into problems that get fixed when I change to a for loop. It always has to do with comparing the values of two things, usually with nested loops.
Here is an example:
for(my $i=0; $i<#files; $i++)
{
my $sel;
foreach (#selected)
{
if(files[$i] eq selected[$_])
{
$selected='selected';
}
}
<option value=$Files[$i] $sel>$files[$i]</option>
}
The above code falls between select tags in a cgi program.
Basically I am editing the contents of a select box according to user specifications.
But after they add or delete choices I want the choices that were origionally selected to remain selected.
The above code is supposed to accomplish this when reassembling the select on the next form. However, with the foreach version it only gets the first choice that's selected and skips the rest. If I switch it to a 3 part for loop, without changing anything else, it will work as intended.
This is only a recent example, so clearly I am missing something here, can anyone help me out?
Let's assume that #files is a list of filenames.
In the following code, $i is the array index (i.e. it's an integer):
for (my $i=0; $i<#files; $i++) { ... }
In the following code, $i is set to each array item in turn (i.e. it's a filename):
foreach my $i (#files) { ... }
So for example:
use strict;
use warnings;
my #files = (
'foo.txt',
'bar.txt',
'baz.txt',
);
print "for...\n";
for (my $i=0; $i<#files; $i++) {
print "\$i is $i.\n";
}
print "foreach...\n";
foreach my $i (#files) {
print "\$i is $i.\n";
}
Produces the following output:
for...
$i is 0.
$i is 1.
$i is 2.
foreach...
$i is foo.txt.
$i is bar.txt.
$i is baz.txt.
foreach loops are generally preferred for looping through arrays to avoid accidental off-by-one errors caused by things like for (my $i=1;...;...) or for (my $i=0;$i<=#arr;...).
That said, for and foreach are actually implemented as synonyms in Perl, so the following script produces identical output to my previous example:
use strict;
use warnings;
my #files = (
'foo.txt',
'bar.txt',
'baz.txt',
);
print "for...\n";
foreach (my $i=0; $i<#files; $i++) {
print "\$i is $i.\n";
}
print "foreach...\n";
for my $i (#files) {
print "\$i is $i.\n";
}
It it simply customary to refer to the second type of loop as a foreach loop, even if the source code uses the keyword for to perform the loop (as has become quite common).

Perl - Use of uninitialized value in string

I started teaching myself Perl, and with the help of some Googling, I was able to throw together a script that would print out the file extensions in a given directory. The code works well, however, it will sometimes complain the following:
Use of uninitialized value $exts[xx] in string eq at get_file_exts.plx
I tried to correct this by initializing my array as follows: my #exts = (); but this did not work as expected.
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
#Check for correct number of arguments
if(#ARGV != 1) {
print "ERROR: Incorrect syntax...\n";
print "Usage: perl get_file_exts.plx <Directory>\n";
exit 0;
}
#Search through directory
find({ wanted => \&process_file, no_chdir => 1 }, #ARGV);
my #exts;
sub process_file {
if (-f $_) {
#print "File: $_\n";
#Get extension
my ($ext) = $_ =~ /(\.[^.]+)$/;
#Add first extension
if(scalar #exts == 0) {
push(#exts, $ext);
}
#Loop through array
foreach my $index (0..$#exts) {
#Check for match
if($exts[$index] eq $ext) {
last;
}
if($index == $#exts) {
push(#exts, $ext);
}
}
} else {
#print "Searching $_\n";
}
}
#Sort array
#exts = sort(#exts);
#Print contents
print ("#exts", "\n");
You need to test if you found an extension.
Also, you should not be indexing your array. You also do not need to manage 'push' just do it. It is not the Perl way. Your for loop should start like this:
sub process_file {
if (-f $_) {
#print "File: $_\n";
#Get extension
my ($ext) = $_ =~ /(\.[^.]+)$/;
# If we found an extension, and we have not seen it before, add it to #exts
if ($ext) {
#Loop through array to see if this is a new extension
my $newExt = 1;
for my $seenExt (#exts) {
#Check for match
if ($seenExt eq $ext) {
$newExt = 0
last;
}
}
if ($newExt) {
push #exts,$ext;
}
}
}
}
But what you really want to do is to use a hash table to record if you saw an extension
# Move this before find(...); if you want to initialize it or you will clobber the
# contents
my %sawExt;
sub process_file {
if (-f $_) {
#print "File: $_\n";
# Get extension
my ($ext) = $_ =~ /(\.[^.]+)$/;
# If we have an extension, mark that we've seen it
$sawExt{$ext} = 1
if $ext;
}
}
# Print the extensions we've seen in sorted order
print join(' ',sort keys %sawExt) . "\n";
Or even
sub process_file {
if (-f $_ && $_ =~ /(\.[^.]+)$/) {
$sawExt{$1} = 1;
}
}
Or
sub process_file {
$sawExt{$1} = 1
if -f && /(\.[^.]+)$/;
}
Once you start thinking in Perl this is the natural way to write it
The warning is complaining about a content of $exts[xx], not #exts itself.
Actually $ext can be undef, when the filename doesn't match to your regexp, for instance README.
Try like:
my ($ext) = $_ =~ /(\.[^.]+)$/ or return;
The main problem is that you aren't accounting for file names that don't contain a dot, so
my ($ext) = $_ =~ /(\.[^.]+)$/;
sets $ext to undef.
Despite the warning, processing continues by evaluating undef as the null string, failing to find that in #exts, and so percolating undef to the array as well.
The minimal change to get your code working is to replace
my ($ext) = $_ =~ /(\.[^.]+)$/;
with
return unless /(\.[^.]+)$/;
my $ext = $1;
But there is a couple of Perl lessons to be learned here. It used to be taught that good programs were well-commented programs. That was in the days of having to write efficient but incomprehensible code, but is no longer true. You should write code that is as clear as possible, and add comments only if you absolutely have to write something that isn't self-explanatory.
You should remember and use Perl idioms, and try to forget most C that you knew. For instance, Perl accepts the "here document" syntax, and it is common practice to use or and and as short-circuit operators. Your parameter check becomes
#ARGV or die <<END;
ERROR: Incorrect syntax...
Usage: perl get_file_exts.plx <Directory>
END
Perl allows for clear but concise programming. This is how I would have written your wanted subroutine
sub process_file {
return unless -f and /(\.[^.]+)$/;
my $ext = $1;
foreach my $index (0 .. $#exts) {
return if $exts[$index] eq $ext;
}
push #exts, $ext;
}
Use exists on $exts[xx] before accessing it.
exists is deprecated though as #chrsblck pointed out :
Be aware that calling exists on array values is deprecated and likely
to be removed in a future version of Perl.
But you should be able to check if it exists (and not 0 or "") simply with :
if($exts[index] && $exts[$index] eq $ext){
...
}

Perl IF statement not matching variables in REGEX

my $pointer = 0;
foreach (#new1)
{
my $test = $_;
foreach (#chk)
{
my $check = $_;
chomp $check;
delete($new1[$pointer]) if ($test =~ /^$check/i);
}
$pointer++;
}
The if statement never matches the fact that many entries in the #new1 array do contain $check at the start of the array element (88 at least).
I am not sure it is the nested loop that is causing the problem because if i try this it also fails to match:
foreach (#chk)
{
#final = (grep /^$_/, #new1);
}
#final is empty but I know at least 88 entires for $_ are in #new1.
I wrote this code on a machine running Windows ActivePerl 5.14.2 and the top code works. I then (using a copy of #new1) compare the two and remove any duplicates (also works on 5.14.2). I did try to negate the if match but that seemed to wipe out the #new1 array (so that I didn't need to do a hash compare).
When I try to run this code on a Linux RedHat box with Perl 5.8.0 it seems to struggle with the variable matching in the REGEX. If I hard code the REGEX with an example I know is in #new1 the match works and in the first code the entry is deleted (in the second one value is inserted in #final).
The #chk array is a listing file on the web server and the #new1 array is created by opening two log files on the web server and then pushing one into the other.
I had even gone to the trouble of printing out $test and $check in each loop iteration and manually checking to see if any of the the values did match and some of them do.
It has had me baffled for days now and I have had to throw the towel in and ask for help, any ideas?
As tested by user1568538, the solution was to replace
chomp $check;
with
$check =~ s/\r\n//g;
to remove Windows-style line endings from the variable.
Since chomp removes the contents of the input record separator $/ from the end of its argument, you could also change its value:
my $pointer = 0;
foreach (#new1)
{
my $test = $_;
foreach (#chk)
{
local $/="\r\n";
my $check = $_;
chomp $check;
delete($new1[$pointer]) if ($test =~ /^$_/i);
}
$pointer++;
}
However, since $/ also affects other operations (such as reading from a file handle), perhaps it is safest to avoid changing $/ unless you are sure if it is safe. Here I limit the change to the foreach loop where the chomp occurs.
No knowing what your input data looks like, using \Q might help:
if ($test =~ /^\Q$check/i);
See quotemeta.
It is not clear what you are trying to do. However, you may be trying to only get those elements for which there is no match or vice versa. Adapt the code below for your needs
#!/usr/bin/perl
use strict; use warnings;
my #item = qw(...); # your #new?
my #check = qw(...); # your #chk?
my #match;
my #nomatch;
ITEM:
foreach my $item (#item) {
CHECK:
foreach my $check (#check) {
# uncomment this if $check should not be interpreted as a pattern,
# but as literal characters:
# $item = '\Q' . $item;
if ($item =~ /^$check/) {
push #match, $item;
next ITEM; # there was a match, so this $item is burnt
# we don't need to test against other $checks.
}
}
# there was no match, so lets store it:
push #nomatch, $item.
}
print "matched $_\n" for #matched;
print "didn't match $_" for #nomatch;
Your code is somewhat difficult to read. Let me tell you what this
foreach (#chk) {
#final = (grep /^$_/, #new1);
}
does: It is roughly equivalent to
my #final = ();
foreach my $check (#chk) {
#final = grep /^$check/, #new1;
}
which is equivalent to
my #final = ();
foreach my $check (#chk) {
# #final = grep /^$check/, #new1;
#final = ();
foreach (#new) {
if (/^$check/) {
push #final, $_;
last;
}
}
}
So your #final array gets reset, possibly emptied.