Display folders using perl script - perl

I have the following code and my problem is that I cannot modify it in order to use the $file3 outside the for function
for ($i = 0; $i < scalar(#temp); $i++){
$path7 = 'path_to'.#temp[$i];
foreach $path ($path7){
opendir my ($dh3), $path7 or die $!;
while ( my $file3 = readdir $dh3 ) {
next if $file3 eq '.' or $file3 eq '..';
next unless -d catfile($path7, $file3);
print "$file3\n";
}
closedir ($dh3);
}
}

Your $file3 is lexical to the while loop because you declared it with my. If you want it to be available outside, declare it in a larger scope, i.e. outside the for.
my $file3; # here!
for ( ...) {
# ...
# ...
######### no my below
while ( $file3 = readdir $dh3 ) {
# ...
}
# ...
}
Remember that in Perl it's a good practice to declare variables in the smallest scope necessary.
Also note that outside the while loop it will start out being undef and after being done processing the while for the first time ($i is 0, $path is the value of $path7), $file3 will keep the value it had in the last round of the while loop until the next time the while loop starts. That is never, because your foreach's list only has one element, as $path7 is a scalar and not an array. In fact, there is no need for that foreach loop at all. Just use $path7 directly.
Confused with my explanation because of the variable names? Me too. Always pick meaningfull variable names, don't just append numbers. That makes it very hard to maintain. :)

Related

Perl recursive code for scanning directory tree

In this script that scan a directory recursively, i would like to know what happen when the "ScanDirectory($name)" is called -> does the "next" get executed right after?
Cause if the #names gets populated with new directories after each loop then we get inside the first directory in #names, and if there is other directories there Scandirectory is called again but the other directories in the previous #names are replaced and so they are not treated by the loop? Sorry if i don't make sense.
i know there is already a module for this purpose, but i want to improve my understanding of how this loop code works so i can deal with recursive code in other situations
sub ScanDirectory {
my $workdir = shift;
my $startdir = cwd;
chdir $workdir or die;
opendir my $DIR, '.' or die;
my #names = readdir $DIR or die;
closedir $DIR;
foreach my $name (#names) {
next if ($name eq ".");
next if ($name eq "..");
if (-d $name) {
ScanDirectory($name);
next;
}
}
chdir $startdir or die;
}
ScanDirectory('.');
Is this your code?
In the subroutine you call my #names = readdir that defines a new lexically scoped variable, so each recursion will create a new instance of that variable. It might work if you use our instead of my. Variables defined with our are packaged scope which means each call will use the same #names variable. Actually not even then. You're cleaning out the previous value of the variable with your readdir.
You'll be better off using File::Find. File::Find comes with most Perl installations, so it's always available.
use strict;
use warnings;
use File::Find;
my #names;
find ( sub {
next if $_ eq "." or $_ eq "..";
push #names, $File::Find::name;
}, "."
);
This is simpler to understand, easier to write, more flexible, and much more efficient since it doesn't call itself recursively. Most of the time, you'll see this without the sub being embedded in the function:
my #names;
find ( \&wanted, ".");
sub wanted {
next if $_ eq "." or $_ eq "..";
push #names, $File::Find::name;
}
I prefer to embed the subroutine if the subroutine is fairly small. It prevents the subroutine from wandering away from the find call, and it prevents the mysterious instance of #names being used in the subroutine without a clear definition.
Okay, they're both the same. Both are subroutine references (one is called wanted and one is an anonymous subroutine). However, the first use of #names doesn't appear so mysterious since it's literally defined on the line right above the find call.
If you must write your own routine from scratch (maybe a homework assignment?), then don't use recursion. use push to push the reversed readdir into an array.
Then, pop off the items of the array one at a time. If you find a directory, read it (again in reverse) and push it onto your array. Be careful with . and ...
This is strangely-written code, especially if it is published in a book.
Your confusion is because the #names array is declared lexically, which means it exists only for the extent of the current block, and is unique to a prticular stack frame (subroutine call). So each call of scan_directory (local identifiers shouldn't really contain capital letters) has its own independent #names array which vanishes when the subroutine exits, and there is no question of "replacing" the contents.
Also, the next you're referring to is redundant: it skips to the next iteration of the #names array, which is just what would happen without it.
It would be much better written like this
sub scan_directory {
my ($workdir) = #_;
my $startdir = cwd;
chdir $workdir or die $!;
opendir my $dh, '.' or die $!;
while (my $name = readdir $dh) {
next if $name eq '.' or $name eq '..';
scan_directory($name) if -d $name;
}
chdir $startdir or die $!;
}
scan_directory('.');

Scope of $_ : why does this change it?

I have a code snippet like the following:
use strict;
use warnings;
# file names to search for
open(my $files, "<", "fileList.txt") or die "Can't open fileList.txt: $!";
my $flag = 0;
while (<$files>) {
print "File loop: $_\n";
open(my $search, "<", "searchMe.txt") or die "Can't open searchMe.txt: $!";
$flag = 0;
while (<$search>){
print "Search loop: $_\n";
}
}
fileList.txt contains one line: "CheckFilesFunctions.pm"
searchMe.txt contains one line: abc
The output here is
File loop: CheckFilesFunctions.pm
Search loop: abc
However. when I change the search loop to the following
while (<$search> && !$flag){
Suddenly the search loop starts printing
Search loop: CheckFilesFunctions.pm
Why does the scope of $_ change here?
while (<filehandle>) is convenient shorthand for while (defined( $_ = <filehandle> )); if you have a more complicated expression to test, you need to explicitly include the full thing:
while ( defined( $_ = <$search> ) && ! $flag ) {
though I would suggest explicitly using readline (<> can mean either readline or glob, depending on the argument; I prefer to use those directly) and using a lexical variable:
while ( defined( my $line = readline $search ) && ! $flag ) {
Alternatively, you could break out of the loop instead of modifying the condition:
while (<$search>) {
...
if (...) {
last;
Though looking at your code, you probably want to be reading the search file just once into an array before the file loop, and just looping over that array.

What am I not getting about foreach loops?

It was always my understanding that
foreach (#arr)
{
....
}
and
for(my $i=0; $i<#arr; $i++)
{
.....
}
were functionally equivalent.
However, in all of my code, whenever I use a foreach loop I run into problems that get fixed when I change to a for loop. It always has to do with comparing the values of two things, usually with nested loops.
Here is an example:
for(my $i=0; $i<#files; $i++)
{
my $sel;
foreach (#selected)
{
if(files[$i] eq selected[$_])
{
$selected='selected';
}
}
<option value=$Files[$i] $sel>$files[$i]</option>
}
The above code falls between select tags in a cgi program.
Basically I am editing the contents of a select box according to user specifications.
But after they add or delete choices I want the choices that were origionally selected to remain selected.
The above code is supposed to accomplish this when reassembling the select on the next form. However, with the foreach version it only gets the first choice that's selected and skips the rest. If I switch it to a 3 part for loop, without changing anything else, it will work as intended.
This is only a recent example, so clearly I am missing something here, can anyone help me out?
Let's assume that #files is a list of filenames.
In the following code, $i is the array index (i.e. it's an integer):
for (my $i=0; $i<#files; $i++) { ... }
In the following code, $i is set to each array item in turn (i.e. it's a filename):
foreach my $i (#files) { ... }
So for example:
use strict;
use warnings;
my #files = (
'foo.txt',
'bar.txt',
'baz.txt',
);
print "for...\n";
for (my $i=0; $i<#files; $i++) {
print "\$i is $i.\n";
}
print "foreach...\n";
foreach my $i (#files) {
print "\$i is $i.\n";
}
Produces the following output:
for...
$i is 0.
$i is 1.
$i is 2.
foreach...
$i is foo.txt.
$i is bar.txt.
$i is baz.txt.
foreach loops are generally preferred for looping through arrays to avoid accidental off-by-one errors caused by things like for (my $i=1;...;...) or for (my $i=0;$i<=#arr;...).
That said, for and foreach are actually implemented as synonyms in Perl, so the following script produces identical output to my previous example:
use strict;
use warnings;
my #files = (
'foo.txt',
'bar.txt',
'baz.txt',
);
print "for...\n";
foreach (my $i=0; $i<#files; $i++) {
print "\$i is $i.\n";
}
print "foreach...\n";
for my $i (#files) {
print "\$i is $i.\n";
}
It it simply customary to refer to the second type of loop as a foreach loop, even if the source code uses the keyword for to perform the loop (as has become quite common).

Nested foreach loop not working

It should be a simple nested foreach loop but it's not working and really starting to annoy me that I can't figure this out! Still a perl beginner but I thought I understood this by now. Can someone explain to me where I'm going wrong? The idea is simple: 2 files, 1 small, 1 large with info I want in the small one. Both have unique id's in them. Compare and match the id's and output a new small file with the added info in the small file.
I have 2 pieces of code: 1 without stricts and 1 with and both are not working. I know to use stricts but i'm still curious as to why the one without stricts isn't working either.
WITOUT STRICTS:
if ($#ARGV != 2){
print "input_file1 input_file2 output_file\n";
exit;
}
$inputfile1=$ARGV[0];
$inputfile2=$ARGV[1];
$outputfile1=$ARGV[2];
open(INFILE1,$inputfile1) || die "No inputfile :$!\n";
open(INFILE2,$inputfile2) || die "No inputfile :$!\n";
open(OUTFILE_1,">$outputfile1") || die "No outputfile :$!\n";
$i = 0;
$j = 0;
#infile1=<INFILE1>;
#infile2=<INFILE2>;
foreach ( #infile1 ){
#elements = split(";",$infile1[$i]);
$id1 = $elements[3];
print "1. $id1\n";
$lat = $elements[5];
$lon = $elements[6];
$lat =~ s/,/./;
$lon =~ s/,/./;
print "2. $lat\n";
print "3. $lon\n";
foreach ( #infile2 ){
#loopelements = split(";",$infile2[$j]);
$id2 = $loopelements[4];
print "4. $id2\n";
if ($id1 == $id2){
print OUTFILE_1 "$loopelements[0];$loopelements[1];$loopelements[2];$loopelements[3];$loopelements[4];$lat,$lon\n";
};
$j = $j+1;
};
#elements = join(";",#elements); # add ';' to all elements
#print "$i\r";
$i = $i+1;
}
close(INFILE1);
close(INFILE2);
close(OUTFILE_1);
The error without is the second loop will not start if i'm not mistaken.
WITH STRICTS:
use strict;
use warnings;
my $inputfile1 = shift || die "Give input!\n";
my $inputfile2 = shift || die "Give more input!\n";
my $outputfile = shift || die "Give output!\n";
open my $INFILE1, '<', $inputfile1 or die "In use/Not found :$!\n";
open my $INFILE2, '<', $inputfile2 or die "In use/Not found :$!\n";
open my $OUTFILE, '>', $outputfile or die "In use/Not found :$!\n";
my $i = 0;
my $j = 0;
foreach ( my $infile1 = <$INFILE1> ){
my #elements = split(";",$infile1[$i]);
my $id1 = $elements[3];
print "1: $id1\n";
my $lat = $elements[5];
my $lon = $elements[6];
$lat =~ s/,/./;
$lon =~ s/,/./;
print "2: $lat\n";
print "3: $lon\n";
foreach ( my $infile2 = <$INFILE2> ){
my #loopelements = split(";",$infile2[$j]);
my $id2 = $loopelements[4];
print "4: $id2\n";
if ($id1 == $id2){
print $OUTFILE "$loopelements[0];$loopelements[1];$loopelements[2];$loopelements[3];$loopelements[4];$lat,$lon\n";
};
$j = $j+1;
};
##elements = join(";",#elements); # add ';' to all elements
#print "$i\r";
$i = $i+1;
}
close($INFILE1);
close($INFILE2);
close($OUTFILE);
The error with stricts:
Global symbol "#infile1" requires explicit package name at Z:\Data-Content\Data\test\jan\bestemming_zonder_acco\add_latlon_dest_test.pl line 16.
Global symbol "#infile2" requires explicit package name at Z:\Data-Content\Data\test\jan\bestemming_zonder_acco\add_latlon_dest_test.pl line 31.
Your 'strict' implementation gives you errors due to a confusion about the sigils (the $ and # characters) indication whether a variable is an scalar or an array. In the loop statement you are reading each line of the file into a scalar called $infile1 but in the following line you are trying to access a element of the array #infile1. These to variables are not related and as perl tells you the latter is not declared.
Another problem with you 'strict' implementation is that you are reading the file inside the loop. This means that for nested loops you will read file 2 in the first iteration of the outer loop and for all succeeding iterations the inner loop will not be able to read any lines.
I missed the foreach/while issue, pointed out by stevenl, even fixing the stricture issues will leave you with foreach loops with only one iteration.
I'm not sure what your problem with the unstrict script are.
But I wouldn't use a nested loop at all for processing two files. I would un-nest the loops, so it roughly looked like this:
my %cord;
while ( my $line = <$INFILE1> ) {
my #elements = split /;/, $line;
$cord{ $elements[3] } = "$elements[5],$elements[6]";
}
while ( my $line = <$INFILE2> ) {
my #elements = split /;/, $line;
if ( exists %coord{ $elements[4] } ) {
print $OUTFILE "....;$cord{ $elements4 }\n";
}
}
I can't see exactly where the problem with the non-strict version is. What is the problem that you are encountering?
The problem with the strict version is particularly in these 2 lines:
foreach ( my $infile1 = <$INFILE1> ){
my #elements = split(";",$infile1[$i]);
You have a scalar $infile1 in the first line, but you are treating it as an array in the next line. Also, change the foreach to a while (see below).
A few comments.
For the non-strict version, you could have collapsed the loop to a C-style for loop as:
for (my $i = 0; $i < #infile1; $i++) {
...
}
That can be made simpler to read if you go without the array indexes altogether:
foreach my $infile1 (#infile1) {
my #elements = split ';', $infile1;
...
}
But with the larger file, it might take time to slurp the entire file into the array at the beginning. So it might be better to iterate through the file as you go:
while (my $infile = <$INFILE1>) {
...
}
Note the last point should be how the strict version looks. You need a while loop rather than a foreach loop, because assigning <$INFILE1> to a scalar means it will return the next line only, which evaluates to true as long as there is another line in the file. (Thus, the foreach would only ever get the first line to loop over.)
You don't reset $j before the inner foreach loop runs. Therefore, the second time your inner loop runs, you are trying to access elements that are past the end of the array. This mistake exists in both the strict and non-strict version.
You should not be using $i and $j at all; the point of foreach is that it automatically gets each element for you. Here is an example of correctly using foreach in the inner loop:
foreach my $line ( #infile2 ){
#loopelements = split(";",$line);
#...now do stuff as before
}
This puts each element of #infile one into the variable $line in succession, until you have gone through all of the array.

Perl IF statement not matching variables in REGEX

my $pointer = 0;
foreach (#new1)
{
my $test = $_;
foreach (#chk)
{
my $check = $_;
chomp $check;
delete($new1[$pointer]) if ($test =~ /^$check/i);
}
$pointer++;
}
The if statement never matches the fact that many entries in the #new1 array do contain $check at the start of the array element (88 at least).
I am not sure it is the nested loop that is causing the problem because if i try this it also fails to match:
foreach (#chk)
{
#final = (grep /^$_/, #new1);
}
#final is empty but I know at least 88 entires for $_ are in #new1.
I wrote this code on a machine running Windows ActivePerl 5.14.2 and the top code works. I then (using a copy of #new1) compare the two and remove any duplicates (also works on 5.14.2). I did try to negate the if match but that seemed to wipe out the #new1 array (so that I didn't need to do a hash compare).
When I try to run this code on a Linux RedHat box with Perl 5.8.0 it seems to struggle with the variable matching in the REGEX. If I hard code the REGEX with an example I know is in #new1 the match works and in the first code the entry is deleted (in the second one value is inserted in #final).
The #chk array is a listing file on the web server and the #new1 array is created by opening two log files on the web server and then pushing one into the other.
I had even gone to the trouble of printing out $test and $check in each loop iteration and manually checking to see if any of the the values did match and some of them do.
It has had me baffled for days now and I have had to throw the towel in and ask for help, any ideas?
As tested by user1568538, the solution was to replace
chomp $check;
with
$check =~ s/\r\n//g;
to remove Windows-style line endings from the variable.
Since chomp removes the contents of the input record separator $/ from the end of its argument, you could also change its value:
my $pointer = 0;
foreach (#new1)
{
my $test = $_;
foreach (#chk)
{
local $/="\r\n";
my $check = $_;
chomp $check;
delete($new1[$pointer]) if ($test =~ /^$_/i);
}
$pointer++;
}
However, since $/ also affects other operations (such as reading from a file handle), perhaps it is safest to avoid changing $/ unless you are sure if it is safe. Here I limit the change to the foreach loop where the chomp occurs.
No knowing what your input data looks like, using \Q might help:
if ($test =~ /^\Q$check/i);
See quotemeta.
It is not clear what you are trying to do. However, you may be trying to only get those elements for which there is no match or vice versa. Adapt the code below for your needs
#!/usr/bin/perl
use strict; use warnings;
my #item = qw(...); # your #new?
my #check = qw(...); # your #chk?
my #match;
my #nomatch;
ITEM:
foreach my $item (#item) {
CHECK:
foreach my $check (#check) {
# uncomment this if $check should not be interpreted as a pattern,
# but as literal characters:
# $item = '\Q' . $item;
if ($item =~ /^$check/) {
push #match, $item;
next ITEM; # there was a match, so this $item is burnt
# we don't need to test against other $checks.
}
}
# there was no match, so lets store it:
push #nomatch, $item.
}
print "matched $_\n" for #matched;
print "didn't match $_" for #nomatch;
Your code is somewhat difficult to read. Let me tell you what this
foreach (#chk) {
#final = (grep /^$_/, #new1);
}
does: It is roughly equivalent to
my #final = ();
foreach my $check (#chk) {
#final = grep /^$check/, #new1;
}
which is equivalent to
my #final = ();
foreach my $check (#chk) {
# #final = grep /^$check/, #new1;
#final = ();
foreach (#new) {
if (/^$check/) {
push #final, $_;
last;
}
}
}
So your #final array gets reset, possibly emptied.