How to regex and get file and directory path - perl

My array (#array) contains these directory structures. below directory and files path.
/home/testuser/mysql/data/userdata/pushdir/
/home/testuser/mysql/data/userdata/pushdir/test1.sql
/home/testuser/mysql/data/userdata/nextdir/testdir/
/home/testuser/mysql/data/userdata/pushdir/testdir/test2.sql
/home/testuser/mysql/data/userdata/ - from above list till this line path is constant.
I am trying to process the files to another loop . for that I am looking for the file names output only like "pushdir/test1.sql" and "pushdir/testdir/test2.sql"
I am using this code to get that, but I am not getting the expected output like "pushdir/test1.sql" and "pushdir/testdir/test2.sql". Please share your ideas to regex and get the output
foreach $dir(#array)
{
chomp $dir;
print "$dir\n";
#files = <$dir/*>;
my #names=join("\n", sort(#files));
print #names,"\n";
}
foreach my $filepath (#names) {
(my $volume,my $dirs, my $filelist) = File::Spec->splitpath(+$filepath );
print "$filelist\n";
}

#names is declared with my, and therefore scoped inside the foreach $dir loop only. There's no #names array to iterate over in the second foreach loop. Moreover, join
returns a string, you probably don't want the string to go to the array, you want individual filesnames to go there.
Use strict (it will tell you there's no #names declared) and warnings. Indent code blocks properly to see what commands belong where.
#!/usr/bin/perl
use warnings;
use strict;
use File::Spec;
my #array = qw( home/testuser/mysql/data/userdata/pushdir/
home/testuser/mysql/data/userdata/pushdir/test1.sql
home/testuser/mysql/data/userdata/nextdir/testdir/
home/testuser/mysql/data/userdata/pushdir/testdir/test2.sql );
my #names;
for my $dir (#array) {
print "DIR: $dir\n";
push #names, sort glob "$dir/*";
print "NAMES: #names\n";
}
for my $filepath (#names) {
my ($volume, $dirs, $filelist) = 'File::Spec'->splitpath($filepath);
print "FL: $filelist\n";
}

Related

Perl script to pair two array

I want to pair two array and add char '/' between them. Let say, two arrays are like below
#array1 = (FileA .. FileZ);
#array2 = (FileA.txt .. FileZ.txt);
The output that I want is like below
../../../experiment/fileA/fileA.txt
.
.
../../../experiment/fileZ/fileZ.txt
here is my code
my #input_name = input();
my $dirname = "../../../experiment/";
# CREATE FOLDER PATH
my #fileDir;
foreach my $input_name (#input_name){
chomp $input_name;
$_ = $dirname . $input_name;
push #fileDir, $_;
}
# CREATE FILE NAME
my #filename;
my $extension = '.txt';
foreach my $input_name (#input_name){
chomp $input_name;
$_ = $input_name . $extension;
push #filename, $_;
}
The code that I'd try is like below. But it seem doesn't work
#CREATE FULL PATH
foreach my $test_path (#test_path){
foreach my $testname (#testname){
my $test = map "$test_path[$_]/$testname[$_]", 0..$#test_path;
push #file, $test;
}
}
print #file;
I assume input() returns something like ('fileA', 'fileB').
The problem with your code is the nested loop here:
foreach my $test_path (#test_path){
foreach my $testname (#testname){
This combines every $test_path with every possible $testname. You don't want that. Also, it doesn't make much sense to assign the result of map to a scalar: All you'll get is the number of elements in the list created by map.
(Also, you have random chomp calls sprinkled throughout your code. None of those should be there.)
You only need a single array and a single loop:
use strict;
use warnings;
sub input {
return ('fileA', 'fileB');
}
my #input = input();
my $dirname = '../../../experiment';
my #files = map "$dirname/$_/$_.txt", #input;
for my $file (#files) {
print "got $file\n";
}
Here the loop is hidden in the map ..., #input call. If you want to write it as a for loop, it would look like this:
my #files;
for my $input (#input) {
push #files, "$dirname/$input/$input.txt";
}
The problem is your algorithm. You're iterating all filenames and all dirnames at the same time.
I mean, your code says "For every directory, create every file".
Try something along the lines of this and you'll be fine:
# WRITE TESTFILE
foreach my $filename (#filename){
chomp $filename;
if ( -e "$filename/$filename" and -d "$filename/$filename" ){
print "File already exists\n";
}
else {
open ( TXT_FILE, ">$filename/$filename" );
print TXT_FILE "Hello World";
close TXT_FILE;
}
}

Perl recursive code for scanning directory tree

In this script that scan a directory recursively, i would like to know what happen when the "ScanDirectory($name)" is called -> does the "next" get executed right after?
Cause if the #names gets populated with new directories after each loop then we get inside the first directory in #names, and if there is other directories there Scandirectory is called again but the other directories in the previous #names are replaced and so they are not treated by the loop? Sorry if i don't make sense.
i know there is already a module for this purpose, but i want to improve my understanding of how this loop code works so i can deal with recursive code in other situations
sub ScanDirectory {
my $workdir = shift;
my $startdir = cwd;
chdir $workdir or die;
opendir my $DIR, '.' or die;
my #names = readdir $DIR or die;
closedir $DIR;
foreach my $name (#names) {
next if ($name eq ".");
next if ($name eq "..");
if (-d $name) {
ScanDirectory($name);
next;
}
}
chdir $startdir or die;
}
ScanDirectory('.');
Is this your code?
In the subroutine you call my #names = readdir that defines a new lexically scoped variable, so each recursion will create a new instance of that variable. It might work if you use our instead of my. Variables defined with our are packaged scope which means each call will use the same #names variable. Actually not even then. You're cleaning out the previous value of the variable with your readdir.
You'll be better off using File::Find. File::Find comes with most Perl installations, so it's always available.
use strict;
use warnings;
use File::Find;
my #names;
find ( sub {
next if $_ eq "." or $_ eq "..";
push #names, $File::Find::name;
}, "."
);
This is simpler to understand, easier to write, more flexible, and much more efficient since it doesn't call itself recursively. Most of the time, you'll see this without the sub being embedded in the function:
my #names;
find ( \&wanted, ".");
sub wanted {
next if $_ eq "." or $_ eq "..";
push #names, $File::Find::name;
}
I prefer to embed the subroutine if the subroutine is fairly small. It prevents the subroutine from wandering away from the find call, and it prevents the mysterious instance of #names being used in the subroutine without a clear definition.
Okay, they're both the same. Both are subroutine references (one is called wanted and one is an anonymous subroutine). However, the first use of #names doesn't appear so mysterious since it's literally defined on the line right above the find call.
If you must write your own routine from scratch (maybe a homework assignment?), then don't use recursion. use push to push the reversed readdir into an array.
Then, pop off the items of the array one at a time. If you find a directory, read it (again in reverse) and push it onto your array. Be careful with . and ...
This is strangely-written code, especially if it is published in a book.
Your confusion is because the #names array is declared lexically, which means it exists only for the extent of the current block, and is unique to a prticular stack frame (subroutine call). So each call of scan_directory (local identifiers shouldn't really contain capital letters) has its own independent #names array which vanishes when the subroutine exits, and there is no question of "replacing" the contents.
Also, the next you're referring to is redundant: it skips to the next iteration of the #names array, which is just what would happen without it.
It would be much better written like this
sub scan_directory {
my ($workdir) = #_;
my $startdir = cwd;
chdir $workdir or die $!;
opendir my $dh, '.' or die $!;
while (my $name = readdir $dh) {
next if $name eq '.' or $name eq '..';
scan_directory($name) if -d $name;
}
chdir $startdir or die $!;
}
scan_directory('.');

Perl - passing an array to subroutine

I'm in the process of learning Perl and am trying to write a script that takes a pattern and list of files as command line arguments and passes them to a subroutine, the subroutine then opens each file and prints the lines that match the pattern. The code below works; however, it stops after printing the lines from the first file and doesn't even touch the second file. What am I missing here?
#!/usr/bin/perl
use strict;
use warnings;
sub grep_file
{
my $pattern = shift;
my #files = shift;
foreach my $doc (#files)
{
open FILE, $doc;
while (my $line = <FILE>)
{
if ($line =~ m/$pattern/)
{
print $line;
}
}
}
grep_file #ARGV;
Shift pops an element from your parameter (see: http://perldoc.perl.org/functions/shift.html).
So #files can only contain one value.
Try
sub foo
{
my $one = shift #_;
my #files = #_;
print $one."\n";
print #files;
}
foo(#ARGV);
There is little reason to use a subroutine here. You are just putting the whole program inside a function and then calling it.
The empty <> operator will read from all the files in #ARGV in sequence, without you having to open them explicitly.
I would code your program like this
use strict;
use warnings;
my $pattern = shift;
$pattern = qr/$pattern/; # Compile the regex
while (<>) {
print if $_ =~ $pattern;
}

change the directory and grab the xml file to parse certain data in perl

I am trying to parse specific XML file which is located in sub directories of one directory. For some reason i am getting error saying file does not exists. if the file does not exist it should move on to next sub directory.
HERE IS MY CODE
use strict;
use warnings;
use Data::Dumper;
use XML::Simple;
my #xmlsearch = map { chomp; $_ } `ls`;
foreach my $directory (#xmlsearch) {
print "$directory \n";
chdir($directory) or die "Couldn't change to [$directory]: $!";
my #findResults = `find -name education.xml`;
foreach my $educationresults (#findResults){
print $educationresults;
my $parser = new XML::Simple;
my $data = $parser->XMLin($educationresults);
print Dumper($data);
chdir('..');
}
}
ERROR
music/gitar/education.xml
File does not exist: ./music/gitar/education.xml
Using chdir the way you did makes the code IMO less readable. You can use File::Find for that:
use autodie;
use File::Find;
use XML::Simple;
use Data::Dumper;
sub findxml {
my #found;
opendir(DIR, '.');
my #where = grep { -d && m#^[^.]+$# } readdir(DIR);
closedir(DIR);
File::Find::find({wanted => sub {
push #found, $File::Find::name if m#^education\.xml$#s && -f _;
} }, #where);
return #found;
}
foreach my $xml (findxml()){
say $xml;
print Dumper XMLin($xml);
}
Whenever you find yourself relying on backticks to execute shell commands, you should consider whether there is a proper perl way to do it. In this case, there is.
ls can be replaced with <*>, which is a simple glob. The line:
my #array = map { chomp; $_ } `ls`;
Is just a roundabout way of saying
chomp(my #array = `ls`); # chomp takes list arguments as well
But of course the proper way is
my #array = <*>; # no chomp required
Now, the simple solution to all of this is simply to do
for my $xml (<*/education.xml>) { # find the xml files in dir 1 level up
Which will cover one level of directories, with no recursion. For full recursion, use File::Find:
use strict;
use warnings;
use File::Find;
my #list;
find( sub { push #list, $File::Find::name if /^education\.xml$/i; }, ".");
for (#list) {
# do stuff
# #list contains full path names of education.xml files found in subdirs
# e.g. ./music/gitar/education.xml
}
You should note that changing directories is not required, and in my experience, not worth the trouble. Instead of doing:
chdir($somedir);
my $data = XMLin($somefile);
chdir("..");
Simply do:
my $data = XMLin("$somedir/$somefile");

How do I read multiple directories and read the contents of subdirectories in Perl?

I have a folder and inside that I have many subfolders. In those subfolders I have many .html files to be read. I have written the following code to do that. It opens the parent folder and also the first subfolder and it prints only one .html file. It shows error:
NO SUCH FILE OR DIRECTORY
I dont want to change the entire code. Any modifications in the existing code will be good for me.
use FileHandle;
opendir PAR_DIR,"D:\\PERL\\perl_programes\\parent_directory";
while (our $sub_folders = readdir(PAR_DIR))
{
next if(-d $sub_folders);
opendir SUB_DIR,"D:\\PERL\\perl_programes\\parent_directory\\$sub_folders";
while(our $file = readdir(SUB_DIR))
{
next if($file !~ m/\.html/i);
print_file_names($file);
}
close(FUNC_MODEL1);
}
close(FUNC_MODEL);
sub print_file_names()
{
my $fh1 = FileHandle->new("D:\\PERL\\perl_programes\\parent_directory\\$file")
or die "ERROR: $!"; #ERROR HERE
print("$file\n");
}
Your posted code looks way overcomplicated. Check out File::Find::Rule and you could do most of that heavy lifting in very little code.
use File::Find::Rule;
my $finder = File::Find::Rule->new()->name(qr/\.html?$/i)->start("D:/PERL/perl_programes/parent_directory");
while( my $file = $finder->match() ){
print "$file\n";
}
I mean isn't that sexy?!
A user commented that you may be wishing to use only Depth=2 entries.
use File::Find::Rule;
my $finder = File::Find::Rule->new()->name(qr/\.html?$/i)->mindepth(2)->maxdepth(2)->start("D:/PERL/perl_programes/parent_directory");
while( my $file = $finder->match() ){
print "$file\n";
}
Will Apply this restriction.
You're not extracting the supplied $file parameter in the print_file_names() function.
It should be:
sub print_file_names()
{
my $file = shift;
...
}
Your -d test in the outer loop looks wrong too, BTW. You're saying next if -d ... which means that it'll skip the inner loop for directories, which appears to be the complete opposite of what you require. The only reason it's working at all is because you're testing $file which is only the filename relative to the path, and not the full path name.
Note also:
Perl on Windows copes fine with / as a path separator
Set your parent directory once, and then derive other paths from that
Use opendir($scalar, $path) instead of opendir(DIR, $path)
nb: untested code follows:
use strict;
use warnings;
use FileHandle;
my $parent = "D:/PERL/perl_programes/parent_directory";
my ($par_dir, $sub_dir);
opendir($par_dir, $parent);
while (my $sub_folders = readdir($par_dir)) {
next if ($sub_folders =~ /^..?$/); # skip . and ..
my $path = $parent . '/' . $sub_folders;
next unless (-d $path); # skip anything that isn't a directory
opendir($sub_dir, $path);
while (my $file = readdir($sub_dir)) {
next unless $file =~ /\.html?$/i;
my $full_path = $path . '/' . $file;
print_file_names($full_path);
}
closedir($sub_dir);
}
closedir($par_dir);
sub print_file_names()
{
my $file = shift;
my $fh1 = FileHandle->new($file)
or die "ERROR: $!"; #ERROR HERE
print("$file\n");
}
Please start putting:
use strict;
use warnings;
at the top of all your scripts, it will help you avoid problems like this and make your code much more readable.
You can read more about it here: Perlmonks
You are going to need to change the entire code to make it robust:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
my $top = $ENV{TEMP};
find( { wanted => \&wanted, no_chdir=> 1 }, $top );
sub wanted {
return unless -f and /\.html$/i;
print $_, "\n";
}
__END__
Have you considered using
File::Find
Here's one method which does not require to use File::Find:
First open the root directory, and store all the sub-folders' names in an array by using readdir;
Then, use foreach loop. For each sub-folder, open the new directory by linking the root directory and the folder's name. Still use readdir to store the file names in an array.
The last step is to write the codes for processing the files inside this foreach loop.
Special thanks to my teacher who has given me this idea :) It really worked well!