Perl - passing an array to subroutine - perl

I'm in the process of learning Perl and am trying to write a script that takes a pattern and list of files as command line arguments and passes them to a subroutine, the subroutine then opens each file and prints the lines that match the pattern. The code below works; however, it stops after printing the lines from the first file and doesn't even touch the second file. What am I missing here?
#!/usr/bin/perl
use strict;
use warnings;
sub grep_file
{
my $pattern = shift;
my #files = shift;
foreach my $doc (#files)
{
open FILE, $doc;
while (my $line = <FILE>)
{
if ($line =~ m/$pattern/)
{
print $line;
}
}
}
grep_file #ARGV;

Shift pops an element from your parameter (see: http://perldoc.perl.org/functions/shift.html).
So #files can only contain one value.
Try
sub foo
{
my $one = shift #_;
my #files = #_;
print $one."\n";
print #files;
}
foo(#ARGV);

There is little reason to use a subroutine here. You are just putting the whole program inside a function and then calling it.
The empty <> operator will read from all the files in #ARGV in sequence, without you having to open them explicitly.
I would code your program like this
use strict;
use warnings;
my $pattern = shift;
$pattern = qr/$pattern/; # Compile the regex
while (<>) {
print if $_ =~ $pattern;
}

Related

Parsing string in multiline data with positive lookbehind

I am trying to parse data like:
header1
-------
var1 0
var2 5
var3 9
var6 1
header2
-------
var1 -3
var3 5
var5 0
Now I want to get e.g. var3 for header2. Whats the best way to do this?
So far I was parsing my files line-by-line via
open(FILE,"< $file");
while (my $line = <FILE>){
# do stuff
}
but I guess it's not possible to handle multiline parsing properly.
Now I am thinking to parse the file at once but wasn't successful so far...
my #Input;
open(FILE,"< $file");
while (<FILE>){ #Input = <FILE>; }
if (#Input =~ /header2/){
#...
}
The easier way to handle this is "paragraph mode".
local $/ = "";
while (<>) {
my ($header, $body) =~ /^([^\n]*)\n-+\n(.*)/s
or die("Bad data");
my #data = map [ split ], split /\n/, $body;
# ... Do something with $header and #data ...
}
The same can be achieved without messing with $/ as follows:
my #buf;
while (1) {
my $line = <>;
$line =~ s/\s+\z// if !defined($line);
if (!length($line)) {
if (#buf) {
my $header = shift(#buf);
shift(#buf);
my #data = map [ split ], splice(#buf);
# ... Do something with $header and #data ...
}
last if !defined($line);
next;
}
push #buf, $line;
}
(In fact, the second snippet includes a couple of small improvements over the first.)
Quick comments on your attempt:
The while loop is useless because #Input = <FILE> places the remaining lines of the file in #Input.
#Input =~ /header2/ matches header2 against the stringification of the array, which is the stringification of the number of elements in #Input. If you want to check of an element of #Input contains header2, will you will need to loop over the elements of #Inputs and check them individually.
while (<FILE>){ #Input = <FILE>; }
This doesn't make much sense. "While you can read a record from FILE, read all of the data on FILE into #Input". I think what you actually want is just:
my #Input = <FILE>;
if (#Input =~ /header2/){
This is quite strange too. The binding operator (=~) expects scalar operands, so it evaluates both operands in scalar context. That means #Input will be evaluated as the number of elements in #Input. That's an integer and will never match "header2".
A couple of approaches. Firstly a regex approach.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $file = 'file';
open my $fh, '<', $file or die $!;
my $data = join '', <$fh>;
if ($data =~ /header2.+var3 (.+?)\n/s) {
say $1;
} else {
say 'Not found';
}
The key to this is the /s on the m// operator. Without it, the two dots in the regex won't match newlines.
The other approach is more of a line by line parser.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $file = 'file';
open my $fh, '<', $file or die $!;
my $section = '';
while (<$fh>) {
chomp;
# if the line all word characters,
# then we've got a section header.
if ($_ !~ /\W/) {
$section = $_;
next;
}
my ($key, $val) = split;
if ($section eq 'header2' and $key eq 'var3') {
say $val;
last;
}
}
We read the file a line at a time and make a note of the section headers. For data lines, we split on whitespace and check to see if we're in the right section and have the right key.
In both cases, I've switched to using a more standard approach (lexical filehandles, 3-arg open(), or die $!) for opening the file.

How to regex and get file and directory path

My array (#array) contains these directory structures. below directory and files path.
/home/testuser/mysql/data/userdata/pushdir/
/home/testuser/mysql/data/userdata/pushdir/test1.sql
/home/testuser/mysql/data/userdata/nextdir/testdir/
/home/testuser/mysql/data/userdata/pushdir/testdir/test2.sql
/home/testuser/mysql/data/userdata/ - from above list till this line path is constant.
I am trying to process the files to another loop . for that I am looking for the file names output only like "pushdir/test1.sql" and "pushdir/testdir/test2.sql"
I am using this code to get that, but I am not getting the expected output like "pushdir/test1.sql" and "pushdir/testdir/test2.sql". Please share your ideas to regex and get the output
foreach $dir(#array)
{
chomp $dir;
print "$dir\n";
#files = <$dir/*>;
my #names=join("\n", sort(#files));
print #names,"\n";
}
foreach my $filepath (#names) {
(my $volume,my $dirs, my $filelist) = File::Spec->splitpath(+$filepath );
print "$filelist\n";
}
#names is declared with my, and therefore scoped inside the foreach $dir loop only. There's no #names array to iterate over in the second foreach loop. Moreover, join
returns a string, you probably don't want the string to go to the array, you want individual filesnames to go there.
Use strict (it will tell you there's no #names declared) and warnings. Indent code blocks properly to see what commands belong where.
#!/usr/bin/perl
use warnings;
use strict;
use File::Spec;
my #array = qw( home/testuser/mysql/data/userdata/pushdir/
home/testuser/mysql/data/userdata/pushdir/test1.sql
home/testuser/mysql/data/userdata/nextdir/testdir/
home/testuser/mysql/data/userdata/pushdir/testdir/test2.sql );
my #names;
for my $dir (#array) {
print "DIR: $dir\n";
push #names, sort glob "$dir/*";
print "NAMES: #names\n";
}
for my $filepath (#names) {
my ($volume, $dirs, $filelist) = 'File::Spec'->splitpath($filepath);
print "FL: $filelist\n";
}

Unable to get absolute path of a file from $File::Find::name - perl

I am unable to get the absolute path of a file from $File::Find::name. It is showing undef vale as a output. Not able to figure it out why :( can any one please help me out in this
Error displayed is : Use of uninitialized value $file_name in concatenation
My Code :
use strict;
use warnings;
use File::Find;
use File::Path qw(make_path);
use File::Copy;
use Cwd;
use Data::Printer;
my $rootPATH = $ARGV[0];
my $id = $ARGV[1];
my #Arraypath;
my $file_name;
our $anr_name;
opendir( my $DIR, $rootPATH );
while ( my $entry = readdir $DIR ) {
next unless -d $rootPATH . '/' . $entry;
next if $entry eq '.' or $entry eq '..';
#print "Found directory $entry\n";
push( #Arraypath, ( split( "\n", $entry ) ) );
}
closedir $DIR;
my $count = 0;
foreach my $line (#Arraypath) {
my $fulllogpath = $rootPATH . "\\" . $line;
#print "$fulllogpath\n";
$count++;
start($fulllogpath);
}
sub start {
my $fulllogpath = shift;
our #content;
#print "$fulllogpath\n\n";
find( \&wanted, $fulllogpath );
sub wanted {
push #content, $_;
return;
}
foreach my $file (#content) {
# print "$file\n\n";
if ( $file =~ /traces[_d]*/ ) {
print "$file\n\n";
$file_name = $File::Find::name;
p $file_name;
print "$file_name\n";
}
}
}
Your program is very poorly layed out. It will be much simpler to write and debug code if you indent it properly and use carefully-chosen identifiers: a name like start for a subroutine is useless.
You also have unnecessary subroutine declarations which break up the program flow and make it awkward to follow.
Why do you have a couple of package variables (declared with our)? There is generally no need for them, and it is best to use lexical variables throughout, declared at an appropriate place so that all code has access to them if it needs it.
It is also preferable to use File::Spec to work with file paths, rather than manipulate them using string operators, with which it is easy to make a mistake.
The best way to manage the results of find is to work with absolute paths all the way through. It looks like you want to do more than just print the results returned by find since you load modules like Cwd and File::Copy, but without knowing what that further purpose is I cannot help you to write it.
This code removes all the subroutines and makes everything much more concise.
use strict;
use warnings;
use autodie;
use File::Find 'find';
use File::Spec;
use Data::Printer;
my ($root_path, $id) = #ARGV;
opendir my ($dh), $root_path;
my #dir_list =
grep -d,
map File::Spec->catfile($root_path, $_),
grep { not /\A\.\.?\z/ } readdir $dh;
closedir $dh;
my $count;
for my $dir (#dir_list) {
++$count;
find(sub {
return unless /traces[_d]*/;
my $file = $_;
print "$file\n\n";
my $file_name = $File::Find::name;
p $file_name;
print "$file_name\n";
}, $dir);
}
As has already been stated, $File::Find::name is valid only within the wanted function. Not outside of it.
However, I would recommend making the shift to using Path::Class and Path::Class::Rule for some simpler processing of your files in a cross platform compatible way:
use strict;
use warnings;
use Data::Printer;
use Path::Class;
use Path::Class::Rule;
my ( $root_path, $id ) = #ARGV;
my $dir = dir($root_path);
my $next = Path::Class::Rule->new->file->name(qr{traces[_d]*})->iter(
grep { $_->is_dir() } $dir->children
);
while ( my $file = $next->() ) {
# Accomplishes the same as your script. I suspect these prints statements are mostly for debugging though.
print $file->basename(), "\n\n";
p "$file";
print "$file\n";
}

how to compare the the array values with different file in different directory?

#!/usr/bin/perl
use strict;
use warnings;
use warnings;
use 5.010;
my #names = ("RD", "HD", "MP");
my $flag = 0;
my $filename = 'Sample.txt';
if (open(my $fh, '<', $filename))
{
while (my $row = <$fh>)
{
foreach my $i (0 .. $#names)
{
if( scalar $row =~ / \G (.*?) ($names[$i]) /xg )
{
$flag=1;
}
}
}
if( $flag ==1)
{
say $filename;
}
$flag=0;
}
here i read the content from one file and compare with array values, if file contant matches with array value i just display the file. in the same way how can i access different file from different direcory and compare the array values with same?
Q: How can I access a different file?
A: By specifying a different filename.
By the way: If you are using flags for loop control in Perl, you are doing something wrong. You can specify that this was the last iteration of the loop (in C: break), or that you want to start the next iteration. You can label the loops so that you can break out of as many loops as you like at once:
#!/usr/bin/perl
use 5.010; use warnings;
my #names = qw(RD HD MP);
# unpack command line arguments
my ($filename) = #ARGV;
open my $fh, "<", $filename or die "Oh noes, $filename is bad: $!";
LINE:
while (my $line = <$fh>) {
NAME:
foreach my $name (#names) {
if ($line =~ /\Q$name\E/) { # \QUOT\E the $name to escape everything
say "$filename contains $name";
last LINE;
}
}
}
Other highlights:
using a foreach loop as intended and
removing the (in this context) senseless \G assertion
You can then execute the script as perl script.pl Sample.txt or perl script.pl ../another.dir/foo.bar or whatever.
You can use the ~~ operator in Perl 5.10.
Don't forget to chomp the trailing whitespace.
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
my #names = ('RD', 'HD', 'MP');
my $other_dir = '/tmp';
my $filename = 'Sample.txt';
if ( open( my $fh, '<', "$other_dir/$filename" ) ) {
ROW:
while ( my $row = <$fh> ) {
chomp $row; # remove trailing \n
if ( $row ~~ #names ) {
say $filename;
last ROW;
}
}
}
close $fh;

change the directory and grab the xml file to parse certain data in perl

I am trying to parse specific XML file which is located in sub directories of one directory. For some reason i am getting error saying file does not exists. if the file does not exist it should move on to next sub directory.
HERE IS MY CODE
use strict;
use warnings;
use Data::Dumper;
use XML::Simple;
my #xmlsearch = map { chomp; $_ } `ls`;
foreach my $directory (#xmlsearch) {
print "$directory \n";
chdir($directory) or die "Couldn't change to [$directory]: $!";
my #findResults = `find -name education.xml`;
foreach my $educationresults (#findResults){
print $educationresults;
my $parser = new XML::Simple;
my $data = $parser->XMLin($educationresults);
print Dumper($data);
chdir('..');
}
}
ERROR
music/gitar/education.xml
File does not exist: ./music/gitar/education.xml
Using chdir the way you did makes the code IMO less readable. You can use File::Find for that:
use autodie;
use File::Find;
use XML::Simple;
use Data::Dumper;
sub findxml {
my #found;
opendir(DIR, '.');
my #where = grep { -d && m#^[^.]+$# } readdir(DIR);
closedir(DIR);
File::Find::find({wanted => sub {
push #found, $File::Find::name if m#^education\.xml$#s && -f _;
} }, #where);
return #found;
}
foreach my $xml (findxml()){
say $xml;
print Dumper XMLin($xml);
}
Whenever you find yourself relying on backticks to execute shell commands, you should consider whether there is a proper perl way to do it. In this case, there is.
ls can be replaced with <*>, which is a simple glob. The line:
my #array = map { chomp; $_ } `ls`;
Is just a roundabout way of saying
chomp(my #array = `ls`); # chomp takes list arguments as well
But of course the proper way is
my #array = <*>; # no chomp required
Now, the simple solution to all of this is simply to do
for my $xml (<*/education.xml>) { # find the xml files in dir 1 level up
Which will cover one level of directories, with no recursion. For full recursion, use File::Find:
use strict;
use warnings;
use File::Find;
my #list;
find( sub { push #list, $File::Find::name if /^education\.xml$/i; }, ".");
for (#list) {
# do stuff
# #list contains full path names of education.xml files found in subdirs
# e.g. ./music/gitar/education.xml
}
You should note that changing directories is not required, and in my experience, not worth the trouble. Instead of doing:
chdir($somedir);
my $data = XMLin($somefile);
chdir("..");
Simply do:
my $data = XMLin("$somedir/$somefile");