Pass by reference within File Find function in perl - perl

I have read a few post similar to this, but they dont quite have the answer I want. Basically, I am struggling to do pass by reference with this File::find function. The original code is huge. So I take out pieces of the code.
Note that
$node,$ref_p, $ref_alphaSUM ,$ref_rank, $ref_sorted_alpha, $output_file
are all variables or references of global variables. I need to recursively traverse through the directories and apply the &eachFile subroutine to each file, which will update my variables stated above. The "&whichFiles" subroutine just sorts the filenames.
First problem: Since not all files in the directories contains the string nfcapd, i have added the line if(/^nfcapd./) to check the if the name of the file contains this string. But since I am also passing those variables above to eachFile, the "$_" cannot be used now.
Thus, I believe This is why I kept getting the following error:(please correct me if I am wrong):
Use of uninitialized value $File::Find::name in pattern match (m//) at
./pvalues.pl line 178, <PRE> line 65184 (#1)
Line 178 is the line where if(/^nfcapd./) appears in the code below.
This leads to 2nd problem: How to do pass by reference within find function and at the same time preserving a variable for the name of file such that I can still check if the "&eachFile" is being applied to the correct files?
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
find ({wanted => \&eachFile($node,$ref_p, $ref_alphaSUM ,$ref_rank, $ref_sorted_alpha, $output_file), preprocess => \&whichFiles},$flow_data_path);
sub eachFile {
my ($node , $ref_p, $ref_alphaSUM ,$ref_rank, $ref_sorted_alpha , $output_file) = #_;
if(/^nfcapd\./){
#do something
}
}

To pass variables to your wanted sub, you need a closure:
sub eachFile {
my ($node , $ref_p, $ref_alphaSUM ,$ref_rank, $ref_sorted_alpha , $output_file) = #_;
if(/^nfcapd\./){
#do something
}
}
my $wanted = sub { eachFile($node,$ref_p, $ref_alphaSUM ,$ref_rank, $ref_sorted_alpha, $output_file) };
find({wanted => $wanted, ...});

Related

Getting Variable "#xml_files" will not stay shared at ... line

I have the following Perl code:
sub merge_xml {
foreach my $repository ('repo1', 'repo2') {
my #xml_files;
sub match_xml {
my $filename = $File::Find::dir . '/' . $_;
if ($filename =~ m%/(main|test)\.xml$%) {
push(#xml_files, $filename);
}
}
find(\&match_xml, $repository);
print Dumper(\#xml_files);
}
}
And I am getting the warning:
Variable "#xml_files" will not stay shared at ./script/src/repair.pl line 87.
How does one fix this?
PS find as in File::Find
"Nested" named subs in fact aren't -- they're compiled as separate subroutines, and so having them written as "nested" can only be misleading.
Further, this creates a problem since the "inner" subroutine supposedly closes over the variable #xml_files that it uses, which gets redefined on each new call, being lexical. But the sub, built at compile-time and not being a lexical closure, only keeps the refernce to the value at first call and so it works right only upon the first call to the outer sub (merge_xml here).
We do get the warning though. With use diagnostics; (or see it in perldiag)
Variable "$x" will not stay shared at -e line 1 (#1)
(W closure) An inner (nested) named subroutine is referencing a
lexical variable defined in an outer named subroutine.
When the inner subroutine is called, it will see the value of
the outer subroutine's variable as it was before and during the first
call to the outer subroutine; in this case, after the first call to the
outer subroutine is complete, the inner and outer subroutines will no
longer share a common value for the variable. In other words, the
variable will no longer be shared.
This problem can usually be solved by making the inner subroutine
anonymous, using the sub {} syntax. When inner anonymous subs that
reference variables in outer subroutines are created, they
are automatically rebound to the current values of such variables.
So pull out that "inner" sub (match_xml) and use it normally from the "outer" one (merge_xml). In general you'd pass the reference to the array (#xml_files) to it; or, since in this case one cannot pass to File::Find's find, can have the array in such scope so to be seen as needed.
Or, since the purpose of match_xml is to be find's "wanted" function, can use an anonymous sub for that purpose so there is no need for a separate named sub
find( sub { ... }, #dirs );
Or store that coderef in a variable, as shown in Ed Heal's answer
my $wanted_coderef = sub { ... };
find( $wanted_coderef, #dirs );
With help from zdim I came up with:
sub merge_xml {
foreach my $repository ('repo1', 'repo2') {
my #xml_files;
my match_xml = sub {
my $filename = $File::Find::dir . '/' . $_;
if ($filename =~ m%/(main|test)\.xml$%) {
push(#xml_files, $filename);
}
};
find($match_xml, $repository);
print Dumper(\#xml_files);
}
}
Might I suggest another alternative. By using a factory function, you can eliminate the need to hand write a find subroutine each time.
A factory is a function that generates another function (or subroutine in this case). You feed it some parameters and it creates a custom subroutine with those parameters bolted in. My example uses a closure but you could also build it with a string eval if the closure is costly for some reason.
sub match_factory {
my ($filespec, $array) = #_;
# Validate arguments
die qq{File spec must be a regex, not "$filespec"\n}
unless ref $filespec eq "Regexp";
die qq{Second argument must be an array reference, not "$array"\n}
unless ref $array eq "ARRAY";
# Generate an anonymous sub to perform the match check
# that creates a closure on $filespec and $array
my $matcher = sub {
# Automatically compare against $_
m/$filespec/ and
push(#$array, $File::Find::name);
};
return $matcher;
}
sub merge_xml {
my #repos = #_ # or qw/foo bar .../;
foreach my $repository (#repos) {
my #matched_files;
find(
match_factory( qr/\/(main|test)\.xml$/, \#matched_files ),
$repository
);
print Dumper(\#matched_files);
}
}
HTH

Setting file handle attribute in a Class::Std class in Perl

I am trying to create a Perl class LogManager that is to be used to write logs to a file. For that I have written a Class::Std class with following specifications
package LogManager;
use strict;
use warnings;
use Class::Std;
use FileHandle;
my %fName : ATTR( :name<fName> :default<"log">);
my %fLocation : ATTR( :name<fLocation> :default<"./">);
my %fHandle : ATTR( :name<fHandle> :default<undef>);
sub START {
my ($self, $ident, $args_ref) = #_;
my $logfile = $self->get_fLocation(). '/' . $self->get_fName();
$self->set_fHandle(FileHandle->new("> $logfile"));
}
sub log {
my ($self, $msg) = #_;
print $self->get_fHandle(), $msg;
}
1;
What I know of Class::Std is that one can use get and set functions (eg. get_fName(), set_fName()) to read and write values of the attributes.
So, I have defined a START() function as above.
I have used this object as follows
my $fh = LogManager->new();
LogManager::log($fh, "This is my log");
However, instead of writing to the file, following is being printed on the terminal
FileHandle=GLOB(0x5dcdd0)This is my log
I could think of a reason for this. In the START() function third line - there is something else happening than what I intend of - (might be it is copying address of the file handle object - this is a guess).
I'm new to Perl, and can't think of a different way to set the file handle. What I could think of now is creating a normal class instead of a Class::Std class and write something like
my $fHandle = FileHandle->new("> $logfile");
But, what I wanted to know - is there a way I can achieve the same using Class::Std, by some modifications to the above code ?
The syntax for print to a file handle is different: there's no comma. Also, for complex structures, you need curly braces:
print { $self->get_fHandle } $msg;

Replacing a class in Perl ("overriding"/"extending" a class with same name)?

I am trying to Iterate directories in Perl, getting introspectable objects as result, mostly so I can print fields like mtime when I'm using Dumper on the returns from IO::All.
I have discovered, that it can be done, if in the module IO::All::File (for me, /usr/local/share/perl/5.10.1/IO/All/File.pm), I add the line field mtimef => undef;, and then modify its sub file so it runs $self->mtimef($self->mtime); (note, this field cannot have the same name (mtime) as the corresponding method/property, as those are dynamically assigned in IO::All). So, in essence, I'm not interested in "overloading", as in having the same name for multiple function signatures - I'd want to "replace" or "override" a class with its extended version (not sure how this is properly called), but under the same name; so all other classes that may use it, get on to using the extended version from that point on.
The best approach for me now would be, if I could somehow "replace" the IO::All::File class, from my actual "runnable" Perl script -- if somehow possible, by using the mechanisms for inheritance, so I can just add what is "extra". To show what I mean, here is an example:
use warnings;
use strict;
use Data::Dumper;
my #targetDirsToScan = ("./");
use IO::All -utf8 ; # Turn on utf8 for all io
# try to "replace" the IO::All::File class
{ # recursive inheritance!
package IO::All::File;
use IO::All::File -base;
# hacks work if directly in /usr/local/share/perl/5.10.1/IO/All/File.pm
field mtimef => undef; # hack
sub file {
my $self = shift;
bless $self, __PACKAGE__;
$self->name(shift) if #_;
$self->mtimef($self->mtime); # hack
return $self->_init;
}
1;
}
# main script start
my $io = io(#targetDirsToScan);
my #contents = $io->all(0); # Get all contents of dir
for my $contentry ( #contents ) {
print Dumper \%{*$contentry};
}
... which fails with "Recursive inheritance detected in package 'IO::All::Filesys' at /usr/local/share/perl/5.10.1/IO/All/Base.pm line 13."; if you comment out the "recursive inheritance" section, it all works.
I'm sort of clear on why this happens with this kind of syntax - however, is there a syntax, or a way, that can be used to "replace" a class with its extended version but of the same name, similar to how I've tried it above? Obviously, I want the same name, so that I wouldn't have to change anything in IO::All (or any other files in the package). Also, I would preferably do this in the "runner" Perl script (so that I can have everything in a single script file, and I don't have to maintain multiple files) - but if the only way possible is to have a separate .pm file, I'd like to know about it as well.
So, is there a technique I could use for something like this?
Well, I honestly have no idea what is going on, but I poked around with the code above, and it seems all that is required, is to remove the -base from the use IO::All::File statement; and the code otherwise seems to work as I expect it - that is, the package does get "overriden" - if you change this snippet in the code above:
# ...
{ # no more recursive inheritance!? IO::All::File gets overriden with this?!
package IO::All::File;
use IO::All::File; # -base; # just do not use `-base` here?!
# hacks work if directly in /usr/local/share/perl/5.10.1/IO/All/File.pm
field mtimef => undef; # hack
sub file {
my $self = shift;
bless $self, __PACKAGE__;
$self->name(shift) if #_;
$self->mtimef($self->mtime); # hack
print("!! *haxx0rz'd* file() reporting in\n");
return $self->_init;
}
1;
}
# ...
I found this so unbelievable, I even added the print() there to make sure it is the "overriden" function that runs, and sure enough, it is; this is what I get in output:
...
!! *haxx0rz'd* file() reporting in
$VAR1 = {
'_utf8' => 1,
'mtimef' => 1394828707,
'constructor' => sub { "DUMMY" },
'is_open' => 0,
'io_handle' => undef,
'name' => './test.blg',
'_encoding' => 'utf8',
'package' => 'IO::All'
};
...
... and sure enough,the field is there, as expected, too...
Well - I hope someone eventually puts a more qualified answer here; for the time being, I hope this is as good as a fix to my problems :) ...

reasoning: Can't modify non-lvalue subroutine

This code does work. But my question is this: If I uncomment the two commented lines and comment out the next three lines, I would get a Can't modify non-lvalue subroutine and I would like to know why? I would save a variable and ride a ..., if I could use the commented lines.
Next question how would I make this more object oriented?
open FILE, "FBIDs" or die $!;
while (<FILE>) {
#csv = split /,/;
}
for (my $i=0;$i<$#csv;$i++) {
my $browser = LWP::UserAgent->new( );
my $url = "https://graph.facebook.com/$csv[$i]?fields=id,name\n";
my $response = $browser->get($url);
# $response->content=~s/[{}\"]//g;
# my #json = split (/[,:]/,$response->content);
my $resp=$response->content;
$resp=~s/[{}\"]//g;
my #json = split (/[,:]/,$resp);
print $json[1],", ",$json[3],"\n";
$browser->delete( );
}
close FILE;
Perl realizes you're trying to do something useless — modifying a value that's not stored anywhere — so it throws an error. Remember that $response->content is a method call (something that returns a value), not variable (storage aka lvalue).
$response->content or $response->content() is method call and you can't make substitution or change it.
On the other hand some perl functions can be treated in such way, and they are called lvalue subroutines.

dynamic call to subroutines in perl

I'm a bit messed up with the following:
I have a function that calls subroutines in the following way:
sub someFunction {
my $self = shift;
my $type = $self->{'type'};
if($type eq 'one_subroutine') {
$self->updateOneSubroutine();
}
elsif($type eq 'another_one_sub') {
$self->updateAnotherOneSub();
}
(...)
else {
die "Unsupported '$type'";
}
I have to change this to let each subroutine be coded in its own file, include all available files, and automagically call the subroutine inside.
I did this in a test file with the following code:
# Assume a routines subdir with one_subroutine.pm file with
sub updateOneSubroutine(){
$self = shift;
$self->doSomeThings();
(...) #my code
}
1;
test.pl
# Saves in routines hash_ref a pair of file_name => subRoutineName for each file in routines subdir.
# This will be used later to call subroutine.
opendir(DIR,"lib/routines") or die "routines directory not found";
for my $filename (readdir(DIR)) {
if($filename=~m/\.pm$/){
# includes file
require "lib/routines/$filename";
# get rid of file extension
$filename=~s/(.*)\.pm/$1/g;
my $subroutine = "update_${file}";
# camelizes the subroutine name
$subroutine=~s/_([a-z0-9])/\u$1/g;
$routine->{ $filename } = $subroutine;
}
}
{
no strict "refs";
$routine->{$param}();
}
where param is something like "one_subroutine", that matches with a filename available.
Since each subroutine receives $self in the call, I should call the routine by $self->something();
I've tried $self->$routine->{$param}() , $self->${routine->${param}}() and many other things without success. I've checked chapter 9 "dynamic subroutines" of mastering perl, and a similar question to perl monks, but I can't still figure out how to reference the subroutine in a way that represents $self->updateAnotherOneSub() , or something similar that lets $self be read as a param in those subroutines.
Thanks in advance, Keber.
This seems a bit like an X/Y problem. What exactly are you trying to do? If it is to reduce loading time, then modules like AutoSplit/AutoLoader might be of interest to you.
If it is to create some sort of data structure of subroutines, you should be installing anonymous subs into a hash, rather than giving them all names.
Given a subroutine reference:
my $code = sub {...};
you would call it as:
$self->$code(...);
If instead you have a subroutine name, you can lookup the coderef:
my $code = 'Package::With::The::Subroutines'->can('method_name');
and if that succeeds (check it), then you can use $self->$code(...) to call it.
Given this code:
{
no strict "refs";
$routine->{$param}();
}
You would pass $self to the routine with:
{
no strict "refs";
$routine->{$param}($self);
}
Or you could approach it the way I did above with can:
'package'->can($routine->{$param})->($self)
if you don't want to turn off strict 'refs'
Try to extract the method name first, then it should work. I did a small test script that may do something like you want to, so:
my $method = $routine->{$param};
$self->$method->();
You can and of course should check, if the desired method exists like Eric said:
if ($self->can($method)) {
$self->$method->();
}
The important part here is, that you extract the method name so you have it in a single variable; otherwise perl won't figure that out for you - and as far as I know there is no way of setting parens or braces to do so.