How do I insert new fields into $self in Perl, from a File::Find callback - perl

In a Perl object, I'm trying to add a new field into $self from within a File::Find wanted() sub.
use File::Find;
sub _searchForXMLDocument {
my ($self) = #_;
if($_ =~ /[.]+\.xml/) {
$self->{_xmlDocumentPath} = $_;
}
}
sub runIt{
my ($self) = #_;
find (\&_searchForXMLDocument, $self->{_path});
print $self->{_xmlDocumentPath};
}
_searchForXMLDocument() searches for an XML Document within $self->{_path} and is supposed to append that XML path to $self->{_xmlDocumentPath} but when I try to print it, it remains uninitialized. How do I add the field in $self?
Use of uninitialized value in print at /home/scott/workspace/CCGet/XMLProcessor.pm line 51.

You aren't calling _searchForXMLDocument() in an OO manner, so your $self object isn't being passed to it. This should do the trick now. Use a closure for your method and you have access to $self;
sub runIt{
my ($self) = #_;
my $closure = sub {
if($_ !~ m/[.]+\.xml/) {
$self->{_xmlDocumentPath} = $_;
}
};
find(\&$closure, $self->{_path});
print $self->{_xmlDocumentPath};
}

The first argument to find() needs to carry two pieces of information: the test condition, and the object you're working with. The way to do this is with a closure. The sub { ... } creates a code ref, like you get from \&_searchForXMLDocument, but the closure has access to lexical variables in the enclosing scope, so the current object ($self) is associated with the closure.
sub _searchForXMLDocument {
my ($self) = #_;
if($_ =~ /[.]+\.xml/) {
$self->{_xmlDocumentPath} = $_;
}
}
sub runIt{
my ($self) = #_;
find (sub { $self->_searchForXMLDocument (#_) }, $self->{_path});
print $self->{_xmlDocumentPath};
}

I think you're looking for something like this:
package XMLDocThing;
use strict;
use warnings;
use English qw<$EVAL_ERROR>;
use File::Find qw<find>;
...
use constant MY_BREAK = do { \my $v = 133; };
sub find_XML_document {
my $self = shift;
eval {
find( sub {
return unless m/[.]+\.xml/;
$self->{_xmlDocumentPath} = $_;
die MY_BREAK;
}
, $self->{_path}
);
};
if ( my $error = $EVAL_ERROR ) {
die Carp::longmess( $EVAL_ERROR ) unless $error == MY_BREAK;
}
}
...
# meanwhile, in some other package...
$xmldocthing->find_XML_document;
You pass a closure to find and it can access $self from the containing scope. File::Find::find has no capacity to pass in baggage like objects.

Related

Link a variable to a class attribute in Perl

This question was born out of another (Completely destroy all traces of an object in Perl). After seeing some of the comments I believe I have narrowed the problem down to the "real" issue.
I'm looking for a simple way to link a variable to a class attribute in Perl so that whenever the attribute is modified, the variable will be automatically updated.
ex (some pseudo code):
# Create a file object
my $file = File->new();
# Get the text
my $text = $file->text();
# prints 'hello'
print $text;
# Set the text
$file->text('goodbye');
# prints 'goodbye'
print $text;
Also I want the $text variable to be read only so that you cannot inadvertently modify the text attribute of the file.
Use tie:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
{ package File;
sub new {
bless ['hello'], shift
}
sub text {
my $self = shift;
if (#_) {
$self->[0] = shift;
} else {
return $self->[0]
}
}
}
{ package FileVar;
use Tie::Scalar;
use parent qw( -norequire Tie::StdScalar );
sub TIESCALAR {
my ($class, $obj) = #_;
bless \$obj, $class
}
sub FETCH {
my $self = shift;
${$self}->text()
}
sub STORE {
die 'Read only!';
# Or, do you want to change the object by changing the var, too?
my ($self, $value) = #_;
${$self}->text($value);
}
}
my $file = 'File'->new();
tie my $text, 'FileVar', $file;
say $text;
$file->text('goodbye');
say $text;
# Die or change the object:
$text = 'Magic!';
say $file->text;

Overload object operation in Perl

I want to use overloaded operators in a method which modifies an object. I also want to achieve it without duplicating the code.
To illustrate the problem, I will show a simplified version of what I am trying to do. In my original code, the add method overloads + and complicated_calculation method tries to update the object.
The add method creates a new Number object to avoid an expression like $n + 1 modifying the object.
package Number;
use overload
'0+' => 'get_value',
'+' => 'add';
sub new {
my ($class, $value) = #_;
my $self->{value} = $value;
return bless($self, $class);
}
sub get_value {
my ($self) = #_;
return $self->{value};
}
sub set_value {
my ($self, $value) = #_;
$self->{value} = $value;
}
# Actual class has more attributes and the logic of addition includes branches.
sub add {
my ($self, $other) = #_;
print "add $other\n";
return Number->new($self->get_value + $other);
}
sub complicated_calculation {
my ($self) = #_;
# Do something complicated.
$self += 10;
}
package main;
my $n = Number->new(1);
print $n + 1 . "\n";
$n++;
print $n . "\n";
$n->complicated_calculation;
print $n . "\n";
Will output
add 1
2
add 1
2
add 10
2
I want the result of complicated_calculation method (12) to be printed, but 2 is printed instead. The result of the complicated_calculation method is set to an object created by the add method, instead of to the object which called it.
I can make the complicated_calculation method update the object using an add_in_place method to add a number in-place, but this requires duplicated code in add and add_in_place which I was taught to avoid.
In the actual application the Number class will have many more attributes, and the code for addition will be much longer.
package Number;
use overload
'0+' => 'get_value',
'+' => 'add',
'+=' => 'add_in_place',
'fallback' => 1;
sub new {
my ($class, $value) = #_;
my $self->{value} = $value;
return bless($self, $class);
}
sub get_value {
my ($self) = #_;
return $self->{value};
}
sub set_value {
my ($self, $value) = #_;
$self->{value} = $value;
}
# Actual class has more attributes and the logic of addition includes branches.
sub add {
my ($self, $other) = #_;
print "add $other\n";
return Number->new($self->get_value + $other);
}
sub add_in_place {
my ($self, $other) = #_;
print "add_in_place $other\n";
$self->set_value($self->get_value + $other);
}
sub complicated_calculation {
my ($self) = #_;
# Do something complicated.
$self += 10;
}
package main;
my $n = Number->new(1);
print $n + 1 . "\n";
$n++;
print $n . "\n";
$n->complicated_calculation;
print $n . "\n";
Will output
add 1
2
add_in_place 1
2
add_in_place 10
12
I feel that there should be a better way and would like to have some advice from you guys.
First of all, you must always use strict and use warnings at the top of every Perl program file you write. This applies especially when you are asking for help with your code, as it is the first line of defence against bugs and really should be your first resort before troubling others.
This is happening because the add method is called to implement the += operator, which returns a new Number object as a result. That results in the value of $self within complicated_calculation being changed to refer to the new Number object that, correctly, has a value of 12. But the original value -- $n in the main code -- still points to an object with the value of 2.
To get it to work, you could arrange that complicated_calculation returns the new object, and the calling code assigns it to $n. Just changing that statement to
$n = $n->complicated_calculation
will get it working.
However, it is a little strange to write stuff like that as a method. The code in the Number class should be focused on making the object behave correctly, so all the methods should be operators. If you were writing complicated_calculation as a subroutine in the main package then you would be fine with
$n += 10;
print $n;
as the copying of $n would then work correctly and transparently. It is only when you are writing a method that reassigning $self makes no sense, because it then no longer refers to the object the calling code is using.
If you really consider complicated_calculation to be an operator, then it should mutate the object in-place rather than relying on overload to provide the mechanism. If you changed it to
sub complicated_calculation {
my ($self) = #_;
$self->{value} += 10;
}
then everything would work as it should.
Update
I strongly believe that you should write everything in terms of add_in_place, which should be a private method for use only internally by the class.
Both add and complicated_calculation can be very simply rewritten, and there is no longer any need to write $n = $n->complicated_calculation as the method modifies the object in-place.
This example code for the module demonstrates.
package Number;
use strict;
use warnings;
use 5.010;
use overload
'0+' => 'get_value',
'+' => 'add';
sub new {
my ($class, $value) = #_;
bless { value => $value };
}
sub get_value {
my ($self) = #_;
$self->{value};
}
sub set_value {
my ($self, $value) = #_;
$self->{value} = $value;
}
sub add {
my ($self, $other) = #_;
print "add $other\n";
Number->new($self->get_value)->add_in_place($other);
}
sub add_in_place {
my ($self, $other) = #_;
print "add_in_place $other\n";
$self->{value} += $other;
$self;
}
sub complicated_calculation {
my ($self) = #_;
$self->add_in_place(10);
}

How can I do function partial application in Perl?

Is there any way to achieve partial application in Perl?
Suppose, I want to do something like:
sub each_file($arr, $op) {
$op->($_) for #{$arr};
...
}
sub each_line($op, $file) {
...
}
each_file($arr, each_line($op));
I want to partially apply each_line() to only $op, so it'll become a new function can be passed to $each_file, how do I express this in idiomatic Perl?
You can do this in Perl with two approaches combined:
A function which returns a function reference
Closures
Example:
sub each_file {
my ($arr, $line_fn) = #_;
$line_fn->($_) for #{$arr};
...
}
sub each_line {
my ($op, $file) = #_;
...
}
sub make_line_processor {
my ( $op ) = #_;
# This is closed over $op, which effectively becomes
# a constant for the returned function
my $fn = sub {
return each_line( $op, #_ );
};
return $fn;
}
# To call it:
each_file( $arr, make_line_processor($op) );
This can be an even more useful technique in cases where you don't want $op directly, but some expensive-to-fetch derivation of it. In which case you would calculate the derived value just once (in the make_line_processor function) and close over that instead.
# given some $op as implied by your code snippet
each_file($arr, sub { each_line($op, shift) });
# shift op will be applied when anonymous sub { … } is called
(Your code snippet doesn't make it entirely clear what you intend $op to be when you make the call to each_line. It's usually better to present small working programs.)
You can roll this functionality up into a class. Then you can overload the subroutine dereference operator to make it look like your class is really a code reference.
package Partial;
use overload '&{}' => \&call;
sub new {
my $class = shift;
my $code = shift;
bless {code => $code, args => \#_}, $class;
}
sub call {
my ($self) = #_;
return sub{ $self->{code}->(#{$self->{args}}, #_) }
}
You can then use it like this:
sub printArgs {
print join ", ", #_;
print "\n";
}
my $partial = Partial->new(\&printArgs, 'foo', 'bar');
$partial->('baz', 'bat');
# prints foo, bar, baz, bat

How can I pass a module's function as a reference to another module in Perl?

How can I pass a reference to a module's function as parameter in a function call of another module?
I tried the following (simple example):
This is the module that has a function (process_staff) that takes as a parameter a function reference (is_ok).
#!/usr/bin/perl
use strict;
use warnings;
package Objs::Processing;
sub new {
my ($class) = #_;
bless {} ;
}
sub process_staff {
my ($employee, $func) = #_;
if($func->is_ok($employee)) {
print "Is ok to process\n";
}
else {
print "Not ok to process\n";
}
}
1;
This is the module that implements the passed function (is_ok)
#!usr/bin/perl
use strict;
use warnings;
package Objs::Employee;
my $started;
sub new {
my ($class) = #_;
my $cur_time = localtime;
my $self = {
started => $cur_time,
};
print "Time: $cur_time \n";
bless $self;
}
sub get_started {
my ($class) = #_;
return $class->{started};
}
sub set_started {
my ($class, $value) = #_;
$class->{started} = $value;
}
sub is_ok {
my ($emp) = #_;
print "In is ok I received:\n";
use Data::Dumper;
print Dumper($emp);
return 1;
}
This is my test script that I run:
#!/usr/bin/perl
use strict;
use warnings;
use Objs::Manager;
use Objs::Processing;
my $emp = Objs::Manager->new('John Smith');
use Data::Dumper;
print Dumper($emp);
my $processor = Objs::Processing->new();
$processor->process_staff(\&$emp->is_ok); #error is here
I get a:
Not a CODE reference at testScript.pl line 14.
I also tried: $processor->process_staff(\&$emp->is_ok()); but also still does not work.
What am I doing wrong here
You appear to want to pass an object and a method to call on it; the easiest way to do that would be:
$processor->process_staff( sub { $emp->is_ok } );
where process_staff looks like:
sub process_staff {
my ($self, $func) = #_;
if ( $func->() ) {
...
or you can pass the reference and the object separately:
sub process_staff {
my ($self, $emp, $method) = #_;
if ( $emp->$method() ) {
...
$processor->process_staff( $emp, $emp->can('is_ok') );
I think this could work with:
$processor->process_staff(\&Objs::Employee::is_ok);
where you pass in the method ref.
and where you currently have
if( $func->is_ok($employee) ) {
you need
if( $func->( $employee ) ) {
This is because you cannot reference named methods simply from an object, by the syntax \&$obj->method.
However, in your example code it is not at all clear why you don't do this instead:
if( $employee->is_ok() ) {
in which case you would not need to reference the method to call in process_staff at all. There are also other ways to achieve the same method indirection that might give you better encapsulation in future.
In this expression:
$processor->process_staff(\&$emp->is_ok);
You are saying "call the method $emp->is_ok, take the return value, treat it as a CODE reference, dereference it, and return a reference to that. That doesn't work, since the return value from that sub is not a CODE reference.
To do what you want, you can use a reference to an anonymous sub to wrap the call to your object method:
$processor->process_staff( sub { $emp->is_ok } );
You can pass anonymous coderef which returns result from desired method,
$processor->process_staff(sub{ $emp->is_ok(#_) });
#_ can be dropped as is_ok method doesn't take any arguments.
It's not specifically what you asked for, but I think you simply need the following:
sub process_staff {
my ($self, $emp) = #_;
if ($emp->is_ok()) {
print "Is ok to process\n";
}
else {
print "Not ok to process\n";
}
}
$processor->process_staff($emp);

Deferring code on scope change in Perl

I often find it useful to be able to schedule code to be executed upon leaving the current scope. In my previous life in TCL, a friend created a function we called defer.
It enabled code like:
set fp [open "x"]
defer("close $fp");
which was invoked when the current scope exited. The main benefit is that it's always invoked no matter how/where I leave scope.
So I implemented something similar in Perl but it seems there'd be an easier way. Comments critiques welcome.
The way I did it in Perl:
create a global, tied variable which holds an array of subs to be executed.
whenever I want to schedule a fn to be invoked on exit, I use local to change the array.
when I leave the current scope, Perl changes the global to the previous value
because the global is tied, I know when this value change happens and can invoke the subs in the list.
The actual code is below.
Is there a better way to do this? Seems this would be a commonly needed capability.
use strict;
package tiescalar;
sub TIESCALAR {
my $class = shift;
my $self = {};
bless $self, $class;
return $self;
}
sub FETCH {
my $self = shift;
return $self->{VAL};
}
sub STORE {
my $self = shift;
my $value = shift;
if (defined($self->{VAL}) && defined($value)) {
foreach my $s (#{$self->{VAL}}) { &$s; }
}
$self->{VAL} = $value;
}
1;
package main;
our $h;
tie($h, 'tiescalar');
$h = [];
printf "1\n";
printf "2\n";
sub main {
printf "3\n";
local $h = [sub{printf "9\n"}];
push(#$h, sub {printf "10\n";});
printf "4\n";
{
local $h = [sub {printf "8\n"; }];
mysub();
printf "7\n";
return;
}
}
sub mysub {
local $h = [sub {printf "6\n"; }];
print "5\n";
}
main();
printf "11\n";
Well, your specific case is already handled if you use lexical filehandles (as opposed to the old style bareword filehandles). For other cases, you could always use the DESTROY method of an object guaranteed to go to zero references when it goes out of scope:
#!/usr/bin/perl
use strict;
use warnings;
for my $i (1 .. 5) {
my $defer = Defer::Sub->new(sub { print "end\n" });
print "start\n$i\n";
}
package Defer::Sub;
use Carp;
sub new {
my $class = shift;
croak "$class requires a function to call\n" unless #_;
my $self = {
func => shift,
};
return bless $self, $class;
}
sub DESTROY {
my $self = shift;
$self->{func}();
}
ETA: I like brian's name better, Scope::OnExit is a much more descriptive name.
Instead of using tie for this, I think I'd just create an object. You can also avoid the local that way too.
{
my $defer = Scope::OnExit->new( #subs );
$defer->push( $other_sub ); # and pop, shift, etc
...
}
When the variable goes out of scope, you have a chance to do things in the DESTROY method.
Also, in the example you posted, you need to check that the values you store are code references, and it's probably a good idea to check that the VAL value is an array reference:
sub TIESCALAR { bless { VAL => [] }, $_[0] }
sub STORE {
my( $self, $value ) = #_;
carp "Can only store array references!" unless ref $value eq ref [];
foreach { #$value } {
carp "There should only be code refs in the array"
unless ref $_ eq ref sub {}
}
foreach ( #{ $self->{VAL}} ) { $_->() }
$self->{VAL} = $value;
}
You may want to try out B::Hooks::EndOfScope
I Believe this works:
use B::Hooks::EndOfScope;
sub foo {
on_scope_end {
$codehere;
};
$morecode
return 1; # scope end code executes.
}
foo();
I think you want something like Scope::Guard, but it can't be pushed. Hmmm.
Thanks.
Trivially,
sub OnLeavingScope::DESTROY { ${$_[0]}->() }
used like:
{
...
my $onleavingscope = bless \sub { ... }, 'OnLeavingScope';
my $onleavingscope2 = bless \\&whatever, 'OnLeavingScope';
...
}
(The extra level of having a reference to a reference to a sub is necessary only to work around an optimization (that's arguably a bug) when using a non-closure anonymous sub.)