Anything wrong with nested Perl subs, that are only called locally? - perl

If I have the following code
sub a {
my $id = shift;
# does something
print &a_section($texta);
print &a_section($textb);
sub a_section {
my $text = shift;
# combines the $id and the $text to create and return some result.
}
}
Assuming a_section is called only by a, will I run into a memory leak, variable dependability, or other problem?
I am exploring this as an alternative so I can avoid the necessity of passing $id to a_section.

First, it's not a private sub. It's fully visible from the outside. Two, you will have problems.
$ perl -wE'
sub outer {
my ($x) = #_;
sub inner { say $x; }
inner();
}
outer(123);
outer(456);
'
Variable "$x" will not stay shared at -e line 4.
123
123 <--- XXX Not 456!!!!
You could do:
sub a {
my $id = shift;
local *a_section = sub {
my $text = shift;
# combines the $id and the $text to create and return some result.
};
print a_section($texta);
print a_section($textb);
}
(You can call the inner sub recursively using a_section(...).)
or:
sub a {
my $id = shift;
my $a_section = sub {
my $text = shift;
# combines the $id and the $text to create and return some result.
};
print $a_section->($texta);
print $a_section->($textb);
}
(Use __SUB__->(...) if you want to call the inner sub recursively to avoid memory leak, available in Perl 5.16+.)

Related

Possible to call STORE on deeper level tied hash assignment?

I'm trying to write a Perl module for a “persistent YAML hash”, with the following properties:
With every access, check if the YAML file has changed, and if so, reload.
As soon as any data in the hash is changed, save.
Don't save on UNTIE, so that the file isn't updated when you only read values.
My first attempt seemed to work pretty well:
package YAMLHash;
use v5.24;
use warnings;
use experimental 'signatures';
use YAML::XS qw(DumpFile LoadFile);
use File::stat;
sub refresh($self)
{
if (-f $self->{file}) {
if (stat($self->{file})->mtime > $self->{mtime}) {
$self->{data} = LoadFile($self->{file});
$self->{mtime} = stat($self->{file})->mtime;
}
}
}
sub save($self)
{
DumpFile($self->{file}, $self->{data});
$self->{mtime} = stat($self->{file})->mtime;
}
sub TIEHASH($class, #args)
{
my ($filename) = $args[0];
die "No filename specified" unless $filename;
my $self = bless { data=>{}, file=>$filename, mtime=>0 }, $class;
refresh($self);
return $self;
}
sub FETCH($self, $key = '')
{
refresh($self);
return $self->{data}{$key};
}
sub EXISTS($self, $key)
{
refresh($self);
return exists($self->{data}{$key});
}
sub FIRSTKEY($self)
{
refresh($self);
my #ignore = keys %{$self->{data}}; # reset iterator
return each %{$self->{data}};
}
sub NEXTKEY($self, $lastkey)
{
refresh($self);
return each %{$self->{data}};
}
sub SCALAR($self)
{
return scalar %{$self->{data}};
}
sub STORE($self, $key, $value)
{
refresh($self);
$self->{data}{$key} = $value;
save($self);
}
sub DELETE($self, $key)
{
refresh($self);
delete $self->{data}{$key};
save($self);
}
sub CLEAR($self, $key)
{
$self->{data} = {};
save($self);
}
1;
I tried this as follows:
use YAMLHash;
tie my %foo, 'YAMLHash', 'test.yaml';
$foo{hello} = 'world';
$foo{answer} = 42;
$foo{counter}++;
and the resulting YAML file looks like this:
---
answer: 42
counter: 1
hello: world
But then I changed my example code to:
use YAMLHash;
tie my %foo, 'YAMLHash', 'test.yaml';
$foo{hello} = 'world';
$foo{answer} = 42;
$foo{counter}++;
$foo{a}{b}{c}{d} = 'e';
and the result is:
---
a: {}
answer: 42
counter: 2
hello: world
So, obviously, STORE is called when $foo{a} is created, but not when $foo{a}{b}{c}{d} is assigned.
Is there any way to make this do what I want?
You will need to tie %{ $foo{a} }, %{ $foo{a}{b} } and %{ $foo{a}{b}{c} } as well.
You could recursively tie the hashes and arrays in the data structure in TIEHASH. Don't forget to the do the same thing to data added to the structure via STORE!
You might want to use a different class for the root of the data structure and non-root nodes.
Warning: Using tie will make accesses slower.
Note that you need to tie the scalars too, not just the hashes (and arrays). All of the following change the value of a hash element without calling STORE:
Changing the scalar directly:
++$foo{a};
chomp($foo{a});
$foo{a} =~ s/x/y/g;
...
Changing a scalar via an alias or a reference:
my \$x = \$foo{a}; $x = 123;
my $r = \$foo{a}; $$r = 123;
for ($foo{a}) { $_ = 123; }
sub { $_[0] = 123; }->($foo{a});
...

Perl: Subroutines first argument is not the class

I'm relatively new to Perl so bear with me.
My sub getGenes calls getFeaturesByGeneName in the class SequenceModule. The first loop runs fine however in the second loop it tries to invoke get_SeqFeatures (a BioPerl sub) on the string $name meaning that it skips my $self = shift.
What am I missing?
sub getGenes
{
my #names = shift;
my $genome = shift;
my #cds;
foreach my $name (#names)
{
my $feat = SequenceModule -> getFeatureByGeneName($genome, $name);
push (#cds, $feat);
}
return #cds;
}
...
sub getFeatureByGeneName
{
my $self = shift;
my $seq = shift;
my $name = shift;
my #cds = $seq -> get_SeqFeatures("CDS");
...
}
Speculation: you called getGenes with several names:
getGenes(('name1', 'name2'), $genome);
List don't nest in Perl, so the arguments are flattened:
getGenes('name1', 'name2', $genome);
shift can't return more than one element. Therefore,
my #names = shift;
is equivalent to
my #names;
$names[0] = shift;
The second name is still in #_, so it goes to $genome:
my $genome = shift;
If you need to pass a list to a sub, make it the last argument, or send a reference:
sub getGenes {
my $genome = shift;
my #names = #_;
}
getGenes($genome, 'name1', 'name2');
# OR
sub getGenes {
my $names = shift;
my $genome = shift;
for my $name (#$names) { # dereference
...
}
}
getGenes(['name1', 'name2'], $genome);

glob (star) operator, method disappearing, Perl

I have a module which is giving me the error "Can't locate object method "isSubset" via package "a" (perhaps you forgot to load "a"?) at /path/to/set.pm line 121.
SET.PM:
package set; #we will create set objects, instead of treating arrays as sets
sub new{
my $packagename = shift;
my #elements = #_;
bless { 'elements' => \#elements } => $packagename;
}
sub contains{
my $set = shift;
my ($element) = #_;
foreach ($set->elements){ if( $_ eq $element ){ return 1 } }
return 0
}
sub isElement{
my ($element,$set) = #_;
return $set->contains($element)
}
sub isSubset{
my $setA = shift;
my $setB = shift;
foreach ($setA->elements){ unless( isElement($_,$setB) ){ return 0 } }
return 1
}
*subset = *isContainedIn = *isContained = \&isSubset;
sub isSuperset{
my $setA = shift;
my $setB = shift;
return $setB->isSubset($setA) # this is line 121
}
*superset = *isContaining = *contains = \&isSuperset; # when i get rid of THIS line, it works fine.
When I comment out the last line, it works fine. Can you enlighten me on what is causing the failure? Am I using glob incorrectly?
CALLING PROGRAM:
my $a = set->new('a'..'g');
my $b = set->new('b'..'f');
print $a->isSubset($b);
Turn on warnings. Perl will tell you:
Subroutine set::contains redefined at ./1.pl line 44.
You use contains for testing both an element and a set.

How to call a subroutine with a variable pre-assigned to some value?

In Perl, when one uses the sort function with a custom comparison, the variables $a and $b are already assigned to the current pair of elements to compare, e.g. in:
#decreasing = sort { $b <=> $a } #list;
How can I write other subroutines with a similar functionality? For example, imagine that I want to write sort of process_and_store function that does something special with each item of a list and then stores it in a database; and where the variable $item is already assigned to the current item being processed. I would like to write for example something like:
process_and_store { do_something_with($item); } #list;
Rather than
process_and_store { my $item = shift; do_something_with($item); } #list;
How should I go about doing this?
UPDATE: For completeness, although flesk's answer works without problems, in order to “properly” localize the changes I make to the $item variable I had to follow the advice from Axeman. In SomePackage.pm I placed something like:
package SomePackage;
use strict;
require Exporter;
our #ISA = qw/Exporter/;
our #EXPORT = qw(process_and_store);
our $item;
sub import {
my $import_caller = caller();
{ no strict 'refs';
*{ $import_caller . '::item' } = \*item;
}
# Now, cue Exporter!
goto &{ Exporter->can( 'import' ) };
}
sub process_and_store (&#) {
my $code = shift;
for my $x (#_) {
local *item = \$x;
$code->();
print "stored: $item\n"
}
}
1;
Then I call this from main.pl with something like:
#!/usr/bin/perl -w
use strict;
use SomePackage;
process_and_store { print "seen: $item\n"; } (1, 2, 3);
And get the expected result:
seen: 1
stored: 1
seen: 2
stored: 2
seen: 3
stored: 3
In my "associative array" processing library, I do something similar. The user can export the variables $k and $v (key-value) so that they can do things like this:
each_pair { "$k => $v" } some_source_list()
Here's how I do it:
I declare our ( $k, $v ) in the implementing package.
In import I allow packages to export those symbols and alias them in the
receiving package: *{$import_caller.'::'.$name} = \*{ $name };
In the pair processors, I do the following:
local *::_ = \$_[0];
local *k = \$_[0];
local *v = \$_[1];
#res = $block->( $_[0], $_[1] );
Thus $k and $v are aliases of what's in the queue. If this doesn't have to be the case, then you might be happy enough with something like the following:
local ( $k, $v ) = splice( #_, 0, 2 );
local $_ = $k;
But modifiable copies also allow me to do things like:
each_pair { substr( $k, 0, 1 ) eq '-' and $v++ } %some_hash;
UPDATE:
It seems that you're neglecting step #2. You have to make sure that the symbol in the client package maps to your symbol. It can be as simple as:
our $item;
sub import {
my $import_caller = caller();
{ no strict 'refs';
*{ $import_caller . '::item' } = \*item;
}
# Now, cue Exporter!
goto &{ Exporter->can( 'import' ) };
}
Then when you localize your own symbol, the aliased symbol in the client package is localized as well.
The main way that I can see that it would work without the local, is if you were calling it from the same package. Otherwise, $SomePackage::item and $ClientPackage::item are two distinct things.
I think it's a bit of a hack, but you could do something like this:
#!/usr/bin/perl
use strict;
use warnings;
my $item;
sub process_and_store(&#) {
my $code = shift;
for (#_) {
$item = $_;
&$code();
}
undef $item;
}
The thing is, $item has to be a global scalar for this to work, so process_and_store has to update that scalar while looping over the list. You should also undef $item at the end of the sub routine to limit any potential side-effects. If I were to write something like this, I'd tuck it away in a module and make it possible to define the iterator variable, so as to limit name conflicts.
Test:
my #list = qw(apples pears bananas);
process_and_store { do_something_with($item) } #list;
sub do_something_with {
my $fruit = shift;
print "$fruit\n";
}
Output:
apples
pears
bananas
The $a and $b variables are special in Perl; they're real global variables and hence exempt from use strict, and also used specifically by the sort() function.
Most other similar uses in Perl would use the $_ global for this sort of thing:
process_and_store { do_something_with( $_ ) } #list;
Which is already handled by the normal $_ rules. Don't forget to localise $_:
sub process_and_store(&#)
{
my $code = shift;
foreach my $item (#_) {
local $_ = $item;
$code->();
}
}

How do I insert new fields into $self in Perl, from a File::Find callback

In a Perl object, I'm trying to add a new field into $self from within a File::Find wanted() sub.
use File::Find;
sub _searchForXMLDocument {
my ($self) = #_;
if($_ =~ /[.]+\.xml/) {
$self->{_xmlDocumentPath} = $_;
}
}
sub runIt{
my ($self) = #_;
find (\&_searchForXMLDocument, $self->{_path});
print $self->{_xmlDocumentPath};
}
_searchForXMLDocument() searches for an XML Document within $self->{_path} and is supposed to append that XML path to $self->{_xmlDocumentPath} but when I try to print it, it remains uninitialized. How do I add the field in $self?
Use of uninitialized value in print at /home/scott/workspace/CCGet/XMLProcessor.pm line 51.
You aren't calling _searchForXMLDocument() in an OO manner, so your $self object isn't being passed to it. This should do the trick now. Use a closure for your method and you have access to $self;
sub runIt{
my ($self) = #_;
my $closure = sub {
if($_ !~ m/[.]+\.xml/) {
$self->{_xmlDocumentPath} = $_;
}
};
find(\&$closure, $self->{_path});
print $self->{_xmlDocumentPath};
}
The first argument to find() needs to carry two pieces of information: the test condition, and the object you're working with. The way to do this is with a closure. The sub { ... } creates a code ref, like you get from \&_searchForXMLDocument, but the closure has access to lexical variables in the enclosing scope, so the current object ($self) is associated with the closure.
sub _searchForXMLDocument {
my ($self) = #_;
if($_ =~ /[.]+\.xml/) {
$self->{_xmlDocumentPath} = $_;
}
}
sub runIt{
my ($self) = #_;
find (sub { $self->_searchForXMLDocument (#_) }, $self->{_path});
print $self->{_xmlDocumentPath};
}
I think you're looking for something like this:
package XMLDocThing;
use strict;
use warnings;
use English qw<$EVAL_ERROR>;
use File::Find qw<find>;
...
use constant MY_BREAK = do { \my $v = 133; };
sub find_XML_document {
my $self = shift;
eval {
find( sub {
return unless m/[.]+\.xml/;
$self->{_xmlDocumentPath} = $_;
die MY_BREAK;
}
, $self->{_path}
);
};
if ( my $error = $EVAL_ERROR ) {
die Carp::longmess( $EVAL_ERROR ) unless $error == MY_BREAK;
}
}
...
# meanwhile, in some other package...
$xmldocthing->find_XML_document;
You pass a closure to find and it can access $self from the containing scope. File::Find::find has no capacity to pass in baggage like objects.