Preserve local context across nested subroutines - perl

Let's consider the wanted code below. I have recursive calls to process and for each recursion I use a local %context. In this way I can get my context back when I return from a call.
sub process {
my %context; # Local context
process() if rerun();
job1();
job2();
sub job1() {print $context{foo}}
sub job2() {print $context{bar}}
}
Unfortunately perl does not manage nested subroutines as I expected. By moving my subroutines outside from the process subroutine I will get a problem because I won't be able to access %context anymore. So I need to make it global and use a stack as follow:
my %context; # Local context
my #context_stack;
sub process {
push #context_stack, %context;
%context = undef;
process() if rerun();
job1();
job2();
%context = pop #context_stack;
}
sub job1() {print $context{foo}}
sub job2() {print $context{bar}}
The third solution is to pass the context to all subroutines which can be annoying for very small subroutines. Also %context become global to all my program. So I loose the privacy of this variable.
my %context; # Local context
my #context_stack;
sub process {
push #context_stack, %context;
%context = undef;
process() if rerun(\%context);
job1(\%context);
job2(\%context);
%context = pop #context_stack;
}
sub job1() {$context = shift; print $context->{foo}}
sub job2() {$context = shift; print $context->{bar}}
What would be the best approach?
EDIT
For a better understanding of my specific, I provide another example:
process(#ARGV);
exit 0;
sub process {
my $infile = shift;
my $outfile = shift;
open my $fp_in, '<', $infile;
open my $fp_out, '>', $outfile;
LINE: while(<$fp_in>) {
remove_c_comment();
say STDERR "File is $infile";
process($1, "$1.processed") if /#include "(.*?)";
warning("Huh raisin, no!") if /\braisin/;
say STDERR "Fill is still $infile";
print $fp_out $_;
}
sub remove_c_comment { s|//.*$|| }
sub warning { say "[Warning] $infile:$. ".shift() }
}

The thing you're looking for - but you may not know it - is called a closure. (see also: perlref)
{
my %context;
sub job1 { print $context{foo} };
sub job2 { print $context{bar} };
sub init_context{ $context{foo} = 1 };
}
Context remains private within this block, but accessible to all the subroutines.
As an alternative - you can return a code reference from a subroutine - like this:
use strict;
use warnings;
sub make_sub_with_context {
my %context;
$context{"bar"} = 1;
return sub { print $context{"bar"}++ };
}
my $job1_with_context = make_sub_with_context();
my $another_job_with_context = make_sub_with_context();
$job1_with_context->();
$job1_with_context->();
$another_job_with_context->();
$another_job_with_context->();
$another_job_with_context->();
Which may be a better example.
Edit:
Following on from your updated example it looks like your problem spec is to iterate a set of files, and (recursively) traverse referenced files.
Sort of like a find but following include directives. I would point out that by doing it that way, what you're doing is potentially going to end up with a loop, which isn't ideal.
Can I suggest instead taking a different approach? Don't recurse:
use strict;
use warnings;
my #files_to_process = #ARGV;
my %done;
while ( my $infile = pop #files_to_process ) {
next if $done{$infile}++;
open my $fp_in, '<', $infile or die $!;
open my $fp_out, '>', $infile . ".processed" or die $!;
while ( my $line = <$fp_in> ) {
$line =~ s|\/\/.*$||;
if ( my ($include) = ( $line =~ m/#include "(.*?)"/ ) ) {
push #files_to_process, $include;
}
print {$fp_out} $line;
}
close($fp_out);
close($fp_in);
}
With a bit more thought, and the expansion that this task needs to process stuff in declaration order - I'd offer instead - perhaps taking an OO approach would help. Something like:
use strict;
use warnings;
package parser;
sub new {
my ($class) = #_;
my $self = {};
bless $self, $class;
return $self;
}
sub process {
my ( $self, $infile, $outfile ) = #_;
open my $fp_in, '<', $infile;
open my $fp_out, '>', $outfile;
LINE: while ( my $line = <$fp_in> ) {
$line =~ s|\/\/.*$||;
say STDERR "File is $infile";
if ( my ($includefile) = ( $line =~ m/#include "(.*?)"/ ) ) {
my $processor = parser->new();
$processor -> process( $includefile, "$includefile.processed" );
}
$self->warning("Huh raisin, no!") if /\braisin/;
say STDERR "Fill is still $infile";
print $fp_out $line;
}
}
package main;
my $processor = parser->new()->process(#ARGV);

Related

perl redirect stdout to lexical filehandle

I'm trying to write a helper function that runs a perl function in another process and returns a closure that produces a line of output at a time when called.
I figured out a way of doing this using pipe that mixes old and new-style filehandles. I used an old-style one for the sink in order to use the open(STDOUT, ">&thing") syntax and a new-style one for the source since it needs to be captured by a closure and I didn't want to burden the caller with providing a filehandle.
Is there a way of using a new-style filehandle in a construction with the same meaning as open(STDOUT, ">&thing")?
#!/usr/bin/env perl
# pipe.pl
# use pipe() to create a pair of fd's.
# write to one and read from the other.
#
# The source needs to be captured by the closure and can't be
# destructed at the end of get_reader(), so it has to be lexical.
#
# We need to be able to redirect stdout to sink in such a way that
# we actually dup the file descriptor (so shelling out works as intended).
# open(STDOUT, ">&FILEHANDLE") achieves this but appears to require an
# old-style filehandle.
use strict;
use warnings;
sub get_reader {
local *SINK;
my $source;
pipe($source, SINK) or die "can't open pipe!";
my $cpid = fork();
if ($cpid == -1) {
die 'failed to fork';
}
elsif ($cpid == 0) {
open STDOUT, ">&SINK" or die "can't open sink";
system("echo -n hi");
exit;
}
else {
return sub {
my $line = readline($source);
printf "from child (%s)\n", $line;
exit;
}
}
}
sub main {
my $reader = get_reader();
$reader->();
}
main();
When run, this produces
from child (hi)
as expected.
sub get_reader {
my ($cmd) = #_;
open(my $pipe, '-|', #$cmd);
return sub {
return undef if !$pipe;
my $line = <$pipe>;
if (!defined($line)) {
close($pipe);
$pipe = undef;
return undef;
}
chomp($line);
return $line;
};
}
If that's not good enough (e.g. because you also need to redirect the child's STDIN or STDERR), you can use IPC::Run instead.
use IPC::Run qw( start );
sub get_reader {
my ($cmd) = #_;
my $buf = '';
my $h = start($cmd, '>', \$buf);
return sub {
return undef if !$h;
while (1) {
if ($buf =~ s/^([^\n]*)\n//) {
return $1;
}
if (!$h->pump())) {
$h->finish();
$h = undef;
return substr($buf, 0, length($buf), '') if length($buf);
return undef;
}
}
};
}
Either way, you can now do
my $i = get_reader(['prog', 'arg', 'arg']);
while (defined( my $line = $i->() )) {
print "$line\n";
}
Either way, error handling left to you.

Search file in directory structure

Does anybody know a method to search for a file in a directory structure without using File::Find? I know step-by-step how to do it but if it is possible to make it smoother that will be helpful.
File::Find is a core module since perl 5.000 so I don't see a reason for not using it.
But if you still want to take your crazy way you could call the find command.
From one File::Find hater to another: DirWalk.pm, inspired by the Python's os.walk().
package DirWalk;
use strict;
use warnings;
sub new {
my ($class, #dirs) = #_;
my #odirs = #dirs;
#dirs = qw/./ unless #dirs;
s!/+$!! for #dirs;
s!/+\.$!! for #dirs;
my $self = { _odirs => [#odirs], _dirs => [#dirs], _dhstack => [], _dnstack => [] };
opendir my($dirh), $dirs[0];
return undef unless $dirh;
shift #{ $self->{_dirs} };
unshift #{ $self->{_dhstack} }, $dirh;
unshift #{ $self->{_dnstack} }, $dirs[0];
return bless $self, $class;
}
sub _walk_op {
my ($self) = #_;
if (wantarray) {
my #ret;
while (defined(my $x = $self->next())) {
push #ret, $x;
}
return #ret;
}
elsif (defined wantarray) {
return $self->next();
}
return undef;
}
sub next
{
my ($self) = #_;
my $dstack = $self->{_dhstack};
my $nstack = $self->{_dnstack};
if (#$dstack) {
my $x;
do {
$x = readdir $dstack->[0];
} while (defined($x) && ($x eq '.' || $x eq '..'));
if (defined $x) {
my $nm = $nstack->[0].'/'.$x;
if (-d $nm) {
# open dir, and put the handle on the stack
opendir my($dh), $nm;
if (defined $dh) {
unshift #{ $self->{_dhstack} }, $dh;
unshift #{ $self->{_dnstack} }, $nm;
}
else {
warn "can't walk into $nm!"
}
$nm .= '/';
}
# return the name
return $nm;
}
else {
closedir $dstack->[0];
shift #$dstack;
shift #$nstack;
unless (#$dstack) {
while (#{ $self->{_dirs} }) {
my $dir = shift #{ $self->{_dirs} };
opendir my($dirh), $dir;
next unless defined $dirh;
unshift #{ $self->{_dhstack} }, $dirh;
unshift #{ $self->{_dnstack} }, $dir;
last;
}
}
return $self->next();
}
}
else {
return undef;
}
}
use overload '<>' => \&_walk_op;
use overload '""' => sub { 'DirWalk('.join(', ', #{$_[0]->{_odirs}}).')'; };
1;
Example:
# prepare test structure
mkdir aaa
touch aaa/bbb
mkdir aaa/ccc
touch aaa/ccc/ddd
# example invocation:
perl -mDirWalk -E '$dw=DirWalk->new("aaa"); say while <$dw>;'
#output
aaa/ccc/
aaa/ccc/ddd
aaa/bbb
Another example:
use strict;
use warnings;
use DirWalk;
# iteration:
my $dw = DirWalk->new("aaa");
while (<$dw>) {
print "$_\n";
}
# or as a list:
$dw = DirWalk->new("aaa");
my #list = <$dw>;
for (#list) {
print "$_\n";
}
The method I've been inplamenting is utilizing three commands: opendir, readdir, and closedir. See below for an example:
opendir my $dir1, $cwd or die "cannot read the directory $cwd: $!";
#cwd= readdir $dir1;
closedir $dir1;
shift #cwd; shift #cwd;
foreach(#cwd){if ($_=~/$file_search_name/){print "I have found the file in $_\n!";}}
The directory will be stored in #cwd, which includes . and .. For windows, shift #cwd will remove these. I unfortunately am tight for time, but utilize this idea with an anon array to store the directory handles as well as another array for storing the directory paths. Perhaps utilize -d to check if it is a directory. There might be file permission issues, so perhaps unless(opendir ...) would be a great option.
Best of luck.
I'm sure I will be flayed alive for this answer but you could always use either system() or backticks `` to execute the regular linux find command. Or do some sort of ls...
#files = `ls $var/folder/*.logfile`
#files = `find . -name $file2find`
I expect some seasoned perlers have many good reasons not to do this.
yon can also try some stuff like this!!!
# I want to find file xyz.txt in $dir (say C:\sandbox)
Findfile("xyz.txt", $dir);
sub Findfile ()
{
my $file = shift;
my $Searchdir = shift;
my #content = <$Searchdir/*>;
foreach my $element (#content)
{
if($element =~ /.*$file$/)
{
print "found";
last;
}
elsif (-d $element)
{
Findfile($file, $element); #recursive search
}
}
}
File::Find::Rule is "smoother".
use File::Find::Rule qw( );
say for File::Fine::Rule->in(".");

How to I use a class property/variable as a print filehandle in Perl?

I want to do the same thing as
open MYFILE, ">", "data.txt";
print MYFILE "Bob\n";
but instead in class variable like
sub _init_tmp_db
{
my ($self) = #_;
open $$self{tmp_db_fh}, ">", "data.txt";
print $$self{tmp_db_fh} "Bob\n";
}
It gave me this error : 'String found where operator expected near "Bob\n"'
what should I do?
From the print manpage:
If you're storing handles in an array or hash, or in general whenever
you're using any expression more complex than a bareword handle or a
plain, unsubscripted scalar variable to retrieve it, you will have to
use a block returning the filehandle value instead.
You should be using:
print { $$self{tmp_db_fh} } "Bob\n";
This code won't work under use strict. To fix it just use a my variable:
open my $fh, ">", "data.txt" or die $!;
$$self{tmp_db_fh} = $fh;
print { $$self{tmp_db_fh} } "Bob\n";
You should the IO::File module instead.
use IO::File;
my $file = IO::File->new;
$file->open("> data.txt");
print_something($file);
sub print_something {
my ($file) = #_;
$file->print("hello world\n");
}
Or in your example function:
use IO::File;
# ...
sub _init_tmp_db
{
my ($self) = #_;
$self{tmp_db_fh} = IO::File->new;
$self{tmp_db_fh}->open(">", "data.txt");
$self{tmp_db_fh}->print"Bob\n";
}
(note, you can still non -> based calls too, but I wrote the above
using the more traditional ->open() type calls.)
Filehandles can only be scalars.
But $$self{tmp_db_fh} is either an open filehandle (to data.txt) then this would work:
sub _init_tmp_db
{
my ($self) = #_;
my $filehandle = $$self{tmp_db_fh} ;
print $filehandle "Bob\n";
}
or you open the filehandle inside _init_tmp_db
sub _init_tmp_db
{
my ($self) = #_;
open my $filehandle , ">", "data.txt" or die "Cannot open data.txt" ;
print $filehandle "Bob\n";
}
But providing a string in $$self{tmp_db_fh} (like 'FILEHANDLE') won't work.
This is easily solved by creating a variable for a file handle:
sub _init_tmp_db {
my $self = shift;
my $fh;
open $fh, ">", "data.txt"
$self->{temp_db_fh} = $fh;
# Sometime later...
$fh = $self-{temp_db_hf};
print $fh "Bob\n";
}
This is an issue because the way the print syntax is parsed and the early sloppiness of the syntax. The print statement has really two separate formats: Format #1 is that the you're simply passing it stuff to print. Format #2 says that the first item may be a file handle, and the rest is the stuff you want to print to the file handle. If print can't easily determine that the first parameter is a file handle, it fails.
If you look at other languages, they'll use a parameter for passing the file handle, and maybe the stuff to print. Or in object oriented languages, they'll overload >> for the file handle parameter. They'll look something like this:
print "This is my statement", file=file_handle;
or
print "This is my statement" >> file_handle;
You might be able to munge the syntax to get away from using a variable. However, it doesn't make the program more efficient or more readable, and may simply make the program harder to maintain. So, just use a variable for the file handle.
You said class in your title. I assume that you are interested in writing a fully fledge object oriented package to do this. Here's a quick example. Notice in the write subroutine method I retrieve the file handle into a variable and use the variable in the print statement.
#! /usr/bin/env perl
#
use strict;
use warnings;
#######################################################
# MAIN PROGRAM
#
my $file = File->new;
$file->open("OUTPUT") or
die "Can't open 'OUTPUT' for writing\n";
$file->write("This is a test");
#
#######################################################
package File;
use Carp;
sub new {
my $class = shift;
my $self = {};
bless $self, $class;
return $self;
}
sub open {
my $self = shift;
my $file = shift;
my $fh;
if (defined $file) {
$self->{FILE} = $file;
open ($fh, ">", $file) and $self->_fh($fh);
}
return $self->_fh;
}
sub _fh {
my $self = shift;
my $fh = shift;
if (defined $fh) {
$self->{FH} = $fh;
}
return $self->{FH};
}
sub write {
my $self = shift;
my $note = shift;
my $fh = $self->_fh;
print $fh $note . "\n";
return
}

How can I modify the output of the PRINT function using Tie with a Moose implementation?

I can't exactly wrap my head around TIE just yet but the examples ( example-1 example-2 example-3 ) I've seen so far use a non-Moosy implementation, is there anyway to do this:
package MY_STDOUT;
use strict;
my $c = 0;
my $malformed_header = 0;
open(TRUE_STDOUT, '>', '/dev/stdout');
tie *STDOUT, __PACKAGE__, (*STDOUT);
sub TIEHANDLE {
my $class = shift;
my $handles = [#_];
bless $handles, $class;
return $handles;
}
sub PRINT {
my $class = shift;
if (!$c++ && #_[0] !~ /^content-type/) {
my (undef, $file, $line) = caller;
print STDERR "Missing content-type in $file at line $line!!\n";
$malformed_header = 1;
}
return 0 if ($malformed_header);
return print TRUE_STDOUT #_;
}
1;
use MY_STDOUT;
print "content-type: text/html\n\n"; #try commenting out this line
print "<html>\n";
print "</html>\n";
In a more Perl-Moosy way?
For example should I do
open(TRUE_STDOUT, '>', '/dev/stdout');
tie *STDOUT, __PACKAGE__, (*STDOUT);
in a BUILD{} function?
Would it make more sense to implement this as a Moosy class or as Moose::Role?
And finally, would I have to do something like
my $MY_STDOUT = MY_STDOUT->new();
to use it?
I've figured out how to do it with IO::Scalar
https://gist.github.com/1250048
Now I just need to figure out how to do it for STDOUT!

How do I insert new fields into $self in Perl, from a File::Find callback

In a Perl object, I'm trying to add a new field into $self from within a File::Find wanted() sub.
use File::Find;
sub _searchForXMLDocument {
my ($self) = #_;
if($_ =~ /[.]+\.xml/) {
$self->{_xmlDocumentPath} = $_;
}
}
sub runIt{
my ($self) = #_;
find (\&_searchForXMLDocument, $self->{_path});
print $self->{_xmlDocumentPath};
}
_searchForXMLDocument() searches for an XML Document within $self->{_path} and is supposed to append that XML path to $self->{_xmlDocumentPath} but when I try to print it, it remains uninitialized. How do I add the field in $self?
Use of uninitialized value in print at /home/scott/workspace/CCGet/XMLProcessor.pm line 51.
You aren't calling _searchForXMLDocument() in an OO manner, so your $self object isn't being passed to it. This should do the trick now. Use a closure for your method and you have access to $self;
sub runIt{
my ($self) = #_;
my $closure = sub {
if($_ !~ m/[.]+\.xml/) {
$self->{_xmlDocumentPath} = $_;
}
};
find(\&$closure, $self->{_path});
print $self->{_xmlDocumentPath};
}
The first argument to find() needs to carry two pieces of information: the test condition, and the object you're working with. The way to do this is with a closure. The sub { ... } creates a code ref, like you get from \&_searchForXMLDocument, but the closure has access to lexical variables in the enclosing scope, so the current object ($self) is associated with the closure.
sub _searchForXMLDocument {
my ($self) = #_;
if($_ =~ /[.]+\.xml/) {
$self->{_xmlDocumentPath} = $_;
}
}
sub runIt{
my ($self) = #_;
find (sub { $self->_searchForXMLDocument (#_) }, $self->{_path});
print $self->{_xmlDocumentPath};
}
I think you're looking for something like this:
package XMLDocThing;
use strict;
use warnings;
use English qw<$EVAL_ERROR>;
use File::Find qw<find>;
...
use constant MY_BREAK = do { \my $v = 133; };
sub find_XML_document {
my $self = shift;
eval {
find( sub {
return unless m/[.]+\.xml/;
$self->{_xmlDocumentPath} = $_;
die MY_BREAK;
}
, $self->{_path}
);
};
if ( my $error = $EVAL_ERROR ) {
die Carp::longmess( $EVAL_ERROR ) unless $error == MY_BREAK;
}
}
...
# meanwhile, in some other package...
$xmldocthing->find_XML_document;
You pass a closure to find and it can access $self from the containing scope. File::Find::find has no capacity to pass in baggage like objects.