How to get a user-configurable buffer for printing? - perl

I'd like to have a print function supporting a user-configurable buffer, so to print what I have in the buffer only when the buffer is > a threshold).
I need to write multiple files, so I have multiple filehandles to write to, and for this an object oriented module might be handier.
I imagine something like this:
my $printer1 = Print::Buffer->new({ size => 1000, filehandle => \$OUT1 });
for (my $i=1; $i<1000; $i++) {
$printer1->print("This string will be eventually printed ($i/1000)");
}
# and at the end print the remaining buffer
$printer1->flush();
Any recommendation? I probably don't use the right keywords as with print/buffer I didn't find clear matches in CPAN.
UPDATE:
Thanks everyone for the very useful comments. As some of you pointed out, the problem is more complex than I initially thought, and probably a bad idea. (This question arose as I was printing very large files [>100Gb] in with a print statement at each loop iteration, and noted that if I was printing every hunderth iteration I had a speedup, but it could be dependent on how the loop was changed...)
UPDATE 2:
I need/want to accept an answer. To me both have been instructive and they are both useful. I tested both and they both need further work before being able to benchmark the improvement (if any, see update above). The tie handle is a less known feature that I loved, that's why I accepted that. They were both equally close to the desired answer in my opinion. Thank you all very much for the discussion and the insights.

I'd like to have a print function supporting a user-configurable buffer, [...]
I imagine something like this: [...]
It's not hard to write something like it. Here's a basic sketch
File PrintBuffer.pm
package PrintBuffer;
use warnings;
use strict;
sub new {
my ($class, %args) = #_;
my $self = {
_size => $args{size} // 64*1024, #//
_fh => $args{filehandle} // *STDOUT,
_buf => ''
};
$self->{_fh}->autoflush; # want it out once it's printed
bless $self, $class;
}
sub print {
my ($self, $string) = #_;
$self->{_buf} .= $string;
if ( length($self->{_buf}) > $self->{_size} ) {
print { $self->{_fh} } $self->{_buf};
$self->{_buf} = '';
}
return $self;
}
sub DESTROY {
my $self = shift;
print { $self->{_fh} } $self->{_buf} if $self->{_buf} ne '';
$self->{_buf} = '';
}
1;
There's a bit more to do here, and a whole lot that can be added, and since it relies only on basic tools one can add/change as desired.† For one, I can imagine a size method to manipulate the buffer size of an existing object (print if there's already more data than the new size), and flush.
Note that DESTROY method provides for the buffer to be printed as the object drops out of any scope, and is getting destroyed, what seems reasonable to do.
A driver
use warnings;
use strict;
use feature 'say';
use PrintBuffer;
my $fout = shift // die "Usage: $0 out-file\n";
open my $fh, '>', $fout or die "Can't open $fout: $!";
my $obj_file = PrintBuffer->new(size => 100, filehandle => $fh);
my $obj_stdout = PrintBuffer->new(size => 100);
$obj_file->print('a little bit');
$obj_stdout->print('a little bit');
say "printed 'a little bit' ..."; sleep 10;
$obj_file->print('out'x30); # push it over a 100 chars
$obj_stdout->print('out'x30);
say "printed 'out'x30 ... "; sleep 10;
$obj_file->print('again...'); # check DESTROY
$obj_stdout->print('again');
say "printed 'again' (and we're done)";
Check the size of output file in another terminal after each informational print.
I tried PerlIO::buffersize brought up by Grinnz in a comment and it seems to work "as advertised" as they say. It doesn't allow you to do all you may wish but it may be a ready solution for basic needs. Note that this doesn't work with :encoding layer in use.
Thanks to ikegami for comments and tests (linked in comments).
† The print works with an autoflush-ed handle. Still, the first change could be to use syswrite instead, which is unbuffered and attempts to directly write all that's asked of it, via one write(2) call. But since there's no guarantee that all got written we also need to check
use Carp; # for croak
WRITE: {
my $bytes_written = 0;
while ( $bytes_written < length $self->{_buf} ) {
my $rv = syswrite(
$self->{_fh},
$self->{_buf},
length($self->{_buf}) - $bytes_written,
$bytes_written
);
croak "Error writing: $!" if not defined $rv;
$bytes_written += $rv;
}
$self->{_buf} = '';
};
I've put this in a block only to limit the scope of $bytes_written and any other variables that one may wish to introduce so to reduce the number of dereferences of $self (but note that $self->{_buf} may be quite large and copying it "to optimize" dereferencing may end up slower).
Naively we'd only need syswrite(FH, SCALAR) but if it happens that not all of SCALAR gets written then we need to continue writing from past what was written, thus the need to use the form with length-to-write and offset as well.
Since this is unbuffered it mustn't be mixed with buffered IO (or that need be done very carefully); see the docs. Also, :encoding layers can't be used with it. Consider these restrictions against other capabilities that may be wanted in this class.

I don't see a general solution on CPAN, either. But this is straightforward enough with tied filehandles. Something like
use Symbol;
sub Print::Buffer::new {
my ($class,$mode,$file,#opts) = #_;
my $x = Symbol::gensym;
open ($x, $mode, $file) or die "failed to open '$file': $!";
tie *$x, "Print::Buffer", fh => $fh, #opts;
$x;
}
sub Print::Buffer::TIEHANDLE {
my $pkg = shift;
my $self = { #_ };
$self->{bufsize} //= 16 * 1024 * 1024;
$self->{_buffer} = "";
bless $self, $pkg;
}
sub Print::Buffer::PRINT {
my ($self,#msg) = #_;
$self->{buffer} .= join($,,#msg);
$self->_FLUSH if length($self->{buffer}) > $self->{bufsize};
}
sub Print::Buffer::_FLUSH {
my $self = shift;
print {$self->{fh}} $self->{buffer};
$self->{buffer} = "";
}
sub Print::Buffer::CLOSE {
my $self = shift;
$self->_FLUSH;
close( $self->{fh} );
}
sub Print::Buffer::DESTROY {
my $self = shift;
$self->_FLUSH;
}
# ----------------------------------------
my $fh1 = Print::Buffer->new(">", "/tmp/file1",
bufsize => 16*1024*1024);
for (my $i=1; $i<1000; $i++) {
print $fh1 "This string will be eventually printed ($i/1000)\n";
}

Related

process hangs when writing large data to pipe

I have a problem with hung processes with my Perl program, and I think I have isolated it to whenever I write significant amounts of data to a pipe.
Below is all of the code that I think is relevant from my program. When the program hangs, it hangs on the line in ResponseConstructor.pm: print { $self->{writer} } $data;.
I've tested with different data sizes, and it doesn't appear to hang at an exact size. It may become more likely with size, though: sizes around 32KB sometimes work, sometimes don't. Every time I've tried a 110KB string it has failed.
I believe I've also ruled out the contents of the data as a cause, because the same data sometimes causes a hang, and othertimes doesn't.
This is probably the first time I have used pipes in a program before, so I'm not sure what to try next. Any ideas?
use POSIX ":sys_wait_h";
STDOUT->autoflush(1);
pipe(my $pipe_reader, my $pipe_writer);
$pipe_writer->autoflush(1);
my $pid = fork;
if ($pid) {
#I am the parent
close $pipe_writer;
while (waitpid(-1, WNOHANG) <= 0){
#do some stuff while waiting for child to send data on pipe
}
#process the data it got
open(my $fh, '>', "myoutfile.txt");
while ( my $line = <$pipe_reader>){
print $fh $line;
}
close $pipe_reader;
close $fh;
else {
#I am the child
die "cannot fork: $!" unless defined $pid;
close $pipe_reader;
my $response = ResponseConstructor->new($pipe_writer);
if ([a condition where we want to return small data]){
$response->respond('small data');
exit;
}
elsif ([a condition where we want to return big data]){
$response->respond('imagine this is a really big string');
}
}
ResponseConstructor.pm:
package ResponseConstructor;
use strict;
use warnings;
sub new {
my $class = shift;
my $writer = shift;
my $self = {
writer => $writer
};
bless($self, $class);
return($self);
}
#Writes the response then closes the writer (pipe)
sub respond {
my $self = shift;
my $data = shift;
print { $self->{writer} } $data;
close $self->{writer};
}
1;
You probably shouldn't be ignoring your pipe while it is returning data: you can use a select on the pipe (instead of waitpid) to see if there's any data to read during your waiting loop, but if you really want a larger pipe buffer so you can read it all at once, you can use a socketpair instead of a pipe and then you can use setsockopt to make the buffer as large as you need it.

Is there a way to be notified when 'print' is called on $fh?

When I do:
print $fh 'text';
I need a some &sub to be called.
It there a way to do that?
You can tie a filehandle and customize the behavior for printing to that filehandle or for any other operation on that filehandle.
sub PrintNotifier::TIEHANDLE {
my ($pkg, $orignalHandle) = #_;
bless { glob => $orignalHandle }, $pkg;
}
sub PrintNotifier::PRINT {
my ($self,#msg) = #_;
... do whatever you want with #msg here ...
return print {$self->{glob}} #msg;
}
sub PrintNotifier::CLOSE { return close $_[0]->{glob} }
open my $fh, '>', 'some-file';
tie *$fh, 'PrintNotifier', $fh;
print $fh "something"; # calls PrintNotifier::PRINT
You can tie the handle, as mob suggested. Or, if you can change the code and your Perl is new enough, you can replace
print $fh 'text';
with
$fh->print('text');
which you might consider cleaner syntax; then you can sub-class IO::File:
package MyFH {
use parent qw/ IO::File /;
use mro; # Get next::method
sub print {
my ($self, #args) = #_;
warn 'Printing ', #args;
$self->next::method(#args);
}
}
my $fh = MyFH->new();
$fh->open('file', '>') or die $!;
However, this doesn't capture the old-fashioned
print $fh 'text';
style.
Depending on your preference, you might find the new style cleaner anyway, since if your file handle is an expression it allows
$obj->method()->print('text');
instead of
print {$obj->method()} 'text';
It works transparently for Perl 5.14 and up, and can be made to work for older Perls back to (at least) 5.8 by adding
use IO::Handle;
to the top of the files you want to use it in (just to be on the safe side).
In is not documented in perl but has usefull usage
If you want to process print'ing on your object in your class you can tie into itself:
tie *$self, $self;
print $self 'text';
$self->print( 'text' );

How to pass a file handle to a function?

When I run the code below I get
Can't use string ("F") as a symbol ref while "strict refs" in use at ./T.pl line 21.
where line 21 is
flock($fh, LOCK_EX);
What am I doing wrong?
#!/usr/bin/perl
use strict;
use warnings;
use Fcntl ':flock', 'SEEK_SET'; # file locking
use Data::Dumper;
# use xx;
my $file = "T.yaml";
my $fh = "F";
my $obj = open_yaml_with_lock($file, $fh);
$obj->{a} = 1;
write_yaml_with_lock($obj, $fh);
sub open_yaml_with_lock {
my ($file, $fh) = #_;
open $fh, '+<', $file;
flock($fh, LOCK_EX);
my $obj = YAML::Syck::LoadFile($fh);
return $obj;
}
sub write_yaml_with_lock {
my ($obj, $fh) = #_;
my $yaml = YAML::Syck::Dump($obj);
$YAML::Syck::ImplicitUnicode = 1;
seek $fh,0, SEEK_SET; # seek back to the beginning of file
print $fh $yaml . "---\n";
close $fh;
}
What you're doing wrong is using the string "F" as a filehandle. This
has never been something that's worked; you could use a bareword as a
filehandle (open FH, ...; print FH ...), or you could pass in an
empty scalar and perl would assign a new open file object to that
variable. But if you pass in the string F, then you need to refer to
then handle as F, not $fh. But, don't do that.
Do this instead:
sub open_yaml_with_lock {
my ($file) = #_;
open my $fh, '+<', $file or die $!;
flock($fh, LOCK_EX) or die $!;
my $obj = YAML::Syck::LoadFile($fh); # this dies on failure
return ($obj, $fh);
}
We're doing several things here. One, we're not storing the
filehandle in a global. Global state makes your program extremely
difficult to understand -- I had a hard time with your 10 line post --
and should be avoided. Just return the filehandle, if you want to
keep it around. Or, you can alias it like open does:
sub open_yaml_with_lock {
open $_[0], '+<', $_[1] or die $!;
...
}
open_yaml_with_lock(my $fh, 'filename');
write_yaml_with_lock($fh);
But really, this is a mess. Put this stuff in an object. Make new
open and lock the file. Add a write method. Done. Now you can
reuse this code (and let others do the same) without worrying about
getting something wrong. Less stress.
The other thing we're doing here is checking errors. Yup, disks can
fail. Files can be typo'd. If you blissfully ignore the return value
of open and flock, then your program may not be doing what you think
it's doing. The file might not be opened. The file might not be
locked properly. One day, your program is not going to work properly
because you spelled "file" as "flie" and the file can't be opened.
You will scratch your head for hours wondering what's going on.
Eventually, you'll give up, go home, and try again later. This time,
you won't typo the file name, and it will work. Several hours will
have been wasted. You'll die several years earlier than you should
because of the accumulated stress. So just use autodie or write or
die $! after your system calls so that you get an error message when
something goes wrong!
Your script would be correct if you wrote use autodie qw/open flock
seek close/ at the top. (Actually, you should also check that
"print" worked or use
File::Slurp or
syswrite, since autodie can't detect a failing print statement.)
So anyway, to summarize:
Don't open $fh when $fh is defined. Write open my $fh to
avoid thinking about this.
Always check the return values of system calls. Make autodie do
this for you.
Don't keep global state. Don't write a bunch of functions that
are meant to be used together but rely on implicit preconditions
like an open file. If functions have preconditions, put them in
a class and make the constructor satisfy the preconditions.
This way, you can't accidentally write buggy code!
Update
OK, here's how to make this more OO. First we'll do "pure Perl" OO
and then use Moose. Moose is
what I would use for any real work; the "pure Perl" is just for the
sake of making it easy to understand for someone new to both OO and
Perl.
package LockedYAML;
use strict;
use warnings;
use Fcntl ':flock', 'SEEK_SET';
use YAML::Syck;
use autodie qw/open flock sysseek syswrite/;
sub new {
my ($class, $filename) = #_;
open my $fh, '+<', $filename;
flock $fh, LOCK_EX;
my $self = { obj => YAML::Syck::LoadFile($fh), fh => $fh };
bless $self, $class;
return $self;
}
sub object { $_[0]->{obj} }
sub write {
my ($self, $obj) = #_;
my $yaml = YAML::Syck::Dump($obj);
local $YAML::Syck::ImplicitUnicode = 1; # ensure that this is
# set for us only
my $fh = $self->{fh};
# use system seek/write to ensure this really does what we
# mean. optional.
sysseek $fh, 0, SEEK_SET;
syswrite $fh, $yaml;
$self->{obj} = $obj; # to keep things consistent
}
Then, we can use the class in our main program:
use LockedYAML;
my $resource = LockedYAML->new('filename');
print "Our object looks like: ". Dumper($resource->object);
$resource->write({ new => 'stuff' });
Errors will throw exceptions, which can be handled with
Try::Tiny, and the YAML
file will stay locked as long as the instance exists. You can, of
course, have many LockedYAML objects around at once, that's why we
made it OO.
And finally, the Moose version:
package LockedYAML;
use Moose;
use autodie qw/flock sysseek syswrite/;
use MooseX::Types::Path::Class qw(File);
has 'file' => (
is => 'ro',
isa => File,
handles => ['open'],
required => 1,
coerce => 1,
);
has 'fh' => (
is => 'ro',
isa => 'GlobRef',
lazy_build => 1,
);
has 'obj' => (
is => 'rw',
isa => 'HashRef', # or ArrayRef or ArrayRef|HashRef, or whatever
lazy_build => 1,
trigger => sub { shift->_update_obj(#_) },
);
sub _build_fh {
my $self = shift;
my $fh = $self->open('rw');
flock $fh, LOCK_EX;
return $fh;
}
sub _build_obj {
my $self = shift;
return YAML::Syck::LoadFile($self->fh);
}
sub _update_obj {
my ($self, $new, $old) = #_;
return unless $old; # only run if we are replacing something
my $yaml = YAML::Syck::Dump($new);
local $YAML::Syck::ImplicitUnicode = 1;
my $fh = $self->fh;
sysseek $fh, 0, SEEK_SET;
syswrite $fh, $yaml;
return;
}
This is used similarly:
use LockedYAML;
my $resource = LockedYAML->new( file => 'filename' );
$resource->obj; # the object
$resource->obj( { new => 'object' }); # automatically saved to disk
The Moose version is longer, but does a lot more runtime consistency
checking and is easier to enhance. YMMV.
From the documentation:
open FILEHANDLE,EXPR
If FILEHANDLE is an undefined scalar variable (or array or hash
element) the variable is assigned a reference to a new anonymous
filehandle, otherwise if FILEHANDLE is an expression, its value is
used as the name of the real filehandle wanted. (This is considered a
symbolic reference, so "use strict 'refs'" should
not be in effect.)
Filehandle here is an expression ("F") so itsvalue is used as the name of the real filehandle you want. (A filehandle called F). And then... the documentation says "use strict 'refs'" should not be in effect, because you're using F as a symbolic reference.
(use strict; on line 1 includes strict 'refs'.)
Had you just said at the beginning:
my $fh;
This would have worked, because then $fh would become a reference to a new anonymous filehandle and Perl won't try to use it as a symbolic reference.
This works:
#!/usr/bin/perl
my $global_fh;
open_filehandle(\$global_fh);
use_filehandle(\$global_fh);
sub open_filehandle {
my ($fh)=#_;
open($$fh, ">c:\\temp\\testfile") || die;
}
sub use_filehandle {
my($fh) = #_;
# Print is pecular that it expects the next token to be the filehandle
# or a simple scalar. Thus, print $$fh "Hello, world!" will not work.
my $lfh = $$fh;
print $lfh "Hello, world!";
close($$fh);
}
Or you can do what the other poster suggested and use $_[1] directly, but that's a bit harder to read.
If you use the value directly in the sub, it will work:
use strict;
use warnings;
use autodie;
my $fh;
yada($fh);
print $fh "testing, testing";
sub yada {
open $_[0], '>', 'yada.gg';
}
Or as a reference:
yada(\$fh);
sub yada {
my $handle = shift;
open $$handle, '>', 'yada.gg';
}
Or better yet, return a filehandle:
my $fh = yada($file);
sub yada {
my $inputfile = shift;
open my $gg, '>', $inputfile;
return $gg;
}
Replace
my $fh = "F"; # text and also a ref in nonstrict mode
with
my $fh = \*F; # a reference, period
Of course, it's better yet to use lexical filehandles, as in open my $fd, ... or die ..., but that's not always possible, e.g. you have STDIN that's predefined. In such cases, use \*FD wherever $fd fits.
There's also a case with old scripts, you have to watch out where a global FD is opened and closed.

Pimp my Perl code

I'm an experienced developer, but not in Perl. I usually learn Perl to hack a script, then I forget it again until the next time. Hence I'm looking for advice from the pros.
This time around I'm building a series of data analysis scripts. Grossly simplified, the program structure is like this:
01 my $config_var = 999;
03 my $result_var = 0;
05 foreach my $file (#files) {
06 open(my $fh, $file);
07 while (<$fh>) {
08 &analyzeLine($_);
09 }
10 }
12 print "$result_var\n";
14 sub analyzeLine ($) {
15 my $line = shift(#_);
16 $result_var = $result_var + calculatedStuff;
17 }
In real life, there are up to about half a dozen different config_vars and result_vars.
These scripts differ mostly in the values assigned to the config_vars. The main loop will be the same in every case, and analyzeLine() will be mostly the same but could have some small variations.
I can accomplish my purpose by making N copies of this code, with small changes here and there; but that grossly violates all kinds of rules of good design. Ideally, I would like to write a series of scripts containing only a set of config var initializations, followed by
do theCommonStuff;
Note that config_var (and its siblings) must be available to the common code, as must result_var and its lookalikes, upon which analyzeLine() does some calculations.
Should I pack my "common" code into a module? Create a class? Use global variables?
While not exactly code golf, I'm looking for a simple, compact solution that will allow me to DRY and write code only for the differences. I think I would rather not drive the code off a huge table containing all the configs, and certainly not adapt it to use a database.
Looking forward to your suggestions, and thanks!
Update
Since people asked, here's the real analyzeLine:
# Update stats with time and call data in one line.
sub processLine ($) {
my $line = shift(#_);
return unless $line =~ m/$log_match/;
# print "$1 $2\n";
my ($minute, $function) = ($1, $2);
$startMinute = $minute if not $startMinute;
$endMinute = $minute;
if ($minute eq $currentMinute) {
$minuteCount = $minuteCount + 1;
} else {
if ($minuteCount > $topMinuteCount) {
$topMinute = $currentMinute;
$topMinuteCount = $minuteCount;
printf ("%40s %s : %d\n", '', $topMinute, $topMinuteCount);
}
$totalMinutes = $totalMinutes + 1;
$totalCount = $totalCount + $minuteCount;
$currentMinute = $minute;
$minuteCount = 1;
}
}
Since these variables are largely interdependent, I think a functional solution with separate calculations won't be practical. I apologize for misleading people.
Two comments: First, don't post line numbers as they make it more difficult than necessary to copy, paste and edit. Second, don't use &func() to invoke a sub. See perldoc perlsub:
A subroutine may be called using an explicit & prefix. The & is optional in modern Perl, ... Not only does the & form make the argument list optional, it also disables any prototype checking on arguments you do provide.
In short, using & can be surprising unless you know what you are doing and why you are doing it.
Also, don't use prototypes in Perl. They are not the same as prototypes in other languages and, again, can have very surprising effects unless you know what you are doing.
Do not forget to check the return value of system calls such as open. Use autodie with modern perls.
For your specific problem, collect all configuration variables in a hash. Pass that hash to analyzeLine.
#!/usr/bin/perl
use warnings; use strict;
use autodie;
my %config = (
frobnicate => 'yes',
machinate => 'no',
);
my $result;
$result += analyze_file(\%config, $_) for #ARGV;
print "Result = $result\n";
sub analyze_file {
my ($config, $file) = #_;
my $result;
open my $fh, '<', $file;
while ( my $line = <$fh> ) {
$result += analyze_line($config, $line);
}
close $fh;
return $result;
}
sub analyze_line {
my ($line) = #_;
return length $line;
}
Of course, you will note that $config is being passed all over the place, which means you might want to turn this in to a OO solution:
#!/usr/bin/perl
package My::Analyzer;
use strict; use warnings;
use base 'Class::Accessor::Faster';
__PACKAGE__->follow_best_practice;
__PACKAGE__->mk_accessors( qw( analyzer frobnicate machinate ) );
sub analyze_file {
my $self = shift;
my ($file) = #_;
my $result;
open my $fh, '<', $file;
while ( my $line = <$fh> ) {
$result += $self->analyze_line($line);
}
close $fh;
return $result;
}
sub analyze_line {
my $self = shift;
my ($line) = #_;
return $self->get_analyzer->($line);
}
package main;
use warnings; use strict;
use autodie;
my $x = My::Analyzer->new;
$x->set_analyzer(sub {
my $length; $length += length $_ for #_; return $length;
});
$x->set_frobnicate('yes');
$x->set_machinate('no');
my $result;
$result += $x->analyze_file($_) for #ARGV;
print "Result = $result\n";
Go ahead and create a class hierarchy. Your task is an ideal playground for OOP style of programming.
Here's an example:
package Common;
sub new{
my $class=shift;
my $this=bless{},$class;
$this->init();
return $this;
}
sub init{}
sub theCommonStuff(){
my $this=shift;
for(1..10){ $this->analyzeLine($_); }
}
sub analyzeLine(){
my($this,$line)=#_;
$this->{'result'}.=$line;
}
package Special1;
our #ISA=qw/Common/;
sub init{
my $this=shift;
$this->{'sep'}=','; # special param: separator
}
sub analyzeLine(){ # modified logic
my($this,$line)=#_;
$this->{'result'}.=$line.$this->{'sep'};
}
package main;
my $c = new Common;
my $s = new Special1;
$c->theCommonStuff;
$s->theCommonStuff;
print $c->{'result'}."\n";
print $s->{'result'}."\n";
If all the common code is in one function, a function taking your config variables as parameters, and returning the result variables (either as return values, or as in/out parameters), will do. Otherwise, making a class ("package") is a good idea, too.
sub common_func {
my ($config, $result) = #_;
# ...
$result->{foo} += do_stuff($config->{bar});
# ...
}
Note in the above that both the config and result are hashes (actually, references thereto). You can use any other data structure that you feel will suit your goal.
Some thoughts:
If there are several $result_vars, I would recommend creating a separate subroutine for calculating each one.
If a subroutine relies on information outside that function, it should be passed in as a parameter to that subroutine, rather than relying on global state.
Alternatively wrap the whole thing in a class, with $result_var as an attribute of the class.
Practically speaking, there are a couple ways you could implement this:
(1) Have your &analyzeLine function return calculatedStuff, and add it to &result_var in a loop outside the function:
$result_var = 0;
foreach my $file (#files) {
open(my $fh, $file);
while (<$fh>) {
$result_var += analyzeLine($_);
}
}
}
sub analyzeLine ($) {
my $line = shift(#_);
return calculatedStuff;
}
(2) Pass $result_var into analyzeLine explicitly, and return the changed $result_var.
$result_var = 0;
foreach my $file (#files) {
open(my $fh, $file);
while (<$fh>) {
$result_var = addLineToResult($result_var, $_);
}
}
}
sub addLineToResult ($$) {
my $running_total = shift(#_);
my $line = shift(#_);
return $running_total + calculatedStuff;
}
The important part is that if you separate out functions for each of your several $result_vars, you'll be more readily able to write clean code. Don't worry about optimizing yet. That can come later, when your code has proven itself slow. The improved design will make optimization easier when the time comes.
why not create a function and using $config_var and $result_var as parameters?

Access to Perl's empty angle "<>" operator from an actual filehandle?

I like to use the nifty perl feature where reading from the empty angle operator <> magically gives your program UNIX filter semantics, but I'd like to be able to access this feature through an actual filehandle (or IO::Handle object, or similar), so that I can do things like pass it into subroutines and such. Is there any way to do this?
This question is particularly hard to google, because searching for "angle operator" and "filehandle" just tells me how to read from filehandles using the angle operator.
From perldoc perlvar:
ARGV
The special filehandle that iterates over command-line filenames in #ARGV. Usually written as the null filehandle in the angle operator <>. Note that currently ARGV only has its magical effect within the <> operator; elsewhere it is just a plain filehandle corresponding to the last file opened by <>. In particular, passing \*ARGV as a parameter to a function that expects a filehandle may not cause your function to automatically read the contents of all the files in #ARGV.
I believe that answers all aspects of your question in that "Hate to say it but it won't do what you want" kind of way. What you could do is make functions that take a list of filenames to open, and do this:
sub takes_filenames (#) {
local #ARGV = #_;
// do stuff with <>
}
But that's probably the best you'll be able to manage.
Expanding on Chris Lutz's idea, here is a very rudimentary implementation:
#!/usr/bin/perl
package My::ARGV::Reader;
use strict; use warnings;
use autodie;
use IO::Handle;
use overload
'<>' => \&reader,
'""' => \&argv,
'0+' => \&input_line_number,
;
sub new {
my $class = shift;
my $self = {
names => [ #_ ],
handles => [],
current_file => 0,
};
bless $self => $class;
}
sub reader {
my $self = shift;
return scalar <STDIN> unless #{ $self->{names}};
my $line;
while ( 1 ) {
my $current = $self->{current_file};
return if $current >= #{ $self->{names} };
my $fh = $self->{handles}->[$current];
unless ( $fh ) {
$self->{handles}->[$current] = $fh = $self->open_file;
}
if( eof $fh ) {
close $fh;
$self->{current_file} = $current + 1;
next;
}
$line = <$fh>;
last;
}
return $line;
}
sub open_file {
my $self = shift;
my $name = $self->{names}->[ $self->{current_file} ];
open my $fh, '<', $name;
return $fh;
}
sub argv {
my $self = shift;
my $name = #{$self->{names}}
? $self->{names}->[ $self->{current_file} ]
: '-'
;
return $name;
}
sub input_line_number {
my $self = shift;
my $fh = #{$self->{names}}
? $self->{handles}->[$self->{current_file}]
: \*STDIN
;
return $fh->input_line_number;
}
which can be used as:
package main;
use strict; use warnings;
my $it = My::ARGV::Reader->new(#ARGV);
echo($it);
sub echo {
my ($it) = #_;
printf "[%s:%d]:%s", $it, +$it, $_ while <$it>;
}
Output:
[file1:1]:bye bye
[file1:2]:hello
[file1:3]:thank you
[file1:4]:no translation
[file1:5]:
[file2:1]:chao
[file2:2]:hola
[file2:3]:gracias
[file2:4]:
It looks like this has already been implemented as Iterator::Diamond. Iterator::Diamond also disables the 2-argument-open magic that perl uses when reading <ARGV>. Even better, it supports reading '-' as STDIN, without enabling all the other magic. In fact, I might use it for that purpose just on single files.