Use of reference to elements in #_ to avoid duplicating code - perl

Is it safe to take reference of elements of #_ in a subroutine in order to avoid duplicating code? I also wonder if the following is good practice or can be simplified. I have a subroutine mod_str that takes an option saying if a string argument should be modified in-place or not:
use feature qw(say);
use strict;
use warnings;
my $str = 'abc';
my $mstr = mod_str( $str, in_place => 0 );
say $mstr;
mod_str( $str, in_place => 1 );
say $str;
sub mod_str {
my %opt;
%opt = #_[1..$#_];
if ( $opt{in_place} ) {
$_[0] =~ s/a/A/g;
# .. do more stuff with $_[0]
return;
}
else {
my $str = $_[0];
$str =~ s/a/A/g;
# .. do more stuff with $str
return $str;
}
}
In order to avoid repeating/duplicating code in the if and else blocks above, I tried to improve mod_str:
sub mod_str {
my %opt;
%opt = #_[1..$#_];
my $ref;
my $str;
if ( $opt{in_place} ) {
$ref = \$_[0];
}
else {
$str = $_[0]; # make copy
$ref = \$str;
}
$$ref =~ s/a/A/g;
# .. do more stuff with $$ref
$opt{in_place} ? return : return $$ref;
}

The "in place" flag changes the function's interface to the point where it should be a new function. It will simplify the interface, testing, documentation and the internals to have two functions. Rather than having to parse arguments and have a big if/else block, the user has already made that choice for you.
Another way to look at it is the in_place option will always be set to a constant. Because it fundamentally changes how the function behaves, there's no sensible case where you'd write in_place => $flag.
Once you do that, the reuse becomes more obvious. Write one function to do the operation in place. Write another which calls that on a copy.
sub mod_str_in_place {
# ...Do work on $_[0]...
return;
}
sub mod_str {
my $str = $_[0]; # string is copied
mod_str_in_place($str);
return $str;
}

In the absence of the disgraced given I like using for as a topicalizer. This effectively aliases $_ to either $_[0] or the local copy depending on the value of the in_place hash element. It's directly comparable to your $ref but with aliases, and a lot cleaner
I see no reason to return a useless undef / () in the case that the string is modified in place; the subroutine may as well return the new value of the string. (I suspect the old value might be more useful, after the fashion of $x++, but that makes for uglier code!)
I'm not sure whether this is readable code to anyone but me, so comments are welcome!
use strict;
use warnings;
my $ss = 'abcabc';
printf "%s %s\n", mod_str($ss), $ss;
$ss = 'abcabc';
printf "%s %s\n", mod_str($ss, in_place => 1), $ss;
sub mod_str {
my ($copy, %opt) = #_;
for ( $opt{in_place} ? $_[0] : $copy ) {
s/a/A/g;
# .. do more stuff with $_
return $_;
}
}
output
AbcAbc abcabc
AbcAbc AbcAbc

Related

Perl: Syntactical Sugar for Latter Coderef Arguments?

Using sub prototypes, we can define our own subs that look like map or grep. That is, the first coderef argument has shorter syntax than a normal anonymous sub. For example:
sub thunked (&) { $_[0] }
my $val = thunked { 2 * 4 };
Works great here, since the first argument is the coderef. For latter arguments however, it simple won't parse properly.
I made a with sub designed to make writing GTK2 code cleaner. It's meant to look like this (untested since it's hypothetical code):
use 5.012;
use warnings;
use Gtk2 '-init';
sub with ($&) {
local $_ = $_[0];
$_[1]->();
$_;
}
for (Gtk2::Window->new('toplevel')) {
$_->set_title('Test Application');
$_->add(with Gtk2::VBox->new {
my $box = $_;
$box->add(Gtk2::Button->new("Button $_")) for (1..4);
});
$_->show_all;
}
Gtk2->main;
It doesn't work because with needs to take the block as a first argument for the nice syntax to work. Is there any way to pull it off?
The module Devel::Declare contains tools for extending Perl's syntax in a relatively safe way.
Using Devel::Declare you would create a hook on the with token, which will stop the parser when it reaches that word. From there, you have control over the parser and you can read ahead until you reach a { symbol. At that point, you have what you need to work with, so you rewrite it into valid Perl, and pass it back to the parser.
in the file With.pm:
package With;
use warnings;
use strict;
use Devel::Declare;
sub import {
my $caller = caller;
Devel::Declare->setup_for (
$caller => {with => {const => \&parser}}
);
no strict 'refs';
*{$caller.'::with'} = sub ($&) {
$_[1]() for $_[0];
$_[0]
}
}
our $prefix = '';
sub get {substr Devel::Declare::get_linestr, length $prefix}
sub set { Devel::Declare::set_linestr $prefix . $_[0]}
sub parser {
local $prefix = substr get, 0, length($_[0]) + $_[1];
my $with = strip_with();
strip_space();
set "scalar($with), sub " . get;
}
sub strip_space {
my $skip = Devel::Declare::toke_skipspace length $prefix;
set substr get, $skip;
}
sub strip_with {
strip_space;
my $with;
until (get =~ /^\{/) {
(my $line = get) =~ s/^([^{]+)//;
$with .= $1;
set $line;
strip_space;
}
$with =~ s/\s+/ /g;
$with
}
and to use it:
use With;
sub Window::add {say "window add: ", $_[1]->str}
sub Window::new {bless [] => 'Window'}
sub Box::new {bless [] => 'Box'}
sub Box::add {push #{$_[0]}, #_[1..$#_]}
sub Box::str {"Box(#{$_[0]})"}
sub Button::new {"Button($_[1])"}
with Window->new {
$_->add(with Box->new {
for my $num (1 .. 4) {
$_->add(Button->new($num))
}
})
};
Which prints:
window add: Box(Button(1) Button(2) Button(3) Button(4))
A completely different approach would be to skip the with keyword altogether and write a routine to generate constructor subroutines:
BEGIN {
for my $name (qw(VBox)) { # and any others you want
no strict 'refs';
*$name = sub (&#) {
use strict;
my $code = shift;
my $with = "Gtk2::$name"->new(#_);
$code->() for $with;
$with
}
}
}
and then your code could look like
for (Gtk2::Window->new('toplevel')) {
$_->set_title('Test Application');
$_->add(VBox {
my $box = $_;
$box->add(Gtk2::Button->new("Button $_")) for (1..4);
});
$_->show_all;
}
One way that you could deal with it is to add a fairly useless keyword:
sub perform(&) { $_[0] }
with GTK2::VBox->new, perform { ... }
where perform is really just a sugarier alternative to sub.
Another way is to write a Devel::Declare filter or a Syntax::Keyword:: plugin to implement your with, as long as you have some way to tell when you're done parsing the with argument and ready to start parsing the block — balanced parentheses would do (so would an opening curly brace, but then hashes become a problem). Then you could support something like
with (GTK2::VBox->new) { ... }
and let the filter rewrite it to something like
do {
local $_ = GTK2::VBox->new;
do {
...;
};
$_;
}
which, if it works, has the advantage of not actually creating a sub, and thus not interfering with #_, return, and a few other things. The two layers of do-age I think are necessary for being able to install an EndOfScope hook in the proper place.
The obvious disadvantages of this are that it's tricky, it's hairy, and it's a source filter (even if it's a tame one) which means there are problems you have to solve if you want any code using it to be debuggable at all.

Where can I find an array of the (un)assigned Unicode code points for a particular block?

At the moment, I'm writing these arrays by hand.
For example, the Miscellaneous Mathematical Symbols-A block has an entry in hash like this:
my %symbols = (
...
miscellaneous_mathematical_symbols_a => [(0x27C0..0x27CA), 0x27CC,
(0x27D0..0x27EF)],
...
)
The simpler, 'continuous' array
miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF]
doesn't work because Unicode blocks have holes in them. For example, there's nothing at 0x27CB. Take a look at the code chart [PDF].
Writing these arrays by hand is tedious, error-prone and a bit fun. And I get the feeling that someone has already tackled this in Perl!
Perhaps you want Unicode::UCD? Use its charblock routine to get the range of any named block. If you want to get those names, you can use charblocks.
This module is really just an interface to the Unicode databases that come with Perl already, so if you have to do something fancier, you can look at the lib/5.x.y/unicore/UnicodeData.txt or the various other files in that same directory to get what you need.
Here's what I came up with to create your %symbols. I go through all the blocks (although in this sample I skip that ones without "Math" in their name. I get the starting and ending code points and check which ones are assigned. From that, I create a custom property that I can use to check if a character is in the range and assigned.
use strict;
use warnings;
digest_blocks();
my $property = 'My::InMiscellaneousMathematicalSymbolsA';
foreach ( 0x27BA..0x27F3 )
{
my $in = chr =~ m/\p{$property}/;
printf "%X is %sin $property\n",
$_, $in ? '' : ' not ';
}
sub digest_blocks {
use Unicode::UCD qw(charblocks);
my $blocks = charblocks();
foreach my $block ( keys %$blocks )
{
next unless $block =~ /Math/; # just to make the output small
my( $start, $stop ) = #{ $blocks->{$block}[0] };
$blocks->{$block} = {
assigned => [ grep { chr =~ /\A\p{Assigned}\z/ } $start .. $stop ],
unassigned => [ grep { chr !~ /\A\p{Assigned}\z/ } $start .. $stop ],
start => $start,
stop => $stop,
name => $block,
};
define_my_property( $blocks->{$block} );
}
}
sub define_my_property {
my $block = shift;
(my $subname = $block->{name}) =~ s/\W//g;
$block->{my_property} = "My::In$subname"; # needs In or Is
no strict 'refs';
my $string = join "\n", # can do ranges here too
map { sprintf "%X", $_ }
#{ $block->{assigned} };
*{"My::In$subname"} = sub { $string };
}
If I were going to do this a lot, I'd use the same thing to create a Perl source file that has the custom properties already defined so I can just use them right away in any of my work. None of the data should change until you update your Unicode data.
sub define_my_property {
my $block = shift;
(my $subname = $block->{name}) =~ s/\W//g;
$block->{my_property} = "My::In$subname"; # needs In or Is
no strict 'refs';
my $string = num2range( #{ $block->{assigned} } );
print <<"HERE";
sub My::In$subname {
return <<'CODEPOINTS';
$string
CODEPOINTS
}
HERE
}
# http://www.perlmonks.org/?node_id=87538
sub num2range {
local $_ = join ',' => sort { $a <=> $b } #_;
s/(?<!\d)(\d+)(?:,((??{$++1})))+(?!\d)/$1\t$+/g;
s/(\d+)/ sprintf "%X", $1/eg;
s/,/\n/g;
return $_;
}
That gives me output suitable for a Perl library:
sub My::InMiscellaneousMathematicalSymbolsA {
return <<'CODEPOINTS';
27C0 27CA
27CC
27D0 27EF
CODEPOINTS
}
sub My::InSupplementalMathematicalOperators {
return <<'CODEPOINTS';
2A00 2AFF
CODEPOINTS
}
sub My::InMathematicalAlphanumericSymbols {
return <<'CODEPOINTS';
1D400 1D454
1D456 1D49C
1D49E 1D49F
1D4A2
1D4A5 1D4A6
1D4A9 1D4AC
1D4AE 1D4B9
1D4BB
1D4BD 1D4C3
1D4C5 1D505
1D507 1D50A
1D50D 1D514
1D516 1D51C
1D51E 1D539
1D53B 1D53E
1D540 1D544
1D546
1D54A 1D550
1D552 1D6A5
1D6A8 1D7CB
1D7CE 1D7FF
CODEPOINTS
}
sub My::InMiscellaneousMathematicalSymbolsB {
return <<'CODEPOINTS';
2980 29FF
CODEPOINTS
}
sub My::InMathematicalOperators {
return <<'CODEPOINTS';
2200 22FF
CODEPOINTS
}
Maybe this?
my #list =
grep {chr ($_) =~ /^\p{Assigned}$/}
0x27C0..0x27EF;
#list = map { $_ = sprintf ("%X", $_ )} #list;
print "#list\n";
Gives me
27C0 27C1 27C2 27C3 27C4 27C5 27C6 27C7 27C8 27C9 27CA 27D0 27D1 27D2 27D3
27D4 27D5 27D6 27D7 27D8 27D9 27DA 27DB 27DC 27DD 27DE 27DF 27E0 27E1 27E2
27E3 27E4 27E5 27E6 27E7 27E8 27E9 27EA 27EB
I don't know why you wouldn't say miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF], because that's how the Unicode standard is defined according to the PDF.
What do you mean when you say it doesn't "work"? If it's giving you some sort of error when you check the existence of the character in the block, then why not just weed them out of the block when your checker comes across an error?

Perl: if ( element in list )

I'm looking for presence of an element in a list.
In Python there is an in keyword and I would do something like:
if element in list:
doTask
Is there something equivalent in Perl without having to manually iterate through the entire list?
UPDATE:
The smartmatch family of features are now experimental
Smart match, added in v5.10.0 and significantly revised in v5.10.1, has been a regular point of complaint. Although there are a number of ways in which it is useful, it has also proven problematic and confusing for both users and implementors of Perl. There have been a number of proposals on how to best address the problem. It is clear that smartmatch is almost certainly either going to change or go away in the future. Relying on its current behavior is not recommended.
Warnings will now be issued when the parser sees ~~, given, or when.
If you can get away with requiring Perl v5.10, then you can use any of the following examples.
The smart match ~~ operator.
if( $element ~~ #list ){ ... }
if( $element ~~ [ 1, 2, 3 ] ){ ... }
You could also use the given/when construct. Which uses the smart match functionality internally.
given( $element ){
when( #list ){ ... }
}
You can also use a for loop as a "topicalizer" ( meaning it sets $_ ).
for( #elements ){
when( #list ){ ... }
}
One thing that will come out in Perl 5.12 is the ability to use the post-fix version of when. Which makes it even more like if and unless.
given( $element ){
... when #list;
}
If you have to be able to run on older versions of Perl, there still are several options.
You might think you can get away with using List::Util::first, but there are some edge conditions that make it problematic.
In this example it is fairly obvious that we want to successfully match against 0. Unfortunately this code will print failure every time.
use List::Util qw'first';
my $element = 0;
if( first { $element eq $_ } 0..9 ){
print "success\n";
} else {
print "failure\n";
}
You could check the return value of first for defined-ness, but that will fail if we actually want a match against undef to succeed.
You can safely use grep however.
if( grep { $element eq $_ } 0..9 ){ ... }
This is safe because grep gets called in a scalar context. Arrays return the number of elements when called in scalar context. So this will continue to work even if we try to match against undef.
You could use an enclosing for loop. Just make sure you call last, to exit out of the loop on a successful match. Otherwise you might end up running your code more than once.
for( #array ){
if( $element eq $_ ){
...
last;
}
}
You could put the for loop inside the condition of the if statement ...
if(
do{
my $match = 0;
for( #list ){
if( $element eq $_ ){
$match = 1;
last;
}
}
$match; # the return value of the do block
}
){
...
}
... but it might be more clear to put the for loop before the if statement.
my $match = 0;
for( #list ){
if( $_ eq $element ){
$match = 1;
last;
}
}
if( $match ){ ... }
If you're only matching against strings, you could also use a hash. This can speed up your program if #list is large and, you are going to match against %hash several times. Especially if #array doesn't change, because then you only have to load up %hash once.
my %hash = map { $_, 1 } #array;
if( $hash{ $element } ){ ... }
You could also make your own subroutine. This is one of the cases where it is useful to use prototypes.
sub in(&#){
local $_;
my $code = shift;
for( #_ ){ # sets $_
if( $code->() ){
return 1;
}
}
return 0;
}
if( in { $element eq $_ } #list ){ ... }
if( $element ~~ #list ){
do_task
}
~~ is the "smart match operator", and does more than just list membership detection.
grep is helpful here
if (grep { $_ eq $element } #list) {
....
}
If you plan to do this many times, you can trade-off space for lookup time:
#!/usr/bin/perl
use strict; use warnings;
my #array = qw( one ten twenty one );
my %lookup = map { $_ => undef } #array;
for my $element ( qw( one two three ) ) {
if ( exists $lookup{ $element }) {
print "$element\n";
}
}
assuming that the number of times the element appears in #array is not important and the contents of #array are simple scalars.
List::Util::first
$foo = first { ($_ && $_ eq "value" } #list; # first defined value in #list
Or for hand-rolling types:
my $is_in_list = 0;
foreach my $elem (#list) {
if ($elem && $elem eq $value_to_find) {
$is_in_list = 1;
last;
}
}
if ($is_in_list) {
...
A slightly different version MIGHT be somewhat faster on very long lists:
my $is_in_list = 0;
for (my $i = 0; i < scalar(#list); ++$i) {
if ($list[i] && $list[i] eq $value_to_find) {
$is_in_list = 1;
last;
}
}
if ($is_in_list) {
...
TIMTOWTDI
sub is (&#) {
my $test = shift;
$test->() and return 1 for #_;
0
}
sub in (#) {#_}
if( is {$_ eq "a"} in qw(d c b a) ) {
print "Welcome in perl!\n";
}
List::MoreUtils
On perl >= 5.10 the smart match operator is surely the easiest way, as many others have already said.
On older versions of perl, I would instead suggest List::MoreUtils::any.
List::MoreUtils is not a core module (some say it should be) but it's very popular and it's included in major perl distributions.
It has the following advantages:
it returns true/false (as Python's in does) and not the value of the element, as List::Util::first does (which makes it hard to test, as noted above);
unlike grep, it stops at the first element which passes the test (perl's smart match operator short circuits as well);
it works with any perl version (well, >= 5.00503 at least).
Here is an example which works with any searched (scalar) value, including undef:
use List::MoreUtils qw(any);
my $value = 'test'; # or any other scalar
my #array = (1, 2, undef, 'test', 5, 6);
no warnings 'uninitialized';
if ( any { $_ eq $value } #array ) {
print "$value present\n"
}
P.S.
(In production code it's better to narrow the scope of no warnings 'uninitialized').
Probably Perl6::Junction is the clearest way to do. No XS dependencies, no mess and no new perl version required.
use Perl6::Junction qw/ any /;
if (any(#grant) eq 'su') {
...
}
This blog post discusses the best answers to this question.
As a short summary, if you can install CPAN modules then the best solutions are:
if any(#ingredients) eq 'flour';
or
if #ingredients->contains('flour');
However, a more usual idiom is:
if #any { $_ eq 'flour' } #ingredients
which i find less clear.
But please don't use the first() function! It doesn't express the intent of your code at all. Don't use the "Smart match" operator: it is broken. And don't use grep() nor the solution with a hash: they iterate through the whole list. While any() will stop as soon as it finds your value.
Check out the blog post for more details.
PS: i'm answering for people who will have the same question in the future.
You can accomplish a similar enough syntax in Perl if you do some Autoload hacking.
Create a small package to handle the autoload:
package Autoloader;
use strict;
use warnings;
our $AUTOLOAD;
sub AUTOLOAD {
my $self = shift;
my ($method) = (split(/::/, $AUTOLOAD))[-1];
die "Object does not contain method '$method'" if not ref $self->{$method} eq 'CODE';
goto &{$self->{$method}};
}
1;
Then your other package or main script will contain a subroutine that returns the blessed object which gets handled by Autoload when its method attempts to be called.
sub element {
my $elem = shift;
my $sub = {
in => sub {
return if not $_[0];
# you could also implement this as any of the other suggested grep/first/any solutions already posted.
my %hash; #hash{#_} = ();
return (exists $hash{$elem}) ? 1 : ();
}
};
bless($sub, 'Autoloader');
}
This leaves you with usage looking like:
doTask if element('something')->in(#array);
If you reorganize the closure and its arguments, you can switch the syntax around the other way to make it look like this, which is a bit closer to the autobox style:
doTask if search(#array)->contains('something');
function to do that:
sub search {
my #arr = #_;
my $sub = {
contains => sub {
my $elem = shift or return;
my %hash; #hash{#arr} = ();
return (exists $hash{$elem}) ? 1 : ();
}
};
bless($sub, 'Autoloader');
}

Pimp my Perl code

I'm an experienced developer, but not in Perl. I usually learn Perl to hack a script, then I forget it again until the next time. Hence I'm looking for advice from the pros.
This time around I'm building a series of data analysis scripts. Grossly simplified, the program structure is like this:
01 my $config_var = 999;
03 my $result_var = 0;
05 foreach my $file (#files) {
06 open(my $fh, $file);
07 while (<$fh>) {
08 &analyzeLine($_);
09 }
10 }
12 print "$result_var\n";
14 sub analyzeLine ($) {
15 my $line = shift(#_);
16 $result_var = $result_var + calculatedStuff;
17 }
In real life, there are up to about half a dozen different config_vars and result_vars.
These scripts differ mostly in the values assigned to the config_vars. The main loop will be the same in every case, and analyzeLine() will be mostly the same but could have some small variations.
I can accomplish my purpose by making N copies of this code, with small changes here and there; but that grossly violates all kinds of rules of good design. Ideally, I would like to write a series of scripts containing only a set of config var initializations, followed by
do theCommonStuff;
Note that config_var (and its siblings) must be available to the common code, as must result_var and its lookalikes, upon which analyzeLine() does some calculations.
Should I pack my "common" code into a module? Create a class? Use global variables?
While not exactly code golf, I'm looking for a simple, compact solution that will allow me to DRY and write code only for the differences. I think I would rather not drive the code off a huge table containing all the configs, and certainly not adapt it to use a database.
Looking forward to your suggestions, and thanks!
Update
Since people asked, here's the real analyzeLine:
# Update stats with time and call data in one line.
sub processLine ($) {
my $line = shift(#_);
return unless $line =~ m/$log_match/;
# print "$1 $2\n";
my ($minute, $function) = ($1, $2);
$startMinute = $minute if not $startMinute;
$endMinute = $minute;
if ($minute eq $currentMinute) {
$minuteCount = $minuteCount + 1;
} else {
if ($minuteCount > $topMinuteCount) {
$topMinute = $currentMinute;
$topMinuteCount = $minuteCount;
printf ("%40s %s : %d\n", '', $topMinute, $topMinuteCount);
}
$totalMinutes = $totalMinutes + 1;
$totalCount = $totalCount + $minuteCount;
$currentMinute = $minute;
$minuteCount = 1;
}
}
Since these variables are largely interdependent, I think a functional solution with separate calculations won't be practical. I apologize for misleading people.
Two comments: First, don't post line numbers as they make it more difficult than necessary to copy, paste and edit. Second, don't use &func() to invoke a sub. See perldoc perlsub:
A subroutine may be called using an explicit & prefix. The & is optional in modern Perl, ... Not only does the & form make the argument list optional, it also disables any prototype checking on arguments you do provide.
In short, using & can be surprising unless you know what you are doing and why you are doing it.
Also, don't use prototypes in Perl. They are not the same as prototypes in other languages and, again, can have very surprising effects unless you know what you are doing.
Do not forget to check the return value of system calls such as open. Use autodie with modern perls.
For your specific problem, collect all configuration variables in a hash. Pass that hash to analyzeLine.
#!/usr/bin/perl
use warnings; use strict;
use autodie;
my %config = (
frobnicate => 'yes',
machinate => 'no',
);
my $result;
$result += analyze_file(\%config, $_) for #ARGV;
print "Result = $result\n";
sub analyze_file {
my ($config, $file) = #_;
my $result;
open my $fh, '<', $file;
while ( my $line = <$fh> ) {
$result += analyze_line($config, $line);
}
close $fh;
return $result;
}
sub analyze_line {
my ($line) = #_;
return length $line;
}
Of course, you will note that $config is being passed all over the place, which means you might want to turn this in to a OO solution:
#!/usr/bin/perl
package My::Analyzer;
use strict; use warnings;
use base 'Class::Accessor::Faster';
__PACKAGE__->follow_best_practice;
__PACKAGE__->mk_accessors( qw( analyzer frobnicate machinate ) );
sub analyze_file {
my $self = shift;
my ($file) = #_;
my $result;
open my $fh, '<', $file;
while ( my $line = <$fh> ) {
$result += $self->analyze_line($line);
}
close $fh;
return $result;
}
sub analyze_line {
my $self = shift;
my ($line) = #_;
return $self->get_analyzer->($line);
}
package main;
use warnings; use strict;
use autodie;
my $x = My::Analyzer->new;
$x->set_analyzer(sub {
my $length; $length += length $_ for #_; return $length;
});
$x->set_frobnicate('yes');
$x->set_machinate('no');
my $result;
$result += $x->analyze_file($_) for #ARGV;
print "Result = $result\n";
Go ahead and create a class hierarchy. Your task is an ideal playground for OOP style of programming.
Here's an example:
package Common;
sub new{
my $class=shift;
my $this=bless{},$class;
$this->init();
return $this;
}
sub init{}
sub theCommonStuff(){
my $this=shift;
for(1..10){ $this->analyzeLine($_); }
}
sub analyzeLine(){
my($this,$line)=#_;
$this->{'result'}.=$line;
}
package Special1;
our #ISA=qw/Common/;
sub init{
my $this=shift;
$this->{'sep'}=','; # special param: separator
}
sub analyzeLine(){ # modified logic
my($this,$line)=#_;
$this->{'result'}.=$line.$this->{'sep'};
}
package main;
my $c = new Common;
my $s = new Special1;
$c->theCommonStuff;
$s->theCommonStuff;
print $c->{'result'}."\n";
print $s->{'result'}."\n";
If all the common code is in one function, a function taking your config variables as parameters, and returning the result variables (either as return values, or as in/out parameters), will do. Otherwise, making a class ("package") is a good idea, too.
sub common_func {
my ($config, $result) = #_;
# ...
$result->{foo} += do_stuff($config->{bar});
# ...
}
Note in the above that both the config and result are hashes (actually, references thereto). You can use any other data structure that you feel will suit your goal.
Some thoughts:
If there are several $result_vars, I would recommend creating a separate subroutine for calculating each one.
If a subroutine relies on information outside that function, it should be passed in as a parameter to that subroutine, rather than relying on global state.
Alternatively wrap the whole thing in a class, with $result_var as an attribute of the class.
Practically speaking, there are a couple ways you could implement this:
(1) Have your &analyzeLine function return calculatedStuff, and add it to &result_var in a loop outside the function:
$result_var = 0;
foreach my $file (#files) {
open(my $fh, $file);
while (<$fh>) {
$result_var += analyzeLine($_);
}
}
}
sub analyzeLine ($) {
my $line = shift(#_);
return calculatedStuff;
}
(2) Pass $result_var into analyzeLine explicitly, and return the changed $result_var.
$result_var = 0;
foreach my $file (#files) {
open(my $fh, $file);
while (<$fh>) {
$result_var = addLineToResult($result_var, $_);
}
}
}
sub addLineToResult ($$) {
my $running_total = shift(#_);
my $line = shift(#_);
return $running_total + calculatedStuff;
}
The important part is that if you separate out functions for each of your several $result_vars, you'll be more readily able to write clean code. Don't worry about optimizing yet. That can come later, when your code has proven itself slow. The improved design will make optimization easier when the time comes.
why not create a function and using $config_var and $result_var as parameters?

Access to Perl's empty angle "<>" operator from an actual filehandle?

I like to use the nifty perl feature where reading from the empty angle operator <> magically gives your program UNIX filter semantics, but I'd like to be able to access this feature through an actual filehandle (or IO::Handle object, or similar), so that I can do things like pass it into subroutines and such. Is there any way to do this?
This question is particularly hard to google, because searching for "angle operator" and "filehandle" just tells me how to read from filehandles using the angle operator.
From perldoc perlvar:
ARGV
The special filehandle that iterates over command-line filenames in #ARGV. Usually written as the null filehandle in the angle operator <>. Note that currently ARGV only has its magical effect within the <> operator; elsewhere it is just a plain filehandle corresponding to the last file opened by <>. In particular, passing \*ARGV as a parameter to a function that expects a filehandle may not cause your function to automatically read the contents of all the files in #ARGV.
I believe that answers all aspects of your question in that "Hate to say it but it won't do what you want" kind of way. What you could do is make functions that take a list of filenames to open, and do this:
sub takes_filenames (#) {
local #ARGV = #_;
// do stuff with <>
}
But that's probably the best you'll be able to manage.
Expanding on Chris Lutz's idea, here is a very rudimentary implementation:
#!/usr/bin/perl
package My::ARGV::Reader;
use strict; use warnings;
use autodie;
use IO::Handle;
use overload
'<>' => \&reader,
'""' => \&argv,
'0+' => \&input_line_number,
;
sub new {
my $class = shift;
my $self = {
names => [ #_ ],
handles => [],
current_file => 0,
};
bless $self => $class;
}
sub reader {
my $self = shift;
return scalar <STDIN> unless #{ $self->{names}};
my $line;
while ( 1 ) {
my $current = $self->{current_file};
return if $current >= #{ $self->{names} };
my $fh = $self->{handles}->[$current];
unless ( $fh ) {
$self->{handles}->[$current] = $fh = $self->open_file;
}
if( eof $fh ) {
close $fh;
$self->{current_file} = $current + 1;
next;
}
$line = <$fh>;
last;
}
return $line;
}
sub open_file {
my $self = shift;
my $name = $self->{names}->[ $self->{current_file} ];
open my $fh, '<', $name;
return $fh;
}
sub argv {
my $self = shift;
my $name = #{$self->{names}}
? $self->{names}->[ $self->{current_file} ]
: '-'
;
return $name;
}
sub input_line_number {
my $self = shift;
my $fh = #{$self->{names}}
? $self->{handles}->[$self->{current_file}]
: \*STDIN
;
return $fh->input_line_number;
}
which can be used as:
package main;
use strict; use warnings;
my $it = My::ARGV::Reader->new(#ARGV);
echo($it);
sub echo {
my ($it) = #_;
printf "[%s:%d]:%s", $it, +$it, $_ while <$it>;
}
Output:
[file1:1]:bye bye
[file1:2]:hello
[file1:3]:thank you
[file1:4]:no translation
[file1:5]:
[file2:1]:chao
[file2:2]:hola
[file2:3]:gracias
[file2:4]:
It looks like this has already been implemented as Iterator::Diamond. Iterator::Diamond also disables the 2-argument-open magic that perl uses when reading <ARGV>. Even better, it supports reading '-' as STDIN, without enabling all the other magic. In fact, I might use it for that purpose just on single files.