Saving a reference to a localized filehandle. How does it work? - perl

This question is based on the observed behavior of patch running with a certain version of perl. When running a command like:
$ patch -N -p0 -u -b .bak < my.patch
I occasionally got output like:
print() on unopened filehandle NULL at patch line 715, <IN> line 12330.
When looking into the code, I see that the NULL filehandle is localized and saved in the object hash:
sub new {
# ....
local *NULL;
tie *NULL, 'Dev::Null';
$self->{o_fh} = \*NULL; # output filehandle
# ....
}
Since this behavior (the output of the message print() on unopened filehandle NULL) only occured for certain versions of perl and (maybe certain version of the patch program) I wondered if this is a bug? To me it looks like one should not localize NULL since we are saving a reference to it and the value of reference (*NULL) will be restored to its previous value when returning from new().
Here is a minimal example:
use feature qw(say);
use strict;
use warnings;
my $p = Patch->new();
$p->apply();
package Patch;
sub new {
my ( $class ) = #_;
my $self = bless {}, $class;
local *NULL;
tie *NULL, 'Dev::Null';
$self->{null} = \*NULL;
local *OUT;
my $out = 'out.txt';
open OUT, ">$out" or die "Couldn't open '$out': $!\n";
$self->{out} = \*OUT;
return $self;
}
sub apply {
my ( $self ) = #_;
my $null = $self->{null};
say $null "This should be discarded..";
my $out = $self->{out};
say $out "This is output to the file..";
}
package Dev::Null;
sub TIEHANDLE { bless \my $null }
sub PRINT {}
sub PRINTF {}
sub WRITE {}
sub READLINE {''}
sub READ {''}
sub GETC {''}
The output when I run this is:
say() on unopened filehandle NULL at ./p.pl line 34.
say() on unopened filehandle OUT at ./p.pl line 36.

It's a bug in patch.
$self->{...} = \*NULL;
should be
$self->{...} = *NULL;
Let's look at these four snippets:
my $r; $s = "abc"; $r = \$s; say $$r;
my $r; { local $s; $s = "abc"; $r = \$s; } say $$r;
my $r; *F = \*STDOUT; $r = \*F; say $r "abc";
my $r; { local *F; *F = \*STDOUT; $r = \*F; } say $r "abc";
Given that the first three work, we would expect the fourth to work too, but it doesn't.
We can't really talk in terms of variables and values in Perl. Perl's model is far more complex than C's where a variable is just a name that represents a location. Globs are even more complex because they're both a variable type (*FOO) something that can be found in a scalar ($foo = *FOO;). The above difference is related to this.
The following does work while still properly localizing *F:
my $r; { local *F; *F = \*STDOUT; $r = *F; } say $r "abc";
patch already uses this approach for *OUT, but it needs to use it for *NULL too. It probably went unnoticed because *NULL is used as a sink, and using an undefined handle also acts as a sink (if you disregard the warning and the error returned by print).

Related

Perl print out all subs arguments at every call at runtime

I'm looking for way to debug print each subroutine call from the namespace Myapp::* (e.g. without dumping the CPAN modules), but without the need edit every .pm file manually for to inserting some module or print statement.
I just learning (better to say: trying to understand) the package DB, what allows me tracing the execution (using the shebang #!/usr/bin/perl -d:Mytrace)
package DB;
use 5.010;
sub DB {
my( $package, $file, $line ) = caller;
my $code = \#{"::_<$file"};
print STDERR "--> $file $line $code->[$line]";
}
#sub sub {
# print STDERR "$sub\n";
# &$sub;
#}
1;
and looking for a way how to use the sub call to print the actual arguments of the called sub from the namespace of Myapp::*.
Or is here some easier (common) method to
combine the execution line-tracer DB::DB
with the Dump of the each subroutine call arguments (and its return values, if possible)?
I don't know if it counts as "easier" in any sane meaning of the word, but you can walk the symbol table and wrap all functions in code that prints their arguments and return values. Here's an example of how it might be done:
#!/usr/bin/env perl
use 5.14.2;
use warnings;
package Foo;
sub first {
my ( $m, $n ) = #_;
return $m+$n;
}
sub second {
my ( $m, $n ) = #_;
return $m*$n;
}
package main;
no warnings 'redefine';
for my $k (keys %{$::{'Foo::'}}) {
my $orig = *{$::{'Foo::'}{$k}}{CODE};
$::{'Foo::'}{$k} = sub {
say "Args: #_";
unless (wantarray) {
my $r = $orig->(#_);
say "Scalar return: $r";
return $r;
}
else {
my #r = $orig->(#_);
say "List return: #r";
return #r
}
}
}
say Foo::first(2,3);
say Foo::second(4,6);

Printing to stdout from a Perl XS extension

I recently started playing around with writing Perl (v5.8.8) extensions using XS. One of the methods I am writing collects a bunch of data and splats it to the client. I want to write some unit tests that make assertions against the output, but I'm running in to a problem: It doesn't appear that the PerlIO methods are passing data through the same channels as a print call in Perl does. Normally, you can tie in to the STDOUT file handler and intercept the result, but the PerlIO methods seem to be bypassing this completely.
I've pasted an example below, but the basic jist of my test is this: Tie in to STDOUT, run code, untie, return collected string. Doing this, I'm able to capture print statements, but not the PerlIO_* calls from my module. I've tried using PerlIO_write, PerlIO_puts, PerlIO_printf, and more. No dice.
From scratch, here is a minimal repro of what I'm doing:
h2xs -A -n IOTest
cd IOTest
Put this in to IOTest.xs:
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include "ppport.h"
MODULE = IOTest PACKAGE = IOTest
void
oink ()
CODE:
PerlIO_puts(PerlIO_stdout(), "oink!\n");
And this goes in to a file called test.pl (The interesting part is near the bottom, everything else is just for capturing stdout):
# Set up the include path to match the build directories
BEGIN {
push #INC, './blib/lib/';
push #INC, './blib/arch/auto/IOTest';
}
use IOTest;
# This package is just a set of hooks for tieing in to stdout
{
# Lifted from the Test::Output module found here:
# http://search.cpan.org/~bdfoy/Test-Output-1.01/lib/Test/Output.pm
package OutputTie;
sub TIEHANDLE {
my $class = shift;
my $scalar = '';
my $obj = shift || \$scalar;
bless( $obj, $class);
}
sub PRINT {
my $self = shift;
$$self .= join(defined $, ? $, : '', #_);
$$self .= defined $\ ? $\ : '';
}
sub PRINTF {
my $self = shift;
my $fmt = shift;
$$self .= sprintf $fmt, #_;
}
sub read {
my $self = shift;
my $data = $$self;
$$self = '';
return $data;
}
}
# Runs a sub, intercepts stdout and returns it as a string
sub getStdOut (&) {
my $callback = shift;
select( ( select(STDOUT), $| = 1 )[0] );
my $out = tie *STDOUT, 'OutputTie';
$callback->();
my $stdout = $out->read;
undef $out;
untie *STDOUT;
return $stdout;
}
# This is the interesting part, the actual test:
print "Pre-capture\n";
my $output = getStdOut(sub {
print "before";
IOTest::oink();
print "after";
});
print "Captured StdOut:\n" . $output . "\nend\n";
Building and testing is then just a matter of:
perl Makefile.PL
make
perl test.pl
The output I'm seeing is:
Pre-capture
oink!
Captured StdOut:
beforeafter
end
Obviously, I'm expecting "oink!" to be sandwiched between "before" and "after", but that doesn't appear to be happening.
Any ideas?
I think the capturing is faulty. Compare:
use IOTest;
use Capture::Tiny qw(capture);
print "Pre-capture\n";
my $output = capture {
print "before";
IOTest::oink();
print "after";
};
print "Captured StdOut:\n" . $output . "\nend\n";
Pre-capture
Captured StdOut:
beforeoink!
after
end

How can I store and access a filehandle in a Perl class?

please look at the following code first.
#! /usr/bin/perl
package foo;
sub new {
my $pkg = shift;
my $self = {};
my $self->{_fd} = undef;
bless $self, $pkg;
return $self;
}
sub Setfd {
my $self = shift;
my $fd = shift;
$self_->{_fd} = $fd;
}
sub write {
my $self = shift;
print $self->{_fd} "hello word";
}
my $foo = new foo;
My intention is to store a file handle within a class using hash. the file handle is undefined at first, but can be initilized afterwards by calling Setfd function. then
write can be called to actually write string "hello word" to a file indicated by the file handle, supposed that the file handle is the result of a success "write" open.
but, perl compiler just complains that there are syntax error in the "print" line. can anyone of you tells me what's wrong here? thanks in advance.
You will need to put the $self->{_fd} expression in a block or assign it to a simpler expression:
print { $self->{_fd} } "hello word";
my $fd = $self->{_fd};
print $fd "hello word";
From perldoc -f print:
Note that if you're storing FILEHANDLEs in an array, or if you're using any other expression more complex than a scalar variable to retrieve it, you will have to use a block returning the filehandle value instead:
print { $files[$i] } "stuff\n";
print { $OK ? STDOUT : STDERR } "stuff\n";
Alternately:
use IO::Handle;
# ... later ...
$self->{_fd}->print('hello world');

Pimp my Perl code

I'm an experienced developer, but not in Perl. I usually learn Perl to hack a script, then I forget it again until the next time. Hence I'm looking for advice from the pros.
This time around I'm building a series of data analysis scripts. Grossly simplified, the program structure is like this:
01 my $config_var = 999;
03 my $result_var = 0;
05 foreach my $file (#files) {
06 open(my $fh, $file);
07 while (<$fh>) {
08 &analyzeLine($_);
09 }
10 }
12 print "$result_var\n";
14 sub analyzeLine ($) {
15 my $line = shift(#_);
16 $result_var = $result_var + calculatedStuff;
17 }
In real life, there are up to about half a dozen different config_vars and result_vars.
These scripts differ mostly in the values assigned to the config_vars. The main loop will be the same in every case, and analyzeLine() will be mostly the same but could have some small variations.
I can accomplish my purpose by making N copies of this code, with small changes here and there; but that grossly violates all kinds of rules of good design. Ideally, I would like to write a series of scripts containing only a set of config var initializations, followed by
do theCommonStuff;
Note that config_var (and its siblings) must be available to the common code, as must result_var and its lookalikes, upon which analyzeLine() does some calculations.
Should I pack my "common" code into a module? Create a class? Use global variables?
While not exactly code golf, I'm looking for a simple, compact solution that will allow me to DRY and write code only for the differences. I think I would rather not drive the code off a huge table containing all the configs, and certainly not adapt it to use a database.
Looking forward to your suggestions, and thanks!
Update
Since people asked, here's the real analyzeLine:
# Update stats with time and call data in one line.
sub processLine ($) {
my $line = shift(#_);
return unless $line =~ m/$log_match/;
# print "$1 $2\n";
my ($minute, $function) = ($1, $2);
$startMinute = $minute if not $startMinute;
$endMinute = $minute;
if ($minute eq $currentMinute) {
$minuteCount = $minuteCount + 1;
} else {
if ($minuteCount > $topMinuteCount) {
$topMinute = $currentMinute;
$topMinuteCount = $minuteCount;
printf ("%40s %s : %d\n", '', $topMinute, $topMinuteCount);
}
$totalMinutes = $totalMinutes + 1;
$totalCount = $totalCount + $minuteCount;
$currentMinute = $minute;
$minuteCount = 1;
}
}
Since these variables are largely interdependent, I think a functional solution with separate calculations won't be practical. I apologize for misleading people.
Two comments: First, don't post line numbers as they make it more difficult than necessary to copy, paste and edit. Second, don't use &func() to invoke a sub. See perldoc perlsub:
A subroutine may be called using an explicit & prefix. The & is optional in modern Perl, ... Not only does the & form make the argument list optional, it also disables any prototype checking on arguments you do provide.
In short, using & can be surprising unless you know what you are doing and why you are doing it.
Also, don't use prototypes in Perl. They are not the same as prototypes in other languages and, again, can have very surprising effects unless you know what you are doing.
Do not forget to check the return value of system calls such as open. Use autodie with modern perls.
For your specific problem, collect all configuration variables in a hash. Pass that hash to analyzeLine.
#!/usr/bin/perl
use warnings; use strict;
use autodie;
my %config = (
frobnicate => 'yes',
machinate => 'no',
);
my $result;
$result += analyze_file(\%config, $_) for #ARGV;
print "Result = $result\n";
sub analyze_file {
my ($config, $file) = #_;
my $result;
open my $fh, '<', $file;
while ( my $line = <$fh> ) {
$result += analyze_line($config, $line);
}
close $fh;
return $result;
}
sub analyze_line {
my ($line) = #_;
return length $line;
}
Of course, you will note that $config is being passed all over the place, which means you might want to turn this in to a OO solution:
#!/usr/bin/perl
package My::Analyzer;
use strict; use warnings;
use base 'Class::Accessor::Faster';
__PACKAGE__->follow_best_practice;
__PACKAGE__->mk_accessors( qw( analyzer frobnicate machinate ) );
sub analyze_file {
my $self = shift;
my ($file) = #_;
my $result;
open my $fh, '<', $file;
while ( my $line = <$fh> ) {
$result += $self->analyze_line($line);
}
close $fh;
return $result;
}
sub analyze_line {
my $self = shift;
my ($line) = #_;
return $self->get_analyzer->($line);
}
package main;
use warnings; use strict;
use autodie;
my $x = My::Analyzer->new;
$x->set_analyzer(sub {
my $length; $length += length $_ for #_; return $length;
});
$x->set_frobnicate('yes');
$x->set_machinate('no');
my $result;
$result += $x->analyze_file($_) for #ARGV;
print "Result = $result\n";
Go ahead and create a class hierarchy. Your task is an ideal playground for OOP style of programming.
Here's an example:
package Common;
sub new{
my $class=shift;
my $this=bless{},$class;
$this->init();
return $this;
}
sub init{}
sub theCommonStuff(){
my $this=shift;
for(1..10){ $this->analyzeLine($_); }
}
sub analyzeLine(){
my($this,$line)=#_;
$this->{'result'}.=$line;
}
package Special1;
our #ISA=qw/Common/;
sub init{
my $this=shift;
$this->{'sep'}=','; # special param: separator
}
sub analyzeLine(){ # modified logic
my($this,$line)=#_;
$this->{'result'}.=$line.$this->{'sep'};
}
package main;
my $c = new Common;
my $s = new Special1;
$c->theCommonStuff;
$s->theCommonStuff;
print $c->{'result'}."\n";
print $s->{'result'}."\n";
If all the common code is in one function, a function taking your config variables as parameters, and returning the result variables (either as return values, or as in/out parameters), will do. Otherwise, making a class ("package") is a good idea, too.
sub common_func {
my ($config, $result) = #_;
# ...
$result->{foo} += do_stuff($config->{bar});
# ...
}
Note in the above that both the config and result are hashes (actually, references thereto). You can use any other data structure that you feel will suit your goal.
Some thoughts:
If there are several $result_vars, I would recommend creating a separate subroutine for calculating each one.
If a subroutine relies on information outside that function, it should be passed in as a parameter to that subroutine, rather than relying on global state.
Alternatively wrap the whole thing in a class, with $result_var as an attribute of the class.
Practically speaking, there are a couple ways you could implement this:
(1) Have your &analyzeLine function return calculatedStuff, and add it to &result_var in a loop outside the function:
$result_var = 0;
foreach my $file (#files) {
open(my $fh, $file);
while (<$fh>) {
$result_var += analyzeLine($_);
}
}
}
sub analyzeLine ($) {
my $line = shift(#_);
return calculatedStuff;
}
(2) Pass $result_var into analyzeLine explicitly, and return the changed $result_var.
$result_var = 0;
foreach my $file (#files) {
open(my $fh, $file);
while (<$fh>) {
$result_var = addLineToResult($result_var, $_);
}
}
}
sub addLineToResult ($$) {
my $running_total = shift(#_);
my $line = shift(#_);
return $running_total + calculatedStuff;
}
The important part is that if you separate out functions for each of your several $result_vars, you'll be more readily able to write clean code. Don't worry about optimizing yet. That can come later, when your code has proven itself slow. The improved design will make optimization easier when the time comes.
why not create a function and using $config_var and $result_var as parameters?

Access to Perl's empty angle "<>" operator from an actual filehandle?

I like to use the nifty perl feature where reading from the empty angle operator <> magically gives your program UNIX filter semantics, but I'd like to be able to access this feature through an actual filehandle (or IO::Handle object, or similar), so that I can do things like pass it into subroutines and such. Is there any way to do this?
This question is particularly hard to google, because searching for "angle operator" and "filehandle" just tells me how to read from filehandles using the angle operator.
From perldoc perlvar:
ARGV
The special filehandle that iterates over command-line filenames in #ARGV. Usually written as the null filehandle in the angle operator <>. Note that currently ARGV only has its magical effect within the <> operator; elsewhere it is just a plain filehandle corresponding to the last file opened by <>. In particular, passing \*ARGV as a parameter to a function that expects a filehandle may not cause your function to automatically read the contents of all the files in #ARGV.
I believe that answers all aspects of your question in that "Hate to say it but it won't do what you want" kind of way. What you could do is make functions that take a list of filenames to open, and do this:
sub takes_filenames (#) {
local #ARGV = #_;
// do stuff with <>
}
But that's probably the best you'll be able to manage.
Expanding on Chris Lutz's idea, here is a very rudimentary implementation:
#!/usr/bin/perl
package My::ARGV::Reader;
use strict; use warnings;
use autodie;
use IO::Handle;
use overload
'<>' => \&reader,
'""' => \&argv,
'0+' => \&input_line_number,
;
sub new {
my $class = shift;
my $self = {
names => [ #_ ],
handles => [],
current_file => 0,
};
bless $self => $class;
}
sub reader {
my $self = shift;
return scalar <STDIN> unless #{ $self->{names}};
my $line;
while ( 1 ) {
my $current = $self->{current_file};
return if $current >= #{ $self->{names} };
my $fh = $self->{handles}->[$current];
unless ( $fh ) {
$self->{handles}->[$current] = $fh = $self->open_file;
}
if( eof $fh ) {
close $fh;
$self->{current_file} = $current + 1;
next;
}
$line = <$fh>;
last;
}
return $line;
}
sub open_file {
my $self = shift;
my $name = $self->{names}->[ $self->{current_file} ];
open my $fh, '<', $name;
return $fh;
}
sub argv {
my $self = shift;
my $name = #{$self->{names}}
? $self->{names}->[ $self->{current_file} ]
: '-'
;
return $name;
}
sub input_line_number {
my $self = shift;
my $fh = #{$self->{names}}
? $self->{handles}->[$self->{current_file}]
: \*STDIN
;
return $fh->input_line_number;
}
which can be used as:
package main;
use strict; use warnings;
my $it = My::ARGV::Reader->new(#ARGV);
echo($it);
sub echo {
my ($it) = #_;
printf "[%s:%d]:%s", $it, +$it, $_ while <$it>;
}
Output:
[file1:1]:bye bye
[file1:2]:hello
[file1:3]:thank you
[file1:4]:no translation
[file1:5]:
[file2:1]:chao
[file2:2]:hola
[file2:3]:gracias
[file2:4]:
It looks like this has already been implemented as Iterator::Diamond. Iterator::Diamond also disables the 2-argument-open magic that perl uses when reading <ARGV>. Even better, it supports reading '-' as STDIN, without enabling all the other magic. In fact, I might use it for that purpose just on single files.