Prevent strings from being interpreted as a file handle - perl

Perl has the feature that strings named like a file handle are taken to be a filehandle:
# let this be some nice class I wrote
package Input {
sub awesome { ... }
}
So when we do Input->awesome or extra-careful: 'Input'->awesome, the method will get called. Unless:
# now somewhere far, far away, in package main, somebody does this:
open Input, "<&", \*STDIN or die $!; # normally we'd open to a file
This code doesn't even have to be executed, but only be seen by the parser in order to have Perl interpret the string 'Input' as a file handle from now on. Therefore, a method call 'Input'->awesome will die because the object representing the file handle doesn't have awesome methods.
As I am only in control of my class but not of other code, I can't simply decide to only use lexical filehandles everywhere.
Is there any way I can force Input->awesome to always be a method call on the Input package, but never a file handle (at least in scopes controlled by me)? I'd think there shouldn't be any name clash because the Input package is actually the %Input:: stash.
Full code to reproduce the problem (see also this ideone):
use strict;
use warnings;
use feature 'say';
say "This is perl $^V";
package Input {
sub awesome {
say "yay, this works";
}
}
# this works
'Input'->awesome;
# the "open" is parsed, but not actually executed
eval <<'END';
sub red_herring {
open Input, "<&", \*STDIN or die $!;
}
END
say "eval failed: $#" if $#;
# this will die
eval {
'Input'->awesome;
};
say "Caught: $#" if $#;
Example output:
This is perl v5.16.2
yay, this works
Caught: Can't locate object method "awesome" via package "IO::File" at prog.pl line 27.

Using the same identifier for two different things (a used class and filehandle) begs for problems. If your class is used from a different class that's used in the code that uses the filehandle, the error does not appear:
My1.pm
package My1;
use warnings;
use strict;
sub new { bless [], shift }
sub awesome { 'My1'->new }
__PACKAGE__
My2.pm
package My2;
use warnings;
use strict;
use parent 'My1';
sub try {
my $self = shift;
return ('My1'->awesome, $self->awesome);
}
__PACKAGE__
script.pl
#!/usr/bin/perl
use warnings;
use strict;
use My2;
open My1, '<&', *STDIN;
my $o = 'My2'->new;
print $o->awesome, $o->try;

Using the bareword Input as a filehandle is a breach of the naming convention to have only uppercase barewords for FILEHANDLEs and Capitalized/CamelCased barewords for Classes and Packages.
Furthermore lexcial $filehandles have been introduced and encouraged already a very long time ago.
So the programmer using your class is clearly misbehaving, and since namespaces are per definition global this can hardly be addressed by Perl (supporting chorobas statement about begging for problems).
Some naming conventions are crucial for all (dynamic) languages.
Thanks for the interesting question though, the first time I see a Perl question in SO I would preferred to see on perlmonks! :)
UPDATE: The discussion has has been deepened here: http://www.perlmonks.org/?node_id=1083985

Related

Perl unexpected behavior: croak vs. try catch

I had seen some exceptions that pointed to (end of) the catch block itself (see the example below).
As my opinion, this is an unexpected behavior, since it alters the location of original exception and make difficult to debug (it should say die at line 13.)
It shows the (correct) line 13, if I use die/confess or using eval instead try-catch.
Not knowing how my code will be called within the stack, I started to avoid using croak now. What do you think? Did I get right, or there is a way to improve this?
Best regards, Steve
use Carp;
use Try::Tiny;
try {
foo();
}
catch {
# do something before die
die $_;
}; # this is line 10
sub foo {
croak 'die'; # this is line 13
}
Output:
die at line 10.
This is the intended behavior from Carp
[...] use carp() or croak() which report the error as being from where your module was called. [...] There is no guarantee that that is where the error was, but it is a good educated guess.
So the error is reported at where the module's sub is called, which is what the user wants
use warnings;
use strict;
use feature 'say';
use Try::Tiny;
package Throw {
use warnings;
use Carp qw(croak confess);
#sub bam { die "die in module" }; # l.11
sub bam { croak "croak in module" };
1;
};
try {
Throw::bam(); # l.17
}
catch {
say "caught one: $_";
die "die in catch: $_";
};
say "done";
Prints
caught one: croak in module at exceptions.pl line 17.
die in catch: croak in module at exceptions.pl line 17.
If the sub throws using die then this is reported at line 11, what is the normal behavior of die, and what you seem to expect.
If any of this is unclear or suboptimal then better use confess and nicely get a full stacktrace. Also, if you wish more exception-based-like code behavior, can put together an exception/error class and throw its object,† designed and populated as desired.
If you want to confess an object note that at this time Carp has limits with that
The Carp routines don't handle exception objects currently. If called with a first argument that is a reference, they simply call die() or warn(), as appropriate.
One way then would be to confess a stringification of the object,‡ getting at least both a full stack backtrace and whatever is in the object.
I get the same behavior with eval, by replacing try-catch and $_ above
eval {
Throw::bam();
};
if ($#) {
say "caught one: $#";
die "die in catch: $#";
};
Prints exactly the same as above
While the above is clear and behaves as expected, a weird thing is indeed seen in the question's example: the error is reported from the whole try-catch statement, ie. at its closing brace, where line 10 is. (The try sub is prototyped and the whole try-catch is a syntax aid equivalent to a call to try that takes an anonymous sub, and then perhaps more. See ikegami's comment, and docs. Also see this post for more about its syntax.)
This is strange since the call to the croaking sub is foo() inside the try statement and this line should be reported, what can be confirmed by running the script with -MCarp::Always. But in the code in this answer the line of the call to Throw::bam is indeed reported -- why this difference?
The clear purpose of croak is to be used in the libraries, so that the user can see at which point in their (user's) code they called the library in a way that triggered an error. (While die would point to the place where error is detected, so in the library, most likely useless to the user. But read die and Carp docs for related complexities.)
What isn't obvious is that when croak is emitted in the same namespace (main::foo()) from try-catch in its own namespace (Try::Tiny) things get confused, and the end of its statement is reported. This can be checked by adding a foo() to my code above and calling it (instead of a sub from a module), and we get the question's behavior reproduced.
This doesn't happen if main::foo() with croak inside is called from a (complex) statement right in main::, so it seems to be due to the try-catch mix up of namespaces. (On the other hand, try-catch sugar adds an anonymous sub to the callstack, and this sure can mix things up as well.)
In practical terms, I'd say: always use croak out of modules (otherwise use die), or, better yet if you want to mimic exception-based code, use confess and/or your exception class hierarchy.
† Even just like die ExceptionClass->new(...);
Bear in mind that in the way of exceptions Perl only has the lonesome die, and eval. For more structure you'll need to implement it all, or use frameworks like Exception::Class or Throwable
‡ By writing and using a method that generates a plain string with useful information from the object, for Carp::confess $obj->stringify.
Or by overloading the "" (quote) operator for the class since it gets used when confess-ing an object (string context), for Carp::confess $obj; this is good to have anyway.
A basic example for both:
use overload ( q("") => \&stringify );
sub stringify {
my $self = shift;
join ", ", map { "$_ => " . ( $self->{$_} // 'undef' ) } keys %$self
}
where instead of a reference to a named sub on can directly write an anonymous sub.
As a way of solving the OP's problem, but with a different module, if you use Nice::Try instead, you will get the result you expect:
use Carp;
use Nice::Try;
try {
foo();
}
catch {
# do something before die
die $_;
} # this is line 10
sub foo {
croak 'die'; # this is line 13
}
You get:
die at ./try-catch-and-croak.pl line 13.
main::foo() called at ./try-catch-and-croak.pl line 4
main::__ANON__ called at ./try-catch-and-croak.pl line 7
eval {...} called at ./try-catch-and-croak.pl line 7 ...propagated at ./try-catch-and-croak.pl line 9.
For full disclosure, I am the author behind Nice::Try

Cannot load `Cwd` (and other, non-core, modules) at runtime

Imagine I want to load a module at runtime. I expected this to work
use warnings;
use strict;
eval {
require Cwd;
Cwd->import;
};
if ($#) { die "Can't load Cwd: $#" }
say "Dir: ", getcwd;
but it doesn't, per Bareword "getcwd" not allowed ....
The Cwd exports getcwd by default. I tried giving the function name(s) to import and I tried with its other functions.
It works with the full name, say Cwd::getcwd, so I'd think that it isn't importing.
This works as attempted for a few other core modules that I tried, for example
use warnings;
use strict;
eval {
require List::Util;
List::Util->import('max');
};
if ($#) { die "Can't load List::Util: $#" }
my $max = max (1, 14, 3, 26, 2);
print "Max is $max\n";
NOTE added Apparently, function calls with parenthesis give a clue to the compiler. However, in my opinion the question remains, please see EDIT at the end. In addition, a function like first BLOCK LIST from the module above does not work.
However, it does not work for a few (well established) non-core modules that I tried. Worse and more confusingly, it does not work even with the fully qualified names.
I can imagine that the symbol (function) used is not known at compile time if require is used at runtime, but it works for (other) core modules. I thought that this was a standard way to load at runtime.
If I need to use full names when loading dynamically then fine, but what is it with the inconsistency? And how do I load (and use) non-core modules at runtime?
I also tried with Module::Load::Conditional and it did not work.
What am I missing, and how does one load modules at runtime? (Tried with 5.16 and 5.10.1.)
EDIT
As noted by Matt Jacob, a call with parenthesis works, getcwd(). However, given perlsub
NAME LIST; # Parentheses optional if predeclared/imported.
this implies that the import didn't work and the question of why remains.
Besides, having to use varied syntax based on how the module is loaded is not good. Also, I cannot get non-core modules to work this way, specially the ones with syntax like List::MoreUtils has.
First, this has nothing to do with core vs. non-core modules. It happens when the parser has to guess whether a particular token is a function call.
eval {
require Cwd;
Cwd->import;
};
if ($#) { die "Can't load Cwd: $#" }
say "Dir: ", getcwd;
At compile time, there is no getcwd in the main:: symbol table. Without any hints to indicate that it's a function (getcwd() or &getcwd), the parser has no way to know, and strict complains.
eval {
require List::Util;
List::Util->import('max');
};
if ($#) { die "Can't load List::Util: $#" }
my $max = max (1, 14, 3, 26, 2);
At compile time, there is no max in the main:: symbol table. However, since you call max with parentheses, the parser can guess that it's a function that will be defined later, so strict doesn't complain.
In both cases, the strict check happens before import is ever called.
List::MoreUtils is special because the functions use prototypes. Prototypes are ignored if the function definition is not visible at compile time. So, not only do you have to give the parser a hint that you're calling a function, you also have to call it differently since the prototype will be ignored:
use strict;
use warnings 'all';
use 5.010;
eval {
require List::MoreUtils;
List::MoreUtils->import('any')
};
die "Can't load List::MoreUtils: $#" if $#;
say 'found' if any( sub { $_ > 5 }, 1..9 );

Perl module class method vs ordinary subroutine

So I was wondering if there is any difference in usage between a Perl class method and an ordinary subroutine from a standard module. Is there a time you would use one over the other? For this example, I'm assuming that no object methods are present in either module.
Quick little main class here:
#!/usr/local/bin/perl
use strict;
use warnings;
use Foo;
use Bar;
my $arg1 = "blah";
my ($str1, $str2);
$str1 = Foo::subroutine($arg1);
$str2 = Bar->subroutine($arg1);
exit(0);
Package Foo would hold my ordinary subroutine call
use strict;
use warnings;
package Foo;
sub subroutine {
my $arg = shift;
my $string = "Ordinary subroutine arg is $arg\n";
return $string;
}
1;
Package Bar would hold my class method call
use strict;
use warnings;
package Bar;
sub subroutine {
my $class = shift;
my $arg = shift;
my $string = "Class method arg is $arg\n";
return $string;
}
1;
Normally if I'm writing Perl code, I would just use the class method option (like with the Bar example), but I started pondering this question after reading some code from a former coworker that used the syntax like in the Foo example. Both seem to do the same thing inherently, but there seems to be more than meets the eye.
The decider is whether your Module is an object-oriented module or not.
If Module is simply a container for a collection of subroutines, then I would expect it to use Exporter and offer the opportunity to import a subset of its subroutines into the calling name space. An example is List::Util
On the other hand, if there is a constructor Module::new, and the intention is to use it in an OO manner, then you shouldn't mix simple subroutines in with methods (except perhaps for private subroutines that the module uses internally). An example is LWP::UserAgent
So I would expect the sources to be written like one or the other of these, and not a mixture in between. Of course there are always circumstances where a rule of thumb should be ignored, but nothing comes to mind in this case.
Foo.pm
use strict;
use warnings;
package Foo;
use Exporter 'import';
our #EXPORT_OK = qw/ subroutine /;
sub subroutine {
my ($arg) = #_;
"Ordinary subroutine arg is $arg\n";
}
1;
Bar.pm
use strict;
use warnings;
package Bar;
sub new {
my $class = shift;
bless {}, $class;
}
sub subroutine {
my $class = shift;
my ($arg) = #_;
"Class method arg is $arg\n";
}
1;
main.pl
#!/usr/local/bin/perl
use strict;
use warnings;
use Foo 'subroutine';
use Bar;
my $arg1 = "blah";
print subroutine($arg1);
print Bar->subroutine($arg1);
output
Ordinary subroutine arg is blah
Class method arg is blah
There's nothing inherently wrong with an ordinary subroutine. They do what they're designed to do very well.
Methods on the other hand, do all the same things and play nicely with any Class that inherits from yours.
So ask yourself:
Are you expecting/permitting/encouraging folks to write Classes that inherit from your module?
Is your module defining a more complex data structure that works well as an object?
or
Is your module a library of utilities that operate on fundamental data types?
There's plenty of room in this world for both, but if you find yourself, as you did in Bar, ignoring $class (or more commonly $self) throughout the module, then perhaps you've gone too far by designing them as methods. More importantly, anyone who tries to inherit from your marginally OO "Class" will get a rude surprise when your methods can't tell the difference between the two classes...
This is more a question of code paradigm.
There is absolutely nothing wrong with a non object oriented approach to your code. It works, and it works well.
However, object orientation provides a bunch of benefits that are worth considering - and if they're something you want, go an OO route.
Specifically - objects provide encapsulation. It makes it a lot easier for me to write a module and you to just use it. Look at say, LWP::UserAgent for an example:
require LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;
$ua->agent('Mozilla/5.0');
my $response = $ua->get('http://search.cpan.org/');
if ($response->is_success) {
print $response->decoded_content; # or whatever
}
else {
die $response->status_line;
}
Now, all of the above you could do via inherited subroutines. But if you wanted to do multiple fetches of multiple pages, you'd either have to:
Build a sub that took all the parameters you needed - including returning somehow, a 'success/fail/result' - maybe in an array?
Otherwise have 'state' hidden in your external module.
OO is just a neater, more understandable method of doing that. (There are other benefits of doing OO, which I'm sure you could Google).

Perl - New definition of myprint() or Overload print command

I am a newb to Perl. I am writing some scripts and want to define my own print called myprint() which will print the stuff passed to it based on some flags (verbose/debug flag)
open(FD, "> /tmp/abc.txt") or die "Cannot create abc.txt file";
print FD "---Production Data---\n";
myprint "Hello - This is only a comment - debug data";
Can someone please help me with some sample code to for myprint() function?
Do you care more about writing your own logging system, or do you want to know how to put logging statements in appropriate parts of your program which you can turn off (and, incur little performance penalty when they are turned off)?
If you want a logging system that is easy to start using, but also offers a world of features which you can incrementally discover and use, Log::Log4perl is a good option. It has an easy mode, which allows you to specify the desired logging level, and emits only those logging messages that are above the desired level.
#!/usr/bin/env perl
use strict; use warnings;
use File::Temp qw(tempfile);
use Log::Log4perl qw(:easy);
Log::Log4perl->easy_init({level => $INFO});
my ($fh, $filename) = tempfile;
print $fh "---Production Data---\n";
WARN 'Wrote something somewhere somehow';
The snippet also shows a better way of opening a temporary file using File::Temp.
As for overriding the built-in print … It really isn't a good idea to fiddle with built-ins except in very specific circumstances. perldoc perlsub has a section on Overriding Built-in Functions. The accepted answer to this question lists the Perl built-ins that cannot be overridden. print is one of those.
But, then, one really does not need to override a built-in to write a logging system.
So, if an already-written logging system does not do it for you, you really seem to be asking "how do I write a function that prints stuff conditionally depending on the value of a flag?"
Here is one way:
#!/usr/bin/env perl
package My::Logger;
{
use strict; use warnings;
use Sub::Exporter -setup => {
exports => [
DEBUG => sub {
return sub {} unless $ENV{MYDEBUG};
return sub { print 'DEBUG: ' => #_ };
},
]
};
}
package main;
use strict; use warnings;
# You'd replace this with use My::Logger qw(DEBUG) if you put My::Logger
# in My/Logger.pm somewhere in your #INC
BEGIN {
My::Logger->import('DEBUG');
}
sub nicefunc {
print "Hello World!\n";
DEBUG("Isn't this a nice function?\n");
return;
}
nicefunc();
Sample usage:
$ ./yy.pl
Hello World!
$ MYDEBUG=1 ./yy.pl
Hello World!
DEBUG: Isn't this a nice function?
I wasn't going to answer this because Sinan already has the answer I'd recommend, but tonight I also happened to be working on the "Filehandle References" chapter to the upcoming Intermediate Perl. That are a couple of relevant paragraphs which I'll just copy directly without adapting them to your question:
IO::Null and IO::Interactive
Sometimes we don't want to send our output anywhere, but we are forced
to send it somewhere. In that case, we can use IO::Null to create
a filehandle that simply discards anything that we give it. It looks
and acts just like a filehandle, but does nothing:
use IO::Null;
my $null_fh = IO::Null->new;
some_printing_thing( $null_fh, #args );
Other times, we want output in some cases but not in others. If we are
logged in and running our program in our terminal, we probably want to
see lots of output. However, if we schedule the job through cron, we
probably don't care so much about the output as long as it does the job.
The IO::Interactive module is smart enough to tell the difference:
use IO::Interactive;
print { is_interactive } 'Bamboo car frame';
The is_interactive subroutine returns a filehandle. Since the
call to the subroutine is not a simple scalar variable, we surround
it with braces to tell Perl that it's the filehandle.
Now that you know about "do nothing" filehandles, you can replace some
ugly code that everyone tends to write. In some cases you want output
and in some cases you don't, so many people use a post-expression
conditional to turn off a statement in some cases:
print STDOUT "Hey, the radio's not working!" if $Debug;
Instead of that, you can assign different values to $debug_fh based
on whatever condition you want, then leave off the ugly if $Debug
at the end of every print:
use IO::Null;
my $debug_fh = $Debug ? *STDOUT : IO::Null->new;
$debug_fh->print( "Hey, the radio's not working!" );
The magic behind IO::Null might give a warning about "print() on
unopened filehandle GLOB" with the indirect object notation (e.g.
print $debug_fh) even though it works just fine. We don't get that
warning with the direct form.

How can I dynamically include Perl modules without using eval?

I need to dynamically include a Perl module, but if possible would like to stay away from eval due to work coding standards. This works:
$module = "My::module";
eval("use $module;");
But I need a way to do it without eval if possible. All google searches lead to the eval method, but none in any other way.
Is it possible to do it without eval?
Use require to load modules at runtime. It often a good idea to wrap this in a block (not string) eval in case the module can't be loaded.
eval {
require My::Module;
My::Module->import();
1;
} or do {
my $error = $#;
# Module load failed. You could recover, try loading
# an alternate module, die with $error...
# whatever's appropriate
};
The reason for the eval {...} or do {...} syntax and making a copy of $# is because $# is a global variable that can be set by many different things. You want to grab the value as atomically as possible to avoid a race condition where something else has set it to a different value.
If you don't know the name of the module until runtime you'll have to do the translation between module name (My::Module) and file name (My/Module.pm) manually:
my $module = 'My::Module';
eval {
(my $file = $module) =~ s|::|/|g;
require $file . '.pm';
$module->import();
1;
} or do {
my $error = $#;
# ...
};
How about using the core module Module::Load
With your example:
use Module::Load;
my $module = "My::module";
load $module;
"Module::Load - runtime require of both modules and files"
"load eliminates the need to know whether you are trying to require either a file or a module."
If it fails it will die with something of the like "Can't locate xxx in #INC (#INC contains: ...".
Well, there's always require as in
require 'My/Module.pm';
My::Module->import();
Note that you lose whatever effects you may have gotten from the import being called at compile time instead of runtime.
Edit: The tradeoffs between this and the eval way are: eval lets you use the normal module syntax and gives you a more explicit error if the module name is invalid (as opposed to merely not found). OTOH, the eval way is (potentially) more subject to arbitrary code injection.
No, it's not possible to without eval, as require() needs the bareword module name, as described at perldoc -f require. However, it's not an evil use of eval, as it doesn't allow injection of arbitrary code (assuming you have control over the contents of the file you are requireing, of course).
EDIT: Code amended below, but I'm leaving the first version up for completeness.
I use I used to use this little sugar module to do dynamic loads at runtime:
package MyApp::Util::RequireClass;
use strict;
use warnings;
use Exporter 'import'; # gives you Exporter's import() method directly
our #EXPORT_OK = qw(requireClass);
# Usage: requireClass(moduleName);
# does not do imports (wrong scope) -- you should do this after calling me: $class->import(#imports);
sub requireClass
{
my ($class) = #_;
eval "require $class" or do { die "Ack, can't load $class: $#" };
}
1;
PS. I'm staring at this definition (I wrote it quite a while ago) and I'm pondering adding this:
$class->export_to_level(1, undef, #imports);... it should work, but is not tested.
EDIT: version 2 now, much nicer without an eval (thanks ysth): :)
package MyApp::Util::RequireClass;
use strict;
use warnings;
use Exporter 'import'; # gives you Exporter's import() method directly
our #EXPORT_OK = qw(requireClass);
# Usage: requireClass(moduleName);
# does not do imports (wrong scope) -- you should do this after calling me: $class->import(#imports);
sub requireClass
{
my ($class) = #_;
(my $file = $class) =~ s|::|/|g;
$file .= '.pm';
require $file; # will die if there was an error
}
1;
Class::MOP on CPAN has a load_class method for this:
http://metacpan.org/pod/Class::MOP
i like doing things like..
require Win32::Console::ANSI if ( $^O eq "MSWin32" );