Catching runtime errors in Perl and converting to exceptions - perl

Perl currently implements $SIG{__DIE__} in such a way that it will catch any error that occurs, even inside eval blocks. This has a really useful property that you can halt the code at the exact point where the error occurs, collect a stack trace of the actual error, wrap this up in an object, and then call die manually with this object as the parameter.
This abuse of $SIG{__DIE__} is deprecated. Officially, you are supposed to replace $SIG{__DIE__} with *CORE::GLOBAL::die. However, these two are NOT remotely equivalent. *CORE::GLOBAL::die is NOT called when a runtime error occurs! All it does is replace explicit calls to die().
I am not interested in replacing die.
I am specifically interested in catching runtime errors.
I need to ensure that any runtime error, in any function, at any depth, in any module, causes Perl to pass control to me so that I can collect the stack trace and rethrow. This needs to work inside an eval block -- one or more enclosing eval blocks may want to catch the exception, but the runtime error could be in a function without an enclosing eval, inside any module, from anywhere.
$SIG{__DIE__} supports this perfectly—and has served me faithfully for a couple of years or more—but the Powers that Be™ warn that this fantastic facility may be snatched away at any time, and I don't want a nasty surprise one day down the line.
Ideally, for Perl itself, they could create a new signal $SIG{__RTMERR__} for this purpose (switching signal is easy enough, for me anyway, as it's only hooked in one place). Unfortunately, my persuasive powers wouldn't lead an alcoholic to crack open a bottle, so assuming this will not happen, how exactly is one supposed to achieve this aim of catching runtime errors cleanly?
(For example, another answer here recommends Carp::Always, which … also hooks DIE!)

Just do it. I've done it. Probably everyone who's aware of this hook has done it.
It's Perl; it's still compatible going back decades. I interpret "deprecated" here to mean "please don't use this if you don't need it, ew, gross". But you do need it, and seem to understand the implications, so imo go for it. I seriously doubt an irreplaceable language feature is going away any time soon.
And release your work on CPAN so the next dev doesn't need to reinvent this yet again. :)

Related

Skipping error in eval{} statement

I am trying to extract data from website using Perl API. I am using a list of URIs to get the data from the website.
Initially the problem was that if there was no data available for the URI it would die and I wanted it to skip that particular URI and go to the next available URI. I used next unless ....; to come over this problem.
Now the problem is I am trying to extract specific data from the web by calling a specific method (called as identifiers()) from the API. Now the data is available for the URI but the specific data (the identifiers), what I am looking for, is not available and it dies.
I tried to use eval{} like this
eval {
for $bar ($foo->identifiers()){
#do something
};
}
When I use eval{} I think it skips the error and moves ahead but I am not sure. Because the error it gives is Invalid content type in response:text/plain.
Whereas I checked the URI manually, though it doesn't have the identifiers it has rest of the data. I want this to skip and move to next URI. How can I do that?
OK, I think I understand your question, but a little more code would have helped, as would specifying which Perl API -- not that it seems to matter to the answer, but it is a big part of your question. Having said that, the problem seems very simple.
When Perl hits an error, like most languages, it runs out through the calling contexts in order until it finds a place where it can handle the error. Perl's most basic error handling is eval{} (but I'd use Try::Tiny if you can, as it is then clearer that you're doing error handling instead of some of the other strange things eval can do).
Anyway, when Perl hits eval{}, the whole of eval{} exits, and $& is set to the error. So, having the eval{} outside the loop means errors will leave the loop. If you put the eval{} inside the loop, when an error occurs, eval{} will exit, but you will carry on to the next iteration. It's that simple.
I also detect signs that maybe you're not using use strict; and use warnings;. Please do, as they help you find many bugs quicker.

Do you use an exception class in your Perl programs? Why or why not?

I've got a bunch of questions about how people use exceptions in Perl. I've included some background notes on exceptions, skip this if you want, but please take a moment to read the questions and respond to them.
Thanks.
Background on Perl Exceptions
Perl has a very basic built-in exception system that provides a spring-board for more sophisticated usage.
For example die "I ate a bug.\n"; throws an exception with a string assigned to $#.
You can also throw an object, instead of a string: die BadBug->new('I ate a bug.');
You can even install a signal handler to catch the SIGDIE psuedo-signal. Here's a handler that rethrows exceptions as objects if they aren't already.
$SIG{__DIE__} = sub {
my $e = shift;
$e = ExceptionObject->new( $e ) unless blessed $e;
die $e;
}
This pattern is used in a number of CPAN modules. but perlvar says:
Due to an implementation glitch, the
$SIG{DIE} hook is called even
inside an eval(). Do not use this to
rewrite a pending exception in $# , or
as a bizarre substitute for overriding
CORE::GLOBAL::die() . This strange
action at a distance may be fixed in a
future release so that $SIG{DIE}
is only called if your program is
about to exit, as was the original
intent. Any other use is deprecated.
So now I wonder if objectifying exceptions in sigdie is evil.
The Questions
Do you use exception objects? If so, which one and why? If not, why not?
If you don't use exception objects, what would entice you to use them?
If you do use exception objects, what do you hate about them, and what could be better?
Is objectifying exceptions in the DIE handler a bad idea?
Where should I objectify my exceptions? In my eval{} wrapper? In a sigdie handler?
Are there any papers, articles or other resources on exceptions in general and in Perl that you find useful or enlightening.
Cross-posted at Perlmonks.
I don't use exception objects very often; mostly because a string is usually enough and involves less work. This is because there is usually nothing the program can do about the exception. If it could have avoided the exception, it wouldn't have caused it in the first place.
If you can do something about the exceptions, use objects. If you are just going to kill the program (or some subset, say, a web request), save yourself the effort of coming up with an elaborate hierarchy of objects that do nothing more than contain a message.
As for number 4; $SIG{__DIE__} should never be used. It doesn't compose; if one module expects sigdie to work in one way, and another module is loaded that makes it work some other way, those modules can't be used in the same program anymore. So don't do that.
If you want to use objects, just do the very-boring die Object->new( ... ). It may not be exciting as some super-awesome magic somewhere, but it always works and the code does exactly what it says.

What's the best practice in case something goes wrong in Perl code? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
How can I cleanly handle error checking in Perl?
What’s broken about exceptions in Perl?
I saw code which works like this:
do_something($param) || warn "something went wrong\n";
and I also saw code like this:
eval {
do_something_else($param);
};
if($#) {
warn "something went wrong\n";
}
Should I use eval/die in all my subroutines? Should I write all my code based on stuff returned from subroutines? Isn't eval'ing the code ( over and over ) gonna slow me down?
Block eval isn't string eval, so no, it's not slow. Using it is definitely recommended.
There are a few annoying subtleties to the way it works though (mostly annoying side-effects of the fact that $# is a global variable), so consider using Try::Tiny instead of memorizing all of the little tricks that you need to use eval defensively.
do_something($param) || warn "something went wrong\n";
In this case, do_something is expected to return an error code if something goes wrong. Either it can't die or if it does, it is a really unusual situation.
eval {
do_something_else($param);
};
if($#) {
warn "something went wrong\n";
}
Here, the assumption is that the only mechanism by which do_something_else communicates something going wrong is by throwing exceptions.
If do_something_else throws exceptions in truly exceptional situations and returns an error value in some others, you should also check its return value.
Using the block form of eval does not cause extra compilation at run time so there are no serious performance drawbacks:
In the second form, the code within the BLOCK is parsed only once--at the same time the code surrounding the eval itself was parsed--and executed within the context of the current Perl program. This form is typically used to trap exceptions more efficiently than the first (see below), while also providing the benefit of checking the code within BLOCK at compile time.
Modules that warn are very annoying. Either succeed or fail. Don't print something to the terminal and then keep running; my program can't take action based on some message you print. If the program can keep running, only print a message if you have been explicitly told that it's ok. If the program can't keep running, die. That's what it's for.
Always throw an exception when something is wrong. If you can fix the problem, fix it. If you can't fix the problem, don't try; just throw the exception and let the caller deal with it. (And if you can't handle an exception from something you call, don't.)
Basically, the reason many programs are buggy is because they try to fix errors that they can't. A program that dies cleanly at the first sign of a problem is easy to debug and fix. A program that keeps running when it's confused just corrupts data and annoys everyone. So don't do that. Die as soon as possible.
Your two examples do entirely different things. The first checks for a false return value, and takes some action in response. The second checks for an actual death of the called code.
You'll have to decide for yourself which action is appropriate in each case. I would suggest simply returning false in most circumstances. You should only be explicitly dieing if you have encountered errors so severe that you cannot continue (or there is no point in continuing, but even then you could still return false).
Wrapping a block in eval {} is not the same thing as wrapping arbitrary code in eval "". In the former case, the code is still parsed at compile-time, and you do not incur any extra overhead. You will simply catch any death of that code (but you won't have any indication as to what went wrong or how far you got in your code, except for the value that is left for you in $#). In the latter case, the code is treated as a simple string by the Perl interpreter until it is actually evaluated, so there is a definite cost here as the interpreter is invoked (and you lose all compile-time checking of your code).
Incidentally, the way you called eval and checked for the value of $# is not a recommended form; for an extensive discussion of exception gotchas and techniques in Perl, see this discussion.
The first version is very "perlish" and pretty straightforward to understand. The only drawback of this idiom is that it is readable only for short cases. If error handling needs more logic, use the second version.
Nobody's really addressed the "best practice" part of this yet, so I'll jump in.
Yes, you should definitely throw an exception in your code when something goes wrong, and you should do it as early as possible (so you limit the code that needs to be debugged to work out what's causing it).
Code that does stuff like return undef to signify failure isn't particularly reliable, simply because people will tend to use it without checking for the undef returnvalue - meaning that a variable they assume has something meaningful in it actually may not. This leads to complicated, hard to debug problems, and even unexpected problems cropping up later in previously-working code.
A more solid approach is to write your code so that it dies if something goes wrong, and then only if you need to recover from that failure, wrap the any calls to it in eval{ .. } (or, better, try { .. } catch { .. } from Try::Tiny, as has been mentioned). In most cases, there won't be anything meaningful that the calling code can do to recover, so calling code remains simple in the common case, and you can just assume you'll get a useful value back. If something does go wrong, then you'll get an error message from the actual part of the code that failed, rather than silently getting an undef. If your calling code can do something to recover failures, then it can arrange to catch exceptions and do whatever it needs to.
Something that's worth reading about is Exception classes, which are a structured way to send extra information to calling code, as well as allow it to pick which exceptions it wants to catch and which it can't handle. You probably won't want to use them everywhere in your code, but they're a useful technique when you have something complicated that can fail in equally complicated ways, and you want to arrange for failures to be recoverable.

Should a Perl constructor return an undef or a "invalid" object?

Question:
What is considered to be "Best practice" - and why - of handling errors in a constructor?.
"Best Practice" can be a quote from Schwartz, or 50% of CPAN modules use it, etc...; but I'm happy with well reasoned opinion from anyone even if it explains why the common best practice is not really the best approach.
As far as my own view of the topic (informed by software development in Perl for many years), I have seen three main approaches to error handling in a perl module (listed from best to worst in my opinion):
Construct an object, set an invalid flag (usually "is_valid" method). Often coupled with setting error message via your class's error handling.
Pros:
Allows for standard (compared to other method calls) error handling as it allows to use $obj->errors() type calls after a bad constructor just like after any other method call.
Allows for additional info to be passed (e.g. >1 error, warnings, etc...)
Allows for lightweight "redo"/"fixme" functionality, In other words, if the object that is constructed is very heavy, with many complex attributes that are 100% always OK, and the only reason it is not valid is because someone entered an incorrect date, you can simply do "$obj->setDate()" instead of the overhead of re-executing entire constructor again. This pattern is not always needed, but can be enormously useful in the right design.
Cons: None that I'm aware of.
Return "undef".
Cons: Can not achieve any of the Pros of the first solution (per-object error messages outside of global variables and lightweight "fixme" capability for heavy objects).
Die inside the constructor. Outside of some very narrow edge cases, I personally consider this an awful choice for too many reasons to list on the margins of this question.
UPDATE: Just to be clear, I consider the (otherwise very worthy and a great design) solution of having very simple constructor that can't fail at all and a heavy initializer method where all the error checking occurs to be merely a subset of either case #1 (if initializer sets error flags) or case #3 (if initializer dies) for the purposes of this question. Obviously, choosing such a design, you automatically reject option #2.
It depends on how you want your constructors to behave.
The rest of this response goes into my personal observations, but as with most things Perl, Best Practices really boils down to "Here's one way to do it, which you can take or leave depending on your needs." Your preferences as you described them are totally valid and consistent, and nobody should tell you otherwise.
I actually prefer to die if construction fails, because we set it up so that the only types of errors that can occur during object construction really are big, obvious errors that should halt execution.
On the other hand, if you prefer that doesn't happen, I think I'd prefer 2 over 1, because it's just as easy to check for an undefined object as it is to check for some flag variable. This isn't C, so we don't have a strong typing constraint telling us that our constructor MUST return an object of this type. So returning undef, and checking for that to establish success or failure, is a great choice.
The 'overhead' of construction failure is a consideration in certain edge cases (where you can't quickly fail before incurring overhead), so for those you might prefer method 1. So again, it depends on what semantics you've defined for object construction. For example, I prefer to do heavyweight initialization outside of construction. As to standardization, I think that checking whether a constructor returns a defined object is as good a standard as checking a flag variable.
EDIT: In response to your edit about initializers rejecting case #2, I don't see why an initializer can't simply return a value that indicates success or failure rather than setting a flag variable. Actually, you may want to use both, depending on how much detail you want about the error that occurred. But it would be perfectly valid for an initializer to return true on success and undef on failure.
I prefer:
Do as little initialization as possible in the constructor.
croak with an informative message when something goes wrong.
Use appropriate initialization methods to provide per object error messages etc
In addition, returning undef (instead of croaking) is fine in case the users of the class may not care why exactly the failure occurred, only if they got a valid object or not.
I despise easy to forget is_valid methods or adding extra checks to ensure methods are not called when the internal state of the object is not well defined.
I say these from a very subjective perspective without making any statements about best practices.
I would recommend against #1 simply because it leads to more error handling code which will not be written. For example, if you just return false then this works fine.
my $obj = Class->new or die "Construction failed...";
But if you return an object which is invalid...
my $obj = Class->new;
die "Construction failed #{[ $obj->error_message ]}" if $obj->is_valid;
And as the quantity of error handling code increases the probability of it being written decreases. And its not linear. By increasing the complexity of your error handling system you actually decrease the amount of errors it will catch in practical use.
You also have to be careful that your invalid object in question dies when any method is called (aside from is_valid and error_message) leading to yet more code and opportunities for mistakes.
But I agree there is value in being able to get information about the failure, which makes returning false (just return not return undef) inferior. Traditionally this is done by calling a class method or global variable as in DBI.
my $dbh = DBI->connect($data_source, $username, $password)
or die $DBI::errstr;
But it suffers from A) you still have to write error handling code and B) its only valid for the last operation.
The best thing to do, in general, is throw an exception with croak. Now in the normal case the user writes no special code, the error occurs at the point of the problem, and they get a good error message by default.
my $obj = Class->new;
Perl's traditional recommendations against throwing exceptions in library code as being impolite is outdated. Perl programmers are (finally) embracing exceptions. Rather than writing error handling code ever and over again, badly and often forgetting, exceptions DWIM. If you're not convinced just start using autodie (watch pjf's video about it) and you'll never go back.
Exceptions align Huffman encoding with actual use. The common case of expecting the constructor to just work and wanting an error if it doesn't is now the least code. The uncommon case of wanting to handle that error requires writing special code. And the special code is pretty small.
my $obj = eval { Class->new } or do { something else };
If you find yourself wrapping every call in an eval you are doing it wrong. Exceptions are called that because they are exceptional. If, as in your comment above, you want graceful error handling for the user's sake, then take advantage of the fact that errors bubble up the stack. For example, if you want to provide a nice user error page and also log the error you can do this:
eval {
run_the_main_web_code();
} or do {
log_the_error($#);
print_the_pretty_error_page;
};
You only need it in one place, at top of your call stack, rather than scattered everywhere. You can take advantage of this at smaller increments, for example...
my $users = eval { Users->search({ name => $name }) } or do {
...handle an error while finding a user...
};
There's two things going on. 1) Users->search always returns a true value, in this case an array ref. That makes the simple my $obj = eval { Class->method } or do work. That's optional. But more importantly 2) you only need to put special error handling around Users->search. All the methods called inside Users->search and all the methods they call... they just throw exceptions. And they're all caught at one point and handled the same. Handling the exception at the point which cares about it makes for much neater, compact and flexible error handling code.
You can pack more information into the exception by croaking with a string overloaded object rather than just a string.
my $obj = eval { Class->new }
or die "Construction failed: $# and there were #{[ $#->num_frobnitz ]} frobnitzes";
Exceptions:
Do the right thing without any thought by the caller
Require the least code for the most common case
Provide the most flexibility and information about the failure to the caller
Modules such as Try::Tiny fix most of the hanging issues surrounding using eval as an exception handler.
As for your use case where you might have a very expensive object and want to try and continue with it partially build... smells like YAGNI to me. Do you really need it? Or you have a bloated object design which is doing too much work too early. IF you do need it, you can put the information necessary to continue the construction in the exception object.
First the pompous general observations:
A constructor's job should be: Given valid construction parameters, return a valid object.
A constructor that does not construct a valid object cannot perform its job and is therefore a perfect candidate for exception generation.
Making sure the constructed object is valid is part of the constructor's job. Handing out a known-to-be-bad object and relying on the client to check that the object is valid is a surefire way to wind up with invalid objects that explode in remote places for non-obvious reasons.
Checking that all the correct arguments are in place before the constructor call is the client's job.
Exceptions provide a fine-grained way of propagating the particular error that occurred without needing to have a broken object in hand.
return undef; is always bad[1]
bIlujDI' yIchegh()Qo'; yIHegh()!
Now to the actual question, which I will construe to mean "what do you, darch, consider the best practice and why". First, I'll note that returning a false value on failure has a long Perl history (most of the core works that way, for example), and a lot of modules follow this convention. However, it turns out this convention produces inferior client code and newer modules are moving away from it.[2]
[The supporting argument and code samples for this turn out to be the more general case for exceptions that prompted the creation of autodie, and so I will resist the temptation to make that case here. Instead:]
Having to check for successful creation is actually more onerous than checking for an exception at an appropriate exception-handling level. The other solutions require the immediate client to do more work than it should have to just to obtain an object, work that is not required when the constructor fails by throwing an exception.[3] An exception is vastly more expressive than undef and equally expressive as passing back a broken object for purposes of documenting errors and annotating them at various levels in the call stack.
You can even get the partially-constructed object if you pass it back in the exception. I think this is a bad practice per my belief about what a constructor's contract with its clients ought to be, but the behavior is supported. Awkwardly.
So: A constructor that cannot create a valid object should throw an exception as early as possible. The exceptions a constructor can throw should be documented parts of its interface. Only the calling levels that can meaningfully act on the exception should even look for it; very often, the behavior of "if this construction fails, don't do anything" is exactly correct.
[1]: By which I mean, I am not aware of any use cases where return; is not strictly superior. If someone calls me on this I might have to actually open a question. So please don't. ;)
[2]: Per my extremely unscientific recollection of the module interfaces I've read in the last two years, subject to both selection and confirmation biases.
[3]: Note that throwing an exception does still require error-handling, as would the other proposed solutions. This does not mean wrapping every instantiation in an eval unless you actually want to do complex error-handling around every construction (and if you think you do, you're probably wrong). It means wrapping the call which is able to meaningfully act on the exception in an eval.

How can I replace all 'die's with 'confess' in a Perl application?

I'm working in a large Perl application and would like to get stack traces every time 'die' is called. I'm aware of the Carp module, but I would prefer not to search/replace every instance of 'die' with 'confess'. In addition, I would like full stack traces for errors in Perl modules or the Perl interpreter itself, and obviously I can't change those to use Carp.
So, is there a way for me to modify the 'die' function at runtime so that it behaves like 'confess'? Or, is there a Perl interpreter setting that will throw full stack traces from 'die'?
Use Devel::SimpleTrace or Carp::Always and they'll do what you're asking for without any hard work on your part. They have global effect, which means they can easily be added for just one run on the commandline using e.g. -MDevel::SimpleTrace.
What about setting a __DIE__ signal handler? Something like
$SIG{__DIE__} = sub { Carp::confess #_ };
at the top of your script? See perlvar %SIG for more information.
I usually only want to replace the dies in a bit of code, so I localize the __DIE__ handler:
{
use Carp;
local $SIG{__DIE__} = \&Carp::confess;
....
}
As a development tool this can work, but some modules play tricks with this to get their features to work. Those features may break in odd ways when you override the handler they were expecting. It's not a good practice, but it happens sometimes.
The Error module will convert all dies to Error::Simple objects, which contain a full stacktrace (the constructor parses the "at file... line.." text and creates a stack trace). You can use an arbitrary object (generally subclassed from Error::Simple) to handle errors with the $Error::ObjectifyCallback preference.
This is especially handy if you commonly throw around exceptions of other types to signal other events, as then you just add a handler for Error::Simple (or whatever other class you are using for errors) and have it dump its stacktrace or perform specialized logging depending on the type of error.