What is wrong with accessing DBI directly? - perl

I'm currently reading Effective Perl Programming (2nd edition). I have come across a piece of code which was described as being poorly written, but I don't yet understand what's so bad about it, or how it should be improved. It would be great if someone could explain the matter to me.
Here's the code in question:
sub sum_values_per_key {
my ( $class, $dsn, $user, $password, $parameters ) = #_;
my %results;
my $dbh =
DBI->connect( $dsn, $user, $password, $parameters );
my $sth = $dbh->prepare(
'select key, calculate(value) from my_table');
$sth->execute();
# ... fill %results ...
$sth->finish();
$dbh->disconnect();
return \%results;
}
The example comes from the chapter on testing your code (p. 324/325). The sentence that has left me wondering about how to improve the code is the following:
Since the code was poorly written and accesses DBI directly, you'll have to create a fake DBI object to stand in for the real thing.
I have probably not understood a lot of what the book has so far been trying to teach me, or I have skipped the section relevant for understanding what's bad practice about the above code... Well, thanks in advance for your help!

Since the chapter is about testing, consider this:
When testing your function, you are also (implicitly) testing DBI. This is why it's bad.
Good testing always only checks one functionality. To guarantee this, it would be required
to not use DBI directly, but use a mock object instead. This way, if your test fails, you
know it's your function and not something else in another module (like DBI in your example).

I think what Brian was trying to say by "poorly written" is that you do not have a separation between business logic and data access code (and database connection mechanics, while at it).
A correct approach to writing functions is that a function (or method) should do one thing, not 3 things at once.
As a result of this big lump of functionality, when testing, you have to test ALL THREE at the same time, which is difficult (see discussion of using "test SQLite DB" in those paragraphs). Or, as an alternative, do what the chapter was devoted to, and mock the DBI object to test the business logic by pretending that the data access AND DB setup worked a certain way.
But mocking a complicated-behaving object like DBI is very and very complicated to do right.
What if the database is not accessible? What if there's blocking? What if your query has a syntax error? What if the DB connection times out when executing the query? What if...
Good test code tests ALL those error situations and more.
A more correct approach (pattern) for the code would be:
my $dbh = set_up_dbh();
my $query = qq[select key, calculate(value) from my_table];
my $data = retrieve_data($dbh, $query);
# Now, we don't need to test setting up database connection AND data retrieval
my $calc_results = calculate_results($data);
This way, to test the logic in calculate_results (e.g. summing the data), you merely need to mock DATA passed to it, which is very easy (in many cases, you just store several sets of test data in some test config); as opposed to mocking the behavior of a complicated DBI object used to retrieve the data.

There is nothing wrong with using DBI by itself.
The clue is in the fact that this is the testing chapter. I assume the issue being pointed out is that the function opens and closes a database connection itself. It should instead expect a database handle as a parameter and just run queries on it, leaving any concerns about opening and closing a database connection to its caller. That will make the job of the function more narrow, so it makes the function more flexible.
That in turn also makes the function easier to test: just pass it a mock object as a database handle. As it is currently written, you need at least to redefine DBI::connect to test it, which isn’t hard, but is definitely messy.

A method called sum_values_per_key should be interested in summing the values of some keys, not fetching the data to be summed.
It does not meet the S (Single responsibility principle) of SOLID programming. http://en.wikipedia.org/wiki/Solid_%28object-oriented_design%29
This means that it is both:
Not reusable if you wish to use different source data.
Difficult to test in an environment without a database connection.

1) Suppose you have a dozen objects each with a dozen methods like this. Twenty of those methods will be called during the execution of the main program. You now have made 20 DB connections where you only need one.
2) Suppose you are not happy with original DBI and extended it with My::DBI. You now have to rewrite 144 functions in 12 files.
(Apache::DBI might be an example here).
3) You have to carry 3 positional parameters in each call to those 144 functions. Human brain works well with about 7 objects at a time; you have just waisted almost half that space. This makes code less maintainable.

Related

Perl - best practices in dealing with invalid data passed to a sub

What is best practice in Perl when data is passed incorrectly to a subroutine? Should the sub die or just return?
Here is what I usually do
my #text = ('line 1', 'line 2');
print_text(\#text)
or die "ERROR: something went wrong in the sub";
sub print_text{
my ($aref_text) = #_;
return unless ref($aref_text) eq "ARRAY";
print "$_\n" for #{$aref_text};
return 1;
}
Here the sub just returns if the passed input is invalid and it expects the caller to check for errors as it does here. I wonder if it is always a better practice to just "die" at the sub level. In big scripts, I'm afraid of doing that because I don't want to kill the entire script just because some simple sub fails.
On the other hand, I'm afraid of just returning because if the caller forgets to check if the sub returns true, then the script will keep going and weird stuff could happen.
Thanks
This falls squarely under the question of how to deal with errors in subroutines in general.
In principle, these are ways to handle errors in subroutines that can't themselves recover
return codes, some of which indicate errors
return "special" values, like undef in Perl
throw exceptions, and a device for that in Perl is die
The caller either checks the return, or tests for undef, or uses eval† to catch and handle the die. What is most suitable depends entirely on the context and on what the code does.
I don't see much reason in modern languages to be restrained to "codes" (like negative values) that indicate errors. For one thing, that either interferes will legitimate returns or it constrains them to go via pointer/reference, which is a big design decision.
Returning undef is often a good middle-of-the-road approach, in particular in code that isn't overly complex. It indicates some "failure" of the sub to perform what it is meant to. However, even in the smallest of subs undef may be suitable to indicate a result that isn't acceptable. Then if it is also used for bad input we have a problem of distinguishing between those failings.
Throwing an exception, based in Perl on the simple die, adds more possibilities. In complex code you may well want to write (or use) an error-handling class that mimics a more elaborate exception handling support from languages that have it, and then throw that
my $error_obj = ErrorHandlingClass->new( params );
... or die $error_obj;
Then the calling code can analyze the object. This would be the most structured way to do it.
A nice and simple example is Path::Tiny, with its own Path::Tiny::Error found in its source.
Again, what is suitable in any one particular case depends on details of that application.
A few comments on direct questions.
The dilemma of what to return is stressed by the information-free message in die (it tells us nothing of what failed). But how do we make the failure informative, in this case?
Note that your or results in a die if the sub returns 0 or an empty string. If we replace it with // (defined-or), so to die on undef, we still can't print a specific message if undef may also indicate a bad result.
So in this case you may want the function to die on bad input, with a suitable message.
That would do it for debugging after there's been a problem. If the code needs to be able to recover then you'd better return more structured information -- throw (or return) an object of an error handling class you'd write. (As an ad hoc stop-gap measure you can parse the message from die.)
As for the age-old question of discipline to check returns, a die is a good tool. There is no "simple sub" that is unworthy – you do not want to proceed with an error so it's OK to die. And in complex projects error handling is more complex, so we need more tools and structure, not less.
Recall that exceptions "bubble up", propagate up the call stack if unhandled, and so does die. This can be used nicely for debugging without having eval on every single call. In the end, most of this is a part of debugging.
There is no "best practice" for this. But a default of die-ing is rather reasonable.
† By now we seem to be getting a try-catch style handling of an exception (die) support in the core. It is introduced as experimental in 5.34.0, but they recommend using Feature::Compat::Try for now. This is
ported from Syntax::Keyword::Try.

Skipping error in eval{} statement

I am trying to extract data from website using Perl API. I am using a list of URIs to get the data from the website.
Initially the problem was that if there was no data available for the URI it would die and I wanted it to skip that particular URI and go to the next available URI. I used next unless ....; to come over this problem.
Now the problem is I am trying to extract specific data from the web by calling a specific method (called as identifiers()) from the API. Now the data is available for the URI but the specific data (the identifiers), what I am looking for, is not available and it dies.
I tried to use eval{} like this
eval {
for $bar ($foo->identifiers()){
#do something
};
}
When I use eval{} I think it skips the error and moves ahead but I am not sure. Because the error it gives is Invalid content type in response:text/plain.
Whereas I checked the URI manually, though it doesn't have the identifiers it has rest of the data. I want this to skip and move to next URI. How can I do that?
OK, I think I understand your question, but a little more code would have helped, as would specifying which Perl API -- not that it seems to matter to the answer, but it is a big part of your question. Having said that, the problem seems very simple.
When Perl hits an error, like most languages, it runs out through the calling contexts in order until it finds a place where it can handle the error. Perl's most basic error handling is eval{} (but I'd use Try::Tiny if you can, as it is then clearer that you're doing error handling instead of some of the other strange things eval can do).
Anyway, when Perl hits eval{}, the whole of eval{} exits, and $& is set to the error. So, having the eval{} outside the loop means errors will leave the loop. If you put the eval{} inside the loop, when an error occurs, eval{} will exit, but you will carry on to the next iteration. It's that simple.
I also detect signs that maybe you're not using use strict; and use warnings;. Please do, as they help you find many bugs quicker.

Should a Perl constructor return an undef or a "invalid" object?

Question:
What is considered to be "Best practice" - and why - of handling errors in a constructor?.
"Best Practice" can be a quote from Schwartz, or 50% of CPAN modules use it, etc...; but I'm happy with well reasoned opinion from anyone even if it explains why the common best practice is not really the best approach.
As far as my own view of the topic (informed by software development in Perl for many years), I have seen three main approaches to error handling in a perl module (listed from best to worst in my opinion):
Construct an object, set an invalid flag (usually "is_valid" method). Often coupled with setting error message via your class's error handling.
Pros:
Allows for standard (compared to other method calls) error handling as it allows to use $obj->errors() type calls after a bad constructor just like after any other method call.
Allows for additional info to be passed (e.g. >1 error, warnings, etc...)
Allows for lightweight "redo"/"fixme" functionality, In other words, if the object that is constructed is very heavy, with many complex attributes that are 100% always OK, and the only reason it is not valid is because someone entered an incorrect date, you can simply do "$obj->setDate()" instead of the overhead of re-executing entire constructor again. This pattern is not always needed, but can be enormously useful in the right design.
Cons: None that I'm aware of.
Return "undef".
Cons: Can not achieve any of the Pros of the first solution (per-object error messages outside of global variables and lightweight "fixme" capability for heavy objects).
Die inside the constructor. Outside of some very narrow edge cases, I personally consider this an awful choice for too many reasons to list on the margins of this question.
UPDATE: Just to be clear, I consider the (otherwise very worthy and a great design) solution of having very simple constructor that can't fail at all and a heavy initializer method where all the error checking occurs to be merely a subset of either case #1 (if initializer sets error flags) or case #3 (if initializer dies) for the purposes of this question. Obviously, choosing such a design, you automatically reject option #2.
It depends on how you want your constructors to behave.
The rest of this response goes into my personal observations, but as with most things Perl, Best Practices really boils down to "Here's one way to do it, which you can take or leave depending on your needs." Your preferences as you described them are totally valid and consistent, and nobody should tell you otherwise.
I actually prefer to die if construction fails, because we set it up so that the only types of errors that can occur during object construction really are big, obvious errors that should halt execution.
On the other hand, if you prefer that doesn't happen, I think I'd prefer 2 over 1, because it's just as easy to check for an undefined object as it is to check for some flag variable. This isn't C, so we don't have a strong typing constraint telling us that our constructor MUST return an object of this type. So returning undef, and checking for that to establish success or failure, is a great choice.
The 'overhead' of construction failure is a consideration in certain edge cases (where you can't quickly fail before incurring overhead), so for those you might prefer method 1. So again, it depends on what semantics you've defined for object construction. For example, I prefer to do heavyweight initialization outside of construction. As to standardization, I think that checking whether a constructor returns a defined object is as good a standard as checking a flag variable.
EDIT: In response to your edit about initializers rejecting case #2, I don't see why an initializer can't simply return a value that indicates success or failure rather than setting a flag variable. Actually, you may want to use both, depending on how much detail you want about the error that occurred. But it would be perfectly valid for an initializer to return true on success and undef on failure.
I prefer:
Do as little initialization as possible in the constructor.
croak with an informative message when something goes wrong.
Use appropriate initialization methods to provide per object error messages etc
In addition, returning undef (instead of croaking) is fine in case the users of the class may not care why exactly the failure occurred, only if they got a valid object or not.
I despise easy to forget is_valid methods or adding extra checks to ensure methods are not called when the internal state of the object is not well defined.
I say these from a very subjective perspective without making any statements about best practices.
I would recommend against #1 simply because it leads to more error handling code which will not be written. For example, if you just return false then this works fine.
my $obj = Class->new or die "Construction failed...";
But if you return an object which is invalid...
my $obj = Class->new;
die "Construction failed #{[ $obj->error_message ]}" if $obj->is_valid;
And as the quantity of error handling code increases the probability of it being written decreases. And its not linear. By increasing the complexity of your error handling system you actually decrease the amount of errors it will catch in practical use.
You also have to be careful that your invalid object in question dies when any method is called (aside from is_valid and error_message) leading to yet more code and opportunities for mistakes.
But I agree there is value in being able to get information about the failure, which makes returning false (just return not return undef) inferior. Traditionally this is done by calling a class method or global variable as in DBI.
my $dbh = DBI->connect($data_source, $username, $password)
or die $DBI::errstr;
But it suffers from A) you still have to write error handling code and B) its only valid for the last operation.
The best thing to do, in general, is throw an exception with croak. Now in the normal case the user writes no special code, the error occurs at the point of the problem, and they get a good error message by default.
my $obj = Class->new;
Perl's traditional recommendations against throwing exceptions in library code as being impolite is outdated. Perl programmers are (finally) embracing exceptions. Rather than writing error handling code ever and over again, badly and often forgetting, exceptions DWIM. If you're not convinced just start using autodie (watch pjf's video about it) and you'll never go back.
Exceptions align Huffman encoding with actual use. The common case of expecting the constructor to just work and wanting an error if it doesn't is now the least code. The uncommon case of wanting to handle that error requires writing special code. And the special code is pretty small.
my $obj = eval { Class->new } or do { something else };
If you find yourself wrapping every call in an eval you are doing it wrong. Exceptions are called that because they are exceptional. If, as in your comment above, you want graceful error handling for the user's sake, then take advantage of the fact that errors bubble up the stack. For example, if you want to provide a nice user error page and also log the error you can do this:
eval {
run_the_main_web_code();
} or do {
log_the_error($#);
print_the_pretty_error_page;
};
You only need it in one place, at top of your call stack, rather than scattered everywhere. You can take advantage of this at smaller increments, for example...
my $users = eval { Users->search({ name => $name }) } or do {
...handle an error while finding a user...
};
There's two things going on. 1) Users->search always returns a true value, in this case an array ref. That makes the simple my $obj = eval { Class->method } or do work. That's optional. But more importantly 2) you only need to put special error handling around Users->search. All the methods called inside Users->search and all the methods they call... they just throw exceptions. And they're all caught at one point and handled the same. Handling the exception at the point which cares about it makes for much neater, compact and flexible error handling code.
You can pack more information into the exception by croaking with a string overloaded object rather than just a string.
my $obj = eval { Class->new }
or die "Construction failed: $# and there were #{[ $#->num_frobnitz ]} frobnitzes";
Exceptions:
Do the right thing without any thought by the caller
Require the least code for the most common case
Provide the most flexibility and information about the failure to the caller
Modules such as Try::Tiny fix most of the hanging issues surrounding using eval as an exception handler.
As for your use case where you might have a very expensive object and want to try and continue with it partially build... smells like YAGNI to me. Do you really need it? Or you have a bloated object design which is doing too much work too early. IF you do need it, you can put the information necessary to continue the construction in the exception object.
First the pompous general observations:
A constructor's job should be: Given valid construction parameters, return a valid object.
A constructor that does not construct a valid object cannot perform its job and is therefore a perfect candidate for exception generation.
Making sure the constructed object is valid is part of the constructor's job. Handing out a known-to-be-bad object and relying on the client to check that the object is valid is a surefire way to wind up with invalid objects that explode in remote places for non-obvious reasons.
Checking that all the correct arguments are in place before the constructor call is the client's job.
Exceptions provide a fine-grained way of propagating the particular error that occurred without needing to have a broken object in hand.
return undef; is always bad[1]
bIlujDI' yIchegh()Qo'; yIHegh()!
Now to the actual question, which I will construe to mean "what do you, darch, consider the best practice and why". First, I'll note that returning a false value on failure has a long Perl history (most of the core works that way, for example), and a lot of modules follow this convention. However, it turns out this convention produces inferior client code and newer modules are moving away from it.[2]
[The supporting argument and code samples for this turn out to be the more general case for exceptions that prompted the creation of autodie, and so I will resist the temptation to make that case here. Instead:]
Having to check for successful creation is actually more onerous than checking for an exception at an appropriate exception-handling level. The other solutions require the immediate client to do more work than it should have to just to obtain an object, work that is not required when the constructor fails by throwing an exception.[3] An exception is vastly more expressive than undef and equally expressive as passing back a broken object for purposes of documenting errors and annotating them at various levels in the call stack.
You can even get the partially-constructed object if you pass it back in the exception. I think this is a bad practice per my belief about what a constructor's contract with its clients ought to be, but the behavior is supported. Awkwardly.
So: A constructor that cannot create a valid object should throw an exception as early as possible. The exceptions a constructor can throw should be documented parts of its interface. Only the calling levels that can meaningfully act on the exception should even look for it; very often, the behavior of "if this construction fails, don't do anything" is exactly correct.
[1]: By which I mean, I am not aware of any use cases where return; is not strictly superior. If someone calls me on this I might have to actually open a question. So please don't. ;)
[2]: Per my extremely unscientific recollection of the module interfaces I've read in the last two years, subject to both selection and confirmation biases.
[3]: Note that throwing an exception does still require error-handling, as would the other proposed solutions. This does not mean wrapping every instantiation in an eval unless you actually want to do complex error-handling around every construction (and if you think you do, you're probably wrong). It means wrapping the call which is able to meaningfully act on the exception in an eval.

Is my Rose::DB::Object compile-time too slow?

I'm planning to move from Class::DBI to Rose::DB::Object due to its nice structure and the jargon that RDBO is faster compares to CDBI and DBIC.
However on my machine (linux 2.6.9-89, perl 5.8.9) RDBO compiled time is much slower than CDBI:
$ time perl -MClass::DBI -e0
real 0m0.233s
user 0m0.208s
sys 0m0.024s
$ time perl -MRose::DB::Object -e0
real 0m1.178s
user 0m1.097s
sys 0m0.078s
That's a lot different...
Anyone experiences similar behaviour here?
Cheers.
#manni and #john: thanks for the explanation about the modules referenced by RDBO, it surely answers why the compile-time is slower than CDBI.
The application is not running on a persistent environment. In fact it's invoked by several simultaneous cron jobs that run at 2 mins, 5 mins, and x mins interval - so yes, compile-time is crucial here...
Jonathan Rockway's App::Persistent seems interesting, however its (current) limitation to allow only one application running at a time is not suitable for my purpose. Also it has issue when we kill the client, the server process is still running...
Rose::DB::Object simply contains (or references from other modules) much more code than Class::DBI. On the bright side, it also has many more features and is much faster at runtime than Class::DBI. If compile time is concern for you, then your best bet is to load as little code as possible (or get faster disks).
Another option is to set auto_load_related_classes to false in your Metadata objects. To do this early enough and globally will probably require you to make a Metadata subclass and then set that as the meta_class in your common Rose::DB::Object base class.
Turning auto_load_related_classes off means that you'd have to manually load related classes that you actually want to use in your script. That's a bit of a pain, but it lets you control how many classes get loaded. (If you have heavily interrelated classes, loading a single one can end up pulling all the other ones in.)
You could, perhaps, have an environment variable to control the behavior. Example metadata class:
package My::DB::Object::Metadata;
use base 'Rose::DB::Object::Metadata';
# New class method to handle default
sub default_auto_load_related_classes
{
return $ENV{'RDBO_AUTO_LOAD_RELATED_CLASSES'} ? 1 : 0
}
# Override existing object method, honoring new class-defined default
sub auto_load_related_classes
{
my($self) = shift;
return $self->SUPER::auto_load_related_classes(#_) if(#_);
if(defined(my $value = $self->SUPER::auto_load_related_classes))
{
return $value;
}
# Initialize to default
return $self->SUPER::auto_load_related_classes(ref($self)->default_auto_load_related_classes);
}
And here's how it's tied to your common object base class:
package My::DB::Object;
use base 'Rose::DB::Object';
use My::DB::Object::Metadata;
sub meta_class { 'My::DB::Object::Metadata' }
Then set RDBO_AUTO_LOAD_RELATED_CLASSES to true when you're running in a persistent environment, and leave it false (and don't forget to explicitly load related classes) for command-line scripts.
Again, this will only help if you're currently loading more classes than you strictly need in a particular script due to the default true value of the auto_load_related_classes Metadata attribute.
If compile time is an issue, there are methods to lessen the impact. One is PPerl which makes a normal Perl script into a daemon that is compiled once. The only change you need to make (after installing it, of course) is to the shebang line:
#!/usr/bin/pperl
Another option is to code write a client/server model program where the bulk of the work is done by a server that loads the expensive modules and a thin script that just interacts with the server over sockets or pipes.
You should also look at App::Persistent and this article, both of which were written by Jonathan Rockway (aka jrockway).
This looks almost as dramatic over here:
time perl -MClass::DBI -e0
real 0m0.084s
user 0m0.080s
sys 0m0.004s
time perl -MRose::DB::Object -e0
real 0m0.391s
user 0m0.356s
sys 0m0.036s
I'm afraid part of the difference can simply be explained by the number of dependencies in each module:
perl -MClass::DBI -le 'print scalar keys %INC'
46
perl -MRose::DB::Object -le 'print scalar keys %INC'
95
Of course, you should ask yourself how much compilation time really matters for your particular problem. And what source code would be easier to maintain for you.

Why do I need to know how many tests I will be running with Test::More?

Am I a bad person if I use use Test::More qw(no_plan)?
The Test::More POD says
Before anything else, you need a testing plan. This basically declares how many tests your script is going to run to protect against premature failure...
use Test::More tests => 23;
There are rare cases when you will not know beforehand how many tests your script is going to run. In this case, you can declare that you have no plan. (Try to avoid using this as it weakens your test.)
use Test::More qw(no_plan);
But premature failure can be easily seen when there are no results printed at the end of a test run. It just doesn't seem that helpful.
So I have 3 questions:
What is the reasoning behind requiring a test plan by default?
Has anyone found this a useful and time saving feature in the long run?
Do other test suites for other languages support this kind of thing?
What is the reason for requiring a test plan by default?
ysth's answer links to a great discussion of this issue which includes comments by Michael Schwern and Ovid who are the Test::More and Test::Most maintainers respectively. Apparently this comes up every once in a while on the perl-qa list and is a bit of a contentious issue. Here are the highlights:
Reasons to not use a test plan
Its annoying and takes time.
Its not worth the time because test scripts won't die without the test harness noticing except in some rare cases.
Test::More can count tests as they happen
If you use a test plan and need to skip tests, then you have the additional pain of needing a SKIP{} block.
Reasons to use a test plan
It only takes a few seconds to do. If it takes longer, your test logic is too complex.
If there is an exit(0) in the code somewhere, your test will complete successfully without running the remaining test cases. An observant human may notice the screen output doesn't look right, but in an automated test suite it could go unnoticed.
A developer might accidentally write test logic so that some tests never run.
You can't really have a progress bar without knowing ahead of time how many tests will be run. This is difficult to do through introspection alone.
The alternative
Test::Simple, Test::More, and Test::Most have a done_testing() method which should be called at the end of the test script. This is the approach I take currently.
This fixes the problem where code has an exit(0) in it. It doesn't fix the problem of logic which unintentionally skips tests though.
In short, its safer to use a plan, but the chances of this actually saving the day are low unless your test suites are complicated (and they should not be complicated).
So using done_testing() is a middle ground. Its probably not a huge deal whatever your preference.
Has this feature been useful to anyone in the real world?
A few people mention that this feature has been useful to them in the real word. This includes Larry Wall. Michael Schwern says the feature originates with Larry, more than 20 years ago.
Do other languages have this feature?
None of the xUnit type testing suites has the test plan feature. I haven't come across any examples of this feature being used in any other programming language.
I'm not sure what you are really asking because the documentation extract seems to answer it. I want to know if all my tests ran. However, I don't find that useful until the test suite stabilizes.
While developing, I use no_plan because I'm constantly adding to the test suite. As things stabilize, I verify the number of tests that should run and update the plan. Some people mention the "test harness" catching that already, but there is no such thing as "the test harness". There's the one that most modules use by default because that's what MakeMaker or Module::Build specify, but the TAP output is independent of any particular TAP consumer.
A couple of people have mentioned situations where the number of tests might vary. I figure out the tests however I need to compute the number then use that in the plan. It also helps to have small test files that target very specific functionality so the number of tests is low.
use vars qw( $tests );
BEGIN {
$tests = ...; # figure it out
use Test::More tests => $tests;
}
You can also separate the count from the loading:
use Test::More;
plan tests => $tests;
The latest TAP lets you put the plan at the end too.
In one comment, you seem to think prematurely exiting will count as a failure, since the plan won't be output at the end, but this isn't the case - the plan will be output unless
you terminate with POSIX::_exit or a fatal signal or the like. In particular, die() and exit() will result
in the plan being output (though the test harness should detect anything other than an exit(0) as a prematurely terminated test).
You may want to look at Test::Most's deferred plan option, soon to be in Test::More (if it's not already).
There's also been discussion of this on the perl-qa list recently. One thread: http://www.nntp.perl.org/group/perl.qa/2009/03/msg12121.html
Doing any testing is better than doing no testing, but testing is about being deliberate. Stating the number tests expected gives you the ability to see if there is a bug in the test script that is preventing a test from executing (or executing too many times). If you don't run tests under specific conditions you can use the skip function to declare this:
SKIP: {
skip $why, $how_many if $condition;
...normal testing code goes here...
}
I think it's ok to bend the rules and use no_plan when the human cost of figuring out the plan is too high, but this cost is a good indication that the test suite has not been well designed.
Another case where it's useful to have the test_plan explicitely defined is when you are doing this kind of tests:
$coderef = sub { my $arg = shift; isa_ok $arg, 'MyClass' };
do(#args, $coderef);
and
## hijack our interface to test it's called.
local *MyClass::do = $coderef;
If you don't specify a plan, it's easy to miss out that your test failed and that some assertions weren't run as you expected.
Having explicitly the number of test in the plan is a good idea, unless it is too expensive to retrieve this number. The question has been properly answered already but I wanted to stress two points:
Better than no_plan is to use done_testing()
use Test::More;
... run your tests ...;
done_testing( $number_of_tests_run );
# or done_testing() if not number of test is known
this Matt Trout blog entry is interesting, and rants about adding a plan vs cvs conflicts and other issues that make the plan problematic: Why numeric test plans are bad, wrong, and don't actually help anyway
I find it annoying, too, and I usually ignore the number at the very beginning until the test suite stabilizes. Then I just keep it up to date manually. I do like the idea of knowing how many total tests there are as the seconds tick by, as a kind of a progress indicator.
To make counting easier I put the following before each test:
#----- load non-existant record -----
....
#----- add a new record -----
....
#----- load the new record (by name) -----
....
#----- verify the name -----
etc.
Then I can quickly scan the file and easily count the tests, just looking for the #----- lines. I suppose I could even write something up in Emacs to do it for me, but it's honestly not that much of a chore.
It is a pain when doing TDD, because you are writing new tests opportunistically. When I was teaching TDD and the shop used Perl, we decided to use our test suite the no plan way. I guess we could have changed from no_plan to lock down the number of tests. At the time I saw it as more hindrance than help.
Eric Johnson's answer is exactly correct. I just wanted to add that done_testing, a much better replacement to no_plan, was released in Test-Simple 0.87_1 recently. It's an experimental release, but you can download it directly from the previous link.
done_testing allows you to declare the number of tests you think you've run at the end of your testing script, rather than trying to guess it before your script starts. You can read the documentation here.