How should I deal with failing tests for bugs that will not be fixed

How should I deal with failing tests for bugs that will not be fixed - perl

I have a complex set of integration tests that uses Perl's WWW::Mechanize to drive a web app and check the results based on specific combinations of data. There are over 20 subroutines that make up the logic of the tests, loop through data, etc. Each test runs several of the test subroutines on a different dataset.
The web app is not perfect, so sometimes bugs cause the tests to fail with very specific combinations of data. But these combinations are rare enough that our team will not bother to fix the bugs for a long time; building many other new features takes priority.
So what should I do with the failing tests? It's just a few tests out of several dozen per combination of data.
1) I can't let it fail because then the whole test suite would fail.
2) If we comment them out, that means we miss out on making that test for all the other datasets.
3) I could add a flag in the specific dataset that fails, and have the test not run if that flag is set, but then I'm passing extra flags all over the place in my test subroutines.
What's the cleanest and easiest way to do this?
Or are clean and easy mutually exclusive?

That's what TODO is for.
With a todo block, the tests inside are expected to fail. Test::More will run the tests normally, but print out special flags indicating they are "todo". Test::Harness will interpret failures as being ok. Should anything succeed, it will report it as an unexpected success. You then know the thing you had todo is done and can remove the TODO flag.
The nice part about todo tests, as opposed to simply commenting out a block of tests, is it's like having a programmatic todo list. You know how much work is left to be done, you're aware of what bugs there are, and you'll know immediately when they're fixed.
Once a todo test starts succeeding, simply move it outside the block. When the block is empty, delete it.

I see two major options
disable the test (commenting it out), with a reference to your bugtracking system (i.e. a bug ig), possibly keeping a note in the bug as well that there is a test ready for this bug
move the failing tests in a seperate test suite. You could even reverse the failing assertion so you can run the suite and while it is green the bug is still there and if it becomes red either the bug is gone or something else is fishy. Of course a link to the bugtracking system and bag is still a good thing to have.

If you actually use Test::More in conjunction with WWW::Mechanize, case closed (see comment from #daxim). If not, think of a similar approach:
# In your testing module
our $TODO;
# ...
if (defined $TODO) {
# only print warnings
};
# in a test script
local $My::Test::TODO = "This bug is delayed until iteration 42";

Related

VSTS Test fails but vstest.console passes; the assert executes before the code for some reason?

Well the system we have has a bunch of dependencies, but I'll try to summarize what's going on without divulging too much details.
Test assembly in the form of a .dll is the one being executed. A lot of these tests call an API.
In the problematic method, there's 2 API calls that have an await on them: one to write a record to that external interface, and another to extract all records and then read the last one in that external interface, both via API. The test is simply to check if writing the last record was successful in an end-to-end context, that's why there's both a write and then a read.
If we execute the test in Visual Studio, everything works as expected. I also tested it manually via command lining vstest.console.exe, and the expected results always come out as well.
However, when it comes to VS Test task in VSTS, it fails for some reason. We've been trying to figure it out, and eventually we reached the point where we printed the list from the 'read' part. It turns out the last record we inserted isn't in the data we pulled, but if we check the external interface via a different method, we confirmed that the write process actually happened. What gives? Why is VSTest getting like an outdated set of records?
We also noticed two things:
1.) For the tests that passed, none of the Console.WriteLine outputs appear in the logs. Only on Failed test do they do so.
2.) Even if our Data.Should.Be call is at the very end of the TestMethod, the logs report the fail BEFORE it prints out the lines! And even then, the printing should happen after reading the list of records, and yet when the prints do happen we're still missing the record we just wrote.
Is there like a bottom-to-top thing we're missing here? It really seems to me like VSTS vstest is executing the assert before the actual code. The order of TestMethods happen the right order though (the 4th test written top-to-bottom in the code is executed 4th rather than 4th to last) and we need them to happen in the right order because some of the later tests depend on the former tests succeeding.
Anything we're missing here? I'd put a source code but there's a bunch of things I need to scrub first if so.

Turns out we were sorely misunderstanding what 'await' does. We're using .Wait() instead for the culprit and will also go back through the other tests to check for quality.

may the compiler optimize based on assert(...) expressions/contracts?

http://dlang.org/expression.html#AssertExpression
Regarding assert(0): "The optimization and code generation phases of compilation may assume that it is unreachable code."
The same documentation claims assert(0) is a 'special case', but there are several reasons that follow.
Can the D compiler optimize based on general assert-ions made in contracts and elsewhere?
(as if I needed another reason to enjoy the in{} and out{} constructs, but it certainly would make me feel a little more giddy to know that writing them could make things go fwoosh-ier)

In theory, yes, in practice, I don't think it does, especially since the asserts are killed before even getting to the optimizer on dmd -release. I'm not sure about gdc and ldc, but I think they share this portion of the code.
The spec's special case reference btw is that assert(0) is still present, in some form, with the -release compile flag. It is translated into an illegal instruction there (asm {hlt;} - non-kernel programs on x86 aren't allowed to use that so it will segfault upon hitting it), whereas all other asserts are simply left out of the code entirely in -release mode.

GDC certainly does optimise based on asserts. The if conditions make for much better code, even causing unnecessary code to disappear. However, unfortunately at the moment the way it is implemented is that the entire assert can disappear in release build mode so then the compiler never sees the beneficial if-condition info and actually generates worse code in release than in debug mode! Ironic. I have to admit that I've only looked at this effect with if conditions in asserts in the body, I haven't checked what effect in and out blocks have. The in- and out- etc contract blocks can be turned off based on a command line switch iirc, so they are not even compiled, I think this possibly means the compiler doesn't even look at them. So this is another thing that might possibly affect code generation, I haven't looked at it. But there is a feature here that I would very much like to see, that the if condition truth values in the assert conditions (checking that there is no side-effect code in the expression for the assert cond) can always be injected into the compiler as an assumption, just as if there had been an if statement even in release mode. It would involve pretending you had just seen an if ( xxx ) but with the actual code generation for the test suppressed in release mode, and with subsequent code feeling the beneficial effects of say known truth values, value-range limits and so on.

Which is better in PHP: suppress warnings with '#' or run extra checks with isset()?

For example, if I implement some simple object caching, which method is faster?
1. return isset($cache[$cls]) ? $cache[$cls] : $cache[$cls] = new $cls;
2. return #$cache[$cls] ?: $cache[$cls] = new $cls;
I read somewhere # takes significant time to execute (and I wonder why), especially when warnings/notices are actually being issued and suppressed. isset() on the other hand means an extra hash lookup. So which is better and why?
I do want to keep E_NOTICE on globally, both on dev and production servers.

I wouldn't worry about which method is FASTER. That is a micro-optimization. I would worry more about which is more readable code and better coding practice.
I would certainly prefer your first option over the second, as your intent is much clearer. Also, best to keep away edge condition problems by always explicitly testing variables to make sure you are getting what you are expecting to get. For example, what if the class stored in $cache[$cls] is not of type $cls?
Personally, if I typically would not expect the index on $cache to be unset, then I would also put error handling in there rather than using ternary operations. If I could reasonably expect that that index would be unset on a regular basis, then I would make class $cls behave as a singleton and have your code be something like
return $cls::get_instance();

The isset() approach is better. It is code that explicitly states the index may be undefined. Suppressing the error is sloppy coding.
According to this article 10 Performance Tips to Speed Up PHP, warnings take additional execution time and also claims the # operator is "expensive."
Cleaning up warnings and errors beforehand can also keep you from
using # error suppression, which is expensive.
Additionally, the # will not suppress the errors with respect to custom error handlers:
http://www.php.net/manual/en/language.operators.errorcontrol.php
If you have set a custom error handler function with
set_error_handler() then it will still get called, but this custom
error handler can (and should) call error_reporting() which will
return 0 when the call that triggered the error was preceded by an #.
If the track_errors feature is enabled, any error message generated by
the expression will be saved in the variable $php_errormsg. This
variable will be overwritten on each error, so check early if you want
to use it.

# temporarily changes the error_reporting state, that's why it is said to take time.
If you expect a certain value, the first thing to do to validate it, is to check that it is defined. If you have notices, it's probably because you're missing something. Using isset() is, in my opinion, a good practice.

I ran timing tests for both cases, using hash keys of various lengths, also using various hit/miss ratios for the hash table, plus with and without E_NOTICE.
The results were: with error_reporting(E_ALL) the isset() variant was faster than the # by some 20-30%. Platform used: command line PHP 5.4.7 on OS X 10.8.
However, with error_reporting(E_ALL & ~E_NOTICE) the difference was within 1-2% for short hash keys, and up 10% for longer ones (16 chars).
Note that the first variant executes 2 hash table lookups, whereas the variant with # does only one lookup.
Thus, # is inferior in all scenarios and I wonder if there are any plans to optimize it.

I think you have your priorities a little mixed up here.
First of all, if you want to get a real world test of which is faster - load test them. As stated though suppressing will probably be slower.
The problem here is if you have performance issues with regular code, you should be upgrading your hardware, or optimize the grand logic of your code rather than preventing proper execution and error checking.
Suppressing errors to steal the tiniest fraction of a speed gain won't do you any favours in the long run. Especially if you think that this error may keep happening time and time again, and cause your app to run more slowly than if the error was caught and fixed.

pytest: are pytest_sessionstart() and pytest_sessionfinish() valid hooks?

are pytest_sessionstart(session) and pytest_sessionfinish(session) valid hooks? They are not described in dev hook docs or latest hook docs
What is the difference between them and pytest_configure(config)/pytest_unconfigure(config)?
In docs it is said:
pytest_configure(config)called after command line options have been parsed. and all plugins
and initial conftest files been loaded.
and
pytest_unconfigure(config) called before test process is exited.
Session is the same, right?
Thanks!

The bad news is that the situation with sessionstart/configure is not very well specified. Sessionstart in particular is not much documented because the semantics differ if one is in the xdist/distribution case or not. One can distinguish these situations but it's all a bit too complicated.
The good news is that pytest-2.3 should make things easier. If you define a #fixture with scope="session" you can implement a fixture that is called once per process within which test execute.
For distributed testing, this means once per test slave. For single-process testing, it means once for the whole test run. In either case, if you do a "--collectonly" run, or "-h" or other options that do not involve the running of tests, then fixture functions will not execute at all.
Hope this clarifies.

Why do I need to know how many tests I will be running with Test::More?

Am I a bad person if I use use Test::More qw(no_plan)?
The Test::More POD says
Before anything else, you need a testing plan. This basically declares how many tests your script is going to run to protect against premature failure...
use Test::More tests => 23;
There are rare cases when you will not know beforehand how many tests your script is going to run. In this case, you can declare that you have no plan. (Try to avoid using this as it weakens your test.)
use Test::More qw(no_plan);
But premature failure can be easily seen when there are no results printed at the end of a test run. It just doesn't seem that helpful.
So I have 3 questions:
What is the reasoning behind requiring a test plan by default?
Has anyone found this a useful and time saving feature in the long run?
Do other test suites for other languages support this kind of thing?

What is the reason for requiring a test plan by default?
ysth's answer links to a great discussion of this issue which includes comments by Michael Schwern and Ovid who are the Test::More and Test::Most maintainers respectively. Apparently this comes up every once in a while on the perl-qa list and is a bit of a contentious issue. Here are the highlights:
Reasons to not use a test plan
Its annoying and takes time.
Its not worth the time because test scripts won't die without the test harness noticing except in some rare cases.
Test::More can count tests as they happen
If you use a test plan and need to skip tests, then you have the additional pain of needing a SKIP{} block.
Reasons to use a test plan
It only takes a few seconds to do. If it takes longer, your test logic is too complex.
If there is an exit(0) in the code somewhere, your test will complete successfully without running the remaining test cases. An observant human may notice the screen output doesn't look right, but in an automated test suite it could go unnoticed.
A developer might accidentally write test logic so that some tests never run.
You can't really have a progress bar without knowing ahead of time how many tests will be run. This is difficult to do through introspection alone.
The alternative
Test::Simple, Test::More, and Test::Most have a done_testing() method which should be called at the end of the test script. This is the approach I take currently.
This fixes the problem where code has an exit(0) in it. It doesn't fix the problem of logic which unintentionally skips tests though.
In short, its safer to use a plan, but the chances of this actually saving the day are low unless your test suites are complicated (and they should not be complicated).
So using done_testing() is a middle ground. Its probably not a huge deal whatever your preference.
Has this feature been useful to anyone in the real world?
A few people mention that this feature has been useful to them in the real word. This includes Larry Wall. Michael Schwern says the feature originates with Larry, more than 20 years ago.
Do other languages have this feature?
None of the xUnit type testing suites has the test plan feature. I haven't come across any examples of this feature being used in any other programming language.

I'm not sure what you are really asking because the documentation extract seems to answer it. I want to know if all my tests ran. However, I don't find that useful until the test suite stabilizes.
While developing, I use no_plan because I'm constantly adding to the test suite. As things stabilize, I verify the number of tests that should run and update the plan. Some people mention the "test harness" catching that already, but there is no such thing as "the test harness". There's the one that most modules use by default because that's what MakeMaker or Module::Build specify, but the TAP output is independent of any particular TAP consumer.
A couple of people have mentioned situations where the number of tests might vary. I figure out the tests however I need to compute the number then use that in the plan. It also helps to have small test files that target very specific functionality so the number of tests is low.
use vars qw( $tests );
BEGIN {
$tests = ...; # figure it out
use Test::More tests => $tests;
}
You can also separate the count from the loading:
use Test::More;
plan tests => $tests;
The latest TAP lets you put the plan at the end too.

In one comment, you seem to think prematurely exiting will count as a failure, since the plan won't be output at the end, but this isn't the case - the plan will be output unless
you terminate with POSIX::_exit or a fatal signal or the like. In particular, die() and exit() will result
in the plan being output (though the test harness should detect anything other than an exit(0) as a prematurely terminated test).
You may want to look at Test::Most's deferred plan option, soon to be in Test::More (if it's not already).
There's also been discussion of this on the perl-qa list recently. One thread: http://www.nntp.perl.org/group/perl.qa/2009/03/msg12121.html

Doing any testing is better than doing no testing, but testing is about being deliberate. Stating the number tests expected gives you the ability to see if there is a bug in the test script that is preventing a test from executing (or executing too many times). If you don't run tests under specific conditions you can use the skip function to declare this:
SKIP: {
skip $why, $how_many if $condition;
...normal testing code goes here...
}

I think it's ok to bend the rules and use no_plan when the human cost of figuring out the plan is too high, but this cost is a good indication that the test suite has not been well designed.
Another case where it's useful to have the test_plan explicitely defined is when you are doing this kind of tests:
$coderef = sub { my $arg = shift; isa_ok $arg, 'MyClass' };
do(#args, $coderef);
and
## hijack our interface to test it's called.
local *MyClass::do = $coderef;
If you don't specify a plan, it's easy to miss out that your test failed and that some assertions weren't run as you expected.

Having explicitly the number of test in the plan is a good idea, unless it is too expensive to retrieve this number. The question has been properly answered already but I wanted to stress two points:
Better than no_plan is to use done_testing()
use Test::More;
... run your tests ...;
done_testing( $number_of_tests_run );
# or done_testing() if not number of test is known
this Matt Trout blog entry is interesting, and rants about adding a plan vs cvs conflicts and other issues that make the plan problematic: Why numeric test plans are bad, wrong, and don't actually help anyway

I find it annoying, too, and I usually ignore the number at the very beginning until the test suite stabilizes. Then I just keep it up to date manually. I do like the idea of knowing how many total tests there are as the seconds tick by, as a kind of a progress indicator.
To make counting easier I put the following before each test:
#----- load non-existant record -----
....
#----- add a new record -----
....
#----- load the new record (by name) -----
....
#----- verify the name -----
etc.
Then I can quickly scan the file and easily count the tests, just looking for the #----- lines. I suppose I could even write something up in Emacs to do it for me, but it's honestly not that much of a chore.

It is a pain when doing TDD, because you are writing new tests opportunistically. When I was teaching TDD and the shop used Perl, we decided to use our test suite the no plan way. I guess we could have changed from no_plan to lock down the number of tests. At the time I saw it as more hindrance than help.

Eric Johnson's answer is exactly correct. I just wanted to add that done_testing, a much better replacement to no_plan, was released in Test-Simple 0.87_1 recently. It's an experimental release, but you can download it directly from the previous link.
done_testing allows you to declare the number of tests you think you've run at the end of your testing script, rather than trying to guess it before your script starts. You can read the documentation here.