Can I configure Matlab's Unit Test Framework to fail for specific warnings? - matlab

I have a Matlab model and lots of unit tests, based on Matlab's own class-based unit test framework (matlab.unittest.TestCase and matlab.unittest.TestRunner). Tests produce quite a lot of warnings, some of which are serious from my pov. I would like the framework to report a test case failure if some of the specific warnings pop up.
A test runner can be easily configured to fail on warnings. But then it will fail on any warning:
import matlab.unittest.TestRunner;
import matlab.unittest.plugins.FailOnWarningsPlugin;
runner = TestRunner.withNoPlugins;
runner.addPlugin(FailOnWarningsPlugin);
A test runner can be also configured to ignore specific warnings, for example:
runner.addPlugin(FailOnWarningsPlugin('Ignoring',{'MATLAB:singularMatrix'}));
Here is the documentation:
https://se.mathworks.com/help/matlab/ref/matlab.unittest.plugins.failonwarningsplugin-class.html
Using 'Ignoring' flag and listing lots of warnings seems troublesome.
Is there a way to do it the other way around? That is, to force my test cases to fail only on certain warnings and ignore others?

You can temporarily set warning to be reported as errors:
s= warning('error', 'MATLAB:DELETE:FileNotFound'); % set warning as an error
warn(s) % restore the warning to non-error
Reference: https://undocumentedmatlab.com/blog/trapping-warnings-efficiently

Related

How to report Fail instead of Error at fixture teadown?

I would like to make a fixture to do automatic assertions at the end of a test using it. However, such assertions are reported as ERROR instead of FAIL as it occurs during the teardown. Is there a solution ?

How to debug when hypothesis produces flaky test error?

I am using the hypothesis python package for stateful testing.
I am getting the following error when I run my tests:
hypothesis.errors.Flaky: Unreliable assumption: An example which satisfied assumptions on the first run now fails it.
I understand what flaky error means from a similar post. I have a test which failed the first time but passed during the second time. I can understand from the log, which test has led to this failure. Hypothesis tries the same test sequence 4 times during the overall test run among which, 2 of them pass and 2 of them fail.
I have tried the failing test individually without hypothesis and it does not fail. I am trying to understand what leads to the flaky error. Is it possibly a bug in Hypothesis as given in the post below:
What does Flaky: Hypothesis test produces unreliable results mean?
How do I get around this? Please find the log file of the test run at the link:
https://github.com/aparnasbose/hypothesis/blob/master/flaky%20test
The problem is almost certainly that your test is not deterministic for all inputs; there are some arguments or sequences of actions that Hypothesis can find which sometimes pass and sometimes fail. Hypothesis considers this a bug in your test, and raises the Flaky error.
To diagnose this in more detail I'd need to see your actual source code.
FYI verbose verbosity is much more useful here than debug (which dumps too much internal state). You may also want upgrade to Hypothesis >= 4.41.1 for improved statistics.

Capture all warnings that occur during execution

After executing a script I try to read the warning state via
matlab_warnings = warning;
The point is, not all warnings that occurred during execution are inside this warning state. When calling
warning('on', 'verbose');
I get a useful hint on how to disable a specific warning, however, I'm still curious why some warnings wont appear in warning. In my case I'm calling quadprog and this function (part of the Optimization Toolbox) throws the warnings I'm looking for.
warning does not return the warnings which occurred in your code, instead it returns the setting if a warning should be displayed or not. lastwarn is the only way to access warnings and it only allows access to the most recent warning.
If you know which parts of your code are probable to generate warnings, you could append lastwarn to a list each time after you execute your code. Code snippet below.
warnlist=[];
while somecondition
% Code that might generate a warning, eg your 'quadprog' function call.
warnlist = [warnlist; lastwarn]
warning('') % Clear the last warning, so you wont get dupes in the list
end

How should I deal with failing tests for bugs that will not be fixed

I have a complex set of integration tests that uses Perl's WWW::Mechanize to drive a web app and check the results based on specific combinations of data. There are over 20 subroutines that make up the logic of the tests, loop through data, etc. Each test runs several of the test subroutines on a different dataset.
The web app is not perfect, so sometimes bugs cause the tests to fail with very specific combinations of data. But these combinations are rare enough that our team will not bother to fix the bugs for a long time; building many other new features takes priority.
So what should I do with the failing tests? It's just a few tests out of several dozen per combination of data.
1) I can't let it fail because then the whole test suite would fail.
2) If we comment them out, that means we miss out on making that test for all the other datasets.
3) I could add a flag in the specific dataset that fails, and have the test not run if that flag is set, but then I'm passing extra flags all over the place in my test subroutines.
What's the cleanest and easiest way to do this?
Or are clean and easy mutually exclusive?
That's what TODO is for.
With a todo block, the tests inside are expected to fail. Test::More will run the tests normally, but print out special flags indicating they are "todo". Test::Harness will interpret failures as being ok. Should anything succeed, it will report it as an unexpected success. You then know the thing you had todo is done and can remove the TODO flag.
The nice part about todo tests, as opposed to simply commenting out a block of tests, is it's like having a programmatic todo list. You know how much work is left to be done, you're aware of what bugs there are, and you'll know immediately when they're fixed.
Once a todo test starts succeeding, simply move it outside the block. When the block is empty, delete it.
I see two major options
disable the test (commenting it out), with a reference to your bugtracking system (i.e. a bug ig), possibly keeping a note in the bug as well that there is a test ready for this bug
move the failing tests in a seperate test suite. You could even reverse the failing assertion so you can run the suite and while it is green the bug is still there and if it becomes red either the bug is gone or something else is fishy. Of course a link to the bugtracking system and bag is still a good thing to have.
If you actually use Test::More in conjunction with WWW::Mechanize, case closed (see comment from #daxim). If not, think of a similar approach:
# In your testing module
our $TODO;
# ...
if (defined $TODO) {
# only print warnings
};
# in a test script
local $My::Test::TODO = "This bug is delayed until iteration 42";

Why do I need to know how many tests I will be running with Test::More?

Am I a bad person if I use use Test::More qw(no_plan)?
The Test::More POD says
Before anything else, you need a testing plan. This basically declares how many tests your script is going to run to protect against premature failure...
use Test::More tests => 23;
There are rare cases when you will not know beforehand how many tests your script is going to run. In this case, you can declare that you have no plan. (Try to avoid using this as it weakens your test.)
use Test::More qw(no_plan);
But premature failure can be easily seen when there are no results printed at the end of a test run. It just doesn't seem that helpful.
So I have 3 questions:
What is the reasoning behind requiring a test plan by default?
Has anyone found this a useful and time saving feature in the long run?
Do other test suites for other languages support this kind of thing?
What is the reason for requiring a test plan by default?
ysth's answer links to a great discussion of this issue which includes comments by Michael Schwern and Ovid who are the Test::More and Test::Most maintainers respectively. Apparently this comes up every once in a while on the perl-qa list and is a bit of a contentious issue. Here are the highlights:
Reasons to not use a test plan
Its annoying and takes time.
Its not worth the time because test scripts won't die without the test harness noticing except in some rare cases.
Test::More can count tests as they happen
If you use a test plan and need to skip tests, then you have the additional pain of needing a SKIP{} block.
Reasons to use a test plan
It only takes a few seconds to do. If it takes longer, your test logic is too complex.
If there is an exit(0) in the code somewhere, your test will complete successfully without running the remaining test cases. An observant human may notice the screen output doesn't look right, but in an automated test suite it could go unnoticed.
A developer might accidentally write test logic so that some tests never run.
You can't really have a progress bar without knowing ahead of time how many tests will be run. This is difficult to do through introspection alone.
The alternative
Test::Simple, Test::More, and Test::Most have a done_testing() method which should be called at the end of the test script. This is the approach I take currently.
This fixes the problem where code has an exit(0) in it. It doesn't fix the problem of logic which unintentionally skips tests though.
In short, its safer to use a plan, but the chances of this actually saving the day are low unless your test suites are complicated (and they should not be complicated).
So using done_testing() is a middle ground. Its probably not a huge deal whatever your preference.
Has this feature been useful to anyone in the real world?
A few people mention that this feature has been useful to them in the real word. This includes Larry Wall. Michael Schwern says the feature originates with Larry, more than 20 years ago.
Do other languages have this feature?
None of the xUnit type testing suites has the test plan feature. I haven't come across any examples of this feature being used in any other programming language.
I'm not sure what you are really asking because the documentation extract seems to answer it. I want to know if all my tests ran. However, I don't find that useful until the test suite stabilizes.
While developing, I use no_plan because I'm constantly adding to the test suite. As things stabilize, I verify the number of tests that should run and update the plan. Some people mention the "test harness" catching that already, but there is no such thing as "the test harness". There's the one that most modules use by default because that's what MakeMaker or Module::Build specify, but the TAP output is independent of any particular TAP consumer.
A couple of people have mentioned situations where the number of tests might vary. I figure out the tests however I need to compute the number then use that in the plan. It also helps to have small test files that target very specific functionality so the number of tests is low.
use vars qw( $tests );
BEGIN {
$tests = ...; # figure it out
use Test::More tests => $tests;
}
You can also separate the count from the loading:
use Test::More;
plan tests => $tests;
The latest TAP lets you put the plan at the end too.
In one comment, you seem to think prematurely exiting will count as a failure, since the plan won't be output at the end, but this isn't the case - the plan will be output unless
you terminate with POSIX::_exit or a fatal signal or the like. In particular, die() and exit() will result
in the plan being output (though the test harness should detect anything other than an exit(0) as a prematurely terminated test).
You may want to look at Test::Most's deferred plan option, soon to be in Test::More (if it's not already).
There's also been discussion of this on the perl-qa list recently. One thread: http://www.nntp.perl.org/group/perl.qa/2009/03/msg12121.html
Doing any testing is better than doing no testing, but testing is about being deliberate. Stating the number tests expected gives you the ability to see if there is a bug in the test script that is preventing a test from executing (or executing too many times). If you don't run tests under specific conditions you can use the skip function to declare this:
SKIP: {
skip $why, $how_many if $condition;
...normal testing code goes here...
}
I think it's ok to bend the rules and use no_plan when the human cost of figuring out the plan is too high, but this cost is a good indication that the test suite has not been well designed.
Another case where it's useful to have the test_plan explicitely defined is when you are doing this kind of tests:
$coderef = sub { my $arg = shift; isa_ok $arg, 'MyClass' };
do(#args, $coderef);
and
## hijack our interface to test it's called.
local *MyClass::do = $coderef;
If you don't specify a plan, it's easy to miss out that your test failed and that some assertions weren't run as you expected.
Having explicitly the number of test in the plan is a good idea, unless it is too expensive to retrieve this number. The question has been properly answered already but I wanted to stress two points:
Better than no_plan is to use done_testing()
use Test::More;
... run your tests ...;
done_testing( $number_of_tests_run );
# or done_testing() if not number of test is known
this Matt Trout blog entry is interesting, and rants about adding a plan vs cvs conflicts and other issues that make the plan problematic: Why numeric test plans are bad, wrong, and don't actually help anyway
I find it annoying, too, and I usually ignore the number at the very beginning until the test suite stabilizes. Then I just keep it up to date manually. I do like the idea of knowing how many total tests there are as the seconds tick by, as a kind of a progress indicator.
To make counting easier I put the following before each test:
#----- load non-existant record -----
....
#----- add a new record -----
....
#----- load the new record (by name) -----
....
#----- verify the name -----
etc.
Then I can quickly scan the file and easily count the tests, just looking for the #----- lines. I suppose I could even write something up in Emacs to do it for me, but it's honestly not that much of a chore.
It is a pain when doing TDD, because you are writing new tests opportunistically. When I was teaching TDD and the shop used Perl, we decided to use our test suite the no plan way. I guess we could have changed from no_plan to lock down the number of tests. At the time I saw it as more hindrance than help.
Eric Johnson's answer is exactly correct. I just wanted to add that done_testing, a much better replacement to no_plan, was released in Test-Simple 0.87_1 recently. It's an experimental release, but you can download it directly from the previous link.
done_testing allows you to declare the number of tests you think you've run at the end of your testing script, rather than trying to guess it before your script starts. You can read the documentation here.