Perl: What is the definition of a test? - perl

I feel a bit ashamed to ask this question but I am curious. I recently wrote a script (without organizing the code in modules) that reads log files of a store and saves the info to the data base.
For example, I wrote something like this (via Richard Huxton):
while (<$infile>) {
if (/item_id:(\d+)\s*,\s*sold/) {
my $item_id = $1;
$item_id_sold_times{$item_id}++;
}
}
my #matched_items_ids = keys %item_id_sold_times;
my $owner_ids =
Store::Model::Map::ItemOwnerMap->fetch_by_keys( \#matched_item_ids )
->entry();
for my $owner_id (#$owner_ids) {
$item_id_owner_map{$owner_id}++;
}
Say, this script is called script.pl. When testing the file, I made a file script.t and had to repeat some blocks of script.pl in script.t. After copy pasting relevant code sections, I do confirmations like:
is( $item_id_sold_times{1}, 1, "Number of sold items of item 1" );
is( $item_id_owner_map{3}, 8, "Number of sold items for owner 3" );
And so on and so on.
But some have pointed out that what I wrote is not a test. It is a confirmation script. A good test would involve writing the code with modules, writing a script that kicks the methods in the modules and write a test for the module.
This has made me think about what is the definition of a test most widely used in software engineering. Perhaps some of you that have even tested Perl core functions can give me a hand. A script (not modulized) cannot be properly tested?
Regards

An important distinction is: If someone were to edit (bugfix or new feature) your 'script.pl' file, would running your 'script.t' file in its current state give any useful information on the state of the new 'script.pl'? For this to be the case you must include or use the .pl file instead of selectively copy/pasting excerpts.
In the best case you design your software modularly and plan for testing. If you want to test a monolithic script after the fact, I suppose you could write a test script that includes your main program after an exit() and at least have the subroutines to test...

You can test a program just like a module if you set it up as a modulino. I've written about this all over the place, but most notably in Mastering Perl. You can read Chapter 18 for online for free right now since I'm working on the second edition in public. :)

Without getting into verification versus validation, etc, a test is any activity that verifies that behavior of something else, that the behaviors match requirements and that certain undesirable behaviors don't occur. The scope of a test can be limited (only verifying a few of the desired behaviors) or it can be comprehensive (verifying most or all of the required or desired behaviors), probably with lots of steps or components to the test to make it so. In my mind, the term "confirmation" in your context is another way of saying "limited scope test". Certainly it doesn't completely verify the behavior of what your testing, but it does something. In direct answer to the question at hand: I think calling what you did a "confirmation" is just a matter of semantics.

Related

nytprofhtml seem to ignore module named DB

I'm trying to profile a big application that contains a module creatively named DB. After running it under -d:NYTProf and calling nytprofhtml, both without any additional switches or related environment variables, I get usual nytprof directory with HTML output. However it seems that due to some internal logic any output related to my module DB is severely mangled. Just to make sure, DB is pure Perl.
Top and all subroutines list: while other function links point to relevant -pm-NN-line.html file, links to subs from DB point to entry script instead.
"line" link in "Source Code Files" section does point to DB-pm-NN-line.html and it does exists, but unlike all other files it doesn't have "Statements" table inside and "Line" table have absolutely no lines of code, only summary of calls.
Actually, here's a small example:
# main.pl
use lib '.';
use DB;
for (1..10) {
print DB::do_stuff();
}
# DB.pm
package DB;
sub do_stuff {
my $a = 1;
my $b = 2;
my $c = $a + $b;
return $c;
}
1;
Try running perl -d:NYTProf main.pl, then nytprofhtml and then inspect nytprof/DB-pm-8-line.html.
I don't know if it happens because NYTProf itself have internal module named DB or it handles modules starting with DB in some magical way - I've noticed output for functions from DBI looks somewhat different too.
Is there a way to change/disable this behavior short of renaming my DB module?
That's hardly an option
You don't really have an option. The special DB package and the associated Devel:: namespace are coded into the perl compiler / interpreter. Unless you want to do without any debugging facilities altogether and live in fear of any mysterious unquantifiable side effects then you must refactor your code to rename your proprietary DB library
Anyway, the name is very generic and that's exactly why it is expected to be encountered
On the contrary, Devel::NYTProf is bound to use the existing core package DB. It's precisely because it is a very generic identifier that an engineer should reject it as a choice for production code, which could be required to work with pre-existing third-party code at any point
a module creatively named DB
This belies your own support of the choice. DBhas been a core module since v5.6 in 2000, and anything already in CPAN should have been off the cards from the start. I feel certain that the clash of namespaces must been discovered before now. Please don't be someone else who just sweeps the issue under the carpet
Remember that, as it stands, any code you are now putting in the DB package is sharing space with everything that perl has already put there. It is astonishing that you are not experiencing strange and inexplicable symptoms already
I don't see that it's too big a task to fix this as long as you have a proper test suite for your 10MB of Perl code. If you don't, then hopefully you will at least not make the same mistakes again

What happens if I reference a package but don't use/require it?

As much as I can (mostly for clarity/documentation), I've been trying to say
use Some::Module;
use Another::Module qw( some namespaces );
in my Perl modules that use other modules.
I've been cleaning up some old code and see some places where I reference modules in my code without ever having used them:
my $example = Yet::Another::Module->AFunction($data); # EXAMPLE 1
my $demo = Whats::The::Difference::Here($data); # EXAMPLE 2
So my questions are:
Is there a performance impact (I'm thinking compile time) by not stating use x and simply referencing it in the code?
I assume that I shouldn't use modules that aren't utilized in the code - I'm telling the compiler to compile code that is unnecessary.
What's the difference between calling functions in example 1's style versus example 2's style?
I would say that this falls firmly into the category of preemptive optimisation and if you're not sure, then leave it in. You would have to be including some vast unused libraries if removing them helped at all
It is typical of Perl to hide a complex issue behind a simple mechanism that will generally do what you mean without too much thought
The simple mechanisms are these
use My::Module 'function' is the same as writing
BEGIN {
require My::Module;
My::Module->import( 'function' );
}
The first time perl successfully executes a require statement, it adds an element to the global %INC hash which has the "pathified" module name (in this case, My/Module.pm) for a key and the absolute location where it found the source as a value
If another require for the same module is encountered (that is, it already exists in the %INC hash) then require does nothing
So your question
What happens if I reference a package but don't use/require it?
We're going to have a problem with use, utilise, include and reference here, so I'm code-quoting only use and require when I mean the Perl language words.
Keeping things simple, these are the three possibilities
As above, if require is seen more than once for the same module source, then it is ignored after the first time. The only overhead is checking to see whether there is a corresponding element in %INC
Clearly, if you use source files that aren't needed then you are doing unnecessary compilation. But Perl is damn fast, and you will be able to shave only fractions of a second from the build time unless you have a program that uses enormous libraries and looks like use Catalyst; print "Hello, world!\n";
We know what happens if you make method calls to a class library that has never been compiled. We get
Can't locate object method "new" via package "My::Class" (perhaps you forgot to load "My::Class"?)
If you're using a function library, then what matters is the part of use that says
My::Module->import( 'function' );
because the first part is require and we already know that require never does anything twice. Calling import is usually a simple function call, and you would be saving nothing significant by avoiding it
What is perhaps less obvious is that big modules that include multiple subsidiaries. For instance, if I write just
use LWP::UserAgent;
then it knows what it is likely to need, and these modules will also be compiled
Carp
Config
Exporter
Exporter::Heavy
Fcntl
HTTP::Date
HTTP::Headers
HTTP::Message
HTTP::Request
HTTP::Response
HTTP::Status
LWP
LWP::MemberMixin
LWP::Protocol
LWP::UserAgent
Storable
Time::Local
URI
URI::Escape
and that's ignoring the pragmas!
Did you ever feel like you were kicking your heels, waiting for an LWP program to compile?
I would say that, in the interests of keeping your Perl code clear and tidy, it may be an idea to remove unnecessary modules from the compilation phase. But don't agonise over it, and benchmark your build times before doing any pre-handover tidy. No one will thank you for reducing the build time by 20ms and then causing them hours of work because you removed a non-obvious requirement.
You actually have a bunch of questions.
Is there a performance impact (thinking compile time) by not stating use x and simply referencing it in the code?
No, there is no performance impact, because you can't do that. Every namespace you are using in a working program gets defined somewhere. Either you used or required it earlier to where it's called, or one of your dependencies did, or another way1 was used to make Perl aware of it
Perl keeps track of those things in symbol tables. They hold all the knowledge about namespaces and variable names. So if your Some::Module is not in the referenced symbol table, Perl will complain.
I assume that I shouldn't use modules that aren't utilized in the code - I'm telling the compiler to compile code that is unnecessary.
There is no question here. But yes, you should not do that.
It's hard to say if this is a performance impact. If you have a large Catalyst application that just runs and runs for months it doesn't really matter. Startup cost is usually not relevant in that case. But if this is a cronjob that runs every minute and processes a huge pile of data, then an additional module might well be a performance impact.
That's actually also a reason why all use and require statements should be at the top. So it's easy to find them if you need to add or remove some.
What's the difference between calling functions in example 1's style versus example 2's style?
Those are for different purposes mostly.
my $example = Yet::Another::Module->AFunction($data); # EXAMPLE 1
This syntax is very similar to the following:
my $e = Yet::Another::Module::AFunction('Yet::Another::Module', $data)
It's used for class methods in OOP. The most well-known one would be new, as in Foo->new. It passes the thing in front of the -> to the function named AFunction in the package of the thing on the left (either if it's blessed, or if it's an identifier) as the first argument. But it does more. Because it's a method call, it also takes inheritance into account.
package Yet::Another::Module;
use parent 'A::First::Module';
1;
package A::First::Module;
sub AFunction { ... }
In this case, your example would also call AFunction because it's inherited from A::First::Module. In addition to the symbol table referenced above, it uses #ISA to keep track of who inherits from whom. See perlobj for more details.
my $demo = Whats::The:Difference::Here($data); # EXAMPLE 2
This has a syntax error. There is a : missing after The.
my $demo = Whats::The::Difference::Here($data); # EXAMPLE 2
This is a function call. It calls the function Here in the package Whats::The::Difference and passes $data and nothing else.
Note that as Borodin points out in a comment, your function names are very atypical and confusing. Usually functions in Perl are written with all lowercase and with underscores _ instead of camel case. So AFunction should be a_function, and Here should be here.
1) for example, you can have multiple package definitions in one file, which you should not normally do, or you could assign stuff into a namespace directly with syntax like *Some::Namespace::frobnicate = sub {...}. There are other ways, but that's a bit out of scope for this answer.

Perl convert from script to module design ideas

I have a script that I would like to convert to a module and call its functions from a Perl script as I do with CPAN modules now. My question is how would others design the module, I'd like to get some idea how to proceed as I have never written a module before. The script, as written now does this:
1) Sets up logging to a db using am in house module
2) Establishes a connection to the db using DBI
3) Fetch a file(s) from a remote server using Net::SFTP:Foreign
4) Process the users in each file and add the data to the DB
The script currently takes command line options to override defaults using Getopt::Long.
Each file is a pipe delimited collection of user data, ~4000 lines, which goes in to the db, provided the user has an entry in our LDAP directory.
So more to the point:
How should I design my module? Should everything my script currently does be moved into the module, or are there some things that are best left in the script. For example, I was thinking of designing my module, so it would be called like this:
use MyModule;
$obj = MyModule->new; // possibly pass some arguments
$obj->setupLogging;
$obj->connectToDB;
$obj->fetchFiles;
$obj->processUsers;
That would keep the script nice and clean, but is this the best idea for the module? I was thinking of having the script retrieve the files and then just pass the paths to the module for processing.
Thanks
I think the most useful question is "What of this code is going to be usable by multiple scripts?" Surely the whole thing won't. I rarely create a perl module until I find myself writing the same subroutine more than once.
At that stage I generally dump it into something called "Utility.pm" until that gathers enough code that patterns (not in the gang of four sense really) start to suggest what belongs in a module with well-defined responsibilities.
The other thing you'll want to think about is whether your module is going to represent an object class or if it's going to be a collection of library subroutines.
In your example I could see Logging and Database connection management (though I'd likely use DBI) belonging in external modules.
But putting everything in there just leaves you with a five line script that calls "MyModule::DoStuff" and that's less than useful.
Most of the time ;)
"is this the best idea for the module? I was thinking of having the
script retrieve the files and then just pass the paths to the module
for processing."
It looks like a decent design to me. I also agree that you should not hardcode paths (or URLS) in the module, but I'm a little confused by what "having the script retrieve the files and then just pass the paths to the module for processing" means: do you mean, the script would write them to disk? Why do that? Instead, if it makes sense, have fetchFile(URL) retrieve a single file and return a reference/handle/object suitable for submission to processUsers, so the logic is like:
my #FileUrls = ( ... );
foreach (#FileUrls) {
my $file = $obj->fetchFile($_);
$obj->processUsers($file);
}
Although, if the purpose of fetchFile is just to get some raw text, you might want to make that a function independent of the database class -- either something that is part of the script, or another module.
If your reason for wanting to write to disk is that you don't want an entire file loaded into memory at once, you might want to adjust everything to process chunks of a file, such that you could have one kind of object for fetching files via a socket and outputting chunks as they are read to another kind of object (the process that adds to database one). In that case:
my $db = DBmodule->new(...);
my $netfiles = NetworkFileReader->new(...)
foreach (#FileUrls) {
$netfiles->connect($_);
$db->newfile(); # initialize/reset
while (my $chunk = $netfiles->more()) {
$db->process($chunk);
}
$db->complete();
$netfiles->close();
}
Or incorporate the chunk reading into the db class if you think it is more appropriate and unlikely to serve a generic purpose.

How do I find all the modules used by a Perl script?

I'm getting ready to try to deploy some code to multiple machines. As far as I know, using a Makefile.pm to track dependencies is the best way to ensure they are installed everywhere. The problem I have is I'm not sure our Makefile.pm has been updated as this application has passed through a few different developers.
Is there any way to automatically parse through either my source or a few full runs of my program to determine exactly what versions of what modules my application is depending on? On top of that, is there any way to filter it based on CPAN packages? (So that I only depend on Moose instead of every single module that comes with Moose.)
A third related question is, if you depend on a version of a module that is not the latest, what is the best way to have someone else install it? Should I start including entire localized Perl installations with my application?
Just to be clear - you can not generically get a list of modules that the app depends on by code analysis alone. E.g. if your apps does eval { require $module; $module->import() }, where $module is passed via command line, then this can ONLY be detected by actually running the specific command line version with ALL the module values.
If you do wish to do this, you can figure out every module used by a combination of runs via:
Devel::Cover. Coverage reports would list 100% of modules used. But you don't get version #s.
Print %INC at every single possible exit point in the code as slu's answer said. This should probably be done in END{} block as well as __DIE__ handler to cover all possible exit points, and even then may be not fully 100% covering in generic case if somewhere within the program your __DIE__ handler gets overwritten.
Devel::Modlist (also mentioned by slu's answer) - the downside compared to Devel::Cover is that it does NOT seem to be able to aggregate a database across multiple sample runs like Devel::Cover does. On the plus side, it's purpose-built, so has a lot of very useful options (CPAN paths, versions).
Please note that the other module (Module::ScanDeps) does NOT seem to allow you to do runtime analysis based on arbitrary command line arguments (e.g. it seems at first glance to only allow you to execute the program with no arguments) and if that's true, is inferior to all the above 3 methods for any code that may possibly load modules dynamically.
Module::ScanDeps - Recursively scan Perl code for dependencies
Does both static and runtime scanning. Just modules, I don't know of any exact way of verifying what versions from what distributions. You could get old packages from BackPan, or just package your entire chain of local dependencies up with PAR.
You could look at %INC, see http://www.perlmonks.org/?node_id=681911 which also mentions Devel::Modlist
I would definitely use Devel::TraceUse, which also shows a tree of the modules, so it's easy to guess where they are being loaded.

Dependencies in Perl code

I've been assigned to pick up a webapplication written in some old Perl Legacy code, get it working on our server to later extend it. The code was written 10 years ago by a solitary self-taught developer...
The code has weird stuff going on - they are not afraid to do lib-param.pl on line one, and later in the file do /lib-pl/lib-param.pl - which is offcourse a different file.
Including a.pl with methods b() and c() and later including d.pl with methods c() and e() seems to be quite popular too... Packages appear to be unknown, so you'll just find &c() somewhere in the code later.
Interesting questions:
Is there a tool that can draw relations between perl-files? Show a list of files used by each other file?
The same for MySQL databases and tables? Can it show which schema's/tables are used by which files?
Is there an IDE that knows which c() is called - the one in a.pl or the one in d.pl?
How would you start to try to understand the code?
I'm inclined to go through each file and refactor it, but am not allowed to do that - only the strict minimum to get the code working. (But since the code never uses strict, I don't know if I'm gonna...)
Not using strict is a mistake -- don't continue it. Move the stuff in d.pl to D.pm (or perhaps a better name alltogether), and if the code is procedural use Sub::Exporter to get those subs back into the calling package. strict is lexical, you can turn it on for just one package. Such as your new package D;. To find out which code is being called, use Devel::SimpleTrace.
perl -MDevel::SimpleTrace ./foo.pl
Now any warnings will be accompanied by a full back-log -- sprinkle warnings around the code and run it.
I think the MySQL question should be removed, from this. Schema Table mappings have nothing to do with perl, it seems an out of place distraction on this question.
I would write a utility to scan a complete list of all subs and which file they live in; then I would write a utility to give me a list of all function calls and which file they come from.
By the way - it is not terribly hard to write a fairly mindless static analysis tool to generate a call graph.
For many cases, in well-written code, that will be enough to help me out...