Testing an XS module that uses Dist::Zilla - perl

I'm working on a Perl module that has a lot of XS code and also uses Dist::Zilla to manage packaging. What's the best way to test things efficiently? I know about dzil test, but that's pretty slow because it does a full build/compile/test cycle every time it's invoked.
It would be nice to only update the parts that need updating since last test, and also to be able to run only certain t/*.t test scripts rather than all of them. Anyone have a solution that they like?

I have, in the past, just taken a Build.PL/Makefile.PL as generated by dzil and dropped it into the source repository as a "Makefile_dev.PL" (or "Build_dev.PL"), added it to MANIFEST.SKIP (or the dzil-based, generated equivalent) and used it during development.

For my XS modules, I use either MakeMaker::Custom or ModuleBuild::Custom (both by me). If you set things up properly, you can run Makefile.PL or Build.PL directly in your repo without invoking dzil at all. To run specific tests, you just build the dist and use prove -b testname.
Some examples using ModuleBuild::Custom: Media-LibMTP-API, Win32-IPC.
An example using MakeMaker::Custom: Win32-Setupsup.

I know I'm pigeonholing myself as old-school, but its for these very reasons that I don't use Dist::Zilla: when it works its great, when it doesn't it can be really hard to make it do what you want.
I guess that means, my answer is: when it gets too hard, just move to one of the primary tools that dzil generates, ie. EUMM or MB directly.

Related

How can I have 2 verions of Gensim for summarization in one Jupyter notebook?

I want to have 2 versions of Gensim for using summarization and keyword function from old Gensim.
How can I setup this senario?
In general, a single Jupyter notebook is backed by a single Python interpreter/environment, and popular packages at their 'official' installation paths can only be installed once.
There are a few hackish workarounds suggested in answers like:
Installing multiple versions of a package with pip
However, each workaround presents operational problems.
One approach is to install the older package to a non-standard path (directory) that's still found by Python importing logic (controlled by PYTHONPATH). For example, put/move the older copy of Gensim to a gensim_old package directory. But: this is only likely to work well with very sime (single-.py-file) packages.
With any signficant library (like Gensim) which cross-imports a lot of things from its own utility modules, using the standard paths, lots of things are likely to break unless you dig into all involved individual files to change their import paths. That's kind of kludgey & hard-to-maintain. (Though, to the extent you're just using one old version, say gensim-3.8.3 for the removed summarization feature, perhaps it'd be worth fighting through this process once, then keeping the changes around.)
Another approach is to create a totally-separate Python environment with the alternate version, and only use that other environment from the notebook by a system-call – via either something in Python-code like subprocess.call(), or the notebook-cell ! or !! magic-escapes to run a shell command. That is, you give up the ability to run individual interactive lines of Python in that alt environment - but could still send it batches of data, and either capture the console output or observe its output files to continue processing in your notebook.
I'd expect this to be a better option – cleaner & more-maintainable – provided that either the old-version-functionality (summarization) or new-version-functionality (whatever else) can be condensed into one (or a few) single-step scripts.
Another option would be to try to completely copy the gensim.summarization source code files to some new location inside your own project – performing whatever (few, minor) edits are necessary to ensure it works from the alternate location.
One of the reasons that functionality was removed was that its approach to things like tokenization was not consistent/integrated with other Gensim practices – which actually means it's likely to be a little easier to keep it working (given its use of its own idiosyncratic approaches) separately.
Personally I'd rank these three options desirability as:
(best) Section off the summarization tasks to be run via subprocess executions in a separate Python environment, which has only the older package installed.
(maybe ok) Copy the 10 .py files that implement the gensim.summarization' to your own local module. Edit lightly as necessary to ensure they still work. (That should mainly be updating import` lines, but might reuire a few other adaptations to other Python 3.x/Gensim 4.x changes.)
(probably too messy) Install the whole old package to a non-standard directory, edit lots of files to ensure anything you're using still works.
Finally, note that the main reason the feature was removed is that it did not offer very impressive or adaptable results. While I've seen some people say it's worked OK for their applications, I've never seen even so much as a demo where its practices/algorithm – which can only extract some subset of important sentences, never paraphrase – gave impressive results.
So unless you already know that its approach works well for your needs, don't get your hopes up! Good luck.

In Perl can I create a test in one file that only runs after a test in another file has run?

I am running test files on a Perl module I have written and I would like to have a condition where some tests are only run after others have been run successfully (but still keeping them in separate files).
I was looking at Test::Builder, but I don't think it caters for cross-file testing.
Just to explain why I want to do this; I have a test file for every subroutine in my module. Some of these subroutines are passed large hashes from other subroutines which are difficult to replicate for testing purposes.
So instead of spending a few hours trying to hard-code a testable hash, I would like to be passed one from the code like it would be, only after the subroutine where that hash has been generated is tested.
I hope that makes sense! I could write a script to just run the tests in a certain order, but thought that there may well already be a feature in a Perl testing module that I haven't seen. When it comes to using the module, I ideally want to be able to run the tests without having to fiddle about with the 'make test' bit.
The Perl test harness seems to run test files in alphabetical order. This is why so many CPAN distributions have test files that start with numbers - so that the author can control the order of test execution. That will break if, for example, someone runs those tests with prove -s.
In general a test file should be seen as a completely separate unit of testing. You should be able to run all of your tests in any order without any of them affecting any of the others. If two tests rely on each other then they should be in the same file.
You haven't explained why you're so keen for these tests to be in separate files. Perhaps that's the assumption that you should be questioning.

Are there any modules I'm missing to help me write better code?

I'm using the following command to test my perl code:
perl -MB::Lint::StrictOO -MO=Lint,all,oo -M-circular::require -M-indirect -Mwarnings::method -Mwarnings::unused -c $file
On a system with a perl version less than 5.10 I am also using uninit.
I am also using Perl::Critic and Perl::Tidy and have set up the appropriate rc files to my liking.
These modules have done a great job in helping me break some bad habits I learned when first learning perl.
Are there any more modules or pragmas that will kick me back on the straight and narrow when I mess up?
Using tests, and the Test::* family of modules and some good books have been pointed out. This new information has caused me to reconsider some assumptions about the relationship between testing and code skill building. These are all appreciated and already being researched and put to use.
It seems to me that these are two separate parts of a whole. 'perl -c', Perl::Critic and Perl::Tidy all help during the process of writing code and before execution of code. Devel::Cover, Devel::NYTProf and Tests happen during and after execution of code.
Good development dictates an iterative process, so tests will be run, and code developed over and over, but we still have this separation.
It appears to me that the focus in the answers have been on the 'during and after execution' of code. Again, this is very appreciated. Can I assume that I have the 'writing and pre-execution' part down pretty well then? At least, insomuch as the pragmas, modules and utilities are concerned.
I'm a little worried that you're using Perl 5.9. For two reasons.
Firstly it's a little old. 5.9.0 was released in 2003 and 5.9.5 (the last version in the 5.9.x series) was released in 2007. There have been several high quality versions of Perl since then.
Secondly (and most importantly), 5.9 is an unstable development version of Perl. 5.9 is basically the series of experiments that eventually led to Perl 5.10.0. The only reason to use it is to test that 5.10 will be a stable version of Perl. No-one should be using it at all now.
You don't appear to be testing your code, merely checking that it will compile. I suggest that you look at Test::More (which makes writing actual tests nice and easy), Test::Class (which makes dealing with very large test suites easier), and Devel::Cover (to see which bits of your code are covered by your tests and which aren't).

Why is "package" keyword sometimes separated by a comment from the package name?

Analyzing sources of CPAN modules I can see something like this:
...
package # hide from PAUSE
Try::Tiny::ScopeGuard;
...
Obviously, it's taken from Try::Tiny, but I have seen this kind of comments between package keyword and package identifier in other modules too.
Why this procedure is used? What is its goal and what benefits does it have?
It is indeed a hack to hide a package from PAUSE's indexer.
When a distribution is uploaded to PAUSE, the indexer will examine each file in the upload, looking for the names of packages that are included in the distribution. Any indexed packages can show up in CPAN search results.
There are many reasons for not wanting the indexer to discover your packages. Your distribution may have many small or insignificant packages that would clutter up the search results for your module. You may have packages defined in your t (test) directory or some other non-standard directory that are not meant to be installed as part of the distribution. Your distribution may include files from a completely different distribution (that somebody else wrote).
The hack works because the indexer strictly looks for the keyword package and an expression that looks like a package name on the same line.
Nowadays, you can include a META.yml file with your distribution. The PAUSE indexer will look for and respect a no_index specification in this file. But this is a relatively new capability of the indexer so older modules and old-timer CPAN contributors will still use the line break hack.
Here's an example of a no_index spec from Forks::Super
no_index:
directory:
- t
- inc
package:
- Sys::CpuAffinity
- Signals::XSIG
- Signals::XSIG::Default
- Signals::XSIG::TieArray56
Sys::CpuAffinity and Signals::XSIG are separate distributions that are also packaged with Forks::Super. Some of the test scripts contain package declarations (e.g., Arbitrary::Test::Package) that shouldn't be indexed.
Okay, here's another shot at this phenomenon ... I've been whacky-hacking Perl for a dozen years and I've rarely seen this packy hack and possibly simply ignored and never bothered to investigate. One thing seems clear, though. There's some hackish processing going on at PAUSE that's been crafted in the good ol' Perl'n'UNIX school of thought that without the shadow of a doubt involves line-oriented text parsing, so they parse those Perl files, possibly even using grep, but rather perl itself, who knows, to extract package names and then kick of some procedure or get some stats or whatnot. And to trip up this procedure and hack around its ways the author splits the package declaration in two lines so the hacky packy grep job doesn't have a clue that there's a package declared right under its nose and the programmer is happy about his hacky skills and the PAUSE stats or whatever it is they're cobbling together are as they should be. Does that make sense?

How to guess minimum perl version a particular script is written for?

I have a bunch of scripts that I wrote at times when I did not realize how use v1.2.3; can be useful. So some of them may be using features from later versions of perl, some of them may be OK with, say, perl 5.8.
Now I would like to get that into some order and add proper uses where there is need for them, just to be able to sleep better. :-)
How should I do that? Is there any tool that could help me make an educated guess?
Perl::MinimumVersion
Find a minimum required version of perl for Perl code
The most reliable way is 1) to write a decent test suite, then 2) to run your tests using each version of Perl.
You've surely already done the first part (!), and the second part is actually pretty easy to do using perlbrew.