How to make Perl's Devel::Cover ignore certain lines in coverage total? - perl

In some code coverage tools you can "hide" certain lines of code from the coverage tool, so that those lines do not count towards the coverage totals. For example, some code might be run only in circumstances that are hard or impossible to test (such as certain hardware failures). Thus, you might get 100% coverage reported even though some code was not exercised.
Setting aside for the moment whether this is wise, is this sort of thing possible with Perl's Devel::Cover?
(Devel::Cover can ignore entire files, but I am interested in ignoring just a few lines in a single file.)

A lot of uncoverable code features have been implemented but they are not documented because I wasn't sure of the interface. However, it's been a few years since anything changed in that area.
Probably the easiest way to see how to use the features is to look at tests/uncoverable in the distribution (see https://github.com/pjcj/Devel--Cover/blob/master/test/uncoverable). If you can't or don't want to change your code you can use the .uncoverable file (see https://github.com/pjcj/Devel--Cover/blob/master/tests/.uncoverable) and the cover options as mentioned by toolic.
If you do this, be sure to use the basic_html report which will mark a construct as in error if you tag it as uncoverable but it gets executed anyway.
I really should get around to tidying everything up and documenting it.

According to the TODO file on CPAN, this capability is not currently supported, but the developers see it as a valuable addition:
Enhancements:
Marking of unreachable code - commandline tool and gui.
The cover script mentions promising options: -add_uncoverable_point and -delete_uncoverable_point.

Related

How can I have 2 verions of Gensim for summarization in one Jupyter notebook?

I want to have 2 versions of Gensim for using summarization and keyword function from old Gensim.
How can I setup this senario?
In general, a single Jupyter notebook is backed by a single Python interpreter/environment, and popular packages at their 'official' installation paths can only be installed once.
There are a few hackish workarounds suggested in answers like:
Installing multiple versions of a package with pip
However, each workaround presents operational problems.
One approach is to install the older package to a non-standard path (directory) that's still found by Python importing logic (controlled by PYTHONPATH). For example, put/move the older copy of Gensim to a gensim_old package directory. But: this is only likely to work well with very sime (single-.py-file) packages.
With any signficant library (like Gensim) which cross-imports a lot of things from its own utility modules, using the standard paths, lots of things are likely to break unless you dig into all involved individual files to change their import paths. That's kind of kludgey & hard-to-maintain. (Though, to the extent you're just using one old version, say gensim-3.8.3 for the removed summarization feature, perhaps it'd be worth fighting through this process once, then keeping the changes around.)
Another approach is to create a totally-separate Python environment with the alternate version, and only use that other environment from the notebook by a system-call – via either something in Python-code like subprocess.call(), or the notebook-cell ! or !! magic-escapes to run a shell command. That is, you give up the ability to run individual interactive lines of Python in that alt environment - but could still send it batches of data, and either capture the console output or observe its output files to continue processing in your notebook.
I'd expect this to be a better option – cleaner & more-maintainable – provided that either the old-version-functionality (summarization) or new-version-functionality (whatever else) can be condensed into one (or a few) single-step scripts.
Another option would be to try to completely copy the gensim.summarization source code files to some new location inside your own project – performing whatever (few, minor) edits are necessary to ensure it works from the alternate location.
One of the reasons that functionality was removed was that its approach to things like tokenization was not consistent/integrated with other Gensim practices – which actually means it's likely to be a little easier to keep it working (given its use of its own idiosyncratic approaches) separately.
Personally I'd rank these three options desirability as:
(best) Section off the summarization tasks to be run via subprocess executions in a separate Python environment, which has only the older package installed.
(maybe ok) Copy the 10 .py files that implement the gensim.summarization' to your own local module. Edit lightly as necessary to ensure they still work. (That should mainly be updating import` lines, but might reuire a few other adaptations to other Python 3.x/Gensim 4.x changes.)
(probably too messy) Install the whole old package to a non-standard directory, edit lots of files to ensure anything you're using still works.
Finally, note that the main reason the feature was removed is that it did not offer very impressive or adaptable results. While I've seen some people say it's worked OK for their applications, I've never seen even so much as a demo where its practices/algorithm – which can only extract some subset of important sentences, never paraphrase – gave impressive results.
So unless you already know that its approach works well for your needs, don't get your hopes up! Good luck.

Tool to compare/diff HTML in bulk

I have a lot of HTML files (10,000's and GBs worth) scraped from a server and I want to check to make sure the server produces the same results after some modifications but ignore kinds of differences that don't matter, e.g. whitespace, missing newlines, timestamps, small changes in some kinds of number, etc.
Does anyone know of a tool for doing this? I'd really rather not do more filtering than I have to.
(Oh and it needs to run under linux)
You might consider using a clone detector such as our CloneDR. This tool parses large sets of computer program (HTML is special case) files, builds abstract syntax trees representing the essential structure of each files, and compares programs for similarity.
Because it is comparing essential program structure, it ignores inessential differences such as comments and whitespace, and deterimines that two code segments are either identical or one can be obtained from the other by substituting other blocks of code. The latter allows the recognition of code that has been modified in various ways. You can see samples of clone detection runs on a variety of computer languages at the web site.
In your case, what you would be looking for are files in system A which are essentially clones (exact or near misses) of files in system B. As a general rule, if a file a is a variant of file b (e.g., with a few changes) the CloneDr will report it as a clone and show the exact differences.
At the scale of 20,000 files, I can see why you want a tool, and I can see why you want near-miss matches rather than exact matches.
Doesn't run under Linux, but I assume your problem is hard to enough to solve so that isn't what you are optimizing.
I use winmerge alot in windows and from what i can see some people enjoy meld in linux, so perhaps that could do the trick for you
http://meld.sourceforge.net/
Other examples i saw from a quick googling was Kompare,xxdiff.sourceforge.net, and kdiff3.sourceforge.net
(could only post 1 link so wrote the adresses to xxdiff and kdiff3 as text)
Beyond Compare is purchased software that is actually worth the money (I never thought I'd hear myself typing that!). It is GUI based but handles thousands of files very well. It will allow you to specify unimportant changes with regular expressions as well as whitespace (beginning, middle and end of line). The feature set is very extensive, check out a trial download.
I do not work for this company, I just use Beyond Compare every day at work and enjoy it every time!

Code formatting and source control diffs

What source control products have a "diff" facility that ignores white space, braces, etc., in calculating the difference between checked-in versions? I seem to remember that Clearcase's diff did this but Visual SourceSafe (or at least the version I used) did not.
The reason I ask is probably pretty typical. Four perfectly reasonable developers on a team have four entirely different ways of formatting their code. Upon checking out the code last changed by someone else, each will immediately run some kind of program or editor macro to format things the way they like. They make actual code changes. They check-in their changes. They go on vacation. Two days later that program, which had been running fine for two years, blows up. The developer assigned to the bug does a diff between versions and finds 204 differences, only 3 of which are of any significance, because the diff algorithm is lame.
Yes, you can have coding standards. Most everyone finds them dreadful. A solution where everyone can have their cake and eat it too seems far more preferable.
=========
EDIT: Thanks to everyone for some great suggestions.
What I take away from this is:
(1) A source control system with plug-in type diffs is preferable.
(2) Find a diff with suitable options.
(3) Use a good source formatting program and settle on a check-in standard.
Sounds like a plan. Thanks again.
Git does have these options:
--ignore-space-at-eol
Ignore changes in whitespace at EOL.
-b, --ignore-space-change
Ignore changes in amount of whitespace. This ignores whitespace at line end, and considers all other sequences of one or more
whitespace characters to be equivalent.
-w, --ignore-all-space
Ignore whitespace when comparing lines. This ignores differences even if one line has whitespace where the other line has
none.
I am not sure if brace changes can be ignored using Git's diff.
If it is C/C++ code, you can define Astyle rules and then convert the source code's brace style to the one that you want, using Astyle. A git diff will then produce sane output.
Choose one (dreadful) coding standard, write it down in some official coding standards document, and get on with your life, messing with whitespace is not productive work.
And remember you are a professional developer, it's your job to get the project done, changing anything in the code because of a personal style preference hurts the project - it wont only make diff-ing more difficult, it can also introduce hard to find problems if your source formatter or compiler has bugs (and your fancy diff tool won't save you when two co-worker start fighting over casing).
And if someone just doesn't agree to work with the selected style just remind him (or her) that he is programming as a profession not as an hobby, see http://www.ericsink.com/entries/No_Great_Hackers.html
Maybe you should choose one format and run some indentation tool before checking in so that each person can check out, reformat to his/her own preferences, do the changes, reformat back to the official standard and then check in?
A couple of extra steps but they already use indentation tools when working. Maybe it can be a triggered check-in script?
Edit: this would perhaps also solve the brace problem.
(I haven't tried this solution myself, hence the "perhapes" and "maybes", but I have been in projects with the same problems, and it is a pain to try to go through diffs with hundreds of irrelevant changes that are not limited to whitespace, but includes the formatting itself.)
As explained in Is it possible for git-merge to ignore line-ending differences?, it is more a matter to associate the right diff tool to your favorite VCS, rather than to rely on the right VCS option (even if Git does have some options regarding whitespace, like the one mentioned in Alan's answer, it will always be not as complete as one would like).
DiffMerge is the more complete on those "ignore" options, as it can not only ignore spaces but also other "variations" based on the programming language used in a given file.
Subversion apparently supports this, either natively in the latest versions, or by using an alternate diff like Gnu Diff.
Beyond Compare does this (and much much more) and you can integrate it either in Subversion or Sourcesafe as an external diff tool.

CLI Patterns/Antipatterns for usability

What patterns contribute or detract from the usability of a CLI interface?
As an example consider the CLI for ClearCase. The CLI is very comprehensive (+1) but it is has several glaring opportunities. Recently, I wanted to force the files to lower case into ClearCase using clearfsimport. Unfortunately I wound up on the documentation for its cousin clearimport. It may seem slight but it cost me more hours than I care to admit. The variation in the middle got me.
Why provide such nearly identical functionality with such nearly identical names? There are many better options in my opinion
clearimport -fs
fsclearimport
clear_fs_import
clearimport_fs
Anything would be better than what they went with. The code I am working on IS a CLI and this experience made me look at my own choices. I think I have all the basics covered (standard help, long-form vs short-form, short meaningful names, providing examples, eliminate ambiguity, accurately handling spaces within quotes, etc).
There is some literature on this subject.
Perhaps a bad CLI is no different than a bad API. CLI are type of an API in some sense. The goals are naturally common:: flexibility, readability, and completeness. Several factors differentiate CLI from a typical API. One is that CLI needs to support scriptability (participate many times perhaps in a series of pipes). Another is that autocompletion and namespaces don't exist in the same way. You don't always have a nice colorful GUI doing stuff for you. CLIs must document themselves externally to customer directly. And finally the audience of a CLI is vastly different than the standard API. I appreciate any insight you may have.
I like the subcommand pattern, which I'm most familiar with as its implemented in the command-line Subversion client.
svn [subcommand] [options] [files]
Without the subcommands, subversion would have waaaaay too many different options for me to remember them effectively, and the help system would be a pain to slog through.
But, if I don't remember how any particular subcommand works, I can just type:
svn help [subcommand]
...and it shows me only the relevant portions of the help documentation.
As noted above, this format:
[master verb] [subverb] [optionally, noun] [options]
is good in terms of remembering what commands are available. cvs, svn, Perforce, git, all adhere to this. It improves discoverability of commands, a major CLI problem. One wrinkle that occurs here is options for the master-verb vs. options for the subverb. I.e.,
cvs -d dir command bar
is different than
cvs command -d dir bar
This was a confusing situation in cvs, which svn "fixed" by allowing options specified in any order. Your own solution may vary; if you have a very good reason to pass options to the master verb, okay, just be aware of the overhead.
Looking to API usability is a good idea too, but beware that there is no real typing in CLI commands, and there is a lot of richness in what CLI commands 'return', since you've got both a return code and an output to work with. In the unixy/streams world, the output is usually much more important than the return code. Getting the format of your output right is crucial. Also, while tempting, I've found that sending different things to stdout vs. stderr is not always useful; it confuses novice and even intermediate users (because they both get dumped to console in most cases), and rarely is useful advanced users. So unless there's a real need for it I avoid it; it's too easy for (e.g.) someone to get very confused about why the output of a command was '' in an error condition just because the programmer nicely dumped the errors to stderr.
Another issue in design is the "what next" problem. In a GUI, the next steps for the user are spelled out by the available buttons, menus, etc. In a CLI, the user can literally type any command next, and pipe any command to any other. (Or try, at least.) I design my commands to give hints (either in the help or the output) as to what potential next steps might be in a typical workflow.
Another good pattern is allowing user customization of the output. While it is possible for users to use cut, sort, etc. to tailor the output, being able to specify a format string magnifies the utility of a command. The example I cite here is top, which lets you tell it which columns you want.

Are there any good automated frameworks for applying coding standards in Perl?

One I am aware of is Perl::Critic
And my googling has resulted in no results on multiple attempts so far. :-(
Does anyone have any recommendations here?
Any resources to configure Perl::Critic as per our coding standards and run it on code base would be appreciated.
In terms of setting up a profile, have you tried perlcritic --profile-proto? This will emit to stdout all of your installed policies with all their options with descriptions of both, including their default values, in perlcriticrc format. Save and edit to match what you want. Whenever you upgrade Perl::Critic, you may want to run this command again and do a diff with your current perlcriticrc so you can see any changes to existing policies and pick up any new ones.
In terms of running perlcritic regularly, set up a Test::Perl::Critic test along with the rest of your tests. This is good for new code.
For your existing code, use Test::Perl::Critic::Progressive instead. T::P::C::Progressive will succeed the first time you run it, but will save counts on the number of violations; thereafter, T::P::C::Progressive will complain if any of the counts go up. One thing to look out for is when you revert changes in your source control system. (You are using one, aren't you?) Say I check in a change and run tests and my changes reduce the number of P::C violations. Later, it turns out my change was bad, so I revert to the old code. The T::P::C::Progressive test will fail due to the reduced counts. The easiest thing to do at this point is to just delete the history file (default location t/.perlcritic-history) and run again. It should reproduce your old counts and you can write new stuff to bring them down again.
Perl::Critic has a lot of policies that ship with it, but there are a bunch of add-on distributions of policies. Have a look at Task::Perl::Critic and
Task::Perl::Critic::IncludingOptionalDependencies.
You don't need to have a single perlcriticrc handle all your code. Create separate perlcriticrc files for each set of files you want to test and then a separate test that points to each one. For an example, have a look at the author tests for P::C itself at http://perlcritic.tigris.org/source/browse/perlcritic/trunk/Perl-Critic/xt/author/. When author tests are run, there's a test that runs over all the code of P::C, a second test that applies additional rules just on the policies, and a third one that criticizes P::C's tests.
I personally think that everyone should run at the "brutal" severity level, but knock out the policies that they don't agree with. Perl::Critic isn't entirely self compliant; even the P::C developers don't agree with everything Conway says. Look at the perlcriticrc files used on Perl::Critic itself and search the Perl::Critic code for instances of "## no critic"; I count 143 at present.
(Yes, I'm one of the Perl::Critic developers.)
There is perltidy for most stylistic standards. perlcritic can be easily configured using a .perlcritic file. I personally use the it at level one, but I've disabled a few policies.
In addition to 'automated frameworks', I highly recommend Damian Conway's Perl Best Practices. I don't agree with 100% of what he suggests, but most of the time he's bang on.
The post above mentioning Devel::Prof probably really means Devel::Cover (to get the code coverage of a test suite).
Like:
http://metacpan.org/pod/Perl::Critic
http://www.slideshare.net/joshua.mcadams/an-introduction-to-perl-critic/
Looks like a nice tool!
A nice combination is perlcritic with EPIC for Eclipse - hit CTRL-SHIFT-C (or your preferred configured shortcut) and your code is marked up with warning indicators wherever perlcritic has found something to complain about. Much nicer than remembering to run it before checkin. And as normal with perlcritic, it will pick up your .perlcriticrc so you can customise the rules. We keep our .perlcriticrc in version control so everyone gets the same standards.
In addition to the cosmetic best practices, I always find it useful to run Devel::Prof on my unit test suite to check test coverage.