Tool for helping with deduplication of Perl code? - perl

I'm looking for some tool/library that would scan given project tree, and report on code duplicates - i.e. blocks of code that are repeated in various files.
Is there anything like this?
As it is now, I have to view them (files) all, and search for duplicates, but it doesn't strike me as very efficient.

A System for Detecting Software Plagiarism might work; it supports Perl.
And here's a list.

Related

ctags or similar tagging system for a kernel source tree

ctags is a simple source code tagging system, also integrated in vi (and its flavours nvi, vim, etc.). AFAIK, it builds a plain text file where all the elements (functions, macros, ...) of the source code are indexed. But this file may become too large and unmanageable when the source code tree is extremely huge: this is the case of a kernel (Linux, *BSD, or similar).
Is still ctags or exuberant-ctags suitable for a complex source tree like a kernel?
If not, what tools (with the same integration in vi as ctags) can replace it? This may become subjective, so if possible provide a list of suggested tools: any comments, and references to a guide with the keyboard shortcuts in vi, are welcome.
Supported languages should be at least C, C++, assembly. The tool should be usable through CLI. I would principally like to jump to the definition of functions, macros, struct and similar objects (with ctags, pressing Ctrl+] with the cursor over the item name), to their manpages if possible, and back to the code.
The only alternative tool I know so far is GNU global, with a pretty complex vi integration, which seems to be possible only through Perl (and I can't find the equivalent of Ctrl+]).
The answer to your first point is a resounding yes.
You can use ctags to generate a tags file for different subtrees, thus keeping the size of the generated file to a minimum. At this point, you need to have a mechanism in place for searching for these multiple tags files. Vim provides this, of course.
I have given some advice here, so you may want to check that out.
Of course, I use exuberant-ctags there, so keep that in mind.

How to make Perl's Devel::Cover ignore certain lines in coverage total?

In some code coverage tools you can "hide" certain lines of code from the coverage tool, so that those lines do not count towards the coverage totals. For example, some code might be run only in circumstances that are hard or impossible to test (such as certain hardware failures). Thus, you might get 100% coverage reported even though some code was not exercised.
Setting aside for the moment whether this is wise, is this sort of thing possible with Perl's Devel::Cover?
(Devel::Cover can ignore entire files, but I am interested in ignoring just a few lines in a single file.)
A lot of uncoverable code features have been implemented but they are not documented because I wasn't sure of the interface. However, it's been a few years since anything changed in that area.
Probably the easiest way to see how to use the features is to look at tests/uncoverable in the distribution (see https://github.com/pjcj/Devel--Cover/blob/master/test/uncoverable). If you can't or don't want to change your code you can use the .uncoverable file (see https://github.com/pjcj/Devel--Cover/blob/master/tests/.uncoverable) and the cover options as mentioned by toolic.
If you do this, be sure to use the basic_html report which will mark a construct as in error if you tag it as uncoverable but it gets executed anyway.
I really should get around to tidying everything up and documenting it.
According to the TODO file on CPAN, this capability is not currently supported, but the developers see it as a valuable addition:
Enhancements:
Marking of unreachable code - commandline tool and gui.
The cover script mentions promising options: -add_uncoverable_point and -delete_uncoverable_point.

Lotus Notes Diff Tool

Is there any diff tool for Lotus Notes which allows to compare scripts, design elements and documents?
I see this is an old question, and most of the other answers are a little outdated now, so I thought I would add some hopefully valuable information for those who should stumble upon this now.
In Domino Designer, open either the Navigator or Package Explorer (Window menu -> Show Eclipse Views). Here you can expand databases/templates to see the design elements they contain. Select two or three elements (CTRL-click). They can be in different databases or the same database. Right click on one of the elements and select Compare with -> Each other.
You can also compare two databases element by element by selecting two databases/templates, right-clicking and selecting Compare with -> Each other. You will then get the differences between the two databases listed. You will be able to see which elements differ between the two databases, and which elements exist in one database but not the other. By double-clicking on a differing element, you will open a diff tool which lets you see differences line by line, and you can easily copy changes from left to right or right to left.
There is a tool from TeamStudio called Delta: http://www.teamstudio.com/products/delta.html
If all else fails (and by "all else" I mean the often ridiculous corporate procurement system) you can always do a an export to DXL (or a Design Synopsis for code alone) and use any decent text editor with a diff function. It's not TeamStudio Delta, but it will get you where you want to go.
There is a free tool from OpenNTF which does document comparisons:
http://www.openntf.org/Projects/pmt.nsf/ProjectLookup/Compare%20Notes%20Documents
Ytria also has a product which, among other things, will compare data documents (I don't believe it compares design elements).
http://www.ytria.com/website.nsf/WebPageRequest/Solutions_scanEZ_specen
And, I believe Martin Scott (http://www.martinscott.com) has a similar product which compares documents.
DDE (Domino Designer on Eclipse) let's you compare design elements natively. Same way as the search. It's pretty efficient (faster than a DXL exportation) and it's free.
I had a discussion on my blog a little while back about this:
http://rosshawkins.net/archive/2009/12/24/notesdomino-refactoringanalysis-tools.aspx
However what I've ended up doing in the past is exporting the design to the filesystem and using standard text tools (WinMerge and SublimeText for me personally) to do what I need.
Being able to do the raw dump is something that was added with the Eclipse based designer, and isn't overly obvious, but you can read more about it here:
rosshawkins.net/archive/2010/01/20/searching-the-contents-of-notesdomino-design-elements.aspx
(link mangled as my rep is too low to post 2 links in one post yet!)
Teamstudio Delta is really nice. However it might kill you with too many details. As Ross pointed out the Domino Designer 8.5 can use the Diff tool inherited from Eclipse. You also could head over to http://www.openntf.org and look for the DXLMagic project. It can generate a report that shows differences (including code) between 2 databases (typically a template and a variation of it). It is not as complete as Delta, but shows the essentials. It's free and source is included (Disclaimer: I wrote it).
This is what I do. I run a design synopsis of the database using the Notes Designer. Dump the file to a text file. You can actually split the synopsis out to different objects like Agents, Forms, Views, etc. Then you can run UNIX/Linux/Mac Unix commands to compare the elements. By doing this operation you find out what code is active, and have a complete documented source code. You do a lot of csplit and a few sed commands.
Version 12.0.1 has such a tool as part of the server. Look for comparedbs.ntf and designsynopsis.ntf on the Domino server.

Tool to compare/diff HTML in bulk

I have a lot of HTML files (10,000's and GBs worth) scraped from a server and I want to check to make sure the server produces the same results after some modifications but ignore kinds of differences that don't matter, e.g. whitespace, missing newlines, timestamps, small changes in some kinds of number, etc.
Does anyone know of a tool for doing this? I'd really rather not do more filtering than I have to.
(Oh and it needs to run under linux)
You might consider using a clone detector such as our CloneDR. This tool parses large sets of computer program (HTML is special case) files, builds abstract syntax trees representing the essential structure of each files, and compares programs for similarity.
Because it is comparing essential program structure, it ignores inessential differences such as comments and whitespace, and deterimines that two code segments are either identical or one can be obtained from the other by substituting other blocks of code. The latter allows the recognition of code that has been modified in various ways. You can see samples of clone detection runs on a variety of computer languages at the web site.
In your case, what you would be looking for are files in system A which are essentially clones (exact or near misses) of files in system B. As a general rule, if a file a is a variant of file b (e.g., with a few changes) the CloneDr will report it as a clone and show the exact differences.
At the scale of 20,000 files, I can see why you want a tool, and I can see why you want near-miss matches rather than exact matches.
Doesn't run under Linux, but I assume your problem is hard to enough to solve so that isn't what you are optimizing.
I use winmerge alot in windows and from what i can see some people enjoy meld in linux, so perhaps that could do the trick for you
http://meld.sourceforge.net/
Other examples i saw from a quick googling was Kompare,xxdiff.sourceforge.net, and kdiff3.sourceforge.net
(could only post 1 link so wrote the adresses to xxdiff and kdiff3 as text)
Beyond Compare is purchased software that is actually worth the money (I never thought I'd hear myself typing that!). It is GUI based but handles thousands of files very well. It will allow you to specify unimportant changes with regular expressions as well as whitespace (beginning, middle and end of line). The feature set is very extensive, check out a trial download.
I do not work for this company, I just use Beyond Compare every day at work and enjoy it every time!

How to diff two regions of the same file in Eclipse

I'm a TDDer and often have a need to refactor out common or similar code. Similar code is not always a result of copy and paste.
I'm not looking for tools to identify the regions or suspected duplications, there are a number of tools to do that. And if the code is exactly the same there is no big problem, Eclipse can almost always do that by itself.
I am looking for tools to visualize differences of sections of code that are radically different, but my human eye can see the structural similarities, and could possibly be made even more similar, so that the common code eventually could be factored out.
It would be very handy if there was a possibility to mark two regions and get Eclipse (or some other tool) to mark the differences. With this information it would be much simpler to iteratively move the regions closer until they are the same and then activate the Extract Method refactoring.
It can be done in Emacs of course, but I'd like to have this readily available from Eclipse. Any pointers?
Seems there was somewhat usable answers in this question, a question articulating the same need. But, again those answers focus on finding duplications, not vizualising it.
Two suggestions that works are KDiff3 and Diffuse. Both allow you to open up the same file twice or paste different sections in the panes. There seems to be no way to use them from Eclipse though.
I don not know of a way to mark regions and diff them in eclipse, but you can diff two files. In that way you might get what you are looking for by copying out the parts you want to diff in two paste files, at least 90%?
Select the two files you want to diff in the project tree and right-click -> select compare with -> each other.
cheers,
Jørgen