Is there any diff tool for Lotus Notes which allows to compare scripts, design elements and documents?
I see this is an old question, and most of the other answers are a little outdated now, so I thought I would add some hopefully valuable information for those who should stumble upon this now.
In Domino Designer, open either the Navigator or Package Explorer (Window menu -> Show Eclipse Views). Here you can expand databases/templates to see the design elements they contain. Select two or three elements (CTRL-click). They can be in different databases or the same database. Right click on one of the elements and select Compare with -> Each other.
You can also compare two databases element by element by selecting two databases/templates, right-clicking and selecting Compare with -> Each other. You will then get the differences between the two databases listed. You will be able to see which elements differ between the two databases, and which elements exist in one database but not the other. By double-clicking on a differing element, you will open a diff tool which lets you see differences line by line, and you can easily copy changes from left to right or right to left.
There is a tool from TeamStudio called Delta: http://www.teamstudio.com/products/delta.html
If all else fails (and by "all else" I mean the often ridiculous corporate procurement system) you can always do a an export to DXL (or a Design Synopsis for code alone) and use any decent text editor with a diff function. It's not TeamStudio Delta, but it will get you where you want to go.
There is a free tool from OpenNTF which does document comparisons:
http://www.openntf.org/Projects/pmt.nsf/ProjectLookup/Compare%20Notes%20Documents
Ytria also has a product which, among other things, will compare data documents (I don't believe it compares design elements).
http://www.ytria.com/website.nsf/WebPageRequest/Solutions_scanEZ_specen
And, I believe Martin Scott (http://www.martinscott.com) has a similar product which compares documents.
DDE (Domino Designer on Eclipse) let's you compare design elements natively. Same way as the search. It's pretty efficient (faster than a DXL exportation) and it's free.
I had a discussion on my blog a little while back about this:
http://rosshawkins.net/archive/2009/12/24/notesdomino-refactoringanalysis-tools.aspx
However what I've ended up doing in the past is exporting the design to the filesystem and using standard text tools (WinMerge and SublimeText for me personally) to do what I need.
Being able to do the raw dump is something that was added with the Eclipse based designer, and isn't overly obvious, but you can read more about it here:
rosshawkins.net/archive/2010/01/20/searching-the-contents-of-notesdomino-design-elements.aspx
(link mangled as my rep is too low to post 2 links in one post yet!)
Teamstudio Delta is really nice. However it might kill you with too many details. As Ross pointed out the Domino Designer 8.5 can use the Diff tool inherited from Eclipse. You also could head over to http://www.openntf.org and look for the DXLMagic project. It can generate a report that shows differences (including code) between 2 databases (typically a template and a variation of it). It is not as complete as Delta, but shows the essentials. It's free and source is included (Disclaimer: I wrote it).
This is what I do. I run a design synopsis of the database using the Notes Designer. Dump the file to a text file. You can actually split the synopsis out to different objects like Agents, Forms, Views, etc. Then you can run UNIX/Linux/Mac Unix commands to compare the elements. By doing this operation you find out what code is active, and have a complete documented source code. You do a lot of csplit and a few sed commands.
Version 12.0.1 has such a tool as part of the server. Look for comparedbs.ntf and designsynopsis.ntf on the Domino server.
Related
I have a lot of HTML files (10,000's and GBs worth) scraped from a server and I want to check to make sure the server produces the same results after some modifications but ignore kinds of differences that don't matter, e.g. whitespace, missing newlines, timestamps, small changes in some kinds of number, etc.
Does anyone know of a tool for doing this? I'd really rather not do more filtering than I have to.
(Oh and it needs to run under linux)
You might consider using a clone detector such as our CloneDR. This tool parses large sets of computer program (HTML is special case) files, builds abstract syntax trees representing the essential structure of each files, and compares programs for similarity.
Because it is comparing essential program structure, it ignores inessential differences such as comments and whitespace, and deterimines that two code segments are either identical or one can be obtained from the other by substituting other blocks of code. The latter allows the recognition of code that has been modified in various ways. You can see samples of clone detection runs on a variety of computer languages at the web site.
In your case, what you would be looking for are files in system A which are essentially clones (exact or near misses) of files in system B. As a general rule, if a file a is a variant of file b (e.g., with a few changes) the CloneDr will report it as a clone and show the exact differences.
At the scale of 20,000 files, I can see why you want a tool, and I can see why you want near-miss matches rather than exact matches.
Doesn't run under Linux, but I assume your problem is hard to enough to solve so that isn't what you are optimizing.
I use winmerge alot in windows and from what i can see some people enjoy meld in linux, so perhaps that could do the trick for you
http://meld.sourceforge.net/
Other examples i saw from a quick googling was Kompare,xxdiff.sourceforge.net, and kdiff3.sourceforge.net
(could only post 1 link so wrote the adresses to xxdiff and kdiff3 as text)
Beyond Compare is purchased software that is actually worth the money (I never thought I'd hear myself typing that!). It is GUI based but handles thousands of files very well. It will allow you to specify unimportant changes with regular expressions as well as whitespace (beginning, middle and end of line). The feature set is very extensive, check out a trial download.
I do not work for this company, I just use Beyond Compare every day at work and enjoy it every time!
I'm a TDDer and often have a need to refactor out common or similar code. Similar code is not always a result of copy and paste.
I'm not looking for tools to identify the regions or suspected duplications, there are a number of tools to do that. And if the code is exactly the same there is no big problem, Eclipse can almost always do that by itself.
I am looking for tools to visualize differences of sections of code that are radically different, but my human eye can see the structural similarities, and could possibly be made even more similar, so that the common code eventually could be factored out.
It would be very handy if there was a possibility to mark two regions and get Eclipse (or some other tool) to mark the differences. With this information it would be much simpler to iteratively move the regions closer until they are the same and then activate the Extract Method refactoring.
It can be done in Emacs of course, but I'd like to have this readily available from Eclipse. Any pointers?
Seems there was somewhat usable answers in this question, a question articulating the same need. But, again those answers focus on finding duplications, not vizualising it.
Two suggestions that works are KDiff3 and Diffuse. Both allow you to open up the same file twice or paste different sections in the panes. There seems to be no way to use them from Eclipse though.
I don not know of a way to mark regions and diff them in eclipse, but you can diff two files. In that way you might get what you are looking for by copying out the parts you want to diff in two paste files, at least 90%?
Select the two files you want to diff in the project tree and right-click -> select compare with -> each other.
cheers,
Jørgen
What source control products have a "diff" facility that ignores white space, braces, etc., in calculating the difference between checked-in versions? I seem to remember that Clearcase's diff did this but Visual SourceSafe (or at least the version I used) did not.
The reason I ask is probably pretty typical. Four perfectly reasonable developers on a team have four entirely different ways of formatting their code. Upon checking out the code last changed by someone else, each will immediately run some kind of program or editor macro to format things the way they like. They make actual code changes. They check-in their changes. They go on vacation. Two days later that program, which had been running fine for two years, blows up. The developer assigned to the bug does a diff between versions and finds 204 differences, only 3 of which are of any significance, because the diff algorithm is lame.
Yes, you can have coding standards. Most everyone finds them dreadful. A solution where everyone can have their cake and eat it too seems far more preferable.
=========
EDIT: Thanks to everyone for some great suggestions.
What I take away from this is:
(1) A source control system with plug-in type diffs is preferable.
(2) Find a diff with suitable options.
(3) Use a good source formatting program and settle on a check-in standard.
Sounds like a plan. Thanks again.
Git does have these options:
--ignore-space-at-eol
Ignore changes in whitespace at EOL.
-b, --ignore-space-change
Ignore changes in amount of whitespace. This ignores whitespace at line end, and considers all other sequences of one or more
whitespace characters to be equivalent.
-w, --ignore-all-space
Ignore whitespace when comparing lines. This ignores differences even if one line has whitespace where the other line has
none.
I am not sure if brace changes can be ignored using Git's diff.
If it is C/C++ code, you can define Astyle rules and then convert the source code's brace style to the one that you want, using Astyle. A git diff will then produce sane output.
Choose one (dreadful) coding standard, write it down in some official coding standards document, and get on with your life, messing with whitespace is not productive work.
And remember you are a professional developer, it's your job to get the project done, changing anything in the code because of a personal style preference hurts the project - it wont only make diff-ing more difficult, it can also introduce hard to find problems if your source formatter or compiler has bugs (and your fancy diff tool won't save you when two co-worker start fighting over casing).
And if someone just doesn't agree to work with the selected style just remind him (or her) that he is programming as a profession not as an hobby, see http://www.ericsink.com/entries/No_Great_Hackers.html
Maybe you should choose one format and run some indentation tool before checking in so that each person can check out, reformat to his/her own preferences, do the changes, reformat back to the official standard and then check in?
A couple of extra steps but they already use indentation tools when working. Maybe it can be a triggered check-in script?
Edit: this would perhaps also solve the brace problem.
(I haven't tried this solution myself, hence the "perhapes" and "maybes", but I have been in projects with the same problems, and it is a pain to try to go through diffs with hundreds of irrelevant changes that are not limited to whitespace, but includes the formatting itself.)
As explained in Is it possible for git-merge to ignore line-ending differences?, it is more a matter to associate the right diff tool to your favorite VCS, rather than to rely on the right VCS option (even if Git does have some options regarding whitespace, like the one mentioned in Alan's answer, it will always be not as complete as one would like).
DiffMerge is the more complete on those "ignore" options, as it can not only ignore spaces but also other "variations" based on the programming language used in a given file.
Subversion apparently supports this, either natively in the latest versions, or by using an alternate diff like Gnu Diff.
Beyond Compare does this (and much much more) and you can integrate it either in Subversion or Sourcesafe as an external diff tool.
What patterns contribute or detract from the usability of a CLI interface?
As an example consider the CLI for ClearCase. The CLI is very comprehensive (+1) but it is has several glaring opportunities. Recently, I wanted to force the files to lower case into ClearCase using clearfsimport. Unfortunately I wound up on the documentation for its cousin clearimport. It may seem slight but it cost me more hours than I care to admit. The variation in the middle got me.
Why provide such nearly identical functionality with such nearly identical names? There are many better options in my opinion
clearimport -fs
fsclearimport
clear_fs_import
clearimport_fs
Anything would be better than what they went with. The code I am working on IS a CLI and this experience made me look at my own choices. I think I have all the basics covered (standard help, long-form vs short-form, short meaningful names, providing examples, eliminate ambiguity, accurately handling spaces within quotes, etc).
There is some literature on this subject.
Perhaps a bad CLI is no different than a bad API. CLI are type of an API in some sense. The goals are naturally common:: flexibility, readability, and completeness. Several factors differentiate CLI from a typical API. One is that CLI needs to support scriptability (participate many times perhaps in a series of pipes). Another is that autocompletion and namespaces don't exist in the same way. You don't always have a nice colorful GUI doing stuff for you. CLIs must document themselves externally to customer directly. And finally the audience of a CLI is vastly different than the standard API. I appreciate any insight you may have.
I like the subcommand pattern, which I'm most familiar with as its implemented in the command-line Subversion client.
svn [subcommand] [options] [files]
Without the subcommands, subversion would have waaaaay too many different options for me to remember them effectively, and the help system would be a pain to slog through.
But, if I don't remember how any particular subcommand works, I can just type:
svn help [subcommand]
...and it shows me only the relevant portions of the help documentation.
As noted above, this format:
[master verb] [subverb] [optionally, noun] [options]
is good in terms of remembering what commands are available. cvs, svn, Perforce, git, all adhere to this. It improves discoverability of commands, a major CLI problem. One wrinkle that occurs here is options for the master-verb vs. options for the subverb. I.e.,
cvs -d dir command bar
is different than
cvs command -d dir bar
This was a confusing situation in cvs, which svn "fixed" by allowing options specified in any order. Your own solution may vary; if you have a very good reason to pass options to the master verb, okay, just be aware of the overhead.
Looking to API usability is a good idea too, but beware that there is no real typing in CLI commands, and there is a lot of richness in what CLI commands 'return', since you've got both a return code and an output to work with. In the unixy/streams world, the output is usually much more important than the return code. Getting the format of your output right is crucial. Also, while tempting, I've found that sending different things to stdout vs. stderr is not always useful; it confuses novice and even intermediate users (because they both get dumped to console in most cases), and rarely is useful advanced users. So unless there's a real need for it I avoid it; it's too easy for (e.g.) someone to get very confused about why the output of a command was '' in an error condition just because the programmer nicely dumped the errors to stderr.
Another issue in design is the "what next" problem. In a GUI, the next steps for the user are spelled out by the available buttons, menus, etc. In a CLI, the user can literally type any command next, and pipe any command to any other. (Or try, at least.) I design my commands to give hints (either in the help or the output) as to what potential next steps might be in a typical workflow.
Another good pattern is allowing user customization of the output. While it is possible for users to use cut, sort, etc. to tailor the output, being able to specify a format string magnifies the utility of a command. The example I cite here is top, which lets you tell it which columns you want.
We have a Perl-based web application whose data originates from a vast repository of flat text files. Those flat files are placed into a directory on our system, we extensively parse them inserting bits of information into a MySQL database, and subsequently move those files to their archived repository and permanent home (/www/website/archive/*.txt). Now, we don't parse every single bit of data from these flat files and some of the more obscure data items don't get databased.
The requirement currently out there is for users to be able to perform a full-text search of the entire flat-file repository from a Perl-generated webpage and bring back a list of hits that they could then click on and open the text files for review.
What is the most elegant, efficient and non CPU intensive method to enable this searching capability?
I'd recommend, in this order:
Suck the whole of every document into a MySQL table and use MySQL's full-text search and indexing features. I've never done it but MySQL has always been able to handle more than I can throw at it.
Swish-E still exists and is designed for building full-text indexes and allowing ranked results. I've been running it for a few years and it works pretty well.
You can use File::Find in your Perl code to chew through the repository like grep -r, but it will suck compared to one of the indexed options above. However, it will work, and might even surprise you :)
I recommend using a dedicated search engine to do your indexing and searches.
I haven't looked at search engines recently, but I used ht://dig a few years ago, and was happy with the results.
Update: It looks like ht://dig is a zombie project at this point. You may want to use another engine. Hyper Estraier, besides being unpronounceable looks promising.
I second the recommendation to add an indexing machine. Consider Namazu from http://namazu.org. When I needed it, it looked easier to get started than Swish-e, ht://dig and I'm quite content with it.
If you don't want the overhead of an indexer, look at forking a grep/egrep. Once the text volume goes to multi-megabytes, this will be significantly faster than scanning solely in Perl, e.g:
open GREP, "find $dirlist -name '$filepattern' | xargs egrep '$textpattern' |"
or die "grep: $!";
while (<GREP>) {
...
}
Bonus: use file name conventions like dates/tags/etc to reduce the set of files to grep.
The clunky find ... | xargs ... is meant to work around the shell size limits on wildcard expansion which you might hit with big archives.
I see someone recommended Lucene/Plucene. Check out KinoSearch, I have been using this for a year or more on a Catalyst-based project, very happy with the performance and ease of programming/maintenance.
The caveat on that page should be considered for your circumstance, but I can attest to the module's stability.