Rebuild text sequence from base and diff - diff

I have two text sequences in memory and run a diff on them. Later I want to take the base sequence and the diff and reconstitute the second sequence. A Python solution would be perfect but a command line utility would be OK too. The solution must work on Windows, Linux and MacOS. Needs to be efficient because it will be carried out millions of times. My Internet searches have not turned up a good answer.

Related

Execute Commands in the Linux Commandline [Lazarus / Free Pascal]

I have a problem. I want to execute some commands in the Commandline of linux. I tested TProcess (So i am using Lazarus) but now when i am starting the programm, there is nothing, wich the Program do.
Here is my Code:
uses [...], unix, process;
[...]
var LE_Path: TLabeledEdit;
[...]
Pro1:=TProcess.Create(nil);
Pro1.CommandLine:=(('sudo open'+LE_Path.Text));
Pro1.Options := Pro1.Options; //Here i used Options before
Pro1.Execute;
With this Program, i want to open Files with sudo (The Programm is running on the User Interface)
->Sorry for my Bad English; Sorry for fails in the Question: I am using StackOverflow the first time.
I guess the solution was a missing space char?
Change
Pro1.CommandLine:=(('sudo open'+LE_Path.Text));
to
Pro1.CommandLine:=(('sudo open '+LE_Path.Text));
# ----------------------------^--- added this space char.
But if you're a beginner programmer, my other comments are still worth considering:
trying to use sudo in your first bit of code may be adding a whole extra set of problems. SO... Get something easier to work first, maybe
/bin/ls -l /path/to/some/dir/that/has/only/a/few/files.
find out how to print a statement that will be executed. This is the most basic form of debugging and any language should support that.
Your english communicated your problem well enough, and by including sample code and reasonable (not perfect) problem description "we" were able to help you. In general, a good question contains the fewest number of steps to re-create the problem. OR, if you're trying to manipulate data,
a. small sample input,
b. sample output from that same input
c. your "best" code you have tried
d. your current output
e. your thoughts about why it is not working
AND comments to indicate generally other things you have tried.

Is there a tool to automatically extract common subroutines from files?

I have two 1000+ line programs in Perl, each with about 20 subroutines in the main file. One was forked from the other some time ago and I want to factor out the common parts (before porting features backward.) Is there a diff tool that will treat the subroutines (and preceding comments) as units, and extract the common units into a new file? (if one line of a subroutine is different, the unit doesn’t match.)
My SCM is currently Subversion if that helps. A Perl script that processes the code would be cool.
You can try to use the PPI module; to my knowledge there's no tool for refactoring as the one you mentioned.
If you had 500,000 lines of code it might be useful to have or write such a tool. For 1000 lines, this shouldn't be too hard with a simple visual diff tool, like BeyondCompare ($) or WinMerge (free).
You're trying to compare two different versions of the files?
I use VIM which comes with a built in diff program vimdiff and a fully gui one called gvimdiff. It'll fold common lines and just show you the lines that differ and where.
With gvim, you can open up three splits in one window (the two versions and a blank) and then copy over the various lines you want. If you're using Subversion, you can use the built in merge tool (if you're talking about different versions of the same file). The Subversion merge is pretty good and will probably help you with the merge issues.

Graphically view long line diffs that mostly match

I have two data files. They're both 5000 lines long, and each row is ~10000 characters wide. They are nearly identical, except a few characters on some lines are different. Is there any tool that will jump both vertically and horizontally to the disagreements for both sides, so I can see what's different?
I'm using OSX, but I could use Linux or windows if it was absolutely necessary. I've tried:
FileMerge
KDiff3
DiffMerge
command line diff
Now trying to install meld
vimdiff, or even better its graphical twin gvimdiff, would work if there is at most two edits per line. It highlights the part between the first and last changed character on a line. It also collapses the parts of the files which are identical.
These are part of Vim; I have no idea whether the MacOS versions contain (g)vimdiff.
Note that in my experience Vim sometimes becomes slow with long lines, so that might hurt you for this specific use case.
Araxis Merge may do what you're after (it does character-diffs within lines), and there's an OSX version. It's not cheap, and I've never tried it on 10,000 character wide lines, though.

Tool to compare/diff HTML in bulk

I have a lot of HTML files (10,000's and GBs worth) scraped from a server and I want to check to make sure the server produces the same results after some modifications but ignore kinds of differences that don't matter, e.g. whitespace, missing newlines, timestamps, small changes in some kinds of number, etc.
Does anyone know of a tool for doing this? I'd really rather not do more filtering than I have to.
(Oh and it needs to run under linux)
You might consider using a clone detector such as our CloneDR. This tool parses large sets of computer program (HTML is special case) files, builds abstract syntax trees representing the essential structure of each files, and compares programs for similarity.
Because it is comparing essential program structure, it ignores inessential differences such as comments and whitespace, and deterimines that two code segments are either identical or one can be obtained from the other by substituting other blocks of code. The latter allows the recognition of code that has been modified in various ways. You can see samples of clone detection runs on a variety of computer languages at the web site.
In your case, what you would be looking for are files in system A which are essentially clones (exact or near misses) of files in system B. As a general rule, if a file a is a variant of file b (e.g., with a few changes) the CloneDr will report it as a clone and show the exact differences.
At the scale of 20,000 files, I can see why you want a tool, and I can see why you want near-miss matches rather than exact matches.
Doesn't run under Linux, but I assume your problem is hard to enough to solve so that isn't what you are optimizing.
I use winmerge alot in windows and from what i can see some people enjoy meld in linux, so perhaps that could do the trick for you
http://meld.sourceforge.net/
Other examples i saw from a quick googling was Kompare,xxdiff.sourceforge.net, and kdiff3.sourceforge.net
(could only post 1 link so wrote the adresses to xxdiff and kdiff3 as text)
Beyond Compare is purchased software that is actually worth the money (I never thought I'd hear myself typing that!). It is GUI based but handles thousands of files very well. It will allow you to specify unimportant changes with regular expressions as well as whitespace (beginning, middle and end of line). The feature set is very extensive, check out a trial download.
I do not work for this company, I just use Beyond Compare every day at work and enjoy it every time!

How can I control an interactive Unix application programmatically through Perl?

I have inherited a 20-year-old interactive command-line unix application that is no longer supported by its vendor. We need to automate some tasks in this application.
The most troublesome of these is creating thousands of new records with slightly different parameters (e.g. different identifiers, different names). The records have to be created in sequence, one at a time, which would take many months (and therefore dollars) to do manually. In most cases, creating a record has a very predictable pattern of keying in commands, reading responses, keying in further commands, etc. However, some record creation operations will result in error conditions ('record with this identifier already exists') that require a different set of commands to be exit gracefully.
I can see a few different ways to do this:
Named pipes. Write a Perl script that runs the target application with STDIN and STDOUT set to named pipes then sends the target application the sequence of commands to create a record with the required parameters, and then instructs the target application to exit and shut down. We then run the script as many times as required with different parameters.
Application. Find another Unix tool that can be used to script interactive programs. The only ones I have been able to find though are expect, but this does not seem top be maintained; and chat, which I recall from ages ago, and which seems to do more-or-less what I want, but appears to be only for controlling modems.
One more potential complication: I think the target application was written for a VT100 terminal and it uses some sort of escape sequences to do things like provide highlighting.
My question is what approach should I take? One of these, or something completely different? I quite like the idea of using named pipes and then having a Perl script that opens the FIFOs and reads and writes as required, as it provides a lot of flexibility, but from what I have read it seems like there's a lot of potential problems if I go down this path.
Thanks in advance.
I'd definitely stick to Perl for the extra flexibility, as chaos suggested. Are you aware of the Expect perl module? It's a lot nicer than the named pipe approach.
Note also with named pipes, you can't force the output coming back from your legacy application to be unbuffered, which could be annoying. I think Expect.pm uses pseudo-ttys to get around this problem, but I'm not sure. See the discussion in perlipc in the section "Bidirectional Communication with Another Process" for more details.
expect is a lot more solid than you're probably giving it credit for, but if I were you I'd still go with the Perl option, wanting to have a full and familiar programming language for managing the process and having confidence that whatever weird issues arise, there will be ways of addressing them.
Expect, either with the Tcl or Perl implementations, would be my first attempt. If you are seeing odd sequences in the output because it's doing odd terminal things, just filter those from the output before you do your matching.
With named pipes, you're going to end up reinventing Expect anyway.