diff: How to use another diff algorithm - diff

Not git diff, just plain diff or an equivalent tool: How to diff files using another diff algorithm, like histogram? I'm tired of seeing unrelated curly braces being matched up.
I tried diff --minimal, but it wasn't noticeably better for the differences I'm looking at.
This turns up nothing:
diff --help | grep -i histogram
man 1 diff | grep -i histogram
Is there a better diff tool or method for diffing files?

Related

How do I "inverse" a diff file?

Say I have a diff file looking basically like the following.
+line a
-line b
Is it possible to do one (or both) of the following:
Inverse this file (so I'd get)
-line a
+line b
Pass some argument to patch so the end result the same as applying
the inversed diff file described above
You can leave the diff as is and apply in reverse
git apply --reverse backwards-diff
Here is what you should do (assuming newFile.txt is the file you want to apply the reversed diff file on and diffFile.txt is the diff file):
patch -R newFile.txt diffFile.txt -o oldFile.txt
To rewrite a reversed / inverted diff file, use interdiff from diffutils:
interdiff -q my-diff-file /dev/null

pre-pend word to the last word of a line

I want to pre-pend a directory name to the last word in a line. The line has the following format:
100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0^IoneFile$
where ^I denotes a tab, and $ denotes the end-of-line. This line is generated by git ls-files -s.
I want a sed command to prepend one/ to the filename in this line, like so:
`100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0^Ione/oneFile$
Some of the lines that I've tried, and their corresponding outputs:
Match the longest string of characters that are not \t followed by $; append one/:
$ git ls-files -s | sed 's|[^\t]*$|one/&|'
one/100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0 oneFile
Match the longest string of characters that are not \t or ' ' followed by $; pre-pend one/:
$ git ls-files -s | sed 's|[^\t ]*$|one/&|'
100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e one/0 oneFile
Match the longest string of characters that are not horizontal whitespace, prepend 'one/':
$ git ls-files -s | sed 's|[^[[:blank:]]]*$|one/&|'
100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0 oneFilone/e
I've basically tried a whole bunch of things matching for:
[^\t ]*$|one/&
[^[[:space:]]]*$|one/&
and the ones listed above. The closest I can get is to have oneFilone/e, which was [^[[:blank:]]]*$|one/&|', or to pre-pend to the 0, but I can't seem to quite get what I want.
EDIT
Because a few people have commented / posted answers, none of which work for me, I figured I'd add: I am using Mac OS X 10.7.3. The version of sed I'm not completely sure of (if anybody knows a way to get it feel free to add a comment to that effect) - the man sed page says it's a BSD sed. I'm not sure how different that is to GNU sed, if any.
I'm also using zsh, with oh-my-zsh running (prettymuch unmodified). I have turned on extended_glob (setopt extended_glob).
I've commented with my results for the answers people have given; I assume they are run on a Linux distribution? I don't have access to a non-OS X system tonight, but I will re-run any answers tomorrow; maybe it's just my [shell|OS|bad karma] that isn't letting them work for me.
EDIT Again:
So I've tested on a Ubuntu system, and the (1) above does work. I'd love a working version for my Mac, though.
Final EDIT:
Thanks to all who answered with working equivalent commands. It turns out that my first one does work, but not with BSD sed. I do, however, have gsed available (thanks for pointing that out!) which makes these all magically work.
This is simpler, but seems to work:
echo -e '100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0 oneFile' | sed -e 's/\([a-zA-Z]\+\)$/one\/\0/g'
100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0 one/oneFile
It should work with a tab just before oneFile also.
The following works for me:
git ls-files -s | sed -e 's|\t\(.\+\)$|\tone/\1|g'
100644 345242cb0c4e9bb01a6fef9947f4342ff2f68553 0 one/ExView/resource.h
would have been
100644 345242cb0c4e9bb01a6fef9947f4342ff2f68553 0 ExView/resource.h
I've also had trouble when I got strange new line problems. I doubt this is your problem, but in the past, git ls-files -s | tr -d '\r' | ... has been helpful for me.
unable to test right now, but you're using too many brackets with the character classes.
[^[[:space:]]]*$|one/&
should be
[^[:space:]]*$|one/&
With the extra brackets, you get just the characters '[',':','s','p','a','c','e',']' -- explaining why the dir is inserted before the last 'e'
This might work for you (or am I missing something?):
echo -e "100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0\toneFile" | sed 's/\t/&one\//'
100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0 one/oneFile
I wanted to do exactly this but did not have gsed around...
The only issue is that OS X sed does not recognise \t as a TAB character, as explained here. You have to use an actual TAB character. Use Option 1 in the original post, but instead of typing \t, press Ctrl+v followed by the Tab key. This is hard to copy and paste here, so you will have to do it yourself. :-)
See also this question.

How do you get a list of files included in a diff?

I have a patch file containing the output from git diff. I want to get a summary of all the files that, according to the patch file, have been added or modified. What command can I use to achieve this?
patchutils includes a lsdiff utility.
grep '+++' mydiff.patch seems to do the trick.
I can also use git diff --names-only which is probably the better approach.
grep '+++' mydiff.patch|perl -pe 's/\+\+\+ //g'
Details:
git diff produces output in the format
+++ b/file
So if you're using grep as Nathan suggested
grep '+++' mydiff.patch
You'll have the list of affected files, prepended by '+++ ' (3 plus signs and a space).
I often need to further process files and find it convenient to have one filename per line without anything else. This can be achieved with the following command, where perl/regex removes these plus signs and the space.
grep '+++' mydiff.patch|perl -pe 's/\+\+\+ //g'
For patch files generated with diff -Naur, the mydiff.patch file contains entries with filename and date ( is indicating the tabulator whitespace character)
+++ b/file<tab>2013-07-03 13:58:45.000000000 +0200
To extract the filenames for this, use
grep '+++' mydiff.patch|perl -pe 's/\+\+\+ (.*)\t.*/\1/g'
A decent way to do this is to use the --stat flag (or the --summary flag, if you need only new / deleted / renamed files for some reason).
Example:
git apply --stat peer.diff | awk '{ print $1 }' | sed '$d'
1-js/03-code-quality/index.md
CONTR.md
LICENSE.md
README.md
chat-app.readme.md
When you parse patches generated by git format-patch or others containing additional information about number of lines edited, it's crucial to search for ^+++ (at the start of the line) rather than just +++.
For example:
grep '^+++' *.patch | sed -e 's#+++ [ab]/##'
will output paths without a/ or b/ at the begin.

Diff Ignoring GUIDS

When using Diff, how would one go about ignoring line differences that only diff on GUID's? Something along the lines of:
diff -I "^.*[a-zA-Z0-9]{8}\-[a-zA-Z0-9]{4}\-[a-zA-Z0-9]{5}\-[a-zA-Z0-9]{5}\-[a-zA-Z0-9]{12}.*$"
Where obviously the above doesn't work, but just to get an idea of what is needed.
diff -I '[0-9A-F\-]\{36\}' foo.txt bar.txt
Perhaps you could first pipe the input files through sed to remove anything matching a GUID, then perform the diff.
Can you pipe the output of diff to a grep -v and use your pattern?

Finding most commonly edited files in clearcase

We are currently planning a quality improvement exercise and i would like to target the most commonly edited files in our clearcase vobs. Since we have just been through a bug fixing phase the most commonly edited files should give a good indication of where the most bug prone code is, and therefore the most in need of quality improvment.
Does anyone know if there is a way of obtaining a top 100 list of most edited files? Preferably this would cover edits that are happening on multiple branches.
(The previous answer was for a simpler case: single branch)
Since "most projects dev has not all happened on the one branch so the version numbers don't necessarily mean most edited", a "way to get number of check-ins across all branches" would be:
search all versions created since the date of the last bug fixing phase,
sort them by file,
then by occurrence.
Something along the lines of:
C:\Prog\cc\test\test>ct find -all -type f -ver "created_since(16-Oct-2009)" -exec "cleartool descr -fmt """%En~%Sn\n""""""%CLEARCASE_XPN%"""" | grep -v "\\0" | awk -F ~ "{print $1}" | sort | uniq -c | sort /R | head -100
Or, for Unix syntax:
$ ct find -all -type f -ver 'created_since(16-Oct-2009)' -exec 'cleartool descr -fmt "%En~%Sn\n" "%CLEARCASE_XPN%"' | grep -v "/0" | awk -F ~ '{print $1}' | sort | uniq -c | sort -rn | head -100
replace the date by the one of the label marking the start of your bug-fixing phase
Again, note the double-quotes around the '%CLEARCASE_XPN%' to accommodate spaces within file names.
Here, '%CLEARCASE_XPN%' is used rather than '%CLEARCASE_PN%' because we need every versions.
grep -v "/0" is here to exclude version 0 (/main/0, /main/myBranch/0, ...)
awk -F ~ "{print $1}" is used to only print the first part of each line:
C:\Prog\cc\test\test\a.txt~\main\mybranch\2 becomes C:\Prog\cc\test\test\a.txt
From there, the counting and sorting can begin:
sort to make sure every identical line is grouped
uniq -c to remove duplicate lines and precede each remaining line with a count of said duplicates
sort -rn (or sort /R for Windows) for having the most edited files at the top
head -100 for keeping only the 100 most edited files.
Again, GnuWin32 will come in handy for the Windows version of the one-liner.
(See answer for more complicated case: multiple branches)
First, use a dynamic view: easier and quicker to update its content and fiddle with its config spec rules.
If your bug-fixing has been made in a branch, starting from a given label, set-up a dynamic view with the following config spec as:
element * .../MY_BRANCH/LATEST
element * MY_STARTING_LABEL
element * /main/LATEST
Then you find all files, with their current version number (closely related to the number of edits)
ct find . -type f -exec "cleartool desc -fmt """%Ln\t\t%En\n""" """%CLEARCASE_PN%""""|sort /R|head -100
This is the Windows syntax (nothe the triple "double-quotes" around %CLEARCASE_PN% in order to accommodate spaces within the file names.
the 'head' command comes from the GnuWin32 library.
The most edited version are at the top of the list.
A Unix version would be:
$ ct find . -type f -exec 'cleartool desc -fmt "%Ln\t\t%En\n" "$CLEARCASE_PN"' | sort -rn | head -100
The most edited version would be at the top.
Do not forget that for metrics, the raw numbers are not enough, trends are important too.