Failing to diff some subdirectories' files - diff

I am using the diff command to compare directory and subdirectories' files like so:
diff -bBE ./dir/* ../parent/dir/* >> diff.txt
But I am getting that :
diff: extra operand `./dir/somefile'
The two directory and subdirectories' files are structured exactly the same way but their content is different. I don't know what I am missing.

* is expanded by the shell, becoming (potentially) multiple arguments to diff. Just get rid of the *s; diff already knows how to handle directories.

Related

Display the numbers of lines that were changed between two files

I have two files: A and its modified version B. Is there some convenient way to display the numbers of lines that were added or changed in B (so basically the + lines in diff output)?
The solution would be even better if it scaled to multiple files (I intend to use it on the output of git diff), displaying the result in a format like filename:line.
Edit: my current idea is to use git difftool with some shell script and diff with --unchanged-group-format='' --old-line-format='' --new-line-format='%dn' (or something like that); these options are not listed in the diff manual on my system for some reason.

What's the best way to perform a differential between a list of directories?

I am interested in looking at a list of directories and comparing the previous list with a current list of directories and setting up a script to do so. Maybe in perl or as a shell script.
Should I use something like diff? Programatically, what would be an ideal way to do this? For example let say I output the diff to an output file, if there is no diff then exit, if there is results, I want to see it.
Let's for example I have the following directories today:
/foo/bar/staging/abc
/foo/bar/staging/def
/foo/bar/staging/a1b2c3
Next day would look like this where a directory is either added, or renamed:
/foo/bar/staging/abc
/foo/bar/staging/def
/foo/bar/staging/ghi
/foo/bar/staging/a1b2c4
There might be better ways, but the way I typically do something like this is to run a find command in each directory root, and pipe the output to separate files. You can then diff the files using the diff tool of your choice. If you want to filter out certain directories or files, you can throw in some grep or grep -v commands in the pipeline, or you can experiment with options on the find command.
The other main option is to find a diff tool that offers directory/folder comparisons. Most of the goods ones support this, but I like the command line method, because you get more control over what you're diffing.
cd /my/directory/one
find . -print | sort > /temp/one.txt
cd /my/directory/two
find . -print | sort > /temp/two.txt
diff /temp/one.txt /temp/two.txt
also check the inotifywait command. it allows you to monitor files in RT.
You might also consider the find command using the -newer switch.
The usage is:
find . -newer timefile.txt -print
The -newer switch makes find return a list of files that are created or updated after the specified file's modification time. In the example above, any file created or updated after timefile.txt would be returned. You'd have to create a timefile.txt file, most likely once per day. Some versions of find have variations of newer that compare against other time stamps for a file (last modified, last accessed, last created, etc.)
This technique would not report a file that was deleted, however. A daily diff of the file listings could report that.

Difference in the paths in .gitignore file?

I've been using git but still having confusion about the .gitignore file paths.
So, what is the difference between the following two paths in .gitignore file?
tmp/*
public/documents/**/*
I can understand that tmp/* will ignore all the files and folders inside it. Am I right?
But what does that second line path mean?
This depends on the behavior of your shell. Git doesn't do any work to determine how to expand these. In general, * matches any single file or folder:
/a/*/z
matches /a/b/z
matches /a/c/z
doesn't match /a/b/c/z
** matches any string of folders:
/a/**/z
matches /a/b/z
matches /a/b/c/z
matches /a/b/c/d/e/f/g/h/i/z
doesn't match /a/b/c/z/d.pr0n
Combine ** with * to match files in an entire folder tree:
/a/**/z/*.pr0n
matches /a/b/c/z/d.pr0n
matches /a/b/z/foo.pr0n
doesn't match /a/b/z/bar.txt
Update (08-Mar-2016)
Today, I am unable to find a machine where ** does not work as claimed. That includes OSX-10.11.3 (El Capitan) and Ubuntu-14.04.1 (Trusty). Possibly git-ignore as been updated, or possibly recent fnmatch handles ** as people expect. So the accepted answer now seems to be correct in practice.
Original post
The ** has no special meaning in git. It is a feature of bash >= 4.0, via
shopt -s globstar
But git does not use bash. To see what git actually does, you can experiment with git add -nv and files in several levels of sub-directories.
For the OP, I've tried every combination I can think of for the .gitignore file, and nothing works any better than this:
public/documents/
The following does not do what everyone seems to think:
public/documents/**/*.obj
I cannot get that to work no matter what I try, but at least that is consistent with the git docs. I suspect that when people add that to .gitignore, it works by accident, only because their .obj files are precisely one sub-directory deep. They probably copied the double-asterisk from a bash script. But perhaps there are systems where fnmatch(3) can handle the double-asterisk as bash can.
If you're using a shell such as Bash 4, then ** is essentially a recursive version of *, which will match any number of subdirectories.
This makes more sense if you add a file extension to your examples. To match log files immediately inside tmp, you would type:
/tmp/*.log
To match log files anywhere in any subdirectory of tmp, you would type:
/tmp/**/*.log
But testing with git version 1.6.0.4 and bash version 3.2.17(1)-release, it appears that git does not support ** globs at all. The most recent man page for gitignore doesn't mention **, either, so this is either (1) very new, (2) unsupported, or (3) somehow dependent on your system's implementation of globbing.
Also, there's something subtle going on in your examples. This expression:
tmp/*
...actually means "ignore any file inside a tmp directory, anywhere in the source tree, but don't ignore the tmp directories themselves". Under normal circumstances, you'd probably just write:
/tmp
...which would ignore a single top-level tmp directory. If you do need to keep the tmp directories around, while ignoring their contents, you should place an empty .gitignore file in each tmp directory to make sure that git actually creates the directory.
Note that the '**', when combined with a sub-directory (**/bar), must have changed from its default behavior, since the release note for git1.8.2 now mentions:
The patterns in .gitignore and .gitattributes files can have **/, as a pattern that matches 0 or more levels of subdirectory.
E.g. "foo/**/bar" matches "bar" in "foo" itself or in a subdirectory of "foo".
See commit 4c251e5cb5c245ee3bb98c7cedbe944df93e45f4:
"foo/**/bar" matches "foo/x/bar", "foo/x/y/bar"... but not "foo/bar".
We make a special case, when foo/**/ is detected (and "foo/" part is already matched), try matching "bar" with the rest of the string.
"Match one or more directories" semantics can be easily achieved using "foo/*/**/bar".
This also makes "**/foo" match "foo" in addition to "x/foo", "x/y/foo"..
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds#gmail.com>
Simon Buchan also commented:
current docs (.gitignore man page) are pretty clear that no subdirectory is needed, x/** matches all files under (possibly empty) x
The .gitignore man page does mention:
A trailing "/**" matches everything inside. For example, "abc/**" matches all files inside directory "abc", relative to the location of the .gitignore file, with infinite depth.
A slash followed by two consecutive asterisks then a slash matches zero or more directories. For example, "a/**/b" matches "a/b", "a/x/b", "a/x/y/b" and so on.
When ** isn't supported, the "/" is essentially a terminating character for the wildcard, so when you have something like:
public/documents/**/*
it is essentially looking for two wildcard items in between the slashes and does not pick up the slashes themselves. Consequently, this would be the same as:
public/documents/*/*
It doesn't work for me but you could create a new .gitignore in that subdirectory:
tmp/**/*.log
can be replaced by a .gitignore in tmp:
*.log

How do you use the diff command against two source trees

I tried running 'diff' against two source directories get a patch file with a 'diff' between the two directories.
diff -rupN flyingsaucer-R8pre2_b/ flyingsaucer-R8pre2/ > a.patch
The command above does not seem to work, it generates a diff of everything and I get a 13 MB file, when in reality, it should be a couple of changes.
Should work with any recent version of gnu diff (tested here with gnu diff 2.8.1.)
You might want to add -b (and perhaps -B) to ignore difference in white space which perhaps generate large patch files unnecessarily.
I don't see any reason why it wouldn't work. Try adding "wb" to the argument list to ignore whitespace changes. Are you sure you got the trailing slashes the same on both sides?

Using the output of diff to create the patch

I have something like this
src/sim/simulate.cc
41d40
< #include "mem/mem-interface.h"
90,91d88
< dram_print_stats_common(curTick/500);
<
src/mem/physical.hh
52d51
< public:
55,56d53
< public:
<
58a56,57
> public:
>
61,62c60,61
< virtual bool recvTiming(PacketPtr pkt); //baoyg
<
---
I believe this was created using the diff command in a source tree. What I want is to create the patch using that output, and to apply the same changes to my source tree.
I believe that diff -u oldfile newfile > a.patch is used to create patch files, although some other switched may be thrown in as well (-N?).
Edit: OK, 4 years later and finally going to explain what the switches mean:
-u creates a Unified diff. Unified diffs are the kind of diffs that the patch program expects to get as input. You can also specify a number after the u (min 3, default 3) to increase the number of lines output. This is in case 3 lines isn't unique enough to pinpoint just one spot in the program.
-N treats absent files as being empty, which means it will produce a lot of additional content if one of the files is empty (or see next point).
Also, newfile and oldfile can both be directories instead of single files. You'll likely want the -r argument for this to recurse any subdirectories.
If you want to get the same patch output as SVN or git diff, given two different files or folders:
diff -Naur file1.cpp file2.cpp
What you have there is a non-unified diff. patch can read it, but will be unable to make context matches and is more likely to make mistakes.
That is a (partial) patch file, though it would have been better if they provided you with a unified diff output.
The main issue with that patch is that it doesn't mention which files are being modified, and since there is no context provided, the files must be exact, patch will be unable to allow for minor changes in the file.
Copy the diff in the original post to a patch file named test.patch then run
patch <original file> test.patch
#Sparr and #Arafangion point out that this works best if you have the exact original file used to create the original diff.