How do you get a list of files included in a diff? - diff

I have a patch file containing the output from git diff. I want to get a summary of all the files that, according to the patch file, have been added or modified. What command can I use to achieve this?

patchutils includes a lsdiff utility.

grep '+++' mydiff.patch seems to do the trick.
I can also use git diff --names-only which is probably the better approach.

grep '+++' mydiff.patch|perl -pe 's/\+\+\+ //g'
Details:
git diff produces output in the format
+++ b/file
So if you're using grep as Nathan suggested
grep '+++' mydiff.patch
You'll have the list of affected files, prepended by '+++ ' (3 plus signs and a space).
I often need to further process files and find it convenient to have one filename per line without anything else. This can be achieved with the following command, where perl/regex removes these plus signs and the space.
grep '+++' mydiff.patch|perl -pe 's/\+\+\+ //g'
For patch files generated with diff -Naur, the mydiff.patch file contains entries with filename and date ( is indicating the tabulator whitespace character)
+++ b/file<tab>2013-07-03 13:58:45.000000000 +0200
To extract the filenames for this, use
grep '+++' mydiff.patch|perl -pe 's/\+\+\+ (.*)\t.*/\1/g'

A decent way to do this is to use the --stat flag (or the --summary flag, if you need only new / deleted / renamed files for some reason).
Example:
git apply --stat peer.diff | awk '{ print $1 }' | sed '$d'
1-js/03-code-quality/index.md
CONTR.md
LICENSE.md
README.md
chat-app.readme.md

When you parse patches generated by git format-patch or others containing additional information about number of lines edited, it's crucial to search for ^+++ (at the start of the line) rather than just +++.
For example:
grep '^+++' *.patch | sed -e 's#+++ [ab]/##'
will output paths without a/ or b/ at the begin.

Related

Extracting the contents between two different strings using bash or perl

I have tried to scan through the other posts in stack overflow for this, but couldn't get my code work, hence I am posting a new question.
Below is the content of file temp.
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/<env:Body><dp:response xmlns:dp="http://www.datapower.com/schemas/management"><dp:timestamp>2015-01-
22T13:38:04Z</dp:timestamp><dp:file name="temporary://test.txt">XJzLXJlc3VsdHMtYWN0aW9uX18i</dp:file><dp:file name="temporary://test1.txt">lc3VsdHMtYWN0aW9uX18i</dp:file></dp:response></env:Body></env:Envelope>
This file contains the base64 encoded contents of two files names test.txt and test1.txt. I want to extract the base64 encoded content of each file to seperate files test.txt and text1.txt respectively.
To achieve this, I have to remove the xml tags around the base64 contents. I am trying below commands to achieve this. However, it is not working as expected.
sed -n '/test.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test.txt">##g'|perl -p -e 's#</dp:file>##g' > test.txt
sed -n '/test1.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test1.txt">##g'|perl -p -e 's#</dp:file></dp:response></env:Body></env:Envelope>##g' > test1.txt
Below command:
sed -n '/test.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test.txt">##g'|perl -p -e 's#</dp:file>##g'
produces output:
XJzLXJlc3VsdHMtYWN0aW9uX18i
<dp:file name="temporary://test1.txt">lc3VsdHMtYWN0aW9uX18i</dp:response> </env:Body></env:Envelope>`
Howeveer, in the output I am expecting only first line XJzLXJlc3VsdHMtYWN0aW9uX18i. Where I am commiting mistake?
When i run below command, I am getting expected output:
sed -n '/test1.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test1.txt">##g'|perl -p -e 's#</dp:file></dp:response></env:Body></env:Envelope>##g'
It produces below string
lc3VsdHMtYWN0aW9uX18i
I can then easily route this to test1.txt file.
UPDATE
I have edited the question by updating the source file content. The source file doesn't contain any newline character. The current solution will not work in that case, I have tried it and failed. wc -l temp must output to 1.
OS: solaris 10
Shell: bash
sed -n 's_<dp:file name="\([^"]*\)">\([^<]*\).*_\1 -> \2_p' temp
I add \1 -> to show link from file name to content but for content only, just remove this part
posix version so on GNU sed use --posix
assuming that base64 encoded contents is on the same line as the tag around (and not spread on several lines, that need some modification in this case)
Thanks to JID for full explaination below
How it works
sed -n
The -n means no printing so unless explicitly told to print, then there will be no output from sed
's_
This is to substitute the following regex using _ to separate regex from the replacement.
<dp:file name=
Regular text
"\([^"]*\)"
The brackets are a capture group and must be escaped unless the -r option is used( -r is not available on posix). Everything inside the brackets is captured. [^"]* means 0 or more occurrences of any character that is not a quote. So really this just captures anything between the two quotes.
>\([^<]*\)<
Again uses the capture group this time to capture everything between the > and <
.*
Everything else on the line
_\1 -> \2
This is the replacement, so replace everything in the regex before with the first capture group then a -> and then the second capture group.
_p
Means print the line
Resources
http://unixhelp.ed.ac.uk/CGI/man-cgi?sed
http://www.grymoire.com/Unix/Sed.html
/usr/xpg4/bin/sed works well here.
/usr/bin/sed is not working as expected in case if the file contains just 1 line.
below command works for a file containing only single line.
/usr/xpg4/bin/sed -n 's_<env:Envelope\(.*\)<dp:file name="temporary://BackUpDir/backupmanifest.xml">\([^>]*\)</dp:file>\(.*\)_\2_p' securebackup.xml 2>/dev/null
Without 2>/dev/null this sed command outputs the warning sed: Missing newline at end of file.
This because of the below reason:
Solaris default sed ignores the last line not to break existing scripts because a line was required to be terminated by a new line in the original Unix implementation.
GNU sed has a more relaxed behavior and the POSIX implementation accept the fact but outputs a warning.

In-place replacement

I have a CSV. I want to edit the 35th field of the CSV and write the change back to the 35th field. This is what I am doing on bash:
awk -F "," '{print $35}' test.csv | sed -i 's/^0/+91/g'
so, I am pulling the 35th entry using awk and then replacing the "0" in the starting position in the string with "+91". This one works perfet and I get desired output on the console.
Now I want this new entry to get written in the file. I am thinking of sed's "in -place" replacement feature but this fetuare needs and input file. In above command, I cannot provide input file because my primary command is awk and sed is taking the input from awk.
Thanks.
You should choose one of the two tools. As for sed, it can be done as follows:
sed -ri 's/^(([^,]*,){34})0([^,]*)/\1+91\3/' test.csv
Not sure about awk, but #shellter's comment might help with that.
The in-place feature of sed is misnamed, as it does not edit the file in place. Instead, it creates a new file with the same name. eg:
$ echo foo > foo
$ ln -f foo bar
$ ls -i foo bar # These are the same file
797325 bar 797325 foo
$ echo new-text > foo # Changes bar
$ cat bar
new-text
$ printf '/new/s//newer\nw\nq\n' | ed foo # Edit foo "in-place"; changes bar
9
newer-text
11
$ cat bar
newer-text
$ ls -i foo bar # Still the same file
797325 bar 797325 foo
$ sed -i s/new/newer/ foo # Does not edit in-place; creates a new file
$ ls -i foo bar
797325 bar 792722 foo
Since sed is not actually editing the file in place, but writing a new file and then renaming it to the old file, you might as well do the same.
awk ... test.csv | sed ... > test.csv.1 && mv test.csv.1 test.csv
There is the misperception that using sed -i somehow avoids the creation of the temporary file. It does not. It just hides the fact from you. Sometimes abstraction is a good thing, but other times it is unnecessary obfuscation. In the case of sed -i, it is the latter. The shell is really good at file manipulation. Use it as intended. If you do need to edit a file in place, don't use the streaming version of ed; just use ed
So, it turned out there are numerous ways to do it. I got it working with sed as below:
sed -i 's/0\([0-9]\{10\}\)/\+91\1/g' test.csv
But this is little tricky as it will edit any entry which matches the criteria. however in my case, It is working fine.
Similar implementation of above logic in perl:
perl -p -i -e 's/\b0(\d{10})\b/\+91$1/g;' test.csv
Again, same caveat as mentioned above.
More precise way of doing it as shown by Lev Levitsky because it will operate specifically on the 35th field
sed -ri 's/^(([^,]*,){34})0([^,]*)/\1+91\3/g' test.csv
For more complex situations, I will have to consider using any of the csv modules of perl.
Thanks everyone for your time and input. I surely know more about sed/awk after reading your replies.
This might work for you:
sed -i 's/[^,]*/+91/35' test.csv
EDIT:
To replace the leading zero in the 35th field:
sed 'h;s/[^,]*/\n&/35;/\n0/!{x;b};s//+91/' test.csv
or more simply:
|sed 's/^\(\([^,]*,\)\{34\}\)0/\1+91/' test.csv
If you have moreutils installed, you can simply use the sponge tool:
awk -F "," '{print $35}' test.csv | sed -i 's/^0/+91/g' | sponge test.csv
sponge soaks up the input, closes the input pipe (stdin) and, only then, opens and writes to the test.csv file.
As of 2015, moreutils is available in package repositories of several major Linux distributions, such as Arch Linux, Debian and Ubuntu.
Another perl solution to edit the 35th field in-place:
perl -i -F, -lane '$F[34] =~ s/^0/+91/; print join ",",#F' test.csv
These command-line options are used:
-i edit the file in-place
-n loop around every line of the input file
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace.
-e execute the perl code
-F autosplit modifier, in this case splits on ,
#F is the array of words in each line, indexed starting with 0
$F[34] is the 35 element of the array
s/^0/+91/ does the substitution

pre-pend word to the last word of a line

I want to pre-pend a directory name to the last word in a line. The line has the following format:
100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0^IoneFile$
where ^I denotes a tab, and $ denotes the end-of-line. This line is generated by git ls-files -s.
I want a sed command to prepend one/ to the filename in this line, like so:
`100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0^Ione/oneFile$
Some of the lines that I've tried, and their corresponding outputs:
Match the longest string of characters that are not \t followed by $; append one/:
$ git ls-files -s | sed 's|[^\t]*$|one/&|'
one/100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0 oneFile
Match the longest string of characters that are not \t or ' ' followed by $; pre-pend one/:
$ git ls-files -s | sed 's|[^\t ]*$|one/&|'
100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e one/0 oneFile
Match the longest string of characters that are not horizontal whitespace, prepend 'one/':
$ git ls-files -s | sed 's|[^[[:blank:]]]*$|one/&|'
100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0 oneFilone/e
I've basically tried a whole bunch of things matching for:
[^\t ]*$|one/&
[^[[:space:]]]*$|one/&
and the ones listed above. The closest I can get is to have oneFilone/e, which was [^[[:blank:]]]*$|one/&|', or to pre-pend to the 0, but I can't seem to quite get what I want.
EDIT
Because a few people have commented / posted answers, none of which work for me, I figured I'd add: I am using Mac OS X 10.7.3. The version of sed I'm not completely sure of (if anybody knows a way to get it feel free to add a comment to that effect) - the man sed page says it's a BSD sed. I'm not sure how different that is to GNU sed, if any.
I'm also using zsh, with oh-my-zsh running (prettymuch unmodified). I have turned on extended_glob (setopt extended_glob).
I've commented with my results for the answers people have given; I assume they are run on a Linux distribution? I don't have access to a non-OS X system tonight, but I will re-run any answers tomorrow; maybe it's just my [shell|OS|bad karma] that isn't letting them work for me.
EDIT Again:
So I've tested on a Ubuntu system, and the (1) above does work. I'd love a working version for my Mac, though.
Final EDIT:
Thanks to all who answered with working equivalent commands. It turns out that my first one does work, but not with BSD sed. I do, however, have gsed available (thanks for pointing that out!) which makes these all magically work.
This is simpler, but seems to work:
echo -e '100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0 oneFile' | sed -e 's/\([a-zA-Z]\+\)$/one\/\0/g'
100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0 one/oneFile
It should work with a tab just before oneFile also.
The following works for me:
git ls-files -s | sed -e 's|\t\(.\+\)$|\tone/\1|g'
100644 345242cb0c4e9bb01a6fef9947f4342ff2f68553 0 one/ExView/resource.h
would have been
100644 345242cb0c4e9bb01a6fef9947f4342ff2f68553 0 ExView/resource.h
I've also had trouble when I got strange new line problems. I doubt this is your problem, but in the past, git ls-files -s | tr -d '\r' | ... has been helpful for me.
unable to test right now, but you're using too many brackets with the character classes.
[^[[:space:]]]*$|one/&
should be
[^[:space:]]*$|one/&
With the extra brackets, you get just the characters '[',':','s','p','a','c','e',']' -- explaining why the dir is inserted before the last 'e'
This might work for you (or am I missing something?):
echo -e "100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0\toneFile" | sed 's/\t/&one\//'
100644 bfadfab6f98b8fa1e9989fe16b2bf0fb13ffd39e 0 one/oneFile
I wanted to do exactly this but did not have gsed around...
The only issue is that OS X sed does not recognise \t as a TAB character, as explained here. You have to use an actual TAB character. Use Option 1 in the original post, but instead of typing \t, press Ctrl+v followed by the Tab key. This is hard to copy and paste here, so you will have to do it yourself. :-)
See also this question.

SED Delete lines and replace with new from file

Have been looking at SED documention but need a little pointer in the right direction
I have 200 files I want to modify in a batch.
Source is html file.
Need to create a new file for the changes.
Want to delete the first part of each file up to the first tag (This is 20 or so lines but can vary slightly).
Then insert the contents of a source file (the same for all files) into the new target file starting at line 1, for 30 or so lines. The number of lines to insert does not match the number that are deleted though.
Hope you can help.
Paul
This can certainly be done with sed(1), but I would probably use the vanilla editor ed(1).
$ cat > bigfix.sh
for i in "$#"; do
ed "$i" << \eof
1,/<tag>/-1d
0r otherfile.html
w
q
eof
done
$ sh bigfix.sh file*.html
This shell script takes arguments and runs ed(1) on each arg. It deletes lines starting from the first and ending on the line right before the one with <tag>. It then puts otherfile.html at the top and writes out the result.
For an individual file:
sed -e '1,/tag/{/tag/r insertfile' -e ';d}' inputfile > outputfile
For many files:
find . -name 'criterion*.ext' -type f -exec sh -c 'sed -e "1,/tag/{/tag/r insertfile" -e ';d}" "{}" > "{}.new"' \;
Edit:
Fixed the find command to use sh because of the redirection. Note the change in quoting from the previous version.

Diff Ignoring GUIDS

When using Diff, how would one go about ignoring line differences that only diff on GUID's? Something along the lines of:
diff -I "^.*[a-zA-Z0-9]{8}\-[a-zA-Z0-9]{4}\-[a-zA-Z0-9]{5}\-[a-zA-Z0-9]{5}\-[a-zA-Z0-9]{12}.*$"
Where obviously the above doesn't work, but just to get an idea of what is needed.
diff -I '[0-9A-F\-]\{36\}' foo.txt bar.txt
Perhaps you could first pipe the input files through sed to remove anything matching a GUID, then perform the diff.
Can you pipe the output of diff to a grep -v and use your pattern?