Diff Ignoring GUIDS - diff

When using Diff, how would one go about ignoring line differences that only diff on GUID's? Something along the lines of:
diff -I "^.*[a-zA-Z0-9]{8}\-[a-zA-Z0-9]{4}\-[a-zA-Z0-9]{5}\-[a-zA-Z0-9]{5}\-[a-zA-Z0-9]{12}.*$"
Where obviously the above doesn't work, but just to get an idea of what is needed.

diff -I '[0-9A-F\-]\{36\}' foo.txt bar.txt

Perhaps you could first pipe the input files through sed to remove anything matching a GUID, then perform the diff.

Can you pipe the output of diff to a grep -v and use your pattern?

Related

Sed Pattern filtering long html doc

I am trying to filter a long html page, for leaving only fingerprints which have a consistent structure. for example:
DCD0 5B71 EAB9 4199 527F 44AC DB6B 8C1F 96D8 BF60
i know how to do it by using standrd command line commands as grep, cut and head/tail, but is there more elegant way to do it with sed? the shell comman i use is long and not looking so nice.
thank you
grep is the right tool for extracting strings from a file based on regular expression matching:
grep -Eo '([A-F0-9]{4}[[:space:]]){9}[A-F0-9]{4}' file.html
Here is a sed command tested with GNU sed 4.2.2:
sed -nr '/(([[:xdigit:]]){4} ?){10}/p' file
It matches and prints
10 groups that are made of
4 hexdigits
followed by an optional space
With GNU sed:
sed -E 's/.*(([A-F0-9]{4}[[:space:]]){9}[A-F0-9]{4}).*/\1/' file

Sed command to fetch particular string from full string

I've got a file which contains lot of strings like below input.
Need to extract the below output and process it further.
Input:
History={ExecAt=[2013-05-03 03:00:20,2013-05-03 03:00:23,2013-05-03 03:00:26],MId=["msgId3","msgId4","msgId5"]};
Output should be:
MId=["msgId3","msgId4","msgId5"]
using (sed 's/^.*,MId=/MId/') command i got the output like MId=["msgId3","msgId4","msgId5"]};
but still wanted the exact output (need to remove last 2 special chars }; here).
This works for me:
sed 's/.*\(MId=.*\)\}.*/\1/'
If your grep supports the -o option, you can use it rather than sed:
grep -o 'MId=\[[^]]\+\]'
Using the same regex in sed works fine, just remove anything before and after:
sed -e 's/.*\(MId=\[[^]]\+\]\).*/\1/'

sed stripping hex from start of file including pattern

I've been at this most of this afternoon hacking with sed and it's a bit of a minefield.
I have a file of hex of the form:
485454502F312E31203230300D0A0D0AFFD8FFE000104A46494600
I'm pattern matching on 0D0A0D0A and have managed to delete the contents from the start of the file to there. The problem is that it leaves the 0D0A0D0A, so I have to do a second pass to pick that up.
Is there a way in one command to delete up to and including the pattern that you match to and save it back into the same file ?
thanks in advance.
ID
This should work:
sed -e 's/.*0D0A0D0A//' file.txt
You need to provide better description of your problem.
Based on what you wrote you can use -i switch (Edit files in-place) of sed to save the changed file:
sed -i.bak 's/^.*0D0A0D0A//' file
PS: On posix and on some older versions of sed doesn't have -i switch available. If that's the case use it like this:
sed 's/^.*0D0A0D0A//' file > _temp && mv _temp file

Add text at the end of each line

I'm on Linux command line and I have file with
127.0.0.1
128.0.0.0
121.121.33.111
I want
127.0.0.1:80
128.0.0.0:80
121.121.33.111:80
I remember my colleagues were using sed for that, but after reading sed manual still not clear how to do it on command line?
You could try using something like:
sed -n 's/$/:80/' ips.txt > new-ips.txt
Provided that your file format is just as you have described in your question.
The s/// substitution command matches (finds) the end of each line in your file (using the $ character) and then appends (replaces) the :80 to the end of each line. The ips.txt file is your input file... and new-ips.txt is your newly-created file (the final result of your changes.)
Also, if you have a list of IP numbers that happen to have port numbers attached already, (as noted by Vlad and as given by aragaer,) you could try using something like:
sed '/:[0-9]*$/ ! s/$/:80/' ips.txt > new-ips.txt
So, for example, if your input file looked something like this (note the :80):
127.0.0.1
128.0.0.0:80
121.121.33.111
The final result would look something like this:
127.0.0.1:80
128.0.0.0:80
121.121.33.111:80
Concise version of the sed command:
sed -i s/$/:80/ file.txt
Explanation:
sed stream editor
-i in-place (edit file in place)
s substitution command
/replacement_from_reg_exp/replacement_to_text/ statement
$ matches the end of line (replacement_from_reg_exp)
:80 text you want to add at the end of every line (replacement_to_text)
file.txt the file name
How can this be achieved without modifying the original file?
If you want to leave the original file unchanged and have the results in another file, then give up -i option and add the redirection (>) to another file:
sed s/$/:80/ file.txt > another_file.txt
sed 's/.*/&:80/' abcd.txt >abcde.txt
If you'd like to add text at the end of each line in-place (in the same file), you can use -i parameter, for example:
sed -i'.bak' 's/$/:80/' foo.txt
However -i option is non-standard Unix extension and may not be available on all operating systems.
So you can consider using ex (which is equivalent to vi -e/vim -e):
ex +"%s/$/:80/g" -cwq foo.txt
which will add :80 to each line, but sometimes it can append it to blank lines.
So better method is to check if the line actually contain any number, and then append it, for example:
ex +"g/[0-9]/s/$/:80/g" -cwq foo.txt
If the file has more complex format, consider using proper regex, instead of [0-9].
You can also achieve this using the backreference technique
sed -i.bak 's/\(.*\)/\1:80/' foo.txt
You can also use with awk like this
awk '{print $0":80"}' foo.txt > tmp && mv tmp foo.txt
Using a text editor, check for ^M (control-M, or carriage return) at the end of each line. You will need to remove them first, then append the additional text at the end of the line.
sed -i 's|^M||g' ips.txt
sed -i 's|$|:80|g' ips.txt
sed -i 's/$/,/g' foo.txt
I do this quite often to add a comma to the end of an output so I can just easily copy and paste it into a Python(or your fav lang) array

How do you get a list of files included in a diff?

I have a patch file containing the output from git diff. I want to get a summary of all the files that, according to the patch file, have been added or modified. What command can I use to achieve this?
patchutils includes a lsdiff utility.
grep '+++' mydiff.patch seems to do the trick.
I can also use git diff --names-only which is probably the better approach.
grep '+++' mydiff.patch|perl -pe 's/\+\+\+ //g'
Details:
git diff produces output in the format
+++ b/file
So if you're using grep as Nathan suggested
grep '+++' mydiff.patch
You'll have the list of affected files, prepended by '+++ ' (3 plus signs and a space).
I often need to further process files and find it convenient to have one filename per line without anything else. This can be achieved with the following command, where perl/regex removes these plus signs and the space.
grep '+++' mydiff.patch|perl -pe 's/\+\+\+ //g'
For patch files generated with diff -Naur, the mydiff.patch file contains entries with filename and date ( is indicating the tabulator whitespace character)
+++ b/file<tab>2013-07-03 13:58:45.000000000 +0200
To extract the filenames for this, use
grep '+++' mydiff.patch|perl -pe 's/\+\+\+ (.*)\t.*/\1/g'
A decent way to do this is to use the --stat flag (or the --summary flag, if you need only new / deleted / renamed files for some reason).
Example:
git apply --stat peer.diff | awk '{ print $1 }' | sed '$d'
1-js/03-code-quality/index.md
CONTR.md
LICENSE.md
README.md
chat-app.readme.md
When you parse patches generated by git format-patch or others containing additional information about number of lines edited, it's crucial to search for ^+++ (at the start of the line) rather than just +++.
For example:
grep '^+++' *.patch | sed -e 's#+++ [ab]/##'
will output paths without a/ or b/ at the begin.