difference between the content of two files - diff

I have two files one file subset of other and i want to obtain a file which has contents not common to both.for example
File1
apple
mango
banana
orange
jackfruit
cherry
grapes
eggplant
okra
cabbage
File2
apple
banana
cherry
eggplant
cabbage
The resultant file, difference of above two files
mango
orange
jackfruit
grapes
okra
Any ideas on this are appreciated.

You can sort the files then use comm:
$ comm -23 <(sort file1.txt) <(sort file2.txt)
grapes
jackfruit
mango
okra
orange
You might also want to use comm -3 instead of comm -23:
-1 suppress lines unique to FILE1
-2 suppress lines unique to FILE2
-3 suppress lines that appear in both files

1 Only one instance , in either
cat File1 File2 | sort | uniq -u
2 Only in first file
cat File1 File2 File2 | sort | uniq -u
3 Only in second file
cat File1 File1 File2 | sort | uniq -u

use awk, no sorting necessary (reduce overheads)
$ awk 'FNR==NR{f[$1];next}(!($1 in f)) ' file2 file
mango
orange
jackfruit
grapes
okra

1. Files uncommon to both files
diff --changed-group-format="%<" --unchanged-group-format="%>" file1 file2
2. File unique to first file
diff --changed-group-format="%<" --unchanged-group-format="" file1 file2
3. File unique to second file
diff --changed-group-format="" --unchanged-group-format="%>" file1 file2
Hope it works for you

Related

How to make sed take input from pipe, and insert into a file

is it possible to use the pipe to redirect the output of the previous command, to sed, and let sed use this as input(pattern or string) to access a file?
I know if you only use sed, you can use something like
sed -i '1 i\anything' file
But can I do something like
head -1 file1 | sed -i '1 i\OutputFromPreviousCmd' file2
This way, I don't need to manually copy the output and change the sed command everytime
Update:
Added the files I meant
head -3 file1.txt
Side A,Age(us),mm:ss.ms_us_ns_ps
84 Vendor Specific, 0000000009096, 0349588242
84 Vendor Specific, 0000000011691, 0349591828
head -3 file2.txt
84 Vendor Specific, 0000000000418, 0349575322
83 Vendor Specific, 0000000002099, 0349575343
83 Vendor Specific, 0000000001628, 0349576662
I'd like to grab the first line of file1 and insert it to file2, so the result should be :
head -3 file2.txt
Side A,Age(us),mm:ss.ms_us_ns_ps
84 Vendor Specific, 0000000000418, 0349575322
83 Vendor Specific, 0000000002099, 0349575343
83 Vendor Specific, 0000000001628, 0349576662
head -1 file1 | sed '1s/^/1i /' | sed -i -f- file2
This takes your one line of output, prepends the sed 1i command, the pipes that sed command stream to sed using -f- to take sed commands from stdin.
For example:
$ echo bob > bob.txt
$ echo alice | sed '1s/^/1i /' | sed -i -f- bob.txt
$ more bob.txt
alice
bob
This looks like pipes and not commands ending in > temp ; mv temp file2, but sed is doing that nonetheless when -i is used.
This might work for you (GNU sed):
head -1 file1 | sed -i '1e cat /dev/stdin' file2
Insert the first line of file1 into the start of file2.
But why not use cat?:
cat <(head -1 file1) file2

Sed. How to print lines matching pattern from another file?

I have file1 containing some text, like:
abcdef 123456 abcdef
ghijkl 789123 abcdef
mnopqr 123456 abcdef
and I have file2 containing single line of text which I want to use as pattern:
ghijkl 789123
How can I use second file as a pattern to print lines containing it to third file using sed? like file3:
ghijkl 789123 abcdef
I've tried to use
sed -ne "s/r file2//p" file1 > file3
But the content of file3 is blank for some reason
P.S. using Windows
If you have sed, do have access to grep?
grep -f file2 file1 > file3
This is the simplest sed solution on linux: sed -n /`<file2`/p file1 > file3, but windows does not provides backticks. So the windows work-around would be:
set /p PATERN=<file2
sed -n /%PATERN%/p file1 > file3
The sed solution is:
cat f2.txt | xargs -I {} sed -n "/{}/p" f1.txt > f3.txt
but, as #Cyrus correctly notes, grep is the proper tool for this solution and it's much nicer:
grep -f f2.txt f1.txt > f3.txt
Note: using these incredibly powerful *nix tools like sed, grep, cat, xargs, bash, etc. on Microsoft Windows can be frustrating. Consider spinning up a Linux environment, instead -- you'll save yourself many hours of grief dealing with subtle path and environment issues from emulators like Cygwin, etc.

Merge Files using ksh

I have 2 files in a directory (Files given below are only Examples)
File 1
abcd
efghi
1234
5678
File2
qwert
werty
poqrs
Desried Output
abcd
efghi
1234
5678
qwert
werty
poqrs
Currently i used the following code to merge the records in the file
for file in *.txt
do
cat "$file"
echo
done > output.txt
This is merging the records as expected but the total size of merged file not matching with the sum of sizes of files.
For Ex: if the File1 size is 120, File 2 size is 140 the Merged File Size is coming to be 262 and not 260.
I guess it is because of the echo statement in the code.
can any one help me out if there is any way to merge the data as stated above apart from the above way.
Thanks in advance,
Anand
This will cat the file contents directly into file "output.txt" via append ">>" instead of the original code of cat to stdout, then echo with an extra null terminator.
for file in *.txt ; do
cat $file >> output.txt
done

Sed replace pattern with line number

I need to replace the pattern ### with the current line number.
I managed to Print in the next line with both AWK and SED.
sed -n "/###/{p;=;}" file prints to the next line, without the p;, it replaces the whole line.
sed -e "s/###/{=;}/g" file used to make sense in my head, since the =; returns the line number of the matched pattern, but it will return me the the text {=;}
What am i Missing? I know this is a silly question. I couldn't find the answer to this question in the sed manual, it's not quite clear.
If possible, point me what was i missing, and what to make it work. Thank you
Simple awk oneliner:
awk '{gsub("###",NR,$0);print}'
Given the limitations of the = command, I think it's easier to divide the job in two (actually, three) parts. With GNU sed you can do:
$ sed -n '/###/=' test > lineno
and then something like
$ sed -e '/###/R lineno' test | sed '/###/{:r;N;s/###\([^\n]*\n\)\([^\n]*\)/\2\1/;tr;:c;s/\n\n/\n/;tc}'
I'm afraid there's no simple way with sed because, as well as the = command, the r and GNU extension R commands don't read files into the pattern space, but rather directly append the lines to the output, so the contents of the file cannot be modified in any way. Hence piping to another sed command.
If the contents of test are
fooo
bar ### aa
test
zz ### bar
the above will produce
fooo
bar 2 aa
test
zz 4 bar
This might work for you (GNU sed):
sed = file | sed 'N;:a;s/\(\(.*\)\n.*\)###/\1\2/;ta;s/.*\n//'
An alternative using cat:
cat -n file | sed -E ':a;s/^(\s*(\S*)\t.*)###/\1\2/;ta;s/.*\t//'
As noted by Lev Levitsky this isn't possible with one invocation of sed, because the line number is sent directly to standard out.
You could have sed write a sed-script for you, and do the replacement in two passes:
infile
a
b
c
d
e
###
###
###
a
b
###
c
d
e
###
Find the lines that contain the pattern:
sed -n '/###/=' infile
Output:
6
7
8
11
15
Pipe that into a sed-script writing a new sed-script:
sed 's:.*:&s/###/&/:'
Output:
6s/###/6/
7s/###/7/
8s/###/8/
11s/###/11/
15s/###/15/
Execute:
sed -n '/###/=' infile | sed 's:.*:&s/^/& \&/:' | sed -f - infile
Output:
a
b
c
d
e
6
7
8
a
b
11
c
d
e
15
is this ok ?
kent$ echo "a
b
c
d
e"|awk '/d/{$0=$0" "NR}1'
a
b
c
d 4
e
if match pattern "d", append line number at the end of the line.
edit
oh, you want to replace the pattern not append the line number... take a look the new cmd:
kent$ echo "a
b
c
d
e"|awk '/d/{gsub(/d/,NR)}1'
a
b
c
4
e
and the line could be written like this as well: awk '1+gsub(/d/,NR)' file
one-liner to modify the FILE in place, replacing LINE with the corresponding line number:
seq 1 `wc -l FILE | awk '{print $1}'` | xargs -IX sed -i 'X s/LINE/X/' FILE
Following on from https://stackoverflow.com/a/53519367/29924
If you try this on osx the version of sed is different and you need to do:
seq 1 `wc -l FILE | awk '{print $1}'` | xargs --verbose -IX sed -i bak "X s/__line__/X/" FILE
see https://markhneedham.com/blog/2011/01/14/sed-sed-1-invalid-command-code-r-on-mac-os-x/

How can I show lines in common (reverse diff)?

I have a series of text files for which I'd like to know the lines in common rather than the lines which are different between them. Command line Unix or Windows is fine.
File foo:
linux-vdso.so.1 => (0x00007fffccffe000)
libvlc.so.2 => /usr/lib/libvlc.so.2 (0x00007f0dc4b0b000)
libvlccore.so.0 => /usr/lib/libvlccore.so.0 (0x00007f0dc483f000)
libc.so.6 => /lib/libc.so.6 (0x00007f0dc44cd000)
File bar:
libkdeui.so.5 => /usr/lib/libkdeui.so.5 (0x00007f716ae22000)
libkio.so.5 => /usr/lib/libkio.so.5 (0x00007f716a96d000)
linux-vdso.so.1 => (0x00007fffccffe000)
So, given these two files above, the output of the desired utility would be akin to file1:line_number, file2:line_number == matching text (just a suggestion; I really don't care what the syntax is):
foo:1, bar:3 == linux-vdso.so.1 => (0x00007fffccffe000)
On *nix, you can use comm. The answer to the question is:
comm -1 -2 file1.sorted file2.sorted
# where file1 and file2 are sorted and piped into *.sorted
Here's the full usage of comm:
comm [-1] [-2] [-3 ] file1 file2
-1 Suppress the output column of lines unique to file1.
-2 Suppress the output column of lines unique to file2.
-3 Suppress the output column of lines duplicated in file1 and file2.
Also note that it is important to sort the files before using comm, as mentioned in the man pages.
I found this answer on a question listed as a duplicate. I find grep to be more administrator-friendly than comm, so if you just want the set of matching lines (useful for comparing CSV files, for instance) simply use
grep -F -x -f file1 file2
Or the simplified fgrep version:
fgrep -xf file1 file2
Plus, you can use file2* to glob and look for lines in common with multiple files, rather than just two.
Some other handy variations include
-n flag to show the line number of each matched line
-c to only count the number of lines that match
-v to display only the lines in file2 that differ (or use diff).
Using comm is faster, but that speed comes at the expense of having to sort your files first. It isn't very useful as a 'reverse diff'.
It was asked here before: Unix command to find lines common in two files
You could also try with Perl (credit goes here):
perl -ne 'print if ($seen{$_} .= #ARGV) =~ /10$/' file1 file2
I just learned the comm command from the answers, but I wanted to add something extra: if the files are not sorted, and you don't want to touch the original files, you can pipe the output of the sort command. This leaves the original files intact. It works in Bash, but I can't say about other shells.
comm -1 -2 <(sort file1) <(sort file2)
This can be extended to compare command output, instead of files:
comm -1 -2 <(ls /dir1 | sort) <(ls /dir2 | sort)
The easiest way to do it is:
awk 'NR==FNR{a[$1]++;next} a[$1] ' file1 file2
Files are not necessary to be sorted.
I think diff utility itself, using its unified (-U) option, can be used to achieve effect. Because the first column of output of diff marks whether the line is an addition, or deletion, we can look for lines that haven't changed.
diff -U1000 file_1 file_2 | grep '^ '
The number 1000 is chosen arbitrarily, big enough to be larger than any single hunk of diff output.
Here's the full, foolproof set of commands:
f1="file_1"
f2="file_2"
lc1=$(wc -l "$f1" | cut -f1 -d' ')
lc2=$(wc -l "$f2" | cut -f1 -d' ')
lcmax=$(( lc1 > lc2 ? lc1 : lc2 ))
diff -U$lcmax "$f1" "$f2" | grep '^ ' | less
# Alternatively, use this grep to ignore the lines starting
# with +, -, and # signs.
# grep -vE '^[+#-]'
If you want to include the lines that are just moved around, you can sort the input before diffing, like so:
f1="file_1"
f2="file_2"
lc1=$(wc -l "$f1" | cut -f1 -d' ')
lc2=$(wc -l "$f2" | cut -f1 -d' ')
lcmax=$(( lc1 > lc2 ? lc1 : lc2 ))
diff -U$lcmax <(sort "$f1") <(sort "$f2") | grep '^ ' | less
In Windows, you can use a PowerShell script with CompareObject:
compare-object -IncludeEqual -ExcludeDifferent -PassThru (get-content A.txt) (get-content B.txt)> MATCHING.txt | Out-Null #Find Matching Lines
CompareObject:
IncludeEqual without -ExcludeDifferent: Everything
ExcludeDifferent without -IncludeEqual: Nothing
Just for information, I made a little tool for Windows doing the same thing as "grep -F -x -f file1 file2" (As I haven't found anything equivalent to this command on Windows)
Here it is:
http://www.nerdzcore.com/?page=commonlines
Usage is "CommonLines inputFile1 inputFile2 outputFile"
Source code is also available (GPL).