diff ignore blank likes - diff

How can I get GNU diff ignore the blank lines in the following example?
File a:
x
do
done
File b:
x
do
done
Neither file has trailing white spaces in any line.
Using GNU diff 3.1 on Mac OS X I get:
diff -w a b
2d1
< do
3a3
> do
Same when I add various promising looking options:
diff --suppress-blank-empty -E -b -w -B -I '^[[:space:]]*$' --strip-trailing-cr -i a b
2d1
< do
3a3
> do
What am I missing here?
diff --version
diff (GNU diffutils) 3.1

I think the problem here is that diff is seeing do as being removed from the first file, and added to the second, maybe because there isn't enough context around the change.
If you reverse the order of the files as arguments, diff reports that the space is added and removed, and will then ignore it with --ignore-blanks-lines.
Looking at it as a unified diff, this is a little more clear:
$ diff test.txt test2.txt -u
--- test.txt 2015-10-20 10:50:52.585167600 -0700
+++ test2.txt 2015-10-20 10:51:01.042167600 -0700
## -1,4 +1,4 ##
x
-do
+do
done
prp#QW7PRP09-14 ~/temp
$ diff test2.txt test.txt -u
--- test2.txt 2015-10-20 10:51:01.042167600 -0700
+++ test.txt 2015-10-20 10:50:52.585167600 -0700
## -1,4 +1,4 ##
x
-
do
+
done
And the result with the --ignore-blank-lines, and the order switched:
prp#QW7PRP09-14 ~/temp
$ diff test2.txt test.txt -B -u

Related

How do I extract a version number when there is another number in front with sed

OpenSUSE, in their infinite wisdom, has decided that ld -v will return
GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.37.20211103-7.26
I need to extract the 2 and 37 values and throw out the rest, and this needs to work with ld that isn't so screwed up.
I have tried numerous examples found here and elsewhere for extracting the version, but they all get hung up on 15. Does anyone have any idea on how I can extract this using sed?
Currently in the Makefile I am using
LD_MAJOR_VER := $(shell $(LD) -v | perl -pe '($$_)=/([0-9]+([.][0-9]+)+)/' | cut -f1 -d. )
LD_MINOR_VER := $(shell $(LD) -v | perl -pe '($$_)=/([0-9]+([.][0-9]+)+)/' | cut -f2 -d. )
though I would much prefer to use sed like it did before SuSE screwed up our build process with their 15.3 update. Any help would be greatly appreciated.
You can use
LD_MAJOR_VER := $(shell $(LD) -v | sed -n 's/.* \([0-9]*\).*/\1/p')
LD_MINOR_VER := $(shell $(LD) -v | sed -n 's/.* [0-9]*\.\([0-9]*\).*/\1/p')
Details:
-n - an option that suppresses default line output with sed
.* \([0-9]*\).* - a regex that matches the whole string:
.* - any zero or more chars
- space
\([0-9]*\) - Group 1 (the parentheses are escaped to form a capturing group since this is a POSIX BRE pattern): any zero or more digits
.* - any zero or more chars
\1 - the replacement is the Group 1 value
p - only prints the result of the substitution.
In the second regex, [0-9]*\. also matches zero or more digits (the major version number) with a dot after it to skip that value.
I would do it in two steps, it can make it clear:
get the version information
get the major/minor or whatever from the version information
It would be easier to use awk to solve it, but since you said you prefer sed:
kent$ ver=$(sed 's/.*[[:space:]]//' <<< "GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.37.20211103-7.26")
kent$ echo $ver
2.37.20211103-7.26
kent$ major=$(sed 's/[.].*//' <<< $ver)
kent$ echo $major
2
kent$ minor=$(sed 's/^[^.-]*[.]//;s/[.].*//' <<< $ver)
kent$ echo $minor
37
If you use GNU make then its Functions for Transforming Text solve all this:
LD_VERSION := $(subst ., ,$(lastword $(shell $(LD) -v)))
LD_MAJOR_VER := $(word 1,$(LD_VERSION))
LD_MINOR_VER := $(word 2,$(LD_VERSION))
Moreover it is probably very robust and should work with any version string where the version is the last word and its component are separated by dots. Demo (where the version string is passed as a make variable instead of being returned by $(LD) -v):
$ cat Makefile
LD_VERSION := $(subst ., ,$(lastword $(LD_VERSION_STRING)))
LD_MAJOR_VER := $(word 1,$(LD_VERSION))
LD_MINOR_VER := $(word 2,$(LD_VERSION))
.PHONY: all
all:
#echo $(LD_MAJOR_VER)
#echo $(LD_MINOR_VER)
$ make LD_VERSION_STRING='blah blah blah 1.2.3.4.5.6.7'
1
2
$ make LD_VERSION_STRING='GNU ld (GNU Binutils for Debian) 2.35.2'
2
35
$ make LD_VERSION_STRING='GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.37.20211103-7.26'
2
37

How can I get the name of a symbolic link?

Output of a command looks something like this
# ls -l abc.zip
lrwxrwxrwx 1 sri dba 122 Mar 27 23:37 /a/b/c/abc.zip -> /x/y/z/abc.zip
I need to extract/cut the whole path which comes after ->. I have used cut -f -d but its not working some times, I guess column number is changing. So I need sed equivalent of this.
What you are reading is the information about a symlink. Parsing ls is definitely a bad idea.
What about using readlink instead? It does read value of a symbolic link:
readlink abc.zip
Or with -f for the full path:
readlink -f abc.zip
See an example:
$ touch a
$ ln -s a b # we create a symbolic link "b" to the file "a"
$ ls -l b
lrwxrwxrwx 1 me me 1 Apr 6 09:54 b -> a
$ readlink -f "b" # we check the full "destination" of the symlink "b"
/home/me/a
$ readlink "b" # we check the "destination" of the symlink "b"
a
This might work for you (GNU sed):
sed -n 's/^l.*->\s*//p' file
Using 'cut' you can only specify single character delimiters.
For string delimiters an easy and clear option is awk. In your case try:
ls -ls abc.zip | awk -F '->' '{print $2}'

comparing two directories with separate diff output per file

I'd need to see what has been changed between two directories which contain different version of a software sourcecode. While I have found a way to get a unique .diff file, how can I obtain a different file for each changed file in the two directories? I'd need this, as the "main" is about 6 MB and wanted some more handy thing.
I came around this problem too, so I ended up with some lines of a shell script. It takes three arguments: Source and destination directory (as used for diff) and a target folder (should exist) for the output.
It's a bit hacky, but maybe it would be useful for someone. So use with care, especially if your paths have special characters.
#!/bin/sh
DIFFARGS="-wb"
LANG=C
TARGET=$3
SRC=`echo $1 | sed -e 's/\//\\\\\\//g'`
DST=`echo $2 | sed -e 's/\//\\\\\\//g'`
if [ ! -d "$TARGET" ]; then
echo "'$TARGET' is not a directory." >&2
exit 1
fi
diff -rqN $DIFFARGS "$1" "$2" | sed "s/Files $SRC\/\(.*\?\) and $DST\/\(.*\?\) differ/\1/" | \
while read file
do
if [ ! -d "$TARGET/`dirname \"$file\"`" ]; then
mkdir -p "$TARGET/`dirname \"$file\"`"
fi
diff $DIFFARGS -N "$1/$file" "$2/$file" > "$TARGET"/"$file.diff"
done
if you want to compare source code it is better to commit it to a source vesioning program as "svn".
after you have done so. do a diff of your uploaded code and pipe it to file.diff
svn diff --old svn:url1 --new svn:url2 > file.diff
A bash for loop will work for you. The following will diff two directories with C source code and produce a separate diff for each file.
for FILE in $(find <FIRST_DIR> -name '*.[ch]'); do DIFF=<DIFF_DIR>/$(echo $FILE | grep -o '[-_a-zA-Z0-9.]*$').diff; diff -u $FILE <SECOND_DIR>/$FILE > $DIFF; done
Use the correct patch level for the lines starting with +++

split a large text (xyz) database into x equal parts

I want to split a large text database (~10 million lines). I can use a command like
$ sed -i -e '4 s/(dB)//' -e '4 s/Best\ unit/Best_Unit/' -e '1,3 d' '/cygdrive/c/ Radio Mobile/Output/TRC_TestProcess/trc_longlands.txt'
$ split -l 1000000 /cygdrive/P/2012/Job_044_DM_Radio_Propogation/Working/FinalPropogation/TRC_Longlands/trc_longlands.txt 1
The first line is to clean the databse and the next is to split it -
but then the output files do not have the field names. How can I incorporate the field names into each dataset and pipe a list which has the original file, new file name and line numbers (from original file) in it. This is so that it can be used in the arcgis model to re-join the final simplified polygon datasets.
ALTERNATIVELY AND MORE USEFULLY -as this needs to go into a arcgis model, a python based solution is best. More details are in https://gis.stackexchange.com/questions/21420/large-point-to-polygon-by-buffer-join-buffer-dissolve-issues#comment29062_21420 and Remove specific lines from a large text file in python
SO GOING WITH A CYGWIN based Python solution as per answer by icyrock.com
we have process_text.sh
cd /cygdrive/P/2012/Job_044_DM_Radio_Propogation/Working/FinalPropogation/TRC_Longlands
mkdir processing
cp trc_longlands.txt processing/trc_longlands.txt
cd txt_processing
sed -i -e '4 s/(dB)//' -e '4 s/Best\ unit/Best_Unit/' -e '1,3 d' 'trc_longlands.txt'
split -l 1000000 trc_longlands.txt trc_longlands_
cat > a
h
1
2
3
4
5
6
7
8
9
^D
split -l 3
split -l 3 a 1
mv 1aa 21aa
for i in 1*; do head -n1 21aa|cat - $i > 2$i; done
for i in 21*; do echo ---- $i; cat $i; done
how can "TRC_Longlands" and the path be replaced with the input filename -in python we have %path%/%name for this.
in the last line is "do echo" necessary?
and this is called by python using
import os
os.system("process_text.bat")
where process_text.bat is basically
bash process_text.sh
I get the following error when run from dos...
Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft
Corporation. All rights reserved.
C:\Users\georgec>bash
P:\2012\Job_044_DM_Radio_Propogation\Working\FinalPropogat
ion\TRC_Longlands\process_text.sh 'bash' is not recognized as an
internal or external command, operable program or batch file.
also when I run the bash command from cygwin -I get
georgec#ATGIS25
/cygdrive/P/2012/Job_044_DM_Radio_Propogation/Working/FinalPropogation/TRC_Longlands
$ bash process_text.sh : No such file or directory:
/cygdrive/P/2012/Job_044_DM_Radio_Propogation/Working/FinalPropogation/TRC_Longlands
cp: cannot create regular file `processing/trc_longlands.txt\r': No
such file or directory : No such file or directory: txt_processing :
No such file or directoryds.txt
but the files are created in the root directory.
why is there a "." after the directory name? how can they be given a .txt extension?
If you want to just prepend the first line of the original file to all but the first of the splits, you can do something like:
$ cat > a
h
1
2
3
4
5
6
7
^D
$ split -l 3
$ split -l 3 a 1
$ ls
1aa 1ab 1ac a
$ mv 1aa 21aa
$ for i in 1*; do head -n1 21aa|cat - $i > 2$i; done
$ for i in 21*; do echo ---- $i; cat $i; done
---- 21aa
h
1
2
---- 21ab
h
3
4
5
---- 21ac
h
6
7
Obviously, the first file will have one line less then the middle parts and the last part might be shorter, too, but if that's not a problem, this should work just fine. Of course, if your header has more lines, just change head -n1 to head -nX, X being the number of header lines.
Hope this helps.

Using diff and patch to force one local code base to look like another

I've noticed this strange behavior of diff and patch when I've used them to force one code base to be identical to another. Let's say I want to update update_me to look identical to leave_unchanged. I go to update_me. I run a diff from leave_unchanged to update_me. Then I patch the diff into update_me. If there are new files in leave_unchanged, patch asks me if my patch was reversed! If I answer yes, it deletes the new files in leave_unchanged. Then, if I simply re-run the patch, it correctly patches update_me.
Why does patch try to modify both leave_unchanged and update_me?
What's the proper way to do this? I found a hacky way which is to replace all +++ lines with nonsense paths so patch can't find leave_unchanged. Then it works fine. It's such an ugly solution though.
$ mkdir copyfrom
$ mkdir copyto
$ echo "Hello world" > copyfrom/myFile.txt
$ cd copyto
$ diff -Naur . ../copyfrom > my.diff
$ less my.diff
diff -Naur ./myFile.txt ../copyfrom/myFile.txt
--- ./myFile.txt 1969-12-31 19:00:00.000000000 -0500
+++ ../copyfrom/myFile.txt 2010-03-15 17:21:22.000000000 -0400
## -0,0 +1 ##
+Hello world
$ patch -p0 < my.diff
The next patch would create the file ../copyfrom/myFile.txt,
which already exists! Assume -R? [n] yes
patching file ../copyfrom/myFile.txt
$ patch -p0 < my.diff
patching file ./myFile.txt
Edit
I noticed that Mercurial avoids this problem by pre-pending "a" and "b" directories.
$ hg diff
--- a/crowdsourcing/models.py Mon Jun 14 17:18:46 2010 -0400
+++ b/crowdsourcing/models.py Thu Jun 17 11:08:42 2010 -0400
...
I believe the answer here is to execute your diff at the parent directory. Then use patch -p1 to strip this first segment. I believe this is why the strip option of patch actually defaults to 1 rather than 0. E.g. to use your example from above
$ mkdir copyfrom
$ mkdir copyto
$ echo "Hello world" > copyfrom/myFile.txt
$ diff -Naur copyto copyfrom > my.diff
$ less my.diff
diff -Naur copyto/myFile.txt copyfrom/myFile.txt
--- copyto/myFile.txt 1970-01-01 12:00:00.000000000 +1200
+++ copyfrom/myFile.txt 2010-10-19 10:03:43.000000000 +1300
## -0,0 +1 ##
+Hello world
$ cd copyto
$ patch -p1 < ../my.diff
The only difference from your example is that I've executed the diff from the parent directory so that the directories being compared are at the same level.