diff + ignore lines spaces in case text is the same but on different lines number - diff

I used the diff --ignore-all-space
in order to ignore white spaces when I do diff file1 file2
but what I need to add if I want to ignore also line spaces (text in file1 and file2 are the same but on different lines number)
because actually file1 and file2 are the same text but the text position in file1 is different from file2
for example
diff --ignore-all-space
391a376
> AAAAAAAA
397d381
< AAAAAAAA
423a408
>
Lidia

It's not related to ignoring spaces or not. If you move a line between file1 to file2 , even though it's exactly the same line, diff will detect the line has moved. Which is the result your having. So diff works. If you really don't care of the order of the lines in your files (but I doubt of it ) you can trny sort them (with the sort command ) before diffing them.

Related

sed Remove lines between two patterns (excluding end pattern)

given text like
_adsp TXT "dkim=all"
VVKMU6SE3C2MF88BG4DJQAECMR9SIIF0 NSEC3 1 1 10 C4F407437E8EA4C5 (
175MCHR31K25LP89OVJI5LCE0JA2N2AP
A MX TXT AAAA RRSIG SPF )
RRSIG NSEC3 7 3 1800 (
20200429171433 20200330161758 11672 example.com
H3l26qmtkuiFZCeSYCCAo5krFE3gjM0I8UeQ9jhj3STy
X6fM0YizCHEuv4VZynOJGJc1XJnHRHI+p7yLlZ+OVseK
UfIkPVP+VOmlerwozEpM+Tnt8evwnMTDbcn0zxf/6YJx
kZeO2AszWkRZ0bctqW7INYo8YuyyuTSxSr8se27fiaPA
4GXQymepGgv/JGqargzHbyhhkDhENmNo7Qwkjl+a0kI4
6qqKcEWCsDvnlYUQiDFzc5oRs2j7TT9uybTfwUDQxV+t
MQFMhzu7LNbRIUuOb16sAEGSdl9mWQ4sZRJ9wuXJWbso
G+3tY0pBbq4ffScz/JKcrJ0qAuBF1F5JcQ== )
$TTL 1800
I want to get rid by the part with the "(not beginning with whitespace) NSEC3 " until the first line not beginning with a whitespace character.
resulting
_adsp TXT "dkim=all"
$TTL 1800
in the example.
I tried sed '/^[^\s].*\sNSEC3\s/,/^[^\s]/d;' filename but that doesn't work as expected, example results in
_adsp TXT "dkim=all"
H3l26qmtkuiFZCeSYCCAo5krFE3gjM0I8UeQ9jhj3STy
X6fM0YizCHEuv4VZynOJGJc1XJnHRHI+p7yLlZ+OVseK
UfIkPVP+VOmlerwozEpM+Tnt8evwnMTDbcn0zxf/6YJx
kZeO2AszWkRZ0bctqW7INYo8YuyyuTSxSr8se27fiaPA
4GXQymepGgv/JGqargzHbyhhkDhENmNo7Qwkjl+a0kI4
6qqKcEWCsDvnlYUQiDFzc5oRs2j7TT9uybTfwUDQxV+t
MQFMhzu7LNbRIUuOb16sAEGSdl9mWQ4sZRJ9wuXJWbso
G+3tY0pBbq4ffScz/JKcrJ0qAuBF1F5JcQ== )
$TTL 1800
so resuming printout way too early?
what do I miss?
thank you
P.S.:
you maybe see what I want to do is removing DNSSEC parts out of an named zone. didn't find any other way to remove RRSIG and NSEC3 entries, yet. If someone has an idea, I would appreciate that too.
[\s] matches a literal \ or s characters. It doesn't match whitespace.
The /^[^\s]/d; (if [\s] would work as you expect) will also include removing the last line with non-leading whitespaces. I think you have to loop manually.
On the example you've given, the following seems to work:
sed -n '/^[^ \t].*\sNSEC3\s/{ :a; n; /^[^ \t]/bb; ba}; :b; p'
This might work for you (GNU sed):
sed -n '/^\S.*NSEC/{:a;n;/^\S/!ba};p' file
Turn off implicit printing by using the -n option.
Throw away lines between one starting with a non-space and containing the string NSEC and any lines not starting with a non-space.
Print all other lines.
Alternative:
sed '/^\S.*NSEC/,/^\S/{/^\S.*NSEC\|^\s/d}' file
Yet another alternative:
sed '/^\S.*NSEC/{:a;N;/\n\S/!ba;s/.*\n//}' file
And another:
sed '/^\S.*NSEC/{:a;N;/\n\S/!s/\n//;ta;D}' file
N.B. The first two solutions will delete lines regardless of a line delimiting the end of the deletions. Whereas the last two solutions will only delete lines if there is a line delimiting the end of the deletions.

Improving sed program - conditions

I use this code according to this question.
$ names=(file1.txt file2.txt file3.txt) # Declare array
$ printf 's/%s/a-&/g\n' "${names[#]%.txt}" # Generate sed replacement script
s/file1/a-&/g
s/file2/a-&/g
s/file3/a-&/g
$ sed -f <(printf 's/%s/a-&/g\n' "${names[#]%.txt}") f.txt
TEXT
\connect{a-file1}
\begin{a-file2}
\connect{a-file3}
TEXT
75
How to make conditions that solve the following problem please?
names=(file1.txt file2.txt file3file2.txt)
I mean that there is a world in the names of files that is repeated as a part of another name of file. Then there is added a- more times.
I tried
sed -f <(printf 's/{%s}/{s-&}/g\n' "${files[#]%.tex}")
but the result is
\input{a-{file1}}
I need to find {%s} and a- place between { and %s
It's not clear from the question how to resolve conflicting input. In particular, the code will replace any instance of file1 with a-file1, even things like 'foofile1'.
On surface, the goal seems to be to change tokens (e.g., foofile1 should not be impacted by by file1 substitution. This could be achieved by adding word boundary assertion (\b) - before and after the filename. This will prevent the pattern from matching inside other longer file names.
printf 's/\\b%s\\b/a-&/g\n' "${names[#]%.txt}"
Since this explanation is too long for comment so adding an answer here. I am not sure if my previous answer was clear or not but my answer takes care of this case and will only replace exact file names only and NOT mix of file names.
Lets say following is array value and Input_file:
names=(file1.txt file2.txt file3file2.txt)
echo "${names[*]}"
file1.txt file2.txt file3file2.txt
cat file1
TEXT
\connect{file1}
\begin{file2}
\connect{file3}
TEXT
75
Now when we run following code:
awk -v arr="${names[*]}" '
BEGIN{
FS=OFS="{"
num=split(arr,array," ")
for(i=1;i<=num;i++){
sub(/\.txt/,"",array[i])
array1[array[i]"}"]
}
}
$2 in array1{
$2="a-"$2
}
1
' file1
Output will be as follows. You could see file3 is NOT replaced since it was NOT present in array value.
TEXT
\connect{a-file1}
\begin{a-file2}
\connect{file3}
TEXT
75

How can I ignore line endings when comparing files?

I am comparing two text files and I get the following result:
diff file1 file2 | grep 12345678
> 12345678
< 12345678
As you can see, the same string exists in both files, and both files were sorted with sort.
The line endings must be getting in the way here (Windows vs. Unix).
Is there a way to get diff to ignore line endings on Unix?
Use the --strip-trailing-cr option:
diff --strip-trailing-cr file1 file2
The option causes diff to strip the trailing carriage return character before comparing the files.

Sed Range of Patterns only if it contains Pattern

What i would like to know is how to print a Range of Patterns but only if it contains a specific Pattern.
For example:
I have a file that contains:
HEADER 1
AAA
BBBBBBB
MSG:testing
CCCCCC
DDD
PAGE 1
HEADER 2
EEE
FFFFFF
GGG
HHH
PAGE 2
I want to print from any HEADER to any PAGE but only if it contains the pattern MSG
The result i want is to print only these section:
HEADER 1
AAA
BBBBBBB
MSG:testing
CCCCCC
DDD
PAGE 1
What i have so far is: sed -n -e '/HEADER /,/PAGE /p' inputfile.txt > outputfile.txt
I'm open to any suggestions including the usage of Awk or Grep.
Thanks in advance.
This
sed '/HEADER/ { :a N; /PAGE/!ba; /MSG/!d }' inputfile.txt
works as follows:
/HEADER/ { # in a line that contains HEADER
:a # jump label for looping
N # fetch next line, append to pattern space
/PAGE/!ba # if the pattern space doesn't contain PAGE (this
# is the case if the new line doesn't), go back to :a
/MSG/!d # if the block that's now in the pattern space doesn't
# contain MSG, discard it
}
This removes offending ranges from the file and leaves everything else intact. To print only matching ranges and discard garbage data between ranges,
sed -n '/^HEADER/ { :a N; /PAGE/!ba; /MSG/p }' inputfile.txt
This removes the default print action with -n and uses /MSG/p to explicitly print matching ranges instead of deleting non-matching ranges.
If your date is separated bye space, you can use this gnu awk
awk -v RS= '/MSG/' file
HEADER 1
AAA
BBBBBBB
MSG:testing
CCCCCC
DDD
PAGE 1
By setting RS to nothing, awk works in block mode, and then just select the correct block.
This use HEADER as separator.
awk -v RS="HEADER" '/MSG/ {print RS$0}' file
HEADER 1
AAA
BBBBBBB
MSG:testing
CCCCCC
DDD
PAGE 1
sed -n '/^HEADER/,/^PAGE /!d;H;/^HEADER/h;/^PAGE / {x; /\nMSG/ p;}' YourFile
Assuming there is only and always section starting with HEADER and ending with PAGE (on different lines)
Explaination:
Dont print output unless print is asked
If line is not between (including) HEADER and PAGE, remove it
Append line to holding buffer
if line is HEADER write it to holding buffer (overwrite)
if line is PAGE
load the holding buffer to working buffer
print if MSG is inside
cycle
This might work for you (GNU sed):
sed '/HEADER/!{H;$!d};x;/MSG/!d' file
If the line does not contain HEADER append it to the hold space and if it is not the last line delete it. This means any other line (lines that contain HEADER or last line) will swap with the hold space and if the pattern space (a multiline previously the hold space) does not contain MSG delete it. Lines containing MSG will be printed.

Comparing contents of files

I have two tab delimited files of 15 columns each and n and m number of rows.
The number of rows in file 1 are greater than that of file2, say file 1 has 15 rows
that are not present in file2
how can I find out these rows?
Thank you
The comm command will find lines that are unique to either file, or common to both.
comm -23 <( sort file1 ) <( sort file2 )
will print the lines only in file1 (lines only in file2 and common lines are suppressed by the -2 and -3 options. The files must be sorted; it doesn't really matter how they are sorted, as long as they are both sorted on the same key and in the same way.
does this help?
awk 'NR==FNR{a[$0];next}!($0 in a)' file2 file1
The join command may help, particularly the -a option:
-a FILENUM
print unpairable lines coming from file FILENUM, where FILENUM is 1 or 2, corresponding to FILE1 or FILE2