Replace first occurrence of a pattern if not preceded with another pattern - sed

Using GNU sed, I try to replace first occurrence of pattern in file, but I don't want to replace if there is another pattern before the match.
For example, if the file contains line with "bird [number]" I want to replace the number with "0" if this pattern has no "cat" word any where before.
Example text
dog cat - fish bird 123
dog fish - bird 1234567
dog - cat fish, lion bird 3456
Expected result:
dog cat - fish bird 123
dog fish - bird 0
dog - cat fish, lion bird 3456
I try to combine How to use sed to replace only the first occurrence in a file? and Sed regex and substring negation solutions and came up with something like
sed -E '0,/cat.*bird +[0-9]+/b;/(bird +)[0-9]+/ s//\10/'
where 0,/cat.*bird +[0-9]+/b;/(bird +)[0-9]+/ should match the first occurrence of (bird +)[0-9]+ if the cat.*bird +[0-9]+ pattern does not match, but I get
dog cat - fish bird 123
dog fish - bird 0
dog - cat fish, lion bird 0
The third line is also changed. How can I prevent it? I think it is related to address ranges, but I do not get it how to negate the second part of the address range.

This might work for you (GNU sed):
sed '/\<cat\>.*\<bird\>/b;s/\<\(bird\) \+[0-9]\+/\1 0/;T;:a;n;ba' file
If a line contains the word cat before the word bird end processing for that line.
Try to substitute the number following the word bird by zero. If not successful end processing for that line. Otherwise read/print all following lines until the end of the file.
Might also be written:
sed -E '/cat.*bird/b;/(bird +)[0-9]+/{s//\10/;:a;n;ba}' file

sed is for doing simple s/old/new replacements, that is all. For anything else just use awk, e.g. with GNU awk instead of the GNU sed you were using:
$ awk 'match($0,/(.*bird\s+)[0-9]+(.*)/,a) && (a[1] !~ /cat/) {$0=a[1] 0 a[2]} 1' file
dog cat - fish bird 123
dog fish - bird 0
dog - cat fish, lion bird 3456

Related

Join certain lines with sed

I have an input which looks like this:
1
2
3
4
5
6
And I want to transform it with sed to :
12
345
6
I know it can be easily done with other tools but I want to do it specifically with sed as a learning exercise.
I have attempted this:
sed ':x ; /^ *$/{ N; s/\n// ; bx; }'
But it prints :
123456
Can someone help me fix this?
Quoting from the GNU sed manual:
A common technique to process blocks of text such as paragraphs (instead of line-by-line) is using the following construct:
sed '/./{H;$!d} ; x ; s/REGEXP/REPLACEMENT/'
The first expression, /./{H;$!d} operates on all non-empty lines, and adds the current line (in the pattern space) to the hold space. On all lines except the last, the pattern space is deleted and the cycle is restarted.
The other expressions x and s are executed only on empty lines (i.e. paragraph separators). The x command fetches the accumulated lines from the hold space back to the pattern space. The s/// command then operates on all the text in the paragraph (including the embedded newlines).
And indeed,
sed '/./{H;$!d} ; x ; s/\n//g'
does what you want.
FWIW here's how to really do that task in UNIX:
$ awk -v RS= -v OFS= '{$1=$1}1' file
12
345
6
The above will work on any UNIX box.
A GNU awk approach:
$ awk -F"\n" '{gsub("\n","");}1' RS='\n{2,}' file
12
345
6
Note it will add a trailing newline\n after last line.

Need perl/shell script to compare 2 files

Hi I have 2 files as below, I need script to compare those and find the match. How can I achieve this?
file1 as a.txt :
Anirban
Ball
Cat
Dog
cow
file2 as b.txt :
I am Anirban
I am Ball
I am Cat_cat
I am Dog
I am cow
I am horse
I want output like this :
I am Anirban
I am Ball
I am Dog
I am cow
I tried with grep -f b a, it did not give the exact match.
Like this can be a way:
$ grep -wf a.txt b.txt
I am Anirban
I am Ball
I am Dog
I am cow
On your solution you were not using grep -w, which is convenient. Also, note you were giving the files in the opposite order.
-f is used to tell grep to obtain parameters from a file.
-w matches whole words.
Using awk
awk 'NR==FNR{a[$1];next} $NF in a' a.txt b.txt

Terminal command to find unique pairs where order does not matter

I have a Python script my_script.py which generates a list of tab-separated pairings between two elements, one for each line:
$ python my_script.py
cat dog
dog wolf
cat dog
pig chicken
dog cat
I am looking to pipe the output of this script into a terminal command of some sort that I want to filter out duplicate combinations, not just duplicate permutations. For duplicate permutations, I can use something like:
$ python my_script.py | sort | uniq
cat dog
dog cat
dog wolf
pig chicken
to remove the duplicate "cat dog".
The problem with this approach is that I am left with both "cat dog" and "dog cat", which for my purposes should be treated as the same (same combination). I know I could write another very simple Python script to perform the kind of filtering I am after, but I wanted to see whether there is an even simpler terminal command that will do the equivalent.
Here's one way using awk:
... | awk -F "\t" '!a[$1,$2]++ && !a[$2,$1]++'
Results:
cat dog
dog wolf
pig chicken
Explanation:
-F "\t" # sets the field (column) separator to a single tab character
!a[$1,$2]++ # adds column one and column two to a pseudo-multidimensional
# array if they haven't already been added to the array
!a[$2,$1]++ # does the same thing, but adds the columns in the opposite
# orientation.
Putting it altogether:
So for every line of input, the line will be printed if and only if the first two fields (in either orientation) don't exist in the array. You can read more about how to emulate a multi-dimensional array here.
Caution: script above doesn't provide any output for cases where $1==$2 . Can test via:
echo "dog dog" | awk '!a[$1,$2]++ && !a[$2,$1]++'|wc -l
Try this instead:
|awk '{if($1<$2)print $1,$2; else print $2,$1}'|sort|uniq

Extract the part enclosed by a predefined multiline character sequence

Hope the AWK gurus can provide a solution to my problem .
I have a file that goes like this :
cat cat cat cat cat cat dog rat ate dog tit
dog cat dog dog dog rat dog pat ate cat dog
I have to use AWK to extract the pattern between the first occuring c and a d .Starting from the first c a count should be kept on the number of c's and d's such that when the count matches , the part between the first c and the matched d shoud be ouput to a file including the number of the line in which the match for d occured .
In this particular example the match occurs on the seventh dog , therefore the output will have to be :
cat cat cat cat cat cat dog rat ate dog tit
dog cat dog dog dog rat d
The match can go beyond just two lines ! The output can or cannot be inclusive of the c and the d .There exists all kinds of characters inclusive of the special ones in the text !
In order for the print to occur the count has to be matched .
Thanks in advance for the replies. Suggestions are always welcome .
EDIT : The capture of the pattern between c and d can be compromised as long as the condition is met and the line number of the exit d is obtained :)
A few tips, without giving the full solution:
By default, awk considers each line as a record. The default record separator is RS="\n".
Depending on your version of awk, you may be able to set RS, the record separator, to a regex which matches either c or d. Then, for each record, you can check the RT variable, which will contain either c or d, depending on what has actually been matched. Starting from there, using a variable incremented on c, decremented on d you will be able to find the end of the match when it reaches 0.
You can then use a variable that contains your match so far, and keep concatenating RT and the new record to it, until you're done.
If you need to know the line number of the end of the match, you can set RS to a regex that either matches c, d, as previously, but also add the possibility to match \n. And by maintaining another counter variable incremented every time RT tells you that \n has been matched, you'll have your line number.
Here's a sed solution just for fun:
sed -rne ':r;$!{N;br};s/^[^c]*(.*d)[^d]*$/\1/;:a;h;s/[^cd]//g;' \
-e ':s;s/d(.*)c/c\1d/;ts;s/cd/c\nd/;T;y/c/d/;/^(d+)\n\1$/{g;i -------' \
-e 'p};g;s/d[^d]*d$/d/;ta'
This prints all satisfying sequences from longest to shortest.

Printing text between regexps

I tried the '/pat1/,/pat2/p', but I want to print only the text between the patterns, not the whole line. How do I do that?
A pattern range is for multiline patterns. This is how you'd do that:
sed -n '/pat1/,/pat2/{/pat1\|pat2/!p}' inputfile
-n - don't print by default
/pat1/,/pat2/ - within the two patterns inclusive
/pat1\|pat2/!p - print everything that's not one of the patterns
What you may be asking for is what's between two patterns on the same line. One of the other answers will do that.
Edit:
A couple of examples:
$ cat file1
aaaa bbbb cccc
123 start 456
this is what
I want
789 end 000
xxxx yyyy zzzz
$ sed -n '/start/,/end/{/start\|end/!p}' file1
this is what
I want
You can shorten it by telling sed to use the most recent pattern again (//):
$ sed -n '/.*start.*/,/^[0-9]\{3\} end 0*$/{//!p}' file1
this is what
I want
As you can see, I didn't have to duplicate the long, complicated regex in the second part of the command.
sed -r 's/pat1(.*)pat2/\1/g' somefile.txt
I don't know the kind of pattern you used, but i think it is also possible with regular expressions.
cat myfile | sed -r 's/^(.*)pat1(.*)pat2(.*)$/\2/g'
you can use awk.
$ cat file
other TEXT
pat1 text i want pat2
pat1 TEXT I
WANT
pat2
other text
$ awk -vRS="pat2" 'RT{gsub(/.*pat1/,"");print}' file
text i want
TEXT I
WANT
The solution works for patterns that span multiple lines