Delete a matching pattern with sed - sed

I try to delete all occcurence of a word from my xml file. The pattern I would like to delete is something like below:
& lt;foo_bar>300</foo_bar& gt;
I am not familiar with sed, but I know it's feasible using it. I tried something like :
sed 's^&lt[foo_bar]>$g' myfile.xml
or
sed 's/^&lt[foo_bar]>$//' myfile.xml
both failed with an error message. So could you please help me how to figure out this? OS is Solaris 10 so most likely standart version is sed installed not GNU one. Please ignore space after & sign in the expression. There is no space in actual expression.
Thanks

At least, the way you are using character classes [foo_bar] is wrong. [foo_bar] can match one of f,o,b,a,r,_ only once. And you seem to have no attempt at matching /. The first expression you have lacks regex delimiters. sed will assume you are using ^ as the delimiter but then it lacks the corresponding delimiters as in s^find^replace^g.
This seems to work:
sed 's!<foo_bar>[^&]*</foo_bar>!!g' input

This might work for you (GNU sed):
sed -r 's/&\s*lt;(foo_bar&)gt;[0-9]+<\/\1\s*gt;//g' file

Related

Using sed to convert singular/plural words into uppercase

Using one sed command I'm trying to convert all occurrences of test and tests found in a .txt file into all caps. I also want to print only the converted lines, so I'm using -n. I've been playing around for it for over an hour. The problem is that I'm able to convert one or the other (either test or tests) but not both.
Any help would be so greatly appreciated. Thank you!
Use this
sed -e 's/tests/TESTS/g; s/test/TEST/g; T; p;' input.txt
The semicolons let you execute multiple commands.
This might work for you (GNU sed):
sed 's/\<tests\?\>/\U&/gp;d' file
This will uppercase words (\<....\>) that begin test with an optional s (s\?).
Sorry for the late response, but here is hopefully an understandable one with basic regex (no extended regex):
sed 's:\<test\(s*\)\>:TEST\1:g' < inputFile.txt > outputFile.txt; cat outputFile.txt | grep -n TEST
Explanation:
: delimiter (instead of usual /)
\<test\> matches test. The character before the first t can be any character except a letter, number or underscore. Same applies for the character after the last t.
\(\) remember what is inside the parenthesis.
s* match zero or more s's.
\1 used to insert first remembered match (i.e. any number of s's matched).
The rest is hopefully clear. Otherwise leave a comment.

Sed replace error

I have a pattern I am trying to match:
<x>anything</x>
I am trying to replace 'anything' (which can be any text, not the text anything - (.*)) with 'something' so any occurrences would become:
<x>something</x>
I am trying to use the following sed command:
sed "s/<x>.*</x>/<x>something</x>/g" file
I am getting the following error:
sed: -e expression #1, char 19: unknown option to `s'
Can someone point me in the right direction?
This might work for you (GNU sed):
sed -r 's/(<x>)[^<]*/\1something/g' file
This looks to replace <x> and something which is not a < by <x>something repeatedly on the same line.
N.B. .* is greedy and may well swallow up further tags on the same line.
The slashes in the closing XML tags are confusing it. Try escaping them like this:
sed "s/<x>.*<\/x>/<x>something<\/x>/g" file
You can apparently also use an equals sign which I'd never seen before. I'll be changing a bunch of scripts when I get to work!

One-liners to remove lines in which a specific character appears more than x times

I think the title says it all, I'm looking for a one-liner to remove lines of a file in which a specific character, let's say /, appears more than x times - 5, for instance.
Start:
/Bo/byl/apointe
S/ta/ck/ov/er/flo/w
M/oon/
Expected result:
/Bo/byl/apointe
M/oon/
Thank you for your suggestions !
You can use gsub function of awk. gsub return number of successful substitution made. So you can use that as reference to identify number of occurrences of particular character.
awk 'gsub(/\//,"&")<5' file
Updated Based on Ed Morton's suggestion.
This might work for you (GNU sed):
sed 's|/|&|5;T;d' file
All you need is:
awk -F/ 'NF<6' file
Look:
$ cat file
/Bo/byl/apointe
S/ta/ck/ov/er/flo/w
M/oon/
$ awk -F/ 'NF<6' file
/Bo/byl/apointe
M/oon/
I believe sed would be sufficient here. You'll want to look into //d and supply the correct condition. I'm going to try something and update when I have better ideas, you should too :)
Once you find it sed -i /{blah}/d will be enough to change it in the file, but you might want to run it without the -i and pipe it through less first to confirm it's doing what you think it's doing.
This would do :
sed -r '/(\/.*){5}\//d' file

Replace 3 lines with another line SED Syntax

This is a simple question, I'm not sure if i'm able to do this with sed/awk
How can I make sed search for these 3 lines and replace with a line with a determined string?
<Blarg>
<Bllarg>
<Blllarg>
replace with
<test>
I tried with sed "s/<Blarg>\n<Bllarg>\n<Blllarg>/<test>/g" But it just don't seem to find these lines. Probably something with my break line character (?) \n. Am I missing something?
Because sed usually handles only one line at a time, your pattern will never match. Try this:
sed '1N;$!N;s/<Blarg>\n<Bllarg>\n<Blllarg>/<test>/;P;D' filename
This might work for you:
sed '/<Blarg>/ {N;N;s/<Blarg>\n<Bllarg>\n<Blllarg>/<test>/}' <filename>
It works as follows:
Search the file till <Blarg> is found
Then append the two following lines to the current pattern space using N;N;
Check if the current pattern space matches <Blarg>\n<Bllarg>\n<Blllarg>
If so, then substitute it with <test>
You can use range addresses with regular expressions an the c command, which does exactly what you are asking for:
sed '/<Blarg>/,/<Blllarg>/c<test>' filename

capturing groups in sed

I have many lines of the form
ko04062 ko:CXCR3
ko04062 ko:CX3CR1
ko04062 ko:CCL3
ko04062 ko:CCL5
ko04080 ko:GZMA
and would dearly like to get rid of the ko: bit of the right-hand column. I'm trying to use sed, as follows:
echo "ko05414 ko:ITGA4" | sed 's/\(^ko\d{5}\)\tko:\(.*$\)/\1\2/'
which simply outputs the original string I echo'd. I'm very new to command line scripting, sed, pipes etc, so please don't be too angry if/when I'm doing something extremely dumb.
The main thing that is confusing me is that the same thing happens if I reverse the \1\2 bit to read \2\1 or just use one group. This, I guess, implies that I'm missing something about the mechanics of piping the output of echo into sed, or that my regexp is wrong or that I'm using sed wrong or that sed isn't printing the results of the substitution.
Any help would be greatly appreciated!
sed is outputting its input because the substitution isn't matching. Since you're probably using GNU sed, try this:
echo "ko05414 ko:ITGA4" | sed 's/\(^ko[0-9]\{5\}\)\tko:\(.*$\)/\1\2/'
\d -> [0-9] since GNU sed doesn't recognize \d
{} -> \{\} since GNU sed by default uses basic regular expressions.
This should do it. You can also skip the last group and simply use, \1 instead, but since you're learning sed and regex this is good stuff. I wanted to use a non-capturing group in the middle (:? ) but I could not get that to play with sed for whatever reason, perhaps it's not supported.
sed --posix 's/\(^ko[0-9]\{5\}\)\( ko:\)\(.*$\)/\1 \3/g' file > result
And ofcourse you can use
sed --posix 's/ko://'
You don't need sed for this
Here is how you can do it with bash:
var="ko05414 ko:ITGA4"
echo ${var//"ko:"}
${var//"ko:"} replaces all "ko:" with ""
See Manipulating Strings for more info
#OP, if you just want to get rid of "ko:", then
$ cat file
ko04062 ko:CXCR3
ko04062 ko:CX3CR1
ko04062 ko:CCL3
ko04062 ko:CCL5
some text with a legit ko: this ko: will be deleted if you use gsub.
ko04080 ko:GZMA
$ awk '{sub("ko:","",$2)}1' file
ko04062 CXCR3
ko04062 CX3CR1
ko04062 CCL3
ko04062 CCL5
some text with a legit ko: this ko: will be deleted if you use gsub.
ko04080 GZMA
Jsut a note. While you can use pure bash string substitution, its only more efficient when you are changing a single string. If you have a file, especially a big file, using bash's while read loop is still slower than using sed or awk.