delete strings from text file - sed

I Have the file with the text strings for nucleotides (A, C, G, T). I would like to find specifics strings from a text file and delete them.
For example:
ACTGGGCTGTCCAACTG
ACTTCTGGGTCGAACTG
CCCACTTCTGGGTTCAA
And I would like to delete from all lines only this parts ACT and GGG
Then I will get the file with this strings:
CTGTCCAACTG
TCTTCGAACTG
CCCTCTTTCAA

sed can help you:
sed 's/ACT//g; s/GGG//g' inputFile
i.e. replace all occurrences of ACT and GGG with an empty string.

You can try:
awk '{gsub(/ACT|GGG/,"")}1' file

Using sed
sed -r 's/(ACT|GGG)//g' file

perl -pe 's/ACT|GGG//g' your_file

Related

sed is replacing whole line instead of just the string

I want to just replace few strings in file with nothing, but sed replaces the whole line. Can someone help me with this?
line in file.xml:
<tag>sample text1 text2</tag>
My code:
sed "s/'text1 text2'//" file.xml 2>/dev/null || :
I also tried
sed -i -e "s/'text1 text2'//" file.xml 2>/dev/null || :
expected result:
<tag>sample</tag>
Actual result:
The whole line is removed from file.
Others:
text1 and text 2 are complex text with .=- characters in it
What can I do to fix this?
TIA
Remove the single quotes:
sed "s/text1 text2//" file.xml
You could use
sed 's/\([^ ]*\)[^<]*\(.*\)/\1\2/' filename
Output:
<tag>sample</tag>
Grouping is used. First all characters till a space are grouped together, then all characters till a < are matched and all following characters are grouped into another group.

How can i do this using sed command?

Problem : Cannot insert a text using sed
content of file
aa=
i want to add a text after aa= using sed?
the output should be like below
aa=testing
The following should do it:
sed 's/aa=/aa=testing/'
You can try awk if you like.
awk '/aa=/ {$0=$0"testing"}1' file
If you like to make sure it only replace line that only contains aa= and nothing more, do:
awk '/^aa=$/ {$0=$0"testing"}1' file

Append text to a line on multiple conditions

I am very new to sed so please bear with me... I have a file with contents like
a=1
b=2,3,4
c=3
d=8
.
.
I want to append 'x' to a line which starts with 'c=' and does not contain an 'x'. What I am using right now is
sed -i '/^c=/ s/$/x/'
but this does not cover the second part of my explanation, the 'x' should only be appended if the line did not have it already and hence if I run the command twice it makes the line "c=3xx" which I do not want.
Any help here would be highly appreciated and I know there are a lot of sharp heads around here :) I understand that this can be handled pretty easily through bash but using sed here is a hard requirement.
You can do something like this:
sed -i '/^c=/ {/x/b; s/$/x/}'
Curly brackets are used for grouping. The b command branches to the end of the script (stops the processing of the current line).
b label
Branch to label; if label is omitted, branch to end of script.
Edit: as William Pursell suggests in the comment, a shorter version would be
sed -i '/^c=/ { /x/ !s/$/x/ }'
awk is probably a better choice here as you can easily combine regular expression matches with logical operators. Given the input:
$ cat file
a=1
b=2,3,4
c=3
c=x
c=3
d=8
The command would be:
$ awk '/^c=/ && !/x/ {$0=$0"x"; print $0}' file
a=1
b=2,3,4
c=3x
c=x
c=3x
d=8
Where $0 is the awk variable that contains the current line being read.
This might work for you (GNU sed):
sed -i '/^c=[^x]*$/s/$/x/' file
or:
sed -i 's/^c=[^x]*$/&x/' file

sed remove multiple characters surrounded by digits

I have a file with following contents:
EMAIL|TESTNUMBER|DATE
somemail#address.com|123456789|2011-02-08T16:36:02Z
How do I remove capital letters T between the date and time and Z at the end of the line using sed?
Thanks!
If the format is fixed and each line always matches T\d\d:\d\d:\d\dZ, then you could try the simple:
$ sed 's/T\(..:..:..\)Z$/ \1/'
(Untested)
Perhaps there's a fancier way, but the following script works for me:
s/\(....-..-..\)T\(.*\)/\1 \2/
s/Z$//
Example...in-bound file:
somemail#address.com|123456789|2011-02-08A16:36:02X
somemail#address.com|123456789|2011-02-08T16:36:02Z
somemail#address.com|123456789|2011-02-08B16:36:02Y
Output:
D:\>sed -f sedscr testfile
somemail#address.com|123456789|2011-02-08A16:36:02X
somemail#address.com|123456789|2011-02-08 16:36:02
somemail#address.com|123456789|2011-02-08B16:36:02Y
Cat it through:
sed 's/\([0-9]+\)T\([0-9]+\)/\1\2//' | sed 's/Z$//'
Edit
Oh my! I've just realized (thanks #Fredrik) that for a long time I wasted processes! Shame on me! Now I'm Church of The One Process convert. Here is the blessed version of the above abominated oneliner:
sed 's/\([0-9]+\)T\([0-9]+\)/\1\2//; s/Z$//' the_file.txt

Printing text between regexps

I tried the '/pat1/,/pat2/p', but I want to print only the text between the patterns, not the whole line. How do I do that?
A pattern range is for multiline patterns. This is how you'd do that:
sed -n '/pat1/,/pat2/{/pat1\|pat2/!p}' inputfile
-n - don't print by default
/pat1/,/pat2/ - within the two patterns inclusive
/pat1\|pat2/!p - print everything that's not one of the patterns
What you may be asking for is what's between two patterns on the same line. One of the other answers will do that.
Edit:
A couple of examples:
$ cat file1
aaaa bbbb cccc
123 start 456
this is what
I want
789 end 000
xxxx yyyy zzzz
$ sed -n '/start/,/end/{/start\|end/!p}' file1
this is what
I want
You can shorten it by telling sed to use the most recent pattern again (//):
$ sed -n '/.*start.*/,/^[0-9]\{3\} end 0*$/{//!p}' file1
this is what
I want
As you can see, I didn't have to duplicate the long, complicated regex in the second part of the command.
sed -r 's/pat1(.*)pat2/\1/g' somefile.txt
I don't know the kind of pattern you used, but i think it is also possible with regular expressions.
cat myfile | sed -r 's/^(.*)pat1(.*)pat2(.*)$/\2/g'
you can use awk.
$ cat file
other TEXT
pat1 text i want pat2
pat1 TEXT I
WANT
pat2
other text
$ awk -vRS="pat2" 'RT{gsub(/.*pat1/,"");print}' file
text i want
TEXT I
WANT
The solution works for patterns that span multiple lines