how to rename fasta file headers using sed - sed

i know this is pretty easy but i can't get it to work. I am trying to rename the header using sed and evnethough the reg experssion works but i can't rename the fasta header. Here is an small example. I have a multi sequence fasta file something like below
>Bra000001
CTTATTTTCTCCTTCACCACCGTACCACAGAAAAAAACTGTGATTTTAAA
AGCCACATTTACTTCTTTTTTTGTTGGGTCTAAATGTTAAAATAACATGT
>Bra000002
TTTATGTAGTACTGGACTAATCGGGTAGGGAAACAATCTTGATTTAGCAA
TACAGTGTAATAACTAATAATCATATTCATATTCCATAAATCCAAATGTT
Now i just want to add "Brassica rapa" at the end of fasta header like this
>Bra000001 Brassica rapa
CTTATTTTCTCCTTCACCACCGTACCACAGAAAAAAACTGTGATTTTAAA
AGCCACATTTACTTCTTTTTTTGTTGGGTCTAAATGTTAAAATAACATGT
>Bra000002 Brassica rapa
TTTATGTAGTACTGGACTAATCGGGTAGGGAAACAATCTTGATTTAGCAA
TACAGTGTAATAACTAATAATCATATTCATATTCCATAAATCCAAATGTT
I am doing this for making it work
grep ">" in.fa | sed 's/$/ Brassica rapa/' > out.fa
However by doing this i can only change the headers but no sequence info here. Ideally i want to both change the header and keep the sequence as it is.

You can use only sed with its substitute command, checking if the line begins with > character, group the whole line and append your string at the end, like:
sed 's/^\(>.*\)$/\1 Brassica rapa/' infile
It yields:
>Bra000001 Brassica rapa
CTTATTTTCTCCTTCACCACCGTACCACAGAAAAAAACTGTGATTTTAAA
AGCCACATTTACTTCTTTTTTTGTTGGGTCTAAATGTTAAAATAACATGT
>Bra000002 Brassica rapa
TTTATGTAGTACTGGACTAATCGGGTAGGGAAACAATCTTGATTTAGCAA
TACAGTGTAATAACTAATAATCATATTCATATTCCATAAATCCAAATGTT

awk does this nice and simple.
awk '/^>/ {$0=$0 " Brassica rapa"}1' in.fa >out.fa
>Bra000001 Brassica rapa
CTTATTTTCTCCTTCACCACCGTACCACAGAAAAAAACTGTGATTTTAAA
AGCCACATTTACTTCTTTTTTTGTTGGGTCTAAATGTTAAAATAACATGT
>Bra000002 Brassica rapa
TTTATGTAGTACTGGACTAATCGGGTAGGGAAACAATCTTGATTTAGCAA
TACAGTGTAATAACTAATAATCATATTCATATTCCATAAATCCAAATGTT

sed '/^>/ s/$/ Brassica rapa/' YourFile
or
sed 's/^>.*/& Brassica rapa/' YourFile

Related

Select line matching pattern +1

How do I use sed to select every line matching a pattern + the next line?
For instance, I'd like to select all lines with tag="foo" plus the next line.
As an alternative, I'd also like to be able to select lines with tag="foo" OR group="bar" plus the next line.
This might work for you (GNU sed):
sed -En '/tag="foo"|group="bar"/,+1p' file
Turn on extended regexp -E and off implicit printing -n.
Match the alternation of tag="foo" or group="bar" and print the range +1 line(s).
Alternative:
sed '/tag="foo"\|group="bar"/!d;n' file
To always print 2 lines, use:
sed -n '/tag="foo"\|group="bar"/{N;p}' file

Replace new line character by comma in SED

I have the following csv data in a file.
14,95884250,ENSG00000176438,C,T,A
CCAATCAGA
14,95884250,ENSG00000176438,C,T,A
CAATCAGAG
I would like to replace alternate new line character by ',' (preferably using 'sed'). The desired output is below.
14,95884250,ENSG00000176438,C,T,A,CCAATCAGA
14,95884250,ENSG00000176438,C,T,A,CAATCAGAG
give this awk one-liner a try:
awk 'NR%2{printf "%s,",$0;next}{print}' file
This might work for you (GNU sed):
sed 'N;s/\n/,/' file
Append the next line and replace the newline by a comma.

sed: How do I delete the first 100 lines of a text file

I would like to delete the first 100 lines of a text file using sed. I know how delete to the first line by using:
sed '1d' filename
or the 100th line by typing
sed '100d' filename
How do I specify a range? I thought something like this would work:
sed '1:100d' filename
However, this obviously didn't work. Can someone show me how to specify a range? Thanks in advance for your help.
This should work in gnu sed
sed '1,100d' file
awk can also be used to print data based on conditions related to rows.
Like: Following will print the lines (Records in terms of awk) whose number is greater than 100.
awk 'NR>100' inputfile
One can also use other conditions like:
awk 'NR==100' inpuftile #this will print the 100th line
awk 'NR<100' inputfile #this will print 1-99th line
awk 'NR>100' inputfile #this will print from 101st line onwards
awk 'NR>=100' inputfile #this will print from 100th onwards
try: following too:
sed -n '1,100p' Input_file

sed multiple pattern matches in a line

I'm trying to write a sed command to convert lines:
<http://dbpedia.org/resource/BoA> <http://dbpedia.org/ontology/wikiPageWikiLink> <http://dbpedia.org/resource/Ne-Yo> .
<http://dbpedia.org/resource/BoA> <http://dbpedia.org/ontology/wikiPageWikiLink> <http://dbpedia.org/resource/Tablo> .
to
BoA, Ne-Yo
BoA, Tablo
I know how to match and print using /(/) but I can't find a way to print two matches.
Using awk you can do:
awk -F"[/>]" '/http/ {print $5 ", " $15}' file
BoA, Ne-Yo
BoA, Tablo
Use parentheses and then \1 to print the first match, \2 to print the second match, and so on.
sed 's|<http://dbpedia.org/resource/\([^>]\+\)> <[^>]\+> <http://dbpedia.org/resource/\([^>]\+\)>.*|\1,\2|g' input.txt
A little verbose, though. Put your text into input.txt file.
Less verbose, but also less accurate than #rendon's solution:
sed -e 's?.*/resource/\([^>]*\)>.*/resource/\([^>]*\).*?\1, \2?' input.txt
If it's good enough then this is more readable.
This might work for you (GNU sed):
sed -r 's|[^>]*/([^>]*)>.*/([^>]*).*|\1, \2|' file

repeat string in a line using sed

I would like to repeat each line's content of a file, any quick solution using sed.
supposed the input file is
abc def 123
The expected ouput is:
abcabc defdef 123123
sed 's \(.*\) \1\1 ' infile
This might work for you:
echo -e 'aaA\nbbB\nccC' | sed 's/.*/&&/'
aaAaaA
bbBbbB
ccCccC
sed 'h;G;s/\n//' file.txt
It's even simpler if you take advantage of the variable \0, which holds the string that was matched:
sed 's/.*/\0\0/'
Try this
awk '{print $0$0}' temp.txt