I have a specific file (file.txt) with several lines.
How is it possible to delete all lines that do not have 12 characters, using sed?
Use an interval expression to specify the exact number of characters you want to match between the beginning (^) and end ($) of the input record.
sed '/^.\{12\}$/!d' file
Not sure why you would use sed. This is much cleaner in awk:
awk 'length == 12' file.txt
I am in the learning phase of sed and awk commands, trying some complicated logic but couldn't get solution for the below.
File contents:
This is apple,apple.com 443,apple2.com 80,apple3.com 232,
We talk on 1 banana,banana.com 80,banannna.com 23,
take 5 grape,grape5.com 23,
When I try with
$ cat sample.txt | sed -e 's/[[:space:]][^,]*,/,/g'
,apple.com,apple2.com,apple3.com,
,banana.com,banannna.com,
,grape5.com,
is ok but I want to skip this sed for the first comma in each line, so expected output is
This is apple,apple.com,apple2.com,apple3.com,
We talk on 1 banana,banana.com,banannna.com,
take 5 grape,grape5.com,
Any help is appreciated.
If you are using GNU sed, you can do something like
sed -e 's/[[:space:]][^,]*,/,/2g' file
where the 2g specifies something like start the substitution from the 2nd occurrence and g for doing it subsequently to the rest of the occurrences.
The output for the above command.
sed -e 's/[[:space:]][^,]*,/,/2g' file
This is apple,apple.com,apple2.com,apple3.com,
We talk on 1 banana,banana.com,banannna.com,
take 5 grape,grape5.com,
An excerpt from the man page of GNU sed
g
Apply the replacement to all matches to the regexp, not just the first.
number
Only replace the numberth match of the regexp.
awk '{gsub(/[ ]+/," ")gsub(/com [0-9]+/,"com")}1' file
This is apple,apple.com,apple2.com,apple3.com,
We talk on 1 banana,banana.com,banannna.com,
take 5 grape,grape5.com,
The first gsub removes extra space and the next one takes away unwanted numbers between com and comma.
I'm trying to get the FASTA header of Uniref FASTA files to be in ”>ref|myid|seq definition” form. I know they are using sed command to work on it.
Header of the Uniref FASTA.
">UniRef100_Q6GZX4 Putative transcription factor 001R n=1 Tax=Frog virus 3
(isolate Goorha) RepID=001R_FRG3G
MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPS
EKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLD"
To be as:
">UniRef100|Q6GZX4|Putative transcription factor 001R n=1 Tax=Frog virus 3
(isolate Goorha) RepID=001R_FRG3G
MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPS
EKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLD"
Hope to get some clues on it. Thanks
Try this with GNU sed to replace first _ by | and first whitespace by |:
sed 's/_/|/;s/ /|/' file > new_file
or this to edit file:
sed -i 's/_/|/;s/ /|/' file
Here is something using perl:
cat your-fasta-file | perl -pe 's:^(.+?)_(.+?) :\1|\2|:'
What the regular expression which after the 's: does is find the smallest match from the beginning of the line up to the first underscore ^(.+?)_
and then finds the smallest match up to the next space (.+?) and then it puts | after that first matched pattern \1 and the second-matched pattern \2. The colons are what I use to delimit the pattern to search for and the pattern to replace with.
I want to insert a newline after the following pattern
lcl|NC_005966.1_gene_750
While the last number(in this case the 750) changes. The numbers are in a range of 1-3407.
How can I tell sed to keep this pattern together and not split them after the first number?
So far i found
sed 's/lcl|NC_005966.1_gene_[[:digit:]]/&\n/g' file
But this breaks off, after the first digit.
Try:
sed 's/lcl|NC_005966.1_gene_[[:digit:]]*/&\n/g' file
(note the *)
Alternatively, you could say:
sed '/lcl|NC_005966.1_gene_[[:digit:]]/G' file
which would add a newline after the specified pattern is encountered.
sed 's/lcl|NC_005966\.1_gene_[[:digit:]][[:digit:]]*/&\
/g' file
You need to escape . as it's an RE metacharacter, and you need [[:digit:]][[:digit:]]* to represent 1-or-more digits and you need to use \ followed by a literal newline for portability across seds.
I want to get a list of lines in a batch file which are greater than 120 characters length. For this I thought of using sed. I tried but I was not successful. How can i achieve this ?
Is there any other way to get a list other than using sed ??
Thanks..
Another way to do this using awk:
cat file | awk 'length($0) > 120'
You can use grep and its repetition quantifier:
grep '.\{120\}' script.sh
Using sed, you have some alternatives:
sed -e '/.\{120\}/!d'
sed -e '/^.\{,119\}$/d'
sed -ne '/.\{120\}/p'
The first option matches lines that don't have (at least) 120 characters (the ! after the expression is to execute the command on lines that don't match the pattern before it), and deletes them (ie. doesn't print them).
The second option matches lines that from start (^) to end ($) have a total of characters from zero to 119. These lines are also deleted.
The third option is to use the -n flag, which tells sed to not print lines by default, and only print something if we tell it to. In this case, we match lines that have (at least) 120 characters, and use p to print them.