SED Command to remove first digits and spaces of each line - sed

I have a simple text file in below format.
1 12658003Y
2 34345345N
3 34653785Y
4 36452342N
5 86747488Y
6 34634543Y
so on
10 37456338Y
11 33535555Y
12 37456378Y
so on
100 23432434Y
As you can see there are two white spaces after first number.
I'm trying to write SED command to remove the digits before whitespaces. Is there any SED command to remove spaces and number before spaces?
Output file should look like below.
12658003Y
34345345N
34653785Y
36452342N
so on..
Please assist. I'm very new to shell scripting.

sed 's/[0-9]\+\s\+//' infile > outfile
Explanation:
s: we want to use substitution
/: mark start and end of the expression we want to match
[0-9]: match any digit
+: match the previous one or more time
\s: space
+: match the previous one or more time
/: mark start of what we want to change our matches to (which is nothing)
/: some special operators goes after this (we use no such)
infile: the file we want to change
>: pipe stdout to
outfile: where we want to store output

Your sed command would be,
sed 's/.* //g' file
This would remove the first numbers along with the space followed.

Remove leading digits, then following spaces:
sed 's/^[0-9]* *//' file

sed 's/^[0-9]*[ ]*//g' input.txt

Related

Delete line with specific number of characters

I have a specific file (file.txt) with several lines.
How is it possible to delete all lines that do not have 12 characters, using sed?
Use an interval expression to specify the exact number of characters you want to match between the beginning (^) and end ($) of the input record.
sed '/^.\{12\}$/!d' file
Not sure why you would use sed. This is much cleaner in awk:
awk 'length == 12' file.txt

Remove whitespaces till we find comma, but this should start skipping first comma in each line of a file

I am in the learning phase of sed and awk commands, trying some complicated logic but couldn't get solution for the below.
File contents:
This is apple,apple.com 443,apple2.com 80,apple3.com 232,
We talk on 1 banana,banana.com 80,banannna.com 23,
take 5 grape,grape5.com 23,
When I try with
$ cat sample.txt | sed -e 's/[[:space:]][^,]*,/,/g'
,apple.com,apple2.com,apple3.com,
,banana.com,banannna.com,
,grape5.com,
is ok but I want to skip this sed for the first comma in each line, so expected output is
This is apple,apple.com,apple2.com,apple3.com,
We talk on 1 banana,banana.com,banannna.com,
take 5 grape,grape5.com,
Any help is appreciated.
If you are using GNU sed, you can do something like
sed -e 's/[[:space:]][^,]*,/,/2g' file
where the 2g specifies something like start the substitution from the 2nd occurrence and g for doing it subsequently to the rest of the occurrences.
The output for the above command.
sed -e 's/[[:space:]][^,]*,/,/2g' file
This is apple,apple.com,apple2.com,apple3.com,
We talk on 1 banana,banana.com,banannna.com,
take 5 grape,grape5.com,
An excerpt from the man page of GNU sed
g
Apply the replacement to all matches to the regexp, not just the first.
number
Only replace the numberth match of the regexp.
awk '{gsub(/[ ]+/," ")gsub(/com [0-9]+/,"com")}1' file
This is apple,apple.com,apple2.com,apple3.com,
We talk on 1 banana,banana.com,banannna.com,
take 5 grape,grape5.com,
The first gsub removes extra space and the next one takes away unwanted numbers between com and comma.

Editing Uniref FASTA header ID

I'm trying to get the FASTA header of Uniref FASTA files to be in ”>ref|myid|seq definition” form. I know they are using sed command to work on it.
Header of the Uniref FASTA.
">UniRef100_Q6GZX4 Putative transcription factor 001R n=1 Tax=Frog virus 3
(isolate Goorha) RepID=001R_FRG3G
MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPS
EKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLD"
To be as:
">UniRef100|Q6GZX4|Putative transcription factor 001R n=1 Tax=Frog virus 3
(isolate Goorha) RepID=001R_FRG3G
MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPS
EKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLD"
Hope to get some clues on it. Thanks
Try this with GNU sed to replace first _ by | and first whitespace by |:
sed 's/_/|/;s/ /|/' file > new_file
or this to edit file:
sed -i 's/_/|/;s/ /|/' file
Here is something using perl:
cat your-fasta-file | perl -pe 's:^(.+?)_(.+?) :\1|\2|:'
What the regular expression which after the 's: does is find the smallest match from the beginning of the line up to the first underscore ^(.+?)_
and then finds the smallest match up to the next space (.+?) and then it puts | after that first matched pattern \1 and the second-matched pattern \2. The colons are what I use to delimit the pattern to search for and the pattern to replace with.

Insert newline after pattern with changing number in sed

I want to insert a newline after the following pattern
lcl|NC_005966.1_gene_750
While the last number(in this case the 750) changes. The numbers are in a range of 1-3407.
How can I tell sed to keep this pattern together and not split them after the first number?
So far i found
sed 's/lcl|NC_005966.1_gene_[[:digit:]]/&\n/g' file
But this breaks off, after the first digit.
Try:
sed 's/lcl|NC_005966.1_gene_[[:digit:]]*/&\n/g' file
(note the *)
Alternatively, you could say:
sed '/lcl|NC_005966.1_gene_[[:digit:]]/G' file
which would add a newline after the specified pattern is encountered.
sed 's/lcl|NC_005966\.1_gene_[[:digit:]][[:digit:]]*/&\
/g' file
You need to escape . as it's an RE metacharacter, and you need [[:digit:]][[:digit:]]* to represent 1-or-more digits and you need to use \ followed by a literal newline for portability across seds.

Finding lines which are greater than 120 characters length using sed

I want to get a list of lines in a batch file which are greater than 120 characters length. For this I thought of using sed. I tried but I was not successful. How can i achieve this ?
Is there any other way to get a list other than using sed ??
Thanks..
Another way to do this using awk:
cat file | awk 'length($0) > 120'
You can use grep and its repetition quantifier:
grep '.\{120\}' script.sh
Using sed, you have some alternatives:
sed -e '/.\{120\}/!d'
sed -e '/^.\{,119\}$/d'
sed -ne '/.\{120\}/p'
The first option matches lines that don't have (at least) 120 characters (the ! after the expression is to execute the command on lines that don't match the pattern before it), and deletes them (ie. doesn't print them).
The second option matches lines that from start (^) to end ($) have a total of characters from zero to 119. These lines are also deleted.
The third option is to use the -n flag, which tells sed to not print lines by default, and only print something if we tell it to. In this case, we match lines that have (at least) 120 characters, and use p to print them.