sed with vertical bar? - sed

I have a list
>ANARCI-HMM_human_167.7|pdb|7EPU|A
>ANARCI-HMM_alpaca_173.7|pdb|7EVY|E
>ANARCI-HMM_alpaca_172.8|pdb|7F2O|S
>ANARCI-HMM_alpaca_171.8|pdb|7F4F|S
>ANARCI-HMM_alpaca_173.6|pdb|7F8W|D
I want to remove from ANARCI to the first vertical bar |.
expecting
>pdb|7EPU|A
>pdb|7EVY|E
>pdb|7F2O|S
>pdb|7F4F|S
>pdb|7F8W|D
I tried
sed 's/ANARCI.*\|//g'
but didn't work.
Do you have any idea how to sed in this case?

Using sed
$ sed 's/[A-Z][^|]*|//' input_file
>pdb|7EPU|A
>pdb|7EVY|E
>pdb|7F2O|S
>pdb|7F4F|S
>pdb|7F8W|D

If you want to remove from ANARCIat the first vertical bar |, try this:
sed 's/ANARCI[^|]*\|//g'
or
sed 's/ANARCI[^|]*\|(.*)/\1\2/'

1st solution: With your shown samples, please try following sed code.
sed -E 's/(.*)ANARCI[^|]*\|(.*)/\1\2/' Input_file
Explanation: Adding detailed explanation for above sed code.
Using -E option of sed to enable ERE(extended regular expression) for program.
Then using sed's capability of storing matched patterns into temporary buffer memory(called capturing groups), by which we can make use of caught values while substitution.
Creating 2 capturing groups here, 1st which has everything before ANARCI string and 2nd capturing group which has everything after first pipe(matching from ANARCI to till first pipe) to get rest of part after first pipe.
While performing substitution substituting line with 1st and 2nd capturing group.
2nd solution: You could use awk for this task also, use match function of awk. Simple explanation would be, using match function of awk and matching only part which you don't required in output, while printing the values printing everything else apart from matched part(which is not required).
awk 'match($0,/ANARCI[^|]*/){print substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH+1)}' Input_file
3rd solution: Adding 1 more solution in awk, where setting field separators to: from string ANARCI to till first occurrence of pipe. Then in main awk program printing 1st and last field, required values as per shown samples.
awk -v FS="ANARCI[^\\\\|]*\\\\|" '{print $1 $NF}' Input_file

Try:
sed 's/ANARCI[^|]*\|//g'
to not match the |

Related

How to replace a specific character in bash

I want to replace '_v' with a whitespace and the last dot . into a dash "-". I tried using
sed 's/_v/ /' and tr '_v' ' '
Original Text
src-env-package_v1.0.1.18
output
src-en -package 1.0.1.18
Expected Output
src-env-package 1.0.1-18
This might work for you (GNU sed):
sed -E 's/(.*)_v(.*)\./\1 \2-/' file
Use the greed of the .* regexp to find the last occurrence of _v and likewise . and substitute a space for the former and a - for the latter.
If one of the conditions may occur but not necessarily both, use:
sed -E 's/(.*)_v/\1 /;s/(.*)\./\1-/' file
With your shown samples please try following sed code. Using sed's capability to store matched regex values into temp buffer(called capturing groups) here. Also using -E option here to enable ERE(extended regular expressions) for handling regex in better way.
Here is the Online demo for used regex.
sed -E 's/^(src-env-package)_v([0-9]+\..*)\.([0-9]+)$/\1 \2-\3/' Input_file
OR if its a variable value on which you want to run sed command then use following:
var="src-env-package_v1.0.1.18"
sed -E 's/^(src-env-package)_v([0-9]+\..*)\.([0-9]+)$/\1 \2-\3/' <<<"$var"
src-env-package 1.0.1-18
Bonus solution: Adding a perl one-liner solution here, using capturing groups concept(as explained above) in perl and getting the values as per requirement.
perl -pe 's/^(src-env-package)_v((?:[0-9]+\.){1,}[0-9]+)\.([0-9]+)$/\1 \2-\3/' Input_file

GREP Print Blank Lines For Non-Matches

I want to extract strings between two patterns with GREP, but when no match is found, I would like to print a blank line instead.
Input
This is very new
This is quite old
This is not so new
Desired Output
is very
is not so
I've attempted:
grep -o -P '(?<=This).*?(?=new)'
But this does not preserve the second blank line in the above example. Have searched for over an hour, tried a few things but nothing's worked out.
Will happily used a solution in SED if that's easier!
You can use
#!/bin/bash
s='This is very new
This is quite old
This is not so new'
sed -En 's/.*This(.*)new.*|.*/\1/p' <<< "$s"
See the online demo yielding
is very
is not so
Details:
E - enables POSIX ERE regex syntax
n - suppresses default line output
s/.*This(.*)new.*|.*/\1/ - finds any text, This, any text (captured into Group 1, \1, and then any text again, or the whole string (in sed, line), and replaces with Group 1 value.
p - prints the result of the substitution.
And this is what you need for your actual data:
sed -En 's/.*"user_ip":"([^"]*).*|.*/\1/p'
See this online demo. The [^"]* matches zero or more chars other than a " char.
With your shown samples, please try following awk code.
awk -F'This\\s+|\\s+new' 'NF==3{print $2;next} NF!=3{print ""}' Input_file
OR
awk -F'This\\s+|\\s+new' 'NF==3{print $2;next} {print ""}' Input_file
Explanation: Simple explanation would be, setting This\\s+ OR \\s+new as field separators for all the lines of Input_file. Then in main program checking condition if NF(number of fields) are 3 then print 2nd field (where next will take cursor to next line). In another condition checking if NF(number of fields) is NOT equal to 3 then simply print a blank line.
sed:
sed -E '
/This.*new/! s/.*//
s/.*This(.*)new.*/\1/
' file
first line: lines not matching "This.*new", remove all characters leaving a blank line
second lnie: lines matching the pattern, keep only the "middle" text
this is not the pcre non-greedy match: the line
This is new but that is not new
will produce the output
is new but that is not
To continue to use PCRE, use perl:
perl -lpe '$_ = /This(.*?)new/ ? $1 : ""' file
This might work for you:
sed -E 's/.*This(.*)new.*|.*/\1/' file
If the first match is made, the line is replace by everything between This and new.
Otherwise the second match will remove everything.
N.B. The substitution will always match one of the conditions. The solution was suggested by Wiktor Stribiżew.

Substring file name in Unix using sed command

I want to substring the File name in unix using sed command.
File name : Test_Test1_Test2_10082019_030013.csv.20191008-075740
I want the characters after the 3rd underscore or (all the characters after Test2 ) i need to be printed .
Can this be done using sed command?
I have tried this command
sed 's/^.*_\([^_]*\)$/\1/' <<< 'Test_Test1_Test2_10082019_030013.csv.20191008-075740'
but this is giving result as 030013.csv.20191008-075740
I need it from 10082019_030013.csv.20191008-075740
Thanks
Neha
To remove from the beginning up to including the 3rd underscore you can use
sed 's/^\([^_]*_\)\{3\}//' <<< 'Test_Test1_Test2_10082019_030013.csv.20191008-075740'
This removes the initial part that consists of 3 groups of (any number of non-underscore characters followed by an underscore). The result is
10082019_030013.csv.20191008-075740
If you use GNU sed you can switch it to extended regular expressions and omit the backslashes.
sed -r 's/^([^_]*_){3}//' <<< 'Test_Test1_Test2_10082019_030013.csv.20191008-075740'
Could you please try following.
sed 's/\([^_]*\)_\([^_]*\)_\([^_]*\)_\(.*\)/\4/' Input_file
Or as per Bodo's nice suggestion:
sed 's/[^_]*_[^_]*_[^_]_\(.*\)/\1/' Input_file
This might work for you (GNU sed):
sed 's/_/\n/3;s/.*\n//;t;s/Test2/\n/;s/.*\n//;t;d' file
Replace the third _ by a newline and then remove everything upto and including the first newline. If this succeeds, bail out and print the result. Otherwise, try the same method with Test2 and if this fails delete the entire line.

Use sed to take all lines containing regex and append to end of file

I'm trying to come up with a sed script to take all lines containing a pattern and move them to the end of the output. This is an exercise in learning hold vs pattern space and I'm struggling to come up with it (though I feel close).
I'm here:
$ echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed -E '/foo/H; //d; $G'
hi
bar
something
yo
foo1
foo2
But I want the output to be:
hi
bar
something
yo
foo1
foo2
I understand why this is happening. It is because the first time we find foo the hold space is empty so the H appends \n to the blank hold space and then the first foo, which I suppose is fine. But then the $G does it again, namely another append which appends \n plus what is in the hold space to the pattern space.
I tried a final delete command with /^$/d but that didn't remove the blank line (I think this is because this pattern is being matched not against the last line, but against the, now, multiline pattern space which has a \n\n in it.
I'm sure the sed gurus have a fix for me.
This might work for you (GNU sed):
sed '/foo/H;//!p;$!d;x;//s/.//p;d' file
If the line contains the required string append it to the hold space (HS) otherwise print it as normal. If it is not the last line delete it otherwise swap the HS for the pattern space (PS). If the required string(s) is now in the PS (what was the HS); since all such patterns were appended, the first character will be a newline, delete the first character and print. Delete whatever is left.
An alternative, using the -n flag:
sed -n '/foo/H;//!p;$!b;x;//s/.//p' file
N.B. When the d or b (without a parameter) command is performed no further sed commands are, a new line is read into the PS and the sed script begins with the first command i.e. the sed commands do not resume following the previous d command.
Why? Stuff like this is absolutely trivial in awk, awk is available everywhere that sed is, and the resulting awk script will be simpler, more portable, faster and better in almost every other way than a sed script to do the same task. All that hold space stuff was necessary in sed before the mid-1970s when awk was invented but there's absolutely no use for it now other than as a mental exercise.
$ echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" |
awk '/foo/{buf = buf $0 RS;next} {print} END{printf "%s",buf}'
hi
bar
something
yo
foo1
foo2
The above will work as-is in every awk on every UNIX installation and I bet you can figure out how it works very easily.
This feels like a hack and I think it should be possible to handle this situation more gracefully. The following works on GNU sed:
echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed -r '/foo/{H;d;}; $G; s/\n\n/\n/g'
However, on OSX/BSD sed, results in this odd output:
hi
bar
something
yonfoo1
foo2
Note the 2 consecutive newlines was replaced with the literal character n
The OSX/BSD vs GNU sed is explained in this article. And the following works (in GNU SED as well):
echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed '/foo/{H;d;}; $G; s/\n\n/\'$'\n''/'
TL;DR; in BSD sed, it does not accept escaped characters in the RHS of the replacement expression and so you either have to put a true LF/newline in there at the command line, or do the above where you split the sed script string where you need the newline on the RHS and put a dollar sign in front of '\n' so the shell interprets it as a line feed.

Select specific items from a file using sed

I'm very much a junior when it comes to the sed command, and my Bruce Barnett guide sits right next to me, but one thing has been troubling me. With a file, can you filter it using sed to select only specific items? For example, in the following file:
alpha|november
bravo|october
charlie|papa
alpha|quebec
bravo|romeo
charlie|sahara
Would it be possible to set a command to return only the bravos, like:
bravo|october
bravo|romeo
With sed:
sed '/^bravo|/!d' filename
Alternatively, with grep (because it's sort of made for this stuff):
grep '^bravo|' filename
or with awk, which works nicely for tabular data,
awk -F '|' '$1 == "bravo"' filename
The first two use a regular expression, selecting those lines that match it. In ^bravo|, ^ matches the beginning of the line and bravo| the literal string bravo|, so this selects all lines that begin with bravo|.
The awk way splits the line across the field separator | and selects those lines whose first field is bravo.
You could also use a regex with awk:
awk '/^bravo|/' filename
...but I don't think this plays to awk's strengths in this case.
Another solution with sed:
sed -n '/^bravo|/p' filename
-n option => no printing by default.
If line begins with bravo|, print it (p)
2 way (at least) with sed
removing unwanted line
sed '/^bravo\|/ !d' YourFile
Printing only wanted lines
sed -n '/^bravo\|/ p' YourFile
if no other constraint or action occur, both are the same and a grep is better.
If there will be some action after, it could change the performance where a d cycle directly to the next line and a p will print then continue the following action.
Note the escape of pipe is needed for GNU sed, not on posix version