Remove string between dash (-) and the first dot (.) - sed

I have many web addresses which are including some special interface names, which I would like to remove. Examples:
aaaaaaa-INT1.aaaa.aaaa.com
bbbbbbb-INT2.bbbb.bbbb.com
ccccccc-INT.cccc.cccc.com
So my expected result after sed should be:
aaaaaaa.aaaa.aaaa.com
bbbbbbb.bbbb.bbbb.com
ccccccc.cccc.cccc.com
I have tried this, but it doesnt work:
sed 's/-.*^.//'
Any suggestion please?

To remove the first dash and everything before the first period:
$ sed 's/-[^.]*//' file
aaaaaaa.aaaa.aaaa.com
bbbbbbb.bbbb.bbbb.com
ccccccc.cccc.cccc.com

Solution 1st: Following sed may help you on same too.
sed 's/\([^-]*\)-\([^.]*\)\(.*\)/\1\3/' Input_file
Solution 2nd: With awk.
awk -F"." '{sub(/-.*/,"",$1)} 1' OFS="." Input_file

Related

sed with vertical bar?

I have a list
>ANARCI-HMM_human_167.7|pdb|7EPU|A
>ANARCI-HMM_alpaca_173.7|pdb|7EVY|E
>ANARCI-HMM_alpaca_172.8|pdb|7F2O|S
>ANARCI-HMM_alpaca_171.8|pdb|7F4F|S
>ANARCI-HMM_alpaca_173.6|pdb|7F8W|D
I want to remove from ANARCI to the first vertical bar |.
expecting
>pdb|7EPU|A
>pdb|7EVY|E
>pdb|7F2O|S
>pdb|7F4F|S
>pdb|7F8W|D
I tried
sed 's/ANARCI.*\|//g'
but didn't work.
Do you have any idea how to sed in this case?
Using sed
$ sed 's/[A-Z][^|]*|//' input_file
>pdb|7EPU|A
>pdb|7EVY|E
>pdb|7F2O|S
>pdb|7F4F|S
>pdb|7F8W|D
If you want to remove from ANARCIat the first vertical bar |, try this:
sed 's/ANARCI[^|]*\|//g'
or
sed 's/ANARCI[^|]*\|(.*)/\1\2/'
1st solution: With your shown samples, please try following sed code.
sed -E 's/(.*)ANARCI[^|]*\|(.*)/\1\2/' Input_file
Explanation: Adding detailed explanation for above sed code.
Using -E option of sed to enable ERE(extended regular expression) for program.
Then using sed's capability of storing matched patterns into temporary buffer memory(called capturing groups), by which we can make use of caught values while substitution.
Creating 2 capturing groups here, 1st which has everything before ANARCI string and 2nd capturing group which has everything after first pipe(matching from ANARCI to till first pipe) to get rest of part after first pipe.
While performing substitution substituting line with 1st and 2nd capturing group.
2nd solution: You could use awk for this task also, use match function of awk. Simple explanation would be, using match function of awk and matching only part which you don't required in output, while printing the values printing everything else apart from matched part(which is not required).
awk 'match($0,/ANARCI[^|]*/){print substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH+1)}' Input_file
3rd solution: Adding 1 more solution in awk, where setting field separators to: from string ANARCI to till first occurrence of pipe. Then in main awk program printing 1st and last field, required values as per shown samples.
awk -v FS="ANARCI[^\\\\|]*\\\\|" '{print $1 $NF}' Input_file
Try:
sed 's/ANARCI[^|]*\|//g'
to not match the |

Substring file name in Unix using sed command

I want to substring the File name in unix using sed command.
File name : Test_Test1_Test2_10082019_030013.csv.20191008-075740
I want the characters after the 3rd underscore or (all the characters after Test2 ) i need to be printed .
Can this be done using sed command?
I have tried this command
sed 's/^.*_\([^_]*\)$/\1/' <<< 'Test_Test1_Test2_10082019_030013.csv.20191008-075740'
but this is giving result as 030013.csv.20191008-075740
I need it from 10082019_030013.csv.20191008-075740
Thanks
Neha
To remove from the beginning up to including the 3rd underscore you can use
sed 's/^\([^_]*_\)\{3\}//' <<< 'Test_Test1_Test2_10082019_030013.csv.20191008-075740'
This removes the initial part that consists of 3 groups of (any number of non-underscore characters followed by an underscore). The result is
10082019_030013.csv.20191008-075740
If you use GNU sed you can switch it to extended regular expressions and omit the backslashes.
sed -r 's/^([^_]*_){3}//' <<< 'Test_Test1_Test2_10082019_030013.csv.20191008-075740'
Could you please try following.
sed 's/\([^_]*\)_\([^_]*\)_\([^_]*\)_\(.*\)/\4/' Input_file
Or as per Bodo's nice suggestion:
sed 's/[^_]*_[^_]*_[^_]_\(.*\)/\1/' Input_file
This might work for you (GNU sed):
sed 's/_/\n/3;s/.*\n//;t;s/Test2/\n/;s/.*\n//;t;d' file
Replace the third _ by a newline and then remove everything upto and including the first newline. If this succeeds, bail out and print the result. Otherwise, try the same method with Test2 and if this fails delete the entire line.

find sed regex for {}, ignoring the string in it

in a text file (on linux system) I have this string:
O\WIN_INFRASTRUKTUR{Windows Fabrik}\FIM{Forefront Identity Manager(Benutzer)}\EXTRA{}
Now, I want to replace the O\WIN_INFRASTRUKTUR{Windows Fabrik}, but I don't know what is standing in {}. It could be empty or text in it.
I try this, but without success:
sed -e 's/O\\WIN_INFRASTRUKTUR{[a-zA-Z0-9]}/O\\WIFI{}/g'
And that must be the Result:
O\WIFI{}\FIM{Forefront Identity Manager(Benutzer)}\EXTRA{}
Could anyone help me?
use the delimiter as end of your pattern, here it is } so take a class excluding this, any occurrence than your delimiter with [^}]*}
sed -e 's/O\\WIN_INFRASTRUKTUR{[^}]*}/O\\WIFI{}/g' YourFile
sed -e 's/WIN_INFRASTRUKTUR{[^}]*}/WIFI{}/g' <filename>
Thanks, it will be sucessful, but what is, if I want to have this result:
O\WIFI{}\EXTRA{}.
It doesn't matter if I do this:
sed -e 's/O\\WIN_INFRASTRUKTUR{[^}]*}\\FIM{[^}]*}/O\\WIFI{}/g'
than I get only this result: O\WIFI{}

sed: if line does not contain lower-case, add a blank line above and below

There are a number of questions here about sed to find lines that don't contain a string, but all of them seem to be about then deleting those lines. I want to keep mine, with a blank line added above and below.
Try doing this :
$ sed '/[[:lower:]]/!{a
i
}' file.txt
Here is an awk solution:
awk '!/[[:lower:]]/ {$0=RS$0RS}1' file
If line does not have any lower characters, add Record Selector (newline) before and after line, then print.
This might work for you (GNU sed):
sed '/[[:lower:]]/b;x;p;x;G' file

One-liners to remove lines in which a specific character appears more than x times

I think the title says it all, I'm looking for a one-liner to remove lines of a file in which a specific character, let's say /, appears more than x times - 5, for instance.
Start:
/Bo/byl/apointe
S/ta/ck/ov/er/flo/w
M/oon/
Expected result:
/Bo/byl/apointe
M/oon/
Thank you for your suggestions !
You can use gsub function of awk. gsub return number of successful substitution made. So you can use that as reference to identify number of occurrences of particular character.
awk 'gsub(/\//,"&")<5' file
Updated Based on Ed Morton's suggestion.
This might work for you (GNU sed):
sed 's|/|&|5;T;d' file
All you need is:
awk -F/ 'NF<6' file
Look:
$ cat file
/Bo/byl/apointe
S/ta/ck/ov/er/flo/w
M/oon/
$ awk -F/ 'NF<6' file
/Bo/byl/apointe
M/oon/
I believe sed would be sufficient here. You'll want to look into //d and supply the correct condition. I'm going to try something and update when I have better ideas, you should too :)
Once you find it sed -i /{blah}/d will be enough to change it in the file, but you might want to run it without the -i and pipe it through less first to confirm it's doing what you think it's doing.
This would do :
sed -r '/(\/.*){5}\//d' file