Using Two Regex Strings in SED - sed

I have this text file where I need to first find a string "BEGINNING" and then find a string "HERE" after the first "BEGINNING" but only once. And there can be any amount of strings in between. This must be done with SED commands so no awk. I know I can simply do /BEGINNING/ to find the first one but I don't know how to put the two together in one SED command.

something like this?
$ sed -n '/BEGINNING/,${/HERE/{p;q}}' file
may be supported only by GNU sed, not sure.

Related

Insert specific lines from file before first occurrence of pattern using Sed

I want to insert a range of lines from a file, say something like 210,221r before the first occurrence of a pattern in a bunch of other files.
As I am clearly not a GNU sed expert, I cannot figure how to do this.
I tried
sed '0,/pattern/{210,221r file
}' bunch_of_files
But apparently file is read from line 210 to EOF.
Try this:
sed -r 's/(FIND_ME)/PUT_BEFORE\1/' test.text
-r enables extendend regular expressions
the string you are looking for ("FIND_ME") is inside parentheses, which creates a capture group
\1 puts the captured text into the replacement.
About your second question: You can read the replacement from a file like this*:
sed -r 's/(FIND_ME)/`cat REPLACEMENT.TXT`\1/' test.text
If replace special characters inside REPLACEMENT.TXT beforehand with sed you are golden.
*= this depends on your terminal emulator. It works in bash.
In https://stackoverflow.com/a/11246712/4328188 CodeGnome gave some "sed black magic" :
In order to insert text before a pattern, you need to swap the pattern space into the hold space before reading in the file. For example:
sed '/pattern/ {
h
r file
g
N
}' in
However, to read specific lines from file, one may have to use a two-calls solution similar to dummy's answer. I'd enjoy knowing of a one-call solution if it is possible though.

Manipulate characters with sed

I have a list of usernames and i would like add possible combinations to it.
Example. Lets say this is the list I have
johna
maryb
charlesc
Is there is a way to use sed to edit it the way it looks like
ajohn
bmary
ccharles
And also
john_a
mary_b
charles_c
etc...
Can anyone assist me into getting the commands to do so, any explanation will be awesome as well. I would like to understand how it works if possible. I usually get confused when I see things like 's/\.(.*.... without knowing what some of those mean... anyway thanks in advance.
EDIT ... I change the username
sed s/\(user\)\(.\)/\2\1/
Breakdown:
sed s/string/replacement/ will replace all instances of string with replacement.
Then, string in that sed expression is \(user\)\(.\). This can be broken down into two
parts: \(user\) and \(.\). Each of these is a capture group - bracketed by \( \). That means that once we've matched something with them, we can reuse it in the replacement string.
\(user\) matches, surprisingly enough, the user part of the string. \(.\) matches any single character - that's what the . means. Then, you have two captured groups - user and a (or b or c).
The replacement part just uses these to recreate the pattern a little differently. \2\1 says "print the second capture group, then the first capture group". Which in this case, will print out auser - since we matched user and a with each group.
ex:
$ echo "usera
> userb
> userc" | sed "s/\(user\)\(.\)/\2\1/"
auser
buser
cuser
You can change the \2\1 to use any string you want - ie. \2_\1 will give a_user, b_user, c_user.
Also, in order to match any preceding string (not just "user"), just replace the \(user\) with \(.*\). Ex:
$ echo "marya
> johnb
> alfredc" | sed "s/\(.*\)\(.\)/\2\1/"
amary
bjohn
calfred
here's a partial answer to what is probably the easy part. To use sed to change usera to user_a you could use:
sed 's/user/user_/' temp
where temp is the name of the file that contains your initial list of usernames. How this works: It is finding the first instance of "user" on each line and replacing it with "user_"
Similarly for your dot example:
sed 's/user/user./' temp
will replace the first instance of "user" on each line with "user."
Sed does not offer non-greedy regex, so I suggest perl:
perl -pe 's/(.*?)(.)$/$2$1/g' file
ajohn
bmary
ccharles
perl -pe 's/(.*?)(.)$/$1_$2/g' file
john_a
mary_b
charles_c
That way you don't need to know the username before hand.
Simple solution using awk
awk '{a=$NF;$NF="";$0=a$0}1' FS="" OFS="" file
ajohn
bmary
ccharles
and
awk '{a=$NF;$NF="";$0=$0"_" a}1' FS="" OFS="" file
john_a
mary_b
charles_c
By setting FS to nothing, every letter is a field in awk. You can then easy manipulate it.
And no need to using capturing groups etc, just plain field swapping.
This might work for you (GNU sed):
sed -r 's/^([^_]*)_?(.)$/\2\1/' file
This matches any charactes other than underscores (in the first back reference (\1)), a possible underscore and the last character (in the second back reference (\2)) and swaps them around.

Replace 3 lines with another line SED Syntax

This is a simple question, I'm not sure if i'm able to do this with sed/awk
How can I make sed search for these 3 lines and replace with a line with a determined string?
<Blarg>
<Bllarg>
<Blllarg>
replace with
<test>
I tried with sed "s/<Blarg>\n<Bllarg>\n<Blllarg>/<test>/g" But it just don't seem to find these lines. Probably something with my break line character (?) \n. Am I missing something?
Because sed usually handles only one line at a time, your pattern will never match. Try this:
sed '1N;$!N;s/<Blarg>\n<Bllarg>\n<Blllarg>/<test>/;P;D' filename
This might work for you:
sed '/<Blarg>/ {N;N;s/<Blarg>\n<Bllarg>\n<Blllarg>/<test>/}' <filename>
It works as follows:
Search the file till <Blarg> is found
Then append the two following lines to the current pattern space using N;N;
Check if the current pattern space matches <Blarg>\n<Bllarg>\n<Blllarg>
If so, then substitute it with <test>
You can use range addresses with regular expressions an the c command, which does exactly what you are asking for:
sed '/<Blarg>/,/<Blllarg>/c<test>' filename

Remove a hyphen from a specific line in a file

I have a data file that needs to have several uniq identifiers stripped of hyphens.
So I have:
(Special_Section "data-values")
and I want to have it replaced with:
(Special_Section "datavalues")
I wanted to use a simple sed find/replace, but the data and values are different each time. Preferably, I'd run this in-place since the file has a lot of other information I want to keep in tact.
Does sed or awk have a way to remove the hyphen from the matched portion only?
Currently I can match with: sed -i 's/Special_Section "[a-zA-Z0-9]*-[a-zA-Z0-9]*"/&/g *myfiles*
But I would like to then run s/-// on & if it's possible.
You seems to be using GNU sed, so something like this might work:
sed -ri '
s/(Special_Section [^-]*)-([^)]*)/\1\2/g
' <your_filename_glob>
does this work?
sed -i '/(Special_Section ".*-.*")/{s/-//}' yourFile
Close - scan for the lines and then substitute on those that match:
sed -i '/Special_Section "[a-zA-Z0-9]*-[a-zA-Z0-9]*"/s/\( "[a-zA-Z0-9]*\)-\([a-zA-Z0-9]*\)"/\1\2/' *myfiles*
You can split that over several lines to avoid the scroll bar in SO:
sed -i '/Special_Section "[a-zA-Z0-9]*-[a-zA-Z0-9]*"/{
s/\( "[a-zA-Z0-9]*\)-\([a-zA-Z0-9]*\)"/\1\2/
}' *myfiles*
And on further thoughts, you can also do:
sed -i 's/\(Special_Section "[a-zA-Z0-9]*\)-\([a-zA-Z0-9]*"\)/\1\2/' *myfiles*
This is more compact. You can add the g qualifier if you need it. Both solutions use the special \(...\) notation to capture parts of the regular expression.

capturing groups in sed

I have many lines of the form
ko04062 ko:CXCR3
ko04062 ko:CX3CR1
ko04062 ko:CCL3
ko04062 ko:CCL5
ko04080 ko:GZMA
and would dearly like to get rid of the ko: bit of the right-hand column. I'm trying to use sed, as follows:
echo "ko05414 ko:ITGA4" | sed 's/\(^ko\d{5}\)\tko:\(.*$\)/\1\2/'
which simply outputs the original string I echo'd. I'm very new to command line scripting, sed, pipes etc, so please don't be too angry if/when I'm doing something extremely dumb.
The main thing that is confusing me is that the same thing happens if I reverse the \1\2 bit to read \2\1 or just use one group. This, I guess, implies that I'm missing something about the mechanics of piping the output of echo into sed, or that my regexp is wrong or that I'm using sed wrong or that sed isn't printing the results of the substitution.
Any help would be greatly appreciated!
sed is outputting its input because the substitution isn't matching. Since you're probably using GNU sed, try this:
echo "ko05414 ko:ITGA4" | sed 's/\(^ko[0-9]\{5\}\)\tko:\(.*$\)/\1\2/'
\d -> [0-9] since GNU sed doesn't recognize \d
{} -> \{\} since GNU sed by default uses basic regular expressions.
This should do it. You can also skip the last group and simply use, \1 instead, but since you're learning sed and regex this is good stuff. I wanted to use a non-capturing group in the middle (:? ) but I could not get that to play with sed for whatever reason, perhaps it's not supported.
sed --posix 's/\(^ko[0-9]\{5\}\)\( ko:\)\(.*$\)/\1 \3/g' file > result
And ofcourse you can use
sed --posix 's/ko://'
You don't need sed for this
Here is how you can do it with bash:
var="ko05414 ko:ITGA4"
echo ${var//"ko:"}
${var//"ko:"} replaces all "ko:" with ""
See Manipulating Strings for more info
#OP, if you just want to get rid of "ko:", then
$ cat file
ko04062 ko:CXCR3
ko04062 ko:CX3CR1
ko04062 ko:CCL3
ko04062 ko:CCL5
some text with a legit ko: this ko: will be deleted if you use gsub.
ko04080 ko:GZMA
$ awk '{sub("ko:","",$2)}1' file
ko04062 CXCR3
ko04062 CX3CR1
ko04062 CCL3
ko04062 CCL5
some text with a legit ko: this ko: will be deleted if you use gsub.
ko04080 GZMA
Jsut a note. While you can use pure bash string substitution, its only more efficient when you are changing a single string. If you have a file, especially a big file, using bash's while read loop is still slower than using sed or awk.