Sed: capture the text between the first ( and ) - sed

I have the following badly formatted text:
<h1 id="page-title">ABCD TEXT TEXT ( QQQ-10-123-01)</h1>
<h1 id="page-title">ABCD TEXT TEXT (QQQ-10-123-02)</h1>
<h1 id="page-title">ABCD TEXT TEXT (QQQ-10-123-03 (QWERTY))</h1>
and need to get from it:
QQQ-10-123-01
QQQ-10-123-02
QQQ-10-123-03 (QWERTY)
I.e. get only text between the first "(" and ")", at the moment doing the following:
sed -n "s/.*<h1 id=\"page-title\">.*(\(.*\))<\/h1>.*/\1/p" ./file.txt
and get:
QQQ-10-123-01
QQQ-10-123-02
QWERTY)
As you can see only the second line is being processed properly, since this line is most accurate. There are problems with ignoring possible whitespace and dealing with double entry "(" and ")". Can somebody give the right direction for solving the problems?
P.S. I need to parse over 2k lines; would there be a big difference in performance between sed and awk? As far as I have been reading and understood, sed should have a little benefit in speed. Is that really so?

Using sed
$ sed 's/[^(]*([[:space:]]\?\([^)]*)\?\)).*/\1/' input_file
QQQ-10-123-01
QQQ-10-123-02
QQQ-10-123-03 (QWERTY)
$ sed -E 's/[^(]*\([[:space:]]?([^)]*\)?)\).*/\1/' input_file
QQQ-10-123-01
QQQ-10-123-02
QQQ-10-123-03 (QWERTY)

Using any sed:
$ sed 's/[^(]*( *\(.*\)).*/\1/g' file
QQQ-10-123-01
QQQ-10-123-02
QQQ-10-123-03 (QWERTY)

Related

How to replace a specific character in bash

I want to replace '_v' with a whitespace and the last dot . into a dash "-". I tried using
sed 's/_v/ /' and tr '_v' ' '
Original Text
src-env-package_v1.0.1.18
output
src-en -package 1.0.1.18
Expected Output
src-env-package 1.0.1-18
This might work for you (GNU sed):
sed -E 's/(.*)_v(.*)\./\1 \2-/' file
Use the greed of the .* regexp to find the last occurrence of _v and likewise . and substitute a space for the former and a - for the latter.
If one of the conditions may occur but not necessarily both, use:
sed -E 's/(.*)_v/\1 /;s/(.*)\./\1-/' file
With your shown samples please try following sed code. Using sed's capability to store matched regex values into temp buffer(called capturing groups) here. Also using -E option here to enable ERE(extended regular expressions) for handling regex in better way.
Here is the Online demo for used regex.
sed -E 's/^(src-env-package)_v([0-9]+\..*)\.([0-9]+)$/\1 \2-\3/' Input_file
OR if its a variable value on which you want to run sed command then use following:
var="src-env-package_v1.0.1.18"
sed -E 's/^(src-env-package)_v([0-9]+\..*)\.([0-9]+)$/\1 \2-\3/' <<<"$var"
src-env-package 1.0.1-18
Bonus solution: Adding a perl one-liner solution here, using capturing groups concept(as explained above) in perl and getting the values as per requirement.
perl -pe 's/^(src-env-package)_v((?:[0-9]+\.){1,}[0-9]+)\.([0-9]+)$/\1 \2-\3/' Input_file

sed is replacing whole line instead of just the string

I want to just replace few strings in file with nothing, but sed replaces the whole line. Can someone help me with this?
line in file.xml:
<tag>sample text1 text2</tag>
My code:
sed "s/'text1 text2'//" file.xml 2>/dev/null || :
I also tried
sed -i -e "s/'text1 text2'//" file.xml 2>/dev/null || :
expected result:
<tag>sample</tag>
Actual result:
The whole line is removed from file.
Others:
text1 and text 2 are complex text with .=- characters in it
What can I do to fix this?
TIA
Remove the single quotes:
sed "s/text1 text2//" file.xml
You could use
sed 's/\([^ ]*\)[^<]*\(.*\)/\1\2/' filename
Output:
<tag>sample</tag>
Grouping is used. First all characters till a space are grouped together, then all characters till a < are matched and all following characters are grouped into another group.

find sed regex for {}, ignoring the string in it

in a text file (on linux system) I have this string:
O\WIN_INFRASTRUKTUR{Windows Fabrik}\FIM{Forefront Identity Manager(Benutzer)}\EXTRA{}
Now, I want to replace the O\WIN_INFRASTRUKTUR{Windows Fabrik}, but I don't know what is standing in {}. It could be empty or text in it.
I try this, but without success:
sed -e 's/O\\WIN_INFRASTRUKTUR{[a-zA-Z0-9]}/O\\WIFI{}/g'
And that must be the Result:
O\WIFI{}\FIM{Forefront Identity Manager(Benutzer)}\EXTRA{}
Could anyone help me?
use the delimiter as end of your pattern, here it is } so take a class excluding this, any occurrence than your delimiter with [^}]*}
sed -e 's/O\\WIN_INFRASTRUKTUR{[^}]*}/O\\WIFI{}/g' YourFile
sed -e 's/WIN_INFRASTRUKTUR{[^}]*}/WIFI{}/g' <filename>
Thanks, it will be sucessful, but what is, if I want to have this result:
O\WIFI{}\EXTRA{}.
It doesn't matter if I do this:
sed -e 's/O\\WIN_INFRASTRUKTUR{[^}]*}\\FIM{[^}]*}/O\\WIFI{}/g'
than I get only this result: O\WIFI{}

sed: if line does not contain lower-case, add a blank line above and below

There are a number of questions here about sed to find lines that don't contain a string, but all of them seem to be about then deleting those lines. I want to keep mine, with a blank line added above and below.
Try doing this :
$ sed '/[[:lower:]]/!{a
i
}' file.txt
Here is an awk solution:
awk '!/[[:lower:]]/ {$0=RS$0RS}1' file
If line does not have any lower characters, add Record Selector (newline) before and after line, then print.
This might work for you (GNU sed):
sed '/[[:lower:]]/b;x;p;x;G' file

sed/awk - removing text between delimiters

How would I remove all text between certain delimiters.
example:
hello;you;are;nice
returns:
hello;you;nice
in sed, i know how to remove text before the first delimiter and after the last, but not sure otherwise...
thanks as always to everyone.
What about using cut -
cut -d; -f2-3
It is quite straigthforward with sed
sed "s/\w*;//3"
awk -F\; -v OFS=";" '{print $1,$2,$4}' file