sed: cut a string within a pattern

sed: cut a string within a pattern - sed

I have many XHTML files whose contents are like:
<h:panelGroup rendered="#{not accessBean.isUserLoggedIn}">
<h:form>
<p:panel style="margin-top:10px">
<table style="margin:10px">
<tbody>
<tr>
<td align="center">#{i.m['Login']}</td>
<td align="center">
<h:inputText value="#{accessBean.login}" />
</td>
</tr>
<tr>
<td align="center">#{i.m['Password']}</td>
<td align="center">
<h:inputSecret value="#{accessBean.password}" />
</td>
</tr>
</tbody>
</table>
<p:commandButton ajax="false" value="#{i.m['Submit']}" action="#{accessBean.login}" />
</p:panel>
</h:form>
</h:panelGroup>
I want to replace every occurrence of #{i.m['any-string>']} with any-string, i.e., cut the string within the pattern.
I have created the following sed command
sed -e "s/#{i.m\['\(.*\)']}/\1/g"
And to run it recursively within a directory I could execute
find . -iname '*.xhtml' -type f -exec sed -i -e "s/#{i.m\['\(.*\)']}/\1/g" {} \;
Here, the any-string can be any human-readable HTML displayable character, i.e, alphabet, numbers, other characters etc. That's why I have used regex (.*).
But it seems to be not working perfectly.
Here are some tests I made using echo:
$ echo "<td align=\"center\">#{i.m['Login']}</td>" | sed -e "s/#{i.m\['\(.*\)']}/\1/g"
Result:
<td align="center">Login</td>
OK
$ echo "<p:commandButton ajax=\"false\" value=\"#{i.m['Submit']}\" action=\"#{accessBean.login}\" />" | sed -e "s/#{i.m\['\(.*\)']}/\1/g"
Result:
<p:commandButton ajax="false" value="Submit" action="#{accessBean.login}" />
OK
$ echo "<p:commandButton ajax=\"false\" value=\"#{i.m['Submit']}\" action=\"#{accessBean.login}\" /> <td align=\"center\">#{i.m['Login']}</td>" | sed -e "s/#{i.m\['\(.*\)']}/\1/g"
Result:
<p:commandButton ajax="false" value="Submit']}" action="#{accessBean.login}" /> <td align="center">#{i.m['Login</td>
NOK
I'm using Ubuntu 18.04.

Per your request, and as noted in my comment and the comment of others, you should definitely use a proper XML parser like xmlstartlet for proper XHTML parsing. A simple regex has no validation for what is left behind.
That being said, for your example (only), to replace the text leaving LOGIN, PASSWORD and Submit you could use the following regex:
sed "s/[#][{]i[.]m[[][']\([^']*\)['][]][}]/\1/" <file
Whenever you have to match characters that can also be part of the regex itself, it helps to explicitly make sure the character you want to match is treated as a character and not part of the regex expression. To do that you make use of a character-class (e.g. [...] where the characters between the brackets are matched. (if the first character in the character class is '^' it will invert the match -- i.e. match everything but what is in the class)
With that explanation, the regex should become clear. The regex uses the basic substitution form:
sed "s/find/replace/" file
The 'find' REGEX
[#] - match the pound sign
[{] - match the opening brace
i - match the 'i'
[.] - explicitly match the '.' character (instead of . any character)
m - match the 'm'
[[] - match the opening bracket
['] - match the single quote
\( - begin your capture group to capture text to reinsert as a back reference
[^']* - match zero-or-more characters that are not a single-quote
\) - end your capture group
['] - match the single-quote as the next character
[]] - match the closing bracket
[}] - match the closing brace.
The 'replace' REGEX
All characters captured as part of the find capture group (between the \(....\)), are available to use as a back reference in the replace portion of the substitution. You can have more than one capture group in the find portion, which you reference in the replace part of the substitution as \1, \2, ... and so on. Here you have only a single capture group in the find portion, so whatever was matched can be used as the entire replacement, e.g.
\1 - to replace the whole mess with just the text that was captured with [^']*
Example Use/Output
For use with your example, it will properly leave Login, Password and Submit as indicated in your question, e.g.
sed "s/[#][{]i[.]m[[][']\([^']*\)['][]][}]/\1/" file
<h:panelGroup rendered="#{not accessBean.isUserLoggedIn}">
<h:form>
<p:panel style="margin-top:10px">
<table style="margin:10px">
<tbody>
<tr>
<td align="center">Login</td>
<td align="center">
<h:inputText value="#{accessBean.login}" />
</td>
</tr>
<tr>
<td align="center">Password</td>
<td align="center">
<h:inputSecret value="#{accessBean.password}" />
</td>
</tr>
</tbody>
</table>
<p:commandButton ajax="false" value="Submit" action="#{accessBean.login}" />
</p:panel>
</h:form>
</h:panelGroup>
Again, as a disclaimer and just good common sense, don't parse X/HTML with a regex, use a proper tool like xmlstartlet. Don't parse JSON with a regex, use a proper tools for the job like jq -- you get the drift. (but for this limited example, the regex works well, but it is fragile, if anything in the input changes, it will break -- which is why we have tools like xmlstartlet and jq)

The problem here is that you do not take the greedy nature of regexps into account. You need to prevent your regexp from gobbling up extra 's:
sed -e "s/#{i.m['([^']*)']}/\1/g"
This is also the reason why David C. Rankin's solution works. His regexp is unnecessarily complex, however.

Related

Perl - Replace all but last occurence

I have a string which contains multiple occurrences of the string <br />. I want to replace all of those, except the last one, without the slash: <br>
So, if I have a string:
A<br />B<br />C<br />D<br />.
I want to have the string:
A<br>B<br>C<br>D<br />.

You can use a lookahead assertion, that requires the string to have at least one <br /> left: (?=.*<br />). Here is an example:
$ perl -pe's|<br />(?=.*<br />)|<br>|g'
A<br />B<br />C<br />D<br />
A<br>B<br>C<br>D<br />

Sed only with specific place

For example;
I'd love to replace /test src path only within <img> tag.
However <p>test</p> should not be touched.
$ cat test.html
<img src="/test" width="18" alt="" /><br>
<p>test</p>
For now I could execute something like;
sed -i '/test'|/hoge|g' test.html
However it changes the word globally.

sed '/<img/s|/test|/hoge|g' test.html would work for one line <img tags
Sed allows the s///g replacement to be prefixed with another /PATTERN/ to restrict the replacement to lines matching PATTERN.
But you should really use an xml parser to be safe.

Another approach with sed:
sed -i 's|\(<img *src="/\)test|\1hoge|' test.html
<img *src="/ is captured and backreferenced using \1 in substitution string.
Following string(test) is replaced with hoge.

Replace word tag to entire file content

Assume that we have a content xml-file:
<field name="id" id="1" type="number" default="" />
Assume that we have template file with tag:
INCLUDE_XML
We need to replace INCLUDE_XML tag to entire content from xml-file. We can try.
tpl_content=$(<tpl.xml)
xml_content=$(<cnt.xml)
xml_content="$(echo "$tpl_content" | sed "s/INCLUDE_XML/"$xml_content"/g")"
echo "$xml_content" > out.xml
The problem is unterminated 's' command cause xml-file has lot of bless characters (quotes, slashes, etc). How we can do the replacement without this care about the characters in content xml-file?

Just use sed's built-in facilities.
sed -e '/INCLUDE_XML/!b' -e 'r cnt.xml' -ed tpl.xml >out.xml
Translation: if the current input line doesn't match the regex, just continue. Otherwise, read in and print the other file, and delete the current line.

How to modify the above line if pattern matches in sun solaris

could anyone help me modify the line before a matched pattern in sun solaris i.e search for a pattern and replace the line above with some other text.
eg:
In the input file :
<td>
Completed
</td>
output needed:
<td bgcolor = 'green'>
Completed
</td>
the pattern "Completed" should be searched first and then replace the just above line with some other text.
The following are the commands i have used and failed to get the result,
sed 's/<td>\nCompleted/<td>Completed/' exp12.html > sample.html
sed 's/<td>$Completed/<td>Completed/' exp12.html > sample.html
tr '\n' '*' exp12.html > sample.html
through this, all the text comes to a single line and then have used this
sed '/<td>*Completed*/<td bgcolor = 'green'>*Completed*/' exp12.html > sample.html
Please provide me with a sun solaris command that can fetch the above mentioned output.

Sed: How to edit a line that have some contents needed to be remain after substitution?

I have a site that need to made some mass edit, I used sed to perform most of the task but add the heading tag(<h1>, <h2>) is so tricky that I can't think up of a way to due with:
The pattern that I could guarantee is as follow:
<td class="content_subhd">Heading Name</td>
I want to change it to:
<td class="content_subhd"><h2>Heading Name</h2></td>
Where Heading Name is not static, it is different on each page and this is why I can use substitute to due with it.
Any suggestion?

echo '<td class="content_subhd">Heading Name</td>' | \
sed -r 's;(<td\s*class\s*=\s*"content_subhd"\s*>)([^<]+)(</td>);\1<h2>\2</h2>\3;'

2 tricks needed:
Use pattern grouping \( ... \) and reinsertion \1
Use a colon as pattern seperator instead of / to avoid excessive quoting
Result:
sed 's:\(<td class="content_subhd">\)\(.*\)\(</td>\):\1<h2>\2</h2>\3:'

What about this?
sed 's/\([^>]*>\)\(.*\)\(<.*\)/\1<h1>\2<\/h1>\3/'
1 2 3
catches everything up to and including first >
catches the Heading Name and other possible content
catches everything after that

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

sed: cut a string within a pattern - sed

The problem here is that you do not take the greedy nature of regexps into account. You need to prevent your regexp from gobbling up extra 's: sed -e "s/#{i.m['([^']*)']}/\1/g" This is also the reason why David C. Rankin's solution works. His regexp is unnecessarily complex, however.

Related

Perl - Replace all but last occurence

Sed only with specific place

Replace word tag to entire file content

How to modify the above line if pattern matches in sun solaris

Sed: How to edit a line that have some contents needed to be remain after substitution?

Categories

Resources