could anyone help me modify the line before a matched pattern in sun solaris i.e search for a pattern and replace the line above with some other text.
eg:
In the input file :
<td>
Completed
</td>
output needed:
<td bgcolor = 'green'>
Completed
</td>
the pattern "Completed" should be searched first and then replace the just above line with some other text.
The following are the commands i have used and failed to get the result,
sed 's/<td>\nCompleted/<td>Completed/' exp12.html > sample.html
sed 's/<td>$Completed/<td>Completed/' exp12.html > sample.html
tr '\n' '*' exp12.html > sample.html
through this, all the text comes to a single line and then have used this
sed '/<td>*Completed*/<td bgcolor = 'green'>*Completed*/' exp12.html > sample.html
Please provide me with a sun solaris command that can fetch the above mentioned output.
Related
I have the following text:
<h2 id="title"> ABC A BBBBB0 </h2>
<h2 id="title">ABC A BBBBB1 </h2>
<h2 id="title">ABC A BBBBB2</h2>
<h2 id="title"> ABC A BBBBB3 </h2>
and want to get of it the following:
ABC A BBBBB0
ABC A BBBBB1
ABC A BBBBB2
ABC A BBBBB3
I am currently running the next command:
sed -n "s/.*\"title\">[[:space:]]*\(.*\)<.*/\1/p" ./file.txt
but get lines with spaces at the end:
ABC A BBBBB0[space][space][space][space]
ABC A BBBBB1[space]
ABC A BBBBB2
ABC A BBBBB3[space]
I can not understand the concept of ignoring possible spaces at the end in my case, at the beginning of the possible matches I understand how to do it. Can somebody give me a clear example for this?
The last character in the group has to not be a space, then there may be spaces.
's/.*"title">[[:space:]]*\(.*[^[:space:]]\)[[:space:]]*<.*/\1/p'
I can not understand the concept
.* matches everything up until the end of the whole line. Then regex engine reads < and goes back from right to left up until it matches <, and then continues matching further.
You have to put something so that when you go back from the end of the string, you will end up at the place you want to be. So "not a space", for example. The process of "going back" is called "backtracking".
I can recommend https://www.regular-expressions.info/engine.html
Using sed
$ sed 's/[^>]*>[[:space:]]*\?\([[:alnum:][:space:]]*\)[[:space:]]\?<.*/\1/' file
ABC A BBBBB0
ABC A BBBBB1
ABC A BBBBB2
ABC A BBBBB3
$ sed -E 's/[^>]*> *?([A-Z0-9 ]*) ?<.*/\1/' file
ABC A BBBBB0
ABC A BBBBB1
ABC A BBBBB2
ABC A BBBBB3
When using seds grouping and back referencing, you can easily exclude any character, including spaces by not including it within the grouping parenthesis.
[^>]*> - Skip everything till the next >, as this is not within the parenthesis, it will be excluded.
*? - As too will this space. The ? makes it an optional character (or zero or more).
([A-Z0-9 ]*) - Everything within the parenthesis is included which will be capitals, integers and spaces.
?<.*/\1/' - Exclude a single space before < if one is present.
I'd just use awk:
$ awk -F'> *| *<' '{print $3}' file
ABC A BBBBB0
ABC A BBBBB1
ABC A BBBBB2
ABC A BBBBB3
This might work for you (GNU sed):
sed -nE 's/<h2 id="title">\s*(.*\S)\s*<\/h2>/\1/p' file
Use pattern matching to return the required strings.
N.B. \s matches white space and \S is its dual. Thus (.*\S) captures word or words.
I want to achieve this:
Skip the first occurence of a match
For all the other occurences (except the first)
Delete the entire line containing that occurence
So for example if I have this text:
<div>
<p>First text</p>
</div>
<div>
<p>Second text</p>
<p>Third text</p>
</div>
And I am matching for <p>
I want the output to be:
<div>
<p>First text</p>
</div>
<div>
</div>
I tried sed '0,/<p>/! /<p>/d', but it outputs unknown command: `/' .
How could I achieve my desired result?
I am yet a novice, so my mistake could come off as silly.
Would appreciate a lot if you could help.
From the question, it looks to me that you are not considering cases where <p> and </p> are on different lines, nor that you even care about </p>; you're just deleting all lines containing <p>, except for the first such line.
The following command should do the job:
sed -z 's/<p>/\x0/;s/[^\n]*<p>[^\n]*\n//g;s/\x0/<p>/' input_file
This solution has a fairly simple logic:
it marks and "hides" the first <p>;
deletes all the lines containing <p>, except the first one where <p> is "hidden";
restores the "hidden" <p>.
Detailed explanation:
the option -z makes Sed treat the file as a single string consisting of all lines concateneted, with each line terminating by \n;
the Sed command consists of 3 parts separated by ;:
s/<p>/\x0/ changes the first <p> to \x0 which is not a character present in the file;
s/[^\n]*<p>[^\n]*\n//g deletes (actually substitutes with the empty string) any line which contains only non-\ns with a n<p> somewhere, all followed by \n; the first line containing <p> is not deleted because it doesn't contain <p> since after step 1;
s/\x0/<p>/ changes the marker \x0 back to <p>.
When you want to keep the second <p> when it is on the same line as the first, you can use
sed -rz ':a;s/(<p>.*\n)[^\n]*<p>[^\n]*\n/\1/;ta' file
When you really like sed, you can use
sed -n '1,/<p>/p' file; sed '/<p>/d' <(sed '1,/<p>/d' file)
You wanted sed, I will show an awk solution too:
awk '/<p>/ && delp {next}
/<p>/ {delp=1}
1' file
This might work for you (GNU sed):
sed '/<p>/{x;/./{x;d};x;h}' file
If the current line does not contain <p>, print as normal.
If the current line contains <p> and there is a copy in the hold space, delete the current line.
Otherwise copy the current line to the hold space and print as normal.
Alternative:
sed -z 's/.*<p>.*\n//2mg' file
Here's another solution which uses a fairly more complex logic, but consists of a shorter command:
sed 'x;s/<p>/&/;x;ta;bb;:a;/<p>/d;:b;H' input_file
Here's a pseudo-code describing the logic:
if one of previous lines contains <p>
set flag to true
else
set/leave flag to false
end
if flag
if line contains <p>
delete line
end
end
Detailed explanation:
unlike the other answer, it doesn't use the -z option, which means that the script is run for every line of the input file
the script does the following (again the commands are separated by ;s):
x swaps (exchanges) the content of the pattern space (whose content is "normally" the line that's being processed) with that of the hold space (a register where you can store stuff which initially empty; see step 7 to see how we use it in this script);
s/<p>/&/ searches for <p> in the current content of the pattern space, which means the content of the hold space before step 1 was run, and replaces it with itself (&); this is a no-op as regards the text being processed, but it sets to true an internal flag that means that the last executed s command was successful; in fact this s command acts like if the pattern space contains <p> set the flag to true, otherwise leave it to false;
x swaps pattern and hold space again; the net effect of these first steps (1, 2, and 3) is that the text has not been changed, and the internal flag is set to true if the hold space contains a <p>;
ta test the flag and, if it is true, the control is moved to where :a is; this means that if the hold space contains <p>, we continue with step 5, othewise we jump to step 6
(this is right after :a) /<p>/d deletes the current line being processed if it contains <p>;
(we are here if the test at step 4 had negative result, i.e. the hold space doesn't contain <p>) bb unconditionally branches (jumps) to where :b is, which means that we have simply skipped step 5, i.e. we have let a line containing <p> go, without deleting it;
H appends the current pattern space to the hold space; in practice, we are accumulating line after line to the hold space as we read them.
You were close with 0,/<p>/! /<p>/d! The /pat/ or /pat/! can't be followed by // immiedately - you need { }, thus a syntax error.
No need to repeat the <p> pattern - empty pattern reuses the last one.
$ printf "%s\n" a '<p>' c d '<p>' '<p>' '<p>' e | sed '0,/<p>/!{//d}'
a
<p>
c
d
e
I have many XHTML files whose contents are like:
<h:panelGroup rendered="#{not accessBean.isUserLoggedIn}">
<h:form>
<p:panel style="margin-top:10px">
<table style="margin:10px">
<tbody>
<tr>
<td align="center">#{i.m['Login']}</td>
<td align="center">
<h:inputText value="#{accessBean.login}" />
</td>
</tr>
<tr>
<td align="center">#{i.m['Password']}</td>
<td align="center">
<h:inputSecret value="#{accessBean.password}" />
</td>
</tr>
</tbody>
</table>
<p:commandButton ajax="false" value="#{i.m['Submit']}" action="#{accessBean.login}" />
</p:panel>
</h:form>
</h:panelGroup>
I want to replace every occurrence of #{i.m['any-string>']} with any-string, i.e., cut the string within the pattern.
I have created the following sed command
sed -e "s/#{i.m\['\(.*\)']}/\1/g"
And to run it recursively within a directory I could execute
find . -iname '*.xhtml' -type f -exec sed -i -e "s/#{i.m\['\(.*\)']}/\1/g" {} \;
Here, the any-string can be any human-readable HTML displayable character, i.e, alphabet, numbers, other characters etc. That's why I have used regex (.*).
But it seems to be not working perfectly.
Here are some tests I made using echo:
$ echo "<td align=\"center\">#{i.m['Login']}</td>" | sed -e "s/#{i.m\['\(.*\)']}/\1/g"
Result:
<td align="center">Login</td>
OK
$ echo "<p:commandButton ajax=\"false\" value=\"#{i.m['Submit']}\" action=\"#{accessBean.login}\" />" | sed -e "s/#{i.m\['\(.*\)']}/\1/g"
Result:
<p:commandButton ajax="false" value="Submit" action="#{accessBean.login}" />
OK
$ echo "<p:commandButton ajax=\"false\" value=\"#{i.m['Submit']}\" action=\"#{accessBean.login}\" /> <td align=\"center\">#{i.m['Login']}</td>" | sed -e "s/#{i.m\['\(.*\)']}/\1/g"
Result:
<p:commandButton ajax="false" value="Submit']}" action="#{accessBean.login}" /> <td align="center">#{i.m['Login</td>
NOK
I'm using Ubuntu 18.04.
Per your request, and as noted in my comment and the comment of others, you should definitely use a proper XML parser like xmlstartlet for proper XHTML parsing. A simple regex has no validation for what is left behind.
That being said, for your example (only), to replace the text leaving LOGIN, PASSWORD and Submit you could use the following regex:
sed "s/[#][{]i[.]m[[][']\([^']*\)['][]][}]/\1/" <file
Whenever you have to match characters that can also be part of the regex itself, it helps to explicitly make sure the character you want to match is treated as a character and not part of the regex expression. To do that you make use of a character-class (e.g. [...] where the characters between the brackets are matched. (if the first character in the character class is '^' it will invert the match -- i.e. match everything but what is in the class)
With that explanation, the regex should become clear. The regex uses the basic substitution form:
sed "s/find/replace/" file
The 'find' REGEX
[#] - match the pound sign
[{] - match the opening brace
i - match the 'i'
[.] - explicitly match the '.' character (instead of . any character)
m - match the 'm'
[[] - match the opening bracket
['] - match the single quote
\( - begin your capture group to capture text to reinsert as a back reference
[^']* - match zero-or-more characters that are not a single-quote
\) - end your capture group
['] - match the single-quote as the next character
[]] - match the closing bracket
[}] - match the closing brace.
The 'replace' REGEX
All characters captured as part of the find capture group (between the \(....\)), are available to use as a back reference in the replace portion of the substitution. You can have more than one capture group in the find portion, which you reference in the replace part of the substitution as \1, \2, ... and so on. Here you have only a single capture group in the find portion, so whatever was matched can be used as the entire replacement, e.g.
\1 - to replace the whole mess with just the text that was captured with [^']*
Example Use/Output
For use with your example, it will properly leave Login, Password and Submit as indicated in your question, e.g.
sed "s/[#][{]i[.]m[[][']\([^']*\)['][]][}]/\1/" file
<h:panelGroup rendered="#{not accessBean.isUserLoggedIn}">
<h:form>
<p:panel style="margin-top:10px">
<table style="margin:10px">
<tbody>
<tr>
<td align="center">Login</td>
<td align="center">
<h:inputText value="#{accessBean.login}" />
</td>
</tr>
<tr>
<td align="center">Password</td>
<td align="center">
<h:inputSecret value="#{accessBean.password}" />
</td>
</tr>
</tbody>
</table>
<p:commandButton ajax="false" value="Submit" action="#{accessBean.login}" />
</p:panel>
</h:form>
</h:panelGroup>
Again, as a disclaimer and just good common sense, don't parse X/HTML with a regex, use a proper tool like xmlstartlet. Don't parse JSON with a regex, use a proper tools for the job like jq -- you get the drift. (but for this limited example, the regex works well, but it is fragile, if anything in the input changes, it will break -- which is why we have tools like xmlstartlet and jq)
The problem here is that you do not take the greedy nature of regexps into account. You need to prevent your regexp from gobbling up extra 's:
sed -e "s/#{i.m['([^']*)']}/\1/g"
This is also the reason why David C. Rankin's solution works. His regexp is unnecessarily complex, however.
For example;
I'd love to replace /test src path only within <img> tag.
However <p>test</p> should not be touched.
$ cat test.html
<img src="/test" width="18" alt="" /><br>
<p>test</p>
For now I could execute something like;
sed -i '/test'|/hoge|g' test.html
However it changes the word globally.
sed '/<img/s|/test|/hoge|g' test.html would work for one line <img tags
Sed allows the s///g replacement to be prefixed with another /PATTERN/ to restrict the replacement to lines matching PATTERN.
But you should really use an xml parser to be safe.
Another approach with sed:
sed -i 's|\(<img *src="/\)test|\1hoge|' test.html
<img *src="/ is captured and backreferenced using \1 in substitution string.
Following string(test) is replaced with hoge.
I have a site that need to made some mass edit, I used sed to perform most of the task but add the heading tag(<h1>, <h2>) is so tricky that I can't think up of a way to due with:
The pattern that I could guarantee is as follow:
<td class="content_subhd">Heading Name</td>
I want to change it to:
<td class="content_subhd"><h2>Heading Name</h2></td>
Where Heading Name is not static, it is different on each page and this is why I can use substitute to due with it.
Any suggestion?
echo '<td class="content_subhd">Heading Name</td>' | \
sed -r 's;(<td\s*class\s*=\s*"content_subhd"\s*>)([^<]+)(</td>);\1<h2>\2</h2>\3;'
2 tricks needed:
Use pattern grouping \( ... \) and reinsertion \1
Use a colon as pattern seperator instead of / to avoid excessive quoting
Result:
sed 's:\(<td class="content_subhd">\)\(.*\)\(</td>\):\1<h2>\2</h2>\3:'
What about this?
sed 's/\([^>]*>\)\(.*\)\(<.*\)/\1<h1>\2<\/h1>\3/'
1 2 3
catches everything up to and including first >
catches the Heading Name and other possible content
catches everything after that