Replace word tag to entire file content - sed

Assume that we have a content xml-file:
<field name="id" id="1" type="number" default="" />
Assume that we have template file with tag:
INCLUDE_XML
We need to replace INCLUDE_XML tag to entire content from xml-file. We can try.
tpl_content=$(<tpl.xml)
xml_content=$(<cnt.xml)
xml_content="$(echo "$tpl_content" | sed "s/INCLUDE_XML/"$xml_content"/g")"
echo "$xml_content" > out.xml
The problem is unterminated 's' command cause xml-file has lot of bless characters (quotes, slashes, etc). How we can do the replacement without this care about the characters in content xml-file?

Just use sed's built-in facilities.
sed -e '/INCLUDE_XML/!b' -e 'r cnt.xml' -ed tpl.xml >out.xml
Translation: if the current input line doesn't match the regex, just continue. Otherwise, read in and print the other file, and delete the current line.

Related

sed: skip the first occurence of a match, then for all the other occurences, delete the whole line containing the match

I want to achieve this:
Skip the first occurence of a match
For all the other occurences (except the first)
Delete the entire line containing that occurence
So for example if I have this text:
<div>
<p>First text</p>
</div>
<div>
<p>Second text</p>
<p>Third text</p>
</div>
And I am matching for <p>
I want the output to be:
<div>
<p>First text</p>
</div>
<div>
</div>
I tried sed '0,/<p>/! /<p>/d', but it outputs unknown command: `/' .
How could I achieve my desired result?
I am yet a novice, so my mistake could come off as silly.
Would appreciate a lot if you could help.
From the question, it looks to me that you are not considering cases where <p> and </p> are on different lines, nor that you even care about </p>; you're just deleting all lines containing <p>, except for the first such line.
The following command should do the job:
sed -z 's/<p>/\x0/;s/[^\n]*<p>[^\n]*\n//g;s/\x0/<p>/' input_file
This solution has a fairly simple logic:
it marks and "hides" the first <p>;
deletes all the lines containing <p>, except the first one where <p> is "hidden";
restores the "hidden" <p>.
Detailed explanation:
the option -z makes Sed treat the file as a single string consisting of all lines concateneted, with each line terminating by \n;
the Sed command consists of 3 parts separated by ;:
s/<p>/\x0/ changes the first <p> to \x0 which is not a character present in the file;
s/[^\n]*<p>[^\n]*\n//g deletes (actually substitutes with the empty string) any line which contains only non-\ns with a n<p> somewhere, all followed by \n; the first line containing <p> is not deleted because it doesn't contain <p> since after step 1;
s/\x0/<p>/ changes the marker \x0 back to <p>.
When you want to keep the second <p> when it is on the same line as the first, you can use
sed -rz ':a;s/(<p>.*\n)[^\n]*<p>[^\n]*\n/\1/;ta' file
When you really like sed, you can use
sed -n '1,/<p>/p' file; sed '/<p>/d' <(sed '1,/<p>/d' file)
You wanted sed, I will show an awk solution too:
awk '/<p>/ && delp {next}
/<p>/ {delp=1}
1' file
This might work for you (GNU sed):
sed '/<p>/{x;/./{x;d};x;h}' file
If the current line does not contain <p>, print as normal.
If the current line contains <p> and there is a copy in the hold space, delete the current line.
Otherwise copy the current line to the hold space and print as normal.
Alternative:
sed -z 's/.*<p>.*\n//2mg' file
Here's another solution which uses a fairly more complex logic, but consists of a shorter command:
sed 'x;s/<p>/&/;x;ta;bb;:a;/<p>/d;:b;H' input_file
Here's a pseudo-code describing the logic:
if one of previous lines contains <p>
set flag to true
else
set/leave flag to false
end
if flag
if line contains <p>
delete line
end
end
Detailed explanation:
unlike the other answer, it doesn't use the -z option, which means that the script is run for every line of the input file
the script does the following (again the commands are separated by ;s):
x swaps (exchanges) the content of the pattern space (whose content is "normally" the line that's being processed) with that of the hold space (a register where you can store stuff which initially empty; see step 7 to see how we use it in this script);
s/<p>/&/ searches for <p> in the current content of the pattern space, which means the content of the hold space before step 1 was run, and replaces it with itself (&); this is a no-op as regards the text being processed, but it sets to true an internal flag that means that the last executed s command was successful; in fact this s command acts like if the pattern space contains <p> set the flag to true, otherwise leave it to false;
x swaps pattern and hold space again; the net effect of these first steps (1, 2, and 3) is that the text has not been changed, and the internal flag is set to true if the hold space contains a <p>;
ta test the flag and, if it is true, the control is moved to where :a is; this means that if the hold space contains <p>, we continue with step 5, othewise we jump to step 6
(this is right after :a) /<p>/d deletes the current line being processed if it contains <p>;
(we are here if the test at step 4 had negative result, i.e. the hold space doesn't contain <p>) bb unconditionally branches (jumps) to where :b is, which means that we have simply skipped step 5, i.e. we have let a line containing <p> go, without deleting it;
H appends the current pattern space to the hold space; in practice, we are accumulating line after line to the hold space as we read them.
You were close with 0,/<p>/! /<p>/d! The /pat/ or /pat/! can't be followed by // immiedately - you need { }, thus a syntax error.
No need to repeat the <p> pattern - empty pattern reuses the last one.
$ printf "%s\n" a '<p>' c d '<p>' '<p>' '<p>' e | sed '0,/<p>/!{//d}'
a
<p>
c
d
e

sed command to replace text from some search position

I want to replace value 2000 with 5000 for the fruit grapes. How to use sed to start searching from 'grapes' position and replace the first occurance
<fruits>
<fruit>
<name>apple</name>
<value>2000</value>
</fruit>
<fruit>
<name>grapes</name>
<value>2000</value>
</fruit>
<fruit>
<name>banana</name>
<value>2000</value>
</fruit>
</fruits>
I tried
sed '\,grapes, s/2000/5000/' fruits.txt
sed -i '\%<name>grapes</name>%,\%</fruit>%s%<value>2000</value>%<value>5000</value>%' fruits.xml
We use the generalized \% to avoid needing to backslash the slash in the closing tags. The s%%% doesn't need to be backslashed because sed already knows that the thing after s is the separator.
On *BSDish platforms, including MacOS, you need an explicit empty string argument to sed -i before the script itself (so sed -i '' '\%...')
This tries to be strict, but of course cannot cope with all the possible variations which are syntactically allowed in XML. If your file is always in exactly the expected format, this should work for now. One of the drawbacks is that you will get no warning when it stops working because Jenkins decides to start using whitespace inside the tags, or whatever.
Using xmlstarlet:
xmlstarlet ed -u "//fruits/fruit/name[.='grapes']/following-sibling::value" -v 5000 fruits.xml
This will update (ed -u) the xml by selecting the xpath //fruits/fruit/name where name is grapes and where the following-sibling is value, changing the content to 5000.
sed is for doing s/old/new that is all. For anything else you should be using awk:
$ awk '/grapes/{f=1} f && sub(/2000/,"5000"){f=0} 1' file
<fruits>
<fruit>
<name>apple</name>
<value>2000</value>
</fruit>
<fruit>
<name>grapes</name>
<value>5000</value>
</fruit>
<fruit>
<name>banana</name>
<value>2000</value>
</fruit>
</fruits>

Sed only with specific place

For example;
I'd love to replace /test src path only within <img> tag.
However <p>test</p> should not be touched.
$ cat test.html
<img src="/test" width="18" alt="" /><br>
<p>test</p>
For now I could execute something like;
sed -i '/test'|/hoge|g' test.html
However it changes the word globally.
sed '/<img/s|/test|/hoge|g' test.html would work for one line <img tags
Sed allows the s///g replacement to be prefixed with another /PATTERN/ to restrict the replacement to lines matching PATTERN.
But you should really use an xml parser to be safe.
Another approach with sed:
sed -i 's|\(<img *src="/\)test|\1hoge|' test.html
<img *src="/ is captured and backreferenced using \1 in substitution string.
Following string(test) is replaced with hoge.

Using sed script to replace specific parts of filtered lines

Given an xml file consisting of lines like below:
<dependency field="no_change" name="test" conf="blahblah"/>
<dependency field="to_be_picked_up" name="test" conf="blahREPLACE_ME"/>
I would like to be able to identify lines where the value of field is equal to the to_be_picked_up (which can be anything apart from a specific string e.g. no_change) and replace the string REPLACE_ME with a specific string.
I have used the following command to do some line-level changes but I am not sure how I can script the logic for replacing REPLACE_ME only in lines where the value of the field can be anything apart from the to_be_picked_up and locate the change within the conf="".
sed -e 's/<dependency \(.*\)\(\.*\)>/\<dependency \1\/\>/'
Don't use sed to edit XML. Use an XML-aware tool. For example, in xsh, a tool based on libxml I happen to maintain, you can write
open file.xml ;
for //dependency[#field="to_be_picked_up"]/#conf
set . xsh:subst(., 'REPLACE_ME', 'RESULT') ;
save :b ;
sed '/field="no_change"/!s/REPLACE_ME/whatever/'
Using xmlstarlet, it would be:
xmlstarlet ed -u '//dependency[#field!="no_change"]/#conf' -x 'concat(substring-before(.,"REPLACE_ME"), "whatever", substring-after(., "REPLACE_ME"))'

How to sed stuff within pairs of quotes?

I want to change lines like:
<A HREF="classes_index_additions.html"class="hiddenlink">
to
<A HREF="classes_index_additions.html" class="hiddenlink">
(note the added ' ' before class) but it should leave lines like
<meta name="generator" content="JDiff v1.1.1">
alone. sed -e 's|\("[^"]*"\)\([^ />]\)|\1 \2|g' satisfies the first condition but it changes the other text to
<meta name="generator" content=" JDiff v1.1.1"/>
How do I get sed to process the correct pairs of double quotes?
You can try this:
sed -e 's/"\([^" ]*\)=/" \1=/g'
But with sed, it may be possible that the regular expression matches other parts of your document that you didn't intend, so best to try it and look over the results to see if there are any unintended side effects!
You can try putting each attributes on a new line and then triming trailing spaces on each line before removing new lines.
sed -r 's/(\w*="[^"]*")/\n\1/g; s/ *\n/\n/g; s/\n/ /g'
This works as follow :
s/(\w*="[^"]*")/\n\1/g
Put every attributes on a new line so your node looks like this
<A
HREF="classes_index_additions.html"
class="hiddenlink">
After that you remove trailing spaces
s/ *\n/\n/g
And remove new lines
s/\n/ /g