sed replace text in a XML file

sed replace text in a XML file - sed

I have huge XML file with data like this:
<amount quantity="1">12.00</amount>
How can i replace the 12.00 with something else using sed?

Not really enough information in your question but to replace all values of 12.00 with say 24.00 you could do:
$ sed 's/>12\.00</>24.00</g' file.xml
If you are happy with the results you can store them back using the -i option:
$ sed -i 's/>12\.00</>24.00</g' file.xml
A more rubust match would be:
$ sed -r 's_(<amount quantity="[0-9]+">)12.00(</amount>)_\124.00\2_g' file.xml
But you should really parse the XML properly and not force regexp to do something it wasn't designed for.

script.sh:
#!/bin/bash
xml="<amount quantity="1">12.00</amount>"
newxml=`echo $xml | sed -n "s/\(<amount[^>]*>\)\([^<]*\)\(<\/amount>\)/\113.37\3/gp"`
echo "$newxml"
result:
$ ./script.sh
<amount quantity=1>13.37</amount>

Related

Replace value in single quotes using sed

I already know that sed uses own approach to deal with single quote but I think it still possible to use it in my automation script.
I had to replace value of fingerprint in Saltstack config file.
Current value:
#master_finger: ''
Target value
master_finger: 'some:value'
My current command which doesn't work:
$ sed -i 's/#master_finger: ''/master_finger: 'some:value'/g' /etc/salt/minion
returns:
master_finger: some:value''
How can I solve this?

just use the double quotes to enclose the script.
$ echo "#master_finger: ''" | sed "s/#master_finger: ''/master_finger: 'some:value'/"
master_finger: 'some:value'

It's not sed that's making handling of 's difficult, it's the shell because the shell does not allow 's within any '-quoted string, including scripts.
You could save the sed script in a file and run it with -f or use a here document:
$ sed -f- file <<'EOF'
s/#master_finger: ''/master_finger: 'some:value'/g
EOF
master_finger: 'some:value'
To see the difference between the above and #karakfas suggestion:
$ sed -f- file <<'EOF'
s/#master_finger: ''/master_finger: '$(date)'/g
EOF
master_finger: '$(date)'
$ sed "s/#master_finger: ''/master_finger: '$(date)'/" file
master_finger: 'Sun Feb 14 06:50:43 CST 2021'
and imagine if date was replace by rm -rf * or something worse.
Also consider:
$ sed 's/#master_finger: '\'\''/master_finger: '\''$(date)'\''/' file
master_finger: '$(date)'

manipulation of text by sed command

I a file containing the genome ids following NZ_FLAT01000030.1_173 I need to manipulate those ids like this one: NZ_FLAT01000030.1
I tried some but didn't give me the exact thing.
sed 's/_/\t/' output : NZ FLAT01000030.1_173
sed -r 's/_//' output: NZFLAT01000030.1_173
sed -r 's/_//g' output: NZFLAT01000030.1173
How can I do that by using sed command?

Are you trying to remove the undesrscore and the digits following it?
echo 'NZ_FLAT01000030.1_173' | sed -E 's/_[0-9]+//g'
NZ_FLAT01000030.1

$ echo 'NZ_FLAT01000030.1_173' | sed 's/_[^_]*$//'
NZ_FLAT01000030.1

How to replace a string using Sed?

Suppose I have a string like this
<start><a></a><a></a><a></a></start>
I want to replace values inside <start></start> like this
<start><ab></ab><ab></ab><ab></ab><more></more><vale></value></start>
How do I do this using Sed?

Try this :
sed 's#<start>.*</start>#<start><ab></ab><ab></ab><ab></ab></start>#' file

I get this line with gnu sed :
sed -r 's#(<start>)(.*)(</start>)#echo "\1"$(echo "\2"\|sed "s:a>:ab>:g")"\3"#ge'
see example:
kent$ echo "<start><a></a><a></a><a></a><foo></foo><bar></bar></start>"|sed -r 's#(<start>)(.*)(</start>)#echo "\1"$(echo "\2"\|sed "s:a>:ab>:g")"\3"#ge'
<start><ab></ab><ab></ab><ab></ab><foo></foo><bar></bar></start>
note
this will replace the tags between <start>s which ending with a . which worked for your example. but if you have <aaa></aaa>:
you could do: (I break it into lines for better reading)
sed -r 's#(<start>)(.*)(</start>)
#echo "\1"$(echo "\2"\|sed "s:<a>:<ab>:g;s:</a>:</ab>:g")"\3"
#ge'
e.g.
kent$ echo "<start><a></a><a></a><a></a><aaa></aaa><aba></aba></start>" \
|sed -r 's#(<start>)(.*)(</start>)#echo "\1"$(echo "\2"\|sed "s:<a>:<ab>:g;s:</a>:</ab>:g")"\3"#ge'
<start><ab></ab><ab></ab><ab></ab><aaa></aaa><aba></aba></start>

sed 's/(\<\/?)a\>/\1ab\>/g' yourfile, though that would get <a></a> that was outside <start> as well...

grep -rl 'abc' a.txt | xargs sed -i 's/abc/def/g'

extract a substring of 11 characters from a line using sed,awk or perl

I have a file with many lines, in each line
there is either substring
whatever_blablablalsfjlsdjf;asdfjlds;f/watch?v=yPrg-JN50sw&amp,whatever_blabla
or
whatever_blablabla"/watch?v=yPrg-JN50sw&amp" class=whatever_blablablavwhate
I want to extract a substring, like the "yPrg-JN50s" above
the matching pattern is
the 11 characters after the string "/watch?="
how to extract the substring
I hope it is sed, awk in one line
if not, a pn line perl script is also ok

You can do
grep -oP '(?<=/watch\?v=).{11}'
if your grep knows Perl regex, or
sed 's/.*\/watch?v=\(.\{11\}\).*/\1/g'

$ cat file
/watch?v=yPrg-JN50sw&amp
"/watch?v=yPrg-JN50sw&amp" class=
$
$ awk 'match($0,/\/watch\?v=/) { print substr($0,RSTART+RLENGTH,11) }' file
yPrg-JN50sw
yPrg-JN50sw

Just with the shell's parameter expansion, extract the 11 chars after "watch?v=":
while IFS= read -r line; do
tmp=${line##*watch?v=}
echo ${tmp:0:11}
done < filename

You could use sed to remove the extraneous information:
sed 's/[^=]\+=//; s/&.*$//' file
Or with awk and sensible field separators:
awk -F '[=&]' '{print $2}' file
Contents of file:
cat <<EOF > file
/watch?v=yPrg-JN50sw&amp
"/watch?v=yPrg-JN50sw&amp" class=
EOF
Output:
yPrg-JN50sw
yPrg-JN50sw
Edit accommodating new requirements mentioned in the comments
cat <<EOF > file
<div id="" yt-grid-box "><div class="yt-lockup-thumbnail"><a href="/watch?v=0_NfNAL3Ffc" class="ux-thumb-wrap yt-uix-sessionlink yt-uix-contextlink contains-addto result-item-thumb" data-sessionlink="ved=CAMQwBs%3D&ei=CPTsy8bhqLMCFRR0fAodowXbww%3D%3D"><span class="video-thumb ux-thumb yt-thumb-default-185 "><span class="yt-thumb-clip"><span class="yt-thumb-clip-inner"><img src="//i1.ytimg.com/vi/0_NfNAL3Ffc/mqdefault.jpg" alt="Miniature" width="185" ><span class="vertical-align"></span></span></span></span><span class="video-time">5:15</span>
EOF
Use awk with sensible record separator:
awk -v RS='[=&"]' '/watch/ { getline; print }' file
Note, you should use a proper XML parser for this sort of task.

grep --perl-regexp --only-matching --regexp="(?<=/watch\\?=)([^&]{0,11})"

Assuming your lines have exactly the format you quoted, this should work.
awk '{print substr($0,10,11)}'
Edit: From the comment in another answer, I guess your lines are much longer and complicated than this, in which case something more comprehensive is needed:
gawk '{if(match($0, "/watch\\?v=(\\w+)",a)) print a[1]}'

How do I get rid of this unicode character?

Any idea how to get rid of this irritating character U+0092 from a bunch of text files? I've tried all the below but it doesn't work. It's called U+0092+control from the character map
sed -i 's/\xc2\x92//' *
sed -i 's/\u0092//' *
sed -i 's///' *
Ah, I've found a way:
CHARS=$(python2 -c 'print u"\u0092".encode("utf8")')
sed 's/['"$CHARS"']//g'
But is there a direct sed method for this?

Try sed "s/\`//g" *. (I added the g so it will remove all the backticks it finds).
EDIT: It's not a backtick that OP wants to remove.
Following the solution in this question, this ought to work:
sed 's/\xc2\x92//g'
To demonstrate it does:
$ CHARS=$(python -c 'print u"asdf\u0092asdf".encode("utf8")')
$ echo $CHARS
asdf<funny glyph symbol>asdf
$ echo $CHARS | sed 's/\xc2\x92//g'
asdfasdf
Seeing as it's something you tried already, perhaps what is in your text file is not U+0092?

This might work for you (GNU sed):
echo "string containing funny character(s)" | sed -n 'l0'
This will display the string as sed sees it in octal, then use:
echo "string containing funny character(s)" | sed 's/\onnn//g'
Where nnn is the octal value, to delete it/them.