Print text value from xml using sed

Print text value from xml using sed - sed

using a one line sed command , how could i get the something and someone value?
<value name="something">someone</value>
Using the following regex <value name="(.*)">(.*)<\/value> i could retrieve the values with success using the site https://www.regex101.com/. But i'm not sure how could i do it using the command line.
Thanks in advance.

Something like
sed 's#.*name="\(.*\)">\(.*\)<.*#\1 \2#g'
Test
$ echo "<value name=\"something\">someone</value>" | sed 's#.*name="\(.*\)">\(.*\)<.*#\1 \2#g'
something someone

Try this.
sed -r 's/.*"(.*)">(.*)<.*>$/\1 \2/'

Noting the usual caveats about parsing XML with regular expressions, here's an XML parsing tool in action on your sample data that finds the attribute value and tag value for a "value" tag with a "name" attribute.
xmlstarlet sel -t -v '//value[#name]/#name' -n -v '//value[#name]' -n file.xml
something
someone

Related

Get TagValue of nth occurence of a Tag in XML using sed

MY xml
<?xml version="1.0" encoding="UTF-8" ?>
<Attributes>
<Attribute>123</Attribute>
<Attribute>959595</Attribute>
<Attribute>1233</Attribute>
<Attribute>jiji</Attribute>
</Attributes>
I need to get the tag value of second occurence of attribute tag i.e 959595 using sed
i used the command
sed -n ':a;$!{N;ba};s#\(<Attribute\)\(.*\)\(</Attribute>\)#\1#2#\2#p' file
pattern one second occurrence pattern two value it doesnt work
i dont know whether my approach is correct or not please correct my command

The proper way to do this is :
$ xmllint --xpath '/Attributes/Attribute[2]/text()' file.xml
NOTES
xmllint comes with libxml2.
the '2' is the second searched element

sed -n '/<Attributes>/,\#</Attributes># {
/<Attribute>/ {
H;g
s#.*<Attribute>\(.*\)</Attribute>.*#\1#
t found
}
b
:found
p;q
}' YourFile
Assuming, like in your sample, there is only 1 Attributes to found, this sed only return the 1st. (if the xml content is only like your sample, the /<Attributes>/,\#</Attributes># selection is not needed)
Posix version so --posix on GNU sed

This sed prints all Attribute entries from the Attributes block, then takes the second entry and removes the tags:
sed -n '/<Attributes>/,\#</Attributes>#{/<Attribute>/p}' attrib.txt | sed -n '2p' | sed 's#</Attribute>##;s/<Attribute>//'
Output:
959595
Or another way without pipes is to use sed commands, this goes to the second entry strips the Attribute tag and then quits:
sed -n '/<Attributes>/,\#</Attributes>#{/<Attribute>/{n;s#.*<Attribute>\(.*\)</Attribute>.*#\1#;p;q};}' attrib.txt
Or if your number of Attribute entries changes you can make it a bit more intuitive by parsing all values and then using sed to print the attribute placement where you want:
sed -n '/<Attributes>/,\#</Attributes>#{/<Attribute>/{s#</Attribute>##;s#<Attribute>##;p}}' attrib.txt | sed -n '2p'
You can change the end where from 2, to whatever Attribute value field you want to display or take multiple values like sed -n '2p;3p' or sed -n '1,2p'

I also would follow the xmllint xpath way. It however seems like there is two versions available. According to this man page at https://linux.die.net/man/1/xmllint there is no xpath parameter, but it is called "pattern".
Following this documentation, your call then would be
$ xmllint --pattern '/Attributes/Attribute[2]/text()' file.xml
I recommend checking your local man page to see which one to use.

Sed command to fetch particular string from full string

I've got a file which contains lot of strings like below input.
Need to extract the below output and process it further.
Input:
History={ExecAt=[2013-05-03 03:00:20,2013-05-03 03:00:23,2013-05-03 03:00:26],MId=["msgId3","msgId4","msgId5"]};
Output should be:
MId=["msgId3","msgId4","msgId5"]
using (sed 's/^.*,MId=/MId/') command i got the output like MId=["msgId3","msgId4","msgId5"]};
but still wanted the exact output (need to remove last 2 special chars }; here).

This works for me:
sed 's/.*\(MId=.*\)\}.*/\1/'

If your grep supports the -o option, you can use it rather than sed:
grep -o 'MId=\[[^]]\+\]'
Using the same regex in sed works fine, just remove anything before and after:
sed -e 's/.*\(MId=\[[^]]\+\]\).*/\1/'

How to extract URL from html source with sed/awk or cut?

I am writing a script that will download an html page source as a file and then read the file and extract a specific URL that is located after a specific code. (it only has 1 occurrence)
Here is a sample that I need matched:
<img id="sample-image" class="photo" src="http://xxxx.com/some/ic/pic_1asda963_16x9.jpg"
The code preceding the URL will always be the same so I need to extract the part between:
<img id="sample-image" class="photo" src="
and the " after the URL.
I tried something with sed like this:
sed -n '\<img\ id=\"sample-image\"\ class=\"photo\"\ src=\",\"/p' test.txt
But it does not work. I would appreciate your suggestions, thanks a lot !

You can use grep like this :
grep -oP '<img\s+id="sample-image"\s+class="photo"\s+src="\K[^"]+' test.txt
or with sed :
sed -r 's/<img\s+id="sample-image"\s+class="photo"\s+src="([^"]+)"/\1/' test.txt
or with awk :
awk -F'src="' -F'"' '/<img\s+id="sample-image"/{print $6}' test.txt

If you have GNU grep then you can do something like:
grep -oP "(?<=src=\")[^\"]+(?=\")" test.txt
If you wish to use awk then the following would work:
awk -F\" '{print $(NF-1)}' test.txt

With sed as
echo $string | sed 's/\<img.*src="\(.*\)".*/\1/'

A few things about the sed command you are using:
sed -n '\<img\ id=\"sample-image\"\ class=\"photo\"\ src=\",\"/p' test.txt
You don't need to escape the <, " or space. The single quotes prevents the shell from doing word splitting and other stuff on your sed expression.
You are essentially doing this sed -n '/pattern/p' test.txt (except you seemed to be missing the opening backslash) which says "match this pattern, then print the line which contain the match", you are not really extracting the URL.
This is minor, but you don't need to match class="photo" since the id already makes the HTML element unique (no two elements share the same id w/in the same HTML).
Here's what I would do
sed -n 's/.*<img id="sample-image".*src="\([^"]+\)".*/\1/p' test.txt
The p flag tells sed to print the line where substitution (s) was performed.
\(pattern\) captures a subexpression which can be accessed via \1, \2, etc. on the right side of s///
The .* at the start of regex is in case there is something else preceding the <img> element on the line (you did mention you are parsing a HTML file)

sed replace text in a XML file

I have huge XML file with data like this:
<amount quantity="1">12.00</amount>
How can i replace the 12.00 with something else using sed?

Not really enough information in your question but to replace all values of 12.00 with say 24.00 you could do:
$ sed 's/>12\.00</>24.00</g' file.xml
If you are happy with the results you can store them back using the -i option:
$ sed -i 's/>12\.00</>24.00</g' file.xml
A more rubust match would be:
$ sed -r 's_(<amount quantity="[0-9]+">)12.00(</amount>)_\124.00\2_g' file.xml
But you should really parse the XML properly and not force regexp to do something it wasn't designed for.

script.sh:
#!/bin/bash
xml="<amount quantity="1">12.00</amount>"
newxml=`echo $xml | sed -n "s/\(<amount[^>]*>\)\([^<]*\)\(<\/amount>\)/\113.37\3/gp"`
echo "$newxml"
result:
$ ./script.sh
<amount quantity=1>13.37</amount>

Xmlstarlet and sed to replace string in a file

I have huge number of html files. I need to replace all the , and " with html entities &nsbquo and &quto respectively.
I need to succeed in two steps for this:
1) Find all the text between tags. I need to replace only in this text between tags.
2) Replace all required strings using sed
My command for this is :
xmlstarlet sel -t -v "*//p" "index.html" | sed 's/,/\&nsbquo/'
This works, but now I dont know how to put back the changes to index.html file.
In sed we have -i option, but for that I need to specify the filename with sed command. But in my case, i have to use | to filter out the required string from html file.
Please help. I did a lot of search for this from 2 days but no luck.
Thank you,
Divya.

The main problem here is that in XML there is no difference between " and ", so you can't use xmlstarlet to do this directly. You could replace " with a special string and then use sed to replace that with ":
xmlstarlet ed -u "//p/text()" \
-x "str:replace(str:replace(., ',', '#NSBQUO#'), '\"', '#QUOT#')" \
quote.html | \
sed 's/#NSBQUO#/\&nsbquo\;/g; s/#QUOT#/\&quot\;/g' > quote-new.html
mv quote-new.html quote.html
NOTE: str:replace and other exslt functions were only added to xmlstarlet ed in version 1.3.0, so it was not available at the time this question was asked.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Print text value from xml using sed - sed

Something like sed 's#.name="\(.\)">\(.\)<.#\1 \2#g' Test $ echo "<value name=\"something\">someone</value>" | sed 's#.name="\(.\)">\(.\)<.#\1 \2#g' something someone

Try this. sed -r 's/."(.)">(.)<.>$/\1 \2/'

Related

Get TagValue of nth occurence of a Tag in XML using sed

Sed command to fetch particular string from full string

How to extract URL from html source with sed/awk or cut?

sed replace text in a XML file

Xmlstarlet and sed to replace string in a file

Categories

Resources

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Print text value from xml using sed - sed

Something like sed 's#.*name="\(.*\)">\(.*\)<.*#\1 \2#g' Test $ echo "<value name=\"something\">someone</value>" | sed 's#.*name="\(.*\)">\(.*\)<.*#\1 \2#g' something someone

Try this. sed -r 's/.*"(.*)">(.*)<.*>$/\1 \2/'

Related

Get TagValue of nth occurence of a Tag in XML using sed

Sed command to fetch particular string from full string

How to extract URL from html source with sed/awk or cut?

sed replace text in a XML file

Xmlstarlet and sed to replace string in a file

Categories

Resources

Something like sed 's#.name="\(.\)">\(.\)<.#\1 \2#g' Test $ echo "<value name=\"something\">someone</value>" | sed 's#.name="\(.\)">\(.\)<.#\1 \2#g' something someone

Try this. sed -r 's/."(.)">(.)<.>$/\1 \2/'