print lines between x and y from the log using Perl - perl

I have log file and that contain some xml messages like...
<fixsim xyz='tststtsts'>
<name test="test1">
<time t=234>
</time>
</name>
</fixsim>
here some normal log text
whoiwoei
blsdbndsnb
<fixsim xyz='tssts'
<name test="test2"
<time t=234>
</time>
</name>
</fixsim>
and so on....
From the above log file i want to grab the xml message (from <Fixsim> to </fixsim>) with some condition. For example
i want xml message having test= test2. so as output i should get
<fixsim xyz='tssts'
<name test="test2"
<time t=234>
</time>
</name>
</fixsim>

The following will get the XML docs:
process($_) for $log =~ m{<fixsim.*?</fixsim>}sg;
and so would
my $xml;
while (<$log_fh>) {
if ( my $count = m{<fixsim} .. m{</fixsim>} ) {
$xml .= $_;
if ($count =~ /E0\z/) {
process($xml);
$xml = undef;
}
}
process($xml) if defined($xml);
}
Once you got the XML, you can extract the field you need using your favorite XML parser.

Related

unable to parse xml file using registered namespace

I am using XML::LibXML to parse a XML file. There seems to some problem in using registered namespace while accessing the node elements. I am planning to covert this xml data into CSV file. I am trying to access each and every element here. To start with I tried out extracting attribute values of <country> and <state> tags. Below is the code I have come with . But I am getting error saying XPath error : Undefined namespace prefix.
use strict;
use warnings;
use Data::Dumper;
use XML::LibXML;
my $XML=<<EOF;
<DataSet xmlns="http://www.w3schools.com" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3schools.com note.xsd">
<exec>
<survey_region ver="1.1" type="x789" date="20160312"/>
<survey_loc ver="1.1" type="x789" date="20160312"/>
<note>Population survey</note>
</exec>
<country name="ABC" type="MALE">
<state name="ABC_state1" result="PASS">
<info>
<type>literacy rate comparison</type>
</info>
<comment><![CDATA[
Some random text
contained here
]]></comment>
</state>
</country>
<country name="XYZ" type="MALE">
<state name="XYZ_state2" result="FAIL">
<info>
<type>literacy rate comparison</type>
</info>
<comment><![CDATA[
any random text data
]]></comment>
</state>
</country>
</DataSet>
EOF
my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($XML);
my $xc = XML::LibXML::XPathContext->new($doc);
$xc->registerNs('x','http://www.w3schools.com');
foreach my $camelid ($xc->findnodes('//x:DataSet')) {
my $country_name = $camelid->findvalue('./x:country/#name');
my $country_type = $camelid->findvalue('./x:country/#type');
my $state_name = $camelid->findvalue('./x:state/#name');
my $state_result = $camelid->findvalue('./x:state/#result');
print "state_name ($state_name)\n";
print "state_result ($state_result)\n";
print "country_name ($country_name)\n";
print "country_type ($country_type)\n";
}
Update
if I remove the name space from XML and change my XPath slightly it seems to work. Can someone help me understand the difference.
foreach my $camelid ($xc->findnodes('//DataSet')) {
my $country_name = $camelid->findvalue('./country/#name');
my $country_type = $camelid->findvalue('./country/#type');
my $state_name = $camelid->findvalue('./country/state/#name');
my $state_result = $camelid->findvalue('./country/state/#result');
print "state_name ($state_name)\n";
print "state_result ($state_result)\n";
print "country_name ($country_name)\n";
print "country_type ($country_type)\n";
}
This would be my approach
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
my $XML=<<EOF;
<DataSet xmlns="http://www.w3schools.com" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3schools.com note.xsd">
<exec>
<survey_region ver="1.1" type="x789" date="20160312"/>
<survey_loc ver="1.1" type="x789" date="20160312"/>
<note>Population survey</note>
</exec>
<country name="ABC" type="MALE">
<state name="ABC_state1" result="PASS">
<info>
<type>literacy rate comparison</type>
</info>
<comment><![CDATA[
Some random text
contained here
]]></comment>
</state>
</country>
<country name="XYZ" type="MALE">
<state name="XYZ_state2" result="FAIL">
<info>
<type>literacy rate comparison</type>
</info>
<comment><![CDATA[
any random text data
]]></comment>
</state>
</country>
</DataSet>
EOF
my $parser = XML::LibXML->new();
my $tree = $parser->parse_string($XML);
my $root = $tree->getDocumentElement;
my #country = $root->getElementsByTagName('country');
foreach my $citem(#country){
my $country_name = $citem->getAttribute('name');
my $country_type = $citem->getAttribute('type');
print "Country Name -- $country_name\nCountry Type -- $country_type\n";
my #state = $citem->getElementsByTagName('state');
foreach my $sitem(#state){
my #info = $sitem->getElementsByTagName('info');
my $state_name = $sitem->getAttribute('name');
my $state_result = $sitem->getAttribute('result');
print "State Name -- $state_name\nState Result -- $state_result\n";
foreach my $i (#info){
my $text = $i->getElementsByTagName('type');
print "Info --- $text\n";
}
}
print "\n";
}
Of course you can manipulate the data anyway you'd like. If you are parsing from a file change parse_string to parse_file.
For the individual elements in the xml use the getElementsByTagName to get the elements within the tags. This should be enough to get you going
There seem to be two small mistakes here.
1. call findvalue for the XPathContext document with the context node as parameter.
2. name is a attribute in country no a node.
Therefor try :
my $country_name = $xc->findvalue('./x:country/#name', $camelid );
Update to the updated question if I remove the name space from XML and change my XPath slightly it seems to work. Can someone help me understand the difference.
To understand what happens here have a look to NOTE ON NAMESPACES AND XPATH
In your case $camelid->findvalue('./x:state/#name'); calls findvalue is called for an node.
But: The recommended way is to use the XML::LibXML::XPathContext module to define an explicit context for XPath evaluation, in which a document independent prefix-to-namespace mapping can be defined. Which I did above.
Conclusion:
Calling find on a node will only work: if the root element had no namespace
(or if you use the same prefix as in the xml doucment if ther is any)

XML::LibXML replace value

I have the below xml file and in this I want to write a perl script so that I can grep on first book name and then author name and if both of them matches change the field "value" from false to true. For ex if book name is abc and author name is john, change the value from false to true.
use warnings;
use strict;
use XML::LibXML;
my $parser = XML::LibXML->new();
my $xmldoc = $parser->parse_file('b.xml');
for my $book ($xmldoc->findnodes('/library/book')) {
my $name = $book->findvalue('/#name');
if($name eq "abc")
{
print "yes" ;
}
}
<library>
<book name="abc" id="3">
<key name="history">
<default label="base" value="1"/>
</key>
<author name="john">
<default label="base" value="false"/>
</author>
</book>
<book name="xyz" id="4">
<key name="science">
<default label="base" value="1"/>
</key>
<author name="nik">
<default label="base" value="false"/>
</author>
</book>
I am new to perl, can someone help me here? I have written some code to reach to first point to check the author name but that itself is not working.
for my $default_author_node ($xmldoc->findnodes(
'/library/book[#name="abc"]/author[#name="john"]/default'
)) {
$default_author_node->setAttribute('value', 'true');
}
But you probably want the names to be variable.
Solution 1: Dynamically build the above XPath.
sub text_to_xpath {
my ($s) = #_;
return qq{"$s"} if $s !~ tr/"//;
return qq{'$s'} if $s !~ tr/'//;
$s = s/"/", '"', "/g;
return qq{concat("$s")};
}
my $target_book_xp = text_to_xpath($target_book);
my $target_author_xp = text_to_xpath($target_author);
for my $default_author_node ($xmldoc->findnodes(
"/library/book[\#name=$target_book_xp]/author[\#name=$target_author_xp]/default"
)) {
$default_author_node->setAttribute('value', 'true');
}
Solution 2: Do the checking yourself.
This is what you were attempting, but XPath /#name gets the child attributes (#) named name at the root of the document (/), but the only node there is the root element (library). Just like in directory paths, don't use a leading / if you want to search relative to the context.
for my $book_node ($xmldoc->findnodes('/library/book')) {
my $name = $book_node->getAttribute('name');
next if !defined($name) || $name ne $target_book;
for my $author_node ($book_node->findnodes('author')) {
my $name = $book_node->getAttribute('name');
next if !defined($name) || $name ne $target_author;
for my $default_author_node ($author_node->findnodes('default')) {
$default_author_node->setAttribute('value', 'true');
}
}
}

How to actually modify values of an XML file using XML::LibXML

I have an XML file (information.xml). I have to extract element and attribute values from this XML file and insert those element and attribute values into another XML file (build.xml). I have to change the build.xml file by filling the appropriate element values and tags from information.xml file.
I have to use XML::LibXML to do so. I am able to extract the element and attribute values from information.xml. But, I am unable to open and fill those values in build.xml
Example :
information.xml
<info>
<app version="10.5.10" long_name ="My Application">
<name> MyApp </name>
<owner>larry </owner>
<description> This is my first application</description>
</app>
</info>
build.xml
<build long_name="" version="">
<section type="Appdesciption">
<description> </description>
</section>
<section type="Appdetails">
<app_name> </app_name>
<owner></owner>
</section>
</build>
Now, my task is to extract value of owner from information.xml, open build.xml, search for owner tag in build.xml and put the extracted value there.
The Perl script looks like:
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
my $file1="/root/shubhra/myapp/information.xml";
my $file2="/root/shubhra/myapp/build.xml";
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($file1);
foreach my $line ($doc->findnodes('//info/app'))
{
my $owner= $line->findnodes('./owner'); # 1st way
print "\n",$owner->to_literal,"\n";
my ($long_name) = $line->findvalue('./#long_name'); # 2nd way
print "\n $long_name \n";
my $version = $line->findnodes('#version');
print "\n",$version->to_literal,"\n";
}
my $parser2 = XML::LibXML->new();
my $doc2 = $parser2->parse_file($file2);
foreach my $line2 ($doc2->findnodes('//build'))
{
my ($owner2)= $line2->findnodes('./section/owner/text()');
my ($version2)=$line2->findvalue('./#version');
print "\n Build.xml already has version : $version2 \n";
print "\n Build.xml already has owner :",$owner2->to_literal;
$owner2->setData("Windows Application 2"); # Not changing build.xml
$line2->setAttribute(q|version|,"60.60.60"); # Not changing build.xml
my $changedversion = $line2->getAttribute(q|version|);
#superficially changed but didn't changed build.xml content
print "\n The changed version is : $changedversion";
}
build.xml looks like :
<build long_name="" version="9.10.10">
<section type="Appdesciption">
<description> </description>
</section>
<section type="Appdetails">
<app_name> </app_name>
<owner>shubhra</owner>
</section>
</build>
my $doc3 = XML::LibXML->load_xml(location => $file2, no_blanks => 1);
my $xpath_expression = '/build/section/owner/text()';
my #nodes = $doc3->findnodes( $xpath_expression );
for my $node (#nodes) {
my $content = $node->toString;
$content = $owner;
$node->setData($content);
}
$doc->toFile($file2 . '.new', 1);
The following fails to find anything (setting $owner2 to undef) since owner has no text:
my ($owner2) = $line2->findnodes('./section/owner/text()');
You want
my ($owner2) = $line2->findnodes('./section/owner');
This entails changing
print "\n Build.xml already has owner :", $owner2->to_literal;
to
print "\n Build.xml already has owner :", $owner2->textContent;
and
$owner2->setData("Windows Application 2");
to
$owner2->removeChildNodes();
$owner2->appendText("Windows Application 2");
You imply you want the following to change build.xml, but it doesn't even mention build.xml:
$line2->setAttribute(q|version|, "60.60.60");
It does modify $doc2, but you'll need to add the following code to modify build.xml too:
$doc2->toFile('build.xml');

Perl LibXML raw data from textContent?

Given the following XML:
<?xml version="1.0" encoding="utf-8" ?>
<Request>
<form_submit>
<form_submit id = 1424>
<form_id>1424</form_id>
<field1 id=’5’> <![CDATA[ test ]]> </field1>
<field2 id=’6’> <![CDATA[ test2 ]]> </field2>
</form_submit>
</form_submit>
</Request>
I'm trying to get the raw values for the field1 and field2 elements. I'm using the following code:
foreach my $node ( $xml_request->findnodes('Request/*/*/*[#id]') )
{
my $form_field_value = $node->textContent;
print "Value:\"$form_field_value\"\n";
}
But the output is:
Value:" test "
Value:" test2 "
How do I retrieve the exact data, raw and as is, with all the special characters? So that the output is:
Value:" <![CDATA[ test ]]> "
Value:" <![CDATA[ test2 ]]> "
Thank you.
Am not a libxml expert.
However this is what I could figure out after playing with your xml and libxml a bit.
CDATA is a node/section and is not part of text.
Code below goes one level deep and do a toString() for cdata child nodes
and textContent for other nodes.
foreach my $node ( $xml_request->findnodes('Request/*/*/*[#id]') )
{
my $text;
if($node->childNodes) {
foreach my $child ($node->childNodes()) {
if ($child->nodeType == XML::LibXML::XML_CDATA_SECTION_NODE) {
$text .= $child->toString;
} else {
$text .= $child->textContent;
}
}
} else {
$text = $node->textContent;
}
print qq{"$text"\n};
}
will print
" <![CDATA[ test ]]> "
" <![CDATA[ test2 ]]> "
Your sample data is invalid XML, and won't parse unless you replace 1424, ’5’ and ’6’ with "1424", "5" and "6".
You have asked for the text content and have got exactly that. To get what you need you must search for the children of the <fieldN> elements and use the toString method on them.
This code shows the idea. Note that the spaces before and after the CDATA, which would otherwise appear as separate text nodes, have been eliminated using a keep_blanks => 0 option on the object constructor.
use strict;
use warnings;
use XML::LibXML;
my $xml_request = XML::LibXML->load_xml(string => <<'END', keep_blanks => 0);
<?xml version="1.0" encoding="utf-8" ?>
<Request>
<form_submit>
<form_submit id = "1424">
<form_id>1424</form_id>
<field1 id="5"> <![CDATA[ test ]]> </field1>
<field2 id="6"> <![CDATA[ test2 ]]> </field2>
</form_submit>
</form_submit>
</Request>
END
foreach my $node ( $xml_request->findnodes('//form_submit/*[#id]/text()') ) {
my $form_field_value = $node->toString;
print qq(Value: "$form_field_value"\n);
}
output
Value: "<![CDATA[ test ]]>"
Value: "<![CDATA[ test2 ]]>"
Edit
ikegami has commented that the output requested in the question includes the whitespace surrounding the CDATA section. I don't know whether that is truly part of the requirement, but this edit provides a way to do that.
This would be clearer using XML::LibXML::Reader as it has a readInnerXml method (comparable to JavaScript's innerHTML ) that does exactly what is necessary. Instead, this program has to serialize all the children of the <fieldN> nodes and concatenate them with join.
This is a new foreach loop. The rest of the program remains unchanged except for the construction of $xml_request, which must have the keep_blanks option set to 1 or removed altogether.
foreach my $node ( $xml_request->findnodes('//*[starts-with(name(),"field")][#id]') ) {
my $form_field_value = join '', map $_->toString, $node->childNodes;
print qq(Value: "$form_field_value"\n);
}
output
Value: " <![CDATA[ test ]]> "
Value: " <![CDATA[ test2 ]]> "

perl script to replace the xml values

I have this XML file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<BroadsoftDocument protocol = "OCI" xmlns="C" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<sessionId xmlns="">169.254.52.85,16602326,1324821125562</sessionId>
<command xsi:type="UserAddRequest14sp9" xmlns="">
<serviceProviderId>AtyafBahrain</serviceProviderId>
<groupId>LoadTest</groupId>
<userId>user_0002#atyaf.me</userId>
<lastName>0002</lastName>
<firstName>user</firstName>
<callingLineIdLastName>0002</callingLineIdLastName>
<callingLineIdFirstName>user</callingLineIdFirstName>
<password>123456</password>
<language>English</language>
<timeZone>Asia/Bahrain</timeZone>
<address/>
</command>
</BroadsoftDocument>
and I need to replace the values of some fields (UserID, firstName, password) and output the file to be saved with the same name.
Using the code below I will change the syntax of the xml fields (xml format gets disturbed):
XMLout( $xml, KeepRoot => 1, NoAttr => 1, OutputFile => $xml_file, );
can you please advice how to edit the xml file without changing its syntax?
You can checkout XML::Simple parser for perl. You can refer to the CPAN site. I have used it for parsing XML files but I think this should allow modification as well.
# open XML file (input the XML file name)
open (INPUTFILE, "+<$filename_1");
#file = <INPUTFILE>;
seek INPUTFILE,0,0;
foreach $file (#file)
{
# Find string_1 and replace it by string_2
$file =~ s/$str_1/$str_2/g;
# write to file
print INPUTFILE $file;
}
close INPUTFILE;