perl parsing using sax - perl

I would like to write a xml parsing script in Perl that prints all the firstname values from the following xml file using XML::SAX module.
<employees>
<employee>
<firstname>John</firstname>
<lastname>Doe</lastname>
<age>gg</age>
<department>Operations</department>
<amount Ccy="EUR">100</amount>
</employee>
<employee>
<firstname>Larry</firstname>
<lastname>Page</lastname>
<age>45</age>
<department>Accounts</department>
<amount Ccy="EUR">200</amount>
</employee>
<employee>
<firstname>Harry</firstname>
<lastname>Potter</lastname>
<age>50</age>
<department>Human Resources</department>
<amount Ccy="EUR">300</amount>
</employee>
</employees>
Can anyone help me with sample script?
I am a new to Perl.

Here's an example using XML::SAX. I've used XML::SAX::PurePerl.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use XML::SAX::ParserFactory;
use XML::SAX::PurePerl;
my $characters;
my #firstnames;
my $factory = new XML::SAX::ParserFactory;
#Let's see which handlers we have available
#print Dumper $factory;
my $handler = new XML::SAX::PurePerl;
my $parser = $factory->parser(
Handler => $handler,
Methods => {
characters => sub {
$characters = shift->{Data};
},
end_element => sub {
push #firstnames, $characters if shift->{LocalName} eq 'firstname';
}
}
);
$parser->parse_uri("sample.xml");
print Dumper \#firstnames;
Output:
$VAR1 = [
'John',
'Larry',
'Harry'
];
I use $characters to hold character data, and push its contents onto #firstnames whenever I see a closing firstname tag.

Do you have any reason to stick with XML::Sax; If not then probably you can look for some other XML parsers in Perl (XML::Twig, XML::LibXML, XML::LibXMLReader, XML::Simple) and many more.
Here is a sample code to retrieve the firstname using XML::Twig.
use XML::Twig;
my $twig = XML::Twig->new ();
$twig->parsefile ('sample.xml');
my #firstname = map { $_->text } $twig->findnodes ('//firstname');

Related

How do I extract an attribute/property in Perl using XML::Twig module?

If I have the below sample XML, how do I extract the _Id from the field using XML::Twig?
<note>
<to _Id="100">Share</to>
<from>Jane</from>
<heading>Reminder</heading>
<body>A simple text</body>
</note>
I've tried combinations of the below with no luck.
sub getId {
my ($twig, $mod) = #_;
##my $to_id = $mod->field('to')->{'_Id'}; ## does not work
##my $to_id = $mod->{'atts'}->{_Id}; ## does not work
##my $to_id = $mod->id; ## does not work
$twig->purge;
}
This is one way to get 100. It uses the first_child method:
use warnings;
use strict;
use XML::Twig;
my $xml = <<XML;
<note>
<to _Id="100">Share</to>
<from>Jane</from>
<heading>Reminder</heading>
<body>A simple text</body>
</note>
XML
my $twig = XML::Twig->new(twig_handlers => { note => \&getId });
$twig->parse($xml);
sub getId {
my ($twig, $mod) = #_;
my $to_id = $mod->first_child('to')->att('_Id');
print "$to_id \n";
}

XML reading using Perl

I am new to the Perl language. I have an XML like,
<xml>
<date>
<date1>2012-10-22</date1>
<date2>2012-10-23</date2>
</date>
</xml>
I want to parse this XML file & store it in array. How to do this using perl script?
Use XML::Simple - Easy API to maintain XML (esp config files) or
see XML::Twig - A perl module for processing huge XML documents in tree mode.
Example like:
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
my $xml = q~<xml>
<date>
<date1>2012-10-22</date1>
<date2>2012-10-23</date2>
</date>
</xml>~;
print $xml,$/;
my $data = XMLin($xml);
print Dumper( $data );
my #dates;
foreach my $attributes (keys %{$data->{date}}){
push(#dates, $data->{date}{$attributes})
}
print Dumper(\#dates);
Output:
$VAR1 = [
'2012-10-23',
'2012-10-22'
];
Here's one way with XML::LibXML
#!/usr/bin/env perl
use strict;
use warnings;
use XML::LibXML;
my $doc = XML::LibXML->load_xml(location => 'data.xml');
my #nodes = $doc->findnodes('/xml/date/*');
my #dates = map { $_->textContent } #nodes;
Using XML::XSH2, a wrapper around XML::LibXML:
#!/usr/bin/perl
use warnings;
use strict;
use XML::XSH2;
xsh << '__XSH__';
open 2.xml ;
for $t in /xml/date/* {
my $s = string($t) ;
perl { push #l, $s }
}
__XSH__
no warnings qw(once);
print join(' ', #XML::XSH2::Map::l), ".\n";
If you can't/don't want to use any CPAN mod:
my #hits= $xml=~/<date\d+>(.+?)<\/date\d+>/
This should give you all the dates in the #hits array.
If the XML isn't as simple as your example, using a XML parser is recommended, the XML::Parser is one of them.

Perl : Handling duplicate element names using XML::SAX

How to handle duplicate element names in perl XML::SAX module ? Following is my xml file:
<employees>
<employee>
<name>John</name>
<age>gg</age>
<department>Operations</department>
<amount Ccy="EUR">100</amount>
<company>
<name> abc </name>
</company>
</employee>
<employee>
<name>Larry</name>
<age>45</age>
<department>Accounts</department>
<amount Ccy="EUR">200</amount>
<company>
<name> xyz </name>
</company>
</employee>
</employees>
My question is how to access the element employees->employee->company->name? (I should be able to print "abc" and "xyz").The reason I am asking this is because there is one more 'name' element at employees->employee->name which i want to skip. I would like to use XML::SAX only as my environments only supports this module. Please help. Thanks a lot.
Use a stack to keep record of which nodes you're within by pushing every time you enter a node, and poping every time you leave a node:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use XML::SAX::ParserFactory;
use XML::SAX::PurePerl;
my (#nodes, $characters, #names);
my $factory = new XML::SAX::ParserFactory;
my $handler = new XML::SAX::PurePerl;
my $parser = $factory->parser(
Handler => $handler,
Methods => {
start_element => sub {
push #nodes, shift->{LocalName};
},
characters => sub {
$characters = shift->{Data};
},
end_element => sub {
if (shift->{LocalName} eq 'name' && $nodes[-2] eq 'company') {
push #names, $characters;
}
pop #nodes;
}
}
);
$parser->parse_uri("sample2.xml");
print Dumper \#names;
Output:
$VAR1 = [
' abc ',
' xyz '
];
$nodes[-2] is the second to last element in #nodes and will resolve to 'employee' or 'company' when shift->{LocalName} equals 'name'

How to parse multi record XML file ues XML::Simple in Perl

My data.xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
<cd country="UK">
<title>Hide your heart</title>
<artist>Bonnie Tyler</artist>
<price>10.0</price>
</cd>
<cd country="CHN">
<title>Greatest Hits</title>
<artist>Dolly Parton</artist>
<price>9.99</price>
</cd>
<cd country="USA">
<title>Hello</title>
<artist>Say Hello</artist>
<price>0001</price>
</cd>
</catalog>
my test.pl
#!/usr/bin/perl
# use module
use XML::Simple;
use Data::Dumper;
# create object
$xml = new XML::Simple;
# read XML file
$data = $xml->XMLin("data.xml");
# access XML data
print "$data->{cd}->{country}\n";
print "$data->{cd}->{artist}\n";
print "$data->{cd}->{price}\n";
print "$data->{cd}->{title}\n";
Output:
Not a HASH reference at D:\learning\perl\t1.pl line 16.
Comment: I googled and found the article(handle single xml record).
http://www.go4expert.com/forums/showthread.php?t=812
I tested with the article code, it works quite well on my laptop.
Then I created my practice code above to try to access multiple record. but failed. How can I fix it? Thank you.
Always use strict;, always use warnings; Don't quote complex references like you're doing. You're right to use Dumper;, it should have shown you that cd was an array ref - you have to specificity which cd.
#!/usr/bin/perl
use strict;
use warnings;
# use module
use XML::Simple;
use Data::Dumper;
# create object
my $xml = new XML::Simple;
# read XML file
my $data = $xml->XMLin("file.xml");
# access XML data
print $data->{cd}[0]{country};
print $data->{cd}[0]{artist};
print $data->{cd}[0]{price};
print $data->{cd}[0]{title};
If you do print Dumper($data), you will see that the data structure does not look like you think it does:
$VAR1 = {
'cd' => [
{
'country' => 'UK',
'artist' => 'Bonnie Tyler',
'price' => '10.0',
'title' => 'Hide your heart'
},
{
'country' => 'CHN',
'artist' => 'Dolly Parton',
'price' => '9.99',
'title' => 'Greatest Hits'
},
{
'country' => 'USA',
'artist' => 'Say Hello',
'price' => '0001',
'title' => 'Hello'
}
]
};
You need to access the data like so:
print "$data->{cd}->[0]->{country}\n";
print "$data->{cd}->[0]->{artist}\n";
print "$data->{cd}->[0]->{price}\n";
print "$data->{cd}->[0]->{title}\n";
In addition to what has been said by Evan, if you're unsure if you're stuck with one or many elements, ref() can tell you what it is, and you can handle it accordingly:
my $data = $xml->XMLin("file.xml");
if(ref($data->{cd}) eq 'ARRAY')
{
for my $cd (#{ $data->{cd} })
{
print Dumper $cd;
}
}
else # Chances are it's a single element
{
print Dumper $cd;
}

How can I use Perl's XML::LibXML to extract an attribute in a tag?

I have an XML file
<PARENT >
<TAG string1="asdf" string2="asdf" >
</TAG >
</PARENT>
I want to extract the string2 value here.. and also I want to set it to a new value..
How to do that?
Use XPath expressions
use strict;
use warnings;
use XML::LibXML;
use Data::Dumper;
my $doc = XML::LibXML->new->parse_string(q{
<PARENT>
<TAG string1="asdf" string2="asdfd">
</TAG>
</PARENT>
});
my $xpath = '/PARENT/TAG/#string2';
# getting value of attribute:
print Dumper $doc->findvalue($xpath);
my ($attr) = $doc->findnodes($xpath);
# setting new value:
$attr->setValue('dfdsa');
print Dumper $doc->findvalue($xpath);
# do following if you need to get string representation of your XML structure
print Dumper $doc->toString(1);
And read documentation, of course :)
You could use XML::Parser to get the value as well. For more information refer to the XML::Parser documentation:
#!/usr/local/bin/perl
use strict;
use warnings;
use XML::Parser;
use Data::Dumper;
my $attributes = {};
my $start_handler = sub
{
my ( $expat, $elem, %attr ) = #_;
if ($elem eq 'TAG')
{
$attributes->{$attr{'string1'}} = 'Found';
}
};
my $p1 = new XML::Parser(
Handlers => {
Start => $start_handler
}
);
$p1->parsefile('test.xml');
print Dumper($attributes);
I think you might be better off starting with XML::Simple and playing around a little first:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Simple;
my $xml = XMLin(\*DATA);
print $xml->{TAG}->{string2}, "\n";
$xml->{TAG}->{string2} = "asdf";
print XMLout( $xml, RootName => 'PARENT');
__DATA__
<PARENT>
<TAG string1="asdf" string2="value of string 2">
</TAG>
</PARENT>
Thanks for your responses. I found another answer in "Config file processing with LibXML2" which I found very useful.