XML reading using Perl - perl

I am new to the Perl language. I have an XML like,
<xml>
<date>
<date1>2012-10-22</date1>
<date2>2012-10-23</date2>
</date>
</xml>
I want to parse this XML file & store it in array. How to do this using perl script?

Use XML::Simple - Easy API to maintain XML (esp config files) or
see XML::Twig - A perl module for processing huge XML documents in tree mode.
Example like:
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
my $xml = q~<xml>
<date>
<date1>2012-10-22</date1>
<date2>2012-10-23</date2>
</date>
</xml>~;
print $xml,$/;
my $data = XMLin($xml);
print Dumper( $data );
my #dates;
foreach my $attributes (keys %{$data->{date}}){
push(#dates, $data->{date}{$attributes})
}
print Dumper(\#dates);
Output:
$VAR1 = [
'2012-10-23',
'2012-10-22'
];

Here's one way with XML::LibXML
#!/usr/bin/env perl
use strict;
use warnings;
use XML::LibXML;
my $doc = XML::LibXML->load_xml(location => 'data.xml');
my #nodes = $doc->findnodes('/xml/date/*');
my #dates = map { $_->textContent } #nodes;

Using XML::XSH2, a wrapper around XML::LibXML:
#!/usr/bin/perl
use warnings;
use strict;
use XML::XSH2;
xsh << '__XSH__';
open 2.xml ;
for $t in /xml/date/* {
my $s = string($t) ;
perl { push #l, $s }
}
__XSH__
no warnings qw(once);
print join(' ', #XML::XSH2::Map::l), ".\n";

If you can't/don't want to use any CPAN mod:
my #hits= $xml=~/<date\d+>(.+?)<\/date\d+>/
This should give you all the dates in the #hits array.
If the XML isn't as simple as your example, using a XML parser is recommended, the XML::Parser is one of them.

Related

How to display readable UTF-8 strings with Data::Dumper?

I have some UTF-8 encoded strings in structures which I am dumping for debugging purposes with Data::Dumper.
A small test case is:
use utf8;
use Data::Dumper;
say Dumper({да=>"не"}
It outputs
{
"\x{434}\x{430}" => "\x{43d}\x{435}"
};
but I want to see
{
"да" => "не"
};
Of course my structure is quite more complex.
How can I make the strings in the dumped structure readable while debugging? Maybe I have to process the output via chr somehow before warn/say?
Just for debugging:
#!/usr/bin/perl
use strict;
use warnings;
use v5.10;
use utf8;
use Data::Dumper;
binmode STDOUT, ':utf8';
CASE_1: {
# Redefine Data::Dumper::qquote() to do nothing
no warnings 'redefine';
local *Data::Dumper::qquote = sub { qq["${\(shift)}"] };
# Use the Pure Perl implementation of Dumper
local $Data::Dumper::Useperl = 1;
say Dumper({да=>"не"});
}
CASE_2: {
# Use YAML instead
use YAML;
say Dump({да=>"не"});
}
CASE_3: {
# Evalulate whole dumped string
no strict 'vars';
local $Data::Dumper::Terse = 1;
my $var = Dumper({да=>"не"});
say eval "qq#$var#" or die $#;
}
__END__
$VAR1 = {
"да" => "не"
};
---
да: не
{
"да" => "не"
}
print Dumper(%mydata) =~ s/\\x\{([0-9a-f]{2,})\}/chr hex $1/ger;
sorry but I had tested eval whole dump and had got some repugnancy for my data so
Data::Dumper->new(\#_)
->Indent(1)->Sortkeys(1)->Terse(1)->Useqq(0)->Dump
=~ s/((?:\\x\{[\da-f]+\})+)/eval '"'.$1.'"'/eigr;

List all the subroutine names in perl program

I am using more modules in my perl program.
example:
use File::copy;
so likewise File module contains Basename, Path, stat and etc..
i want to list all the subroutine(function) names which is in File Package module.
In python has dir(modulename)
It list all the function that used in that module....
example:
#!/usr/bin/python
# Import built-in module math
import math
content = dir(math)
print content
Like python tell any code for in perl
If you want to look at the contents of a namespace in perl, you can use %modulename::.
For main that's either %main:: or %::.
E.g.:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
sub fish {};
sub carrot {};
print "Stuff defined in Dumper:\n";
print Dumper \%Data::Dumper::;
print "Stuff defined:\n";
print Dumper \%::;
That covers a load of stuff though - including pragmas. But you can check for e.g. subroutines by simply testing it for being a code reference.
foreach my $thing ( keys %:: ) {
if ( defined &$thing ) {
print "sub $thing\n";
}
}
And with reference to the above sample, this prints:
sub Dumper
sub carrot
sub fish
So with reference to your original question:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use File::Copy;
print "File::Copy has subs of:\n";
foreach my $thing ( keys %File::Copy:: ) {
if ( defined &$thing ) {
print "sub $thing\n";
}
}
Unfortunately you can't do the same thing with the whole File:: namespace, because there's a whole bunch of different modules that could be installed/loaded, but might not be.
You'd have to use e.g. CPAN to check that -
perl -MCPAN -e shell
i /^File::/
Which will list you around 717 modules that are grouped into the File:: tree.
You could look this up on CPAN. Or if you're just after the core modules, then some variant of using Module::CoreList might do what you want.
Something like this:
#!/usr/bin/perl
use strict;
use warnings;
use Module::CoreList;
foreach my $module ( Module::CoreList->find_modules(qr/^File::/) ) {
if ( eval { require $module =~ s|::|/|gr . ".pm" } ) {
print "Module: $module contains\n";
my $key_str = "\%$module\:\:";
my %stuff = eval $key_str;
foreach my $thing ( sort keys %stuff ) {
my $full_sub_path = "$module::$thing";
if ( eval {"defined &$full_sub_path"} ) {
if ( defined &$thing ) {
print "$thing <- $full_sub_path imported by default\n";
}
else {
print "\t$full_sub_path might be loadable\n";
}
}
}
}
else {
print "Module: $module couldn't be loaded\n";
}
}
It's a bit messy because you have to eval various bits of it to test if a module is in fact present and loadable at runtime. Oddly enough, File::Spec::VMS wasn't present on my Win32 system. Can't think why.... :).
Should note - just because you could import a sub from a module (that isn't exported by default) doesn't make it a good idea. By convention, any sub prefixed with an _ is not supposed to be used externally, etc.
My Devel::Examine::Subs module can do this, plus much more. Note that whether it's a method or function is irrelevant, it'll catch both. It works purely on subroutines as found with PPI.
use warnings;
use strict;
use Devel::Examine::Subs;
my $des = Devel::Examine::Subs->new;
my $subs = $des->module(module => 'File::Copy');
for (#$subs){
print "$_\n";
}
Output:
_move
move
syscopy
carp
mv
_eq
_catname
cp
copy
croak
Or a file/full directory. For all Perl files in a directory (recursively), just pass the dir to file param without a file at the end of the path:
my $des = Devel::Examine::Subs->new(file => '/path/to/file.pm');
my $subs = $des->all;
If you just want to print it use the Data::Dumper module and the following method, CGI used as an example:
use strict;
use warnings;
use CGI;
use Data::Dumper;
my $object = CGI->new();
{
no strict 'refs';
print "Instance METHOD IS " . Dumper( \%{ref ($object)."::" }) ;
}
Also note, it's File::Copy, not File::copy.

What module can I use to parse RSS feeds in a Perl CGI script?

I am trying to find a RSS parser that can be used with a Perl CGI script. I found simplepie and that's really easy parser to use in PHP scripting. Unfortunately that doesn't work with a Perl CGI script. Please let me know if there is anything that's easy to use like simplepie.
I came across this one RssDisplay but I am not sure about the usage and also how good it is.
From CPAN: XML::RSS::Parser.
XML::RSS::Parser is a lightweight liberal parser of RSS feeds. This parser is "liberal" in that it does not demand compliance of a specific RSS version and will attempt to gracefully handle tags it does not expect or understand. The parser's only requirements is that the file is well-formed XML and remotely resembles RSS.
#!/usr/bin/perl
use strict; use warnings;
use XML::RSS::Parser;
use FileHandle;
my $parser = XML::RSS::Parser->new;
unless ( -e 'uploads.rdf' ) {
require LWP::Simple;
LWP::Simple::getstore(
'http://search.cpan.org/uploads.rdf',
'uploads.rdf',
);
}
my $fh = FileHandle->new('uploads.rdf');
my $feed = $parser->parse_file($fh);
print $feed->query('/channel/title')->text_content, "\n";
my $count = $feed->item_count;
print "# of Items: $count\n";
foreach my $i ( $feed->query('//item') ) {
print $i->query('title')->text_content, "\n";
}
Available Perl Modules
XML::RSS::Tools
XML::RSS::Parser:
#!/usr/bin/perl -w
use strict;
use XML::RSS::Parser;
use FileHandle;
my $p = XML::RSS::Parser->new;
my $fh = FileHandle->new('/path/to/some/rss/file');
my $feed = $p->parse_file($fh);
# output some values
my $feed_title = $feed->query('/channel/title');
print $feed_title->text_content;
my $count = $feed->item_count;
print " ($count)\n";
foreach my $i ( $feed->query('//item') ) {
my $node = $i->query('title');
print ' '.$node->text_content;
print "\n";
}
XML::RSS::Parser::Lite (Pure Perl):
use XML::RSS::Parser::Lite;
use LWP::Simple;
my $xml = get("http://url.to.rss");
my $rp = new XML::RSS::Parser::Lite;
$rp->parse($xml);
print join(' ', $rp->get('title'), $rp->get('url'), $rp->get('description')), "\n";
for (my $i = 0; $i < $rp->count(); $i++) {
my $it = $rp->get($i);
print join(' ', $it->get('title'), $it->get('url'), $it->get('description')), "\n";
}
dirtyRSS:
use dirtyRSS;
$tree = parse($in);
die("$tree\n") unless (ref $tree);
disptree($tree, 0);

How can I use Perl's XML::LibXML to extract an attribute in a tag?

I have an XML file
<PARENT >
<TAG string1="asdf" string2="asdf" >
</TAG >
</PARENT>
I want to extract the string2 value here.. and also I want to set it to a new value..
How to do that?
Use XPath expressions
use strict;
use warnings;
use XML::LibXML;
use Data::Dumper;
my $doc = XML::LibXML->new->parse_string(q{
<PARENT>
<TAG string1="asdf" string2="asdfd">
</TAG>
</PARENT>
});
my $xpath = '/PARENT/TAG/#string2';
# getting value of attribute:
print Dumper $doc->findvalue($xpath);
my ($attr) = $doc->findnodes($xpath);
# setting new value:
$attr->setValue('dfdsa');
print Dumper $doc->findvalue($xpath);
# do following if you need to get string representation of your XML structure
print Dumper $doc->toString(1);
And read documentation, of course :)
You could use XML::Parser to get the value as well. For more information refer to the XML::Parser documentation:
#!/usr/local/bin/perl
use strict;
use warnings;
use XML::Parser;
use Data::Dumper;
my $attributes = {};
my $start_handler = sub
{
my ( $expat, $elem, %attr ) = #_;
if ($elem eq 'TAG')
{
$attributes->{$attr{'string1'}} = 'Found';
}
};
my $p1 = new XML::Parser(
Handlers => {
Start => $start_handler
}
);
$p1->parsefile('test.xml');
print Dumper($attributes);
I think you might be better off starting with XML::Simple and playing around a little first:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Simple;
my $xml = XMLin(\*DATA);
print $xml->{TAG}->{string2}, "\n";
$xml->{TAG}->{string2} = "asdf";
print XMLout( $xml, RootName => 'PARENT');
__DATA__
<PARENT>
<TAG string1="asdf" string2="value of string 2">
</TAG>
</PARENT>
Thanks for your responses. I found another answer in "Config file processing with LibXML2" which I found very useful.

How do I convert Data::Dumper output back into a Perl data structure?

I was wondering if you could shed some lights regarding the code I've been doing for a couple of days.
I've been trying to convert a Perl-parsed hash back to XML using the XMLout() and XMLin() method and it has been quite successful with this format.
#!/usr/bin/perl -w
use strict;
# use module
use IO::File;
use XML::Simple;
use XML::Dumper;
use Data::Dumper;
my $dump = new XML::Dumper;
my ( $data, $VAR1 );
Topology:$VAR1 = {
'device' => {
'FOC1047Z2SZ' => {
'ChassisID' => '2009-09',
'Error' => undef,
'Group' => {
'ID' => 'A1',
'Type' => 'Base'
},
'Model' => 'CATALYST',
'Name' => 'CISCO-SW1',
'Neighbor' => {},
'ProbedIP' => 'TEST',
'isDerived' => 0
}
},
'issues' => [
'TEST'
]
};
# create object
my $xml = new XML::Simple (NoAttr=>1,
RootName=>'data',
SuppressEmpty => 'true');
# convert Perl array ref into XML document
$data = $xml->XMLout($VAR1);
#reads an XML file
my $X_out = $xml->XMLin($data);
# access XML data
print Dumper($data);
print "STATUS: $X_out->{issues}\n";
print "CHASSIS ID: $X_out->{device}{ChassisID}\n";
print "GROUP ID: $X_out->{device}{Group}{ID}\n";
print "DEVICE NAME: $X_out->{device}{Name}\n";
print "DEVICE NAME: $X_out->{device}{name}\n";
print "ERROR: $X_out->{device}{error}\n";
I can access all the element in the XML with no problem.
But when I try to create a file that will house the parsed hash, problem arises because I can't seem to access all the XML elements. I guess, I wasn't able to unparse the file with the following code.
#!/usr/bin/perl -w
use strict;
#!/usr/bin/perl
# use module
use IO::File;
use XML::Simple;
use XML::Dumper;
use Data::Dumper;
my $dump = new XML::Dumper;
my ( $data, $VAR1, $line_Holder );
#this is the file that contains the parsed hash
my $saveOut = "C:/parsed_hash.txt";
my $result_Holder = IO::File->new($saveOut, 'r');
while ($line_Holder = $result_Holder->getline){
print $line_Holder;
}
# create object
my $xml = new XML::Simple (NoAttr=>1, RootName=>'data', SuppressEmpty => 'true');
# convert Perl array ref into XML document
$data = $xml->XMLout($line_Holder);
#reads an XML file
my $X_out = $xml->XMLin($data);
# access XML data
print Dumper($data);
print "STATUS: $X_out->{issues}\n";
print "CHASSIS ID: $X_out->{device}{ChassisID}\n";
print "GROUP ID: $X_out->{device}{Group}{ID}\n";
print "DEVICE NAME: $X_out->{device}{Name}\n";
print "DEVICE NAME: $X_out->{device}{name}\n";
print "ERROR: $X_out->{device}{error}\n";
Do you have any idea how I could access the $VAR1 inside the text file?
Regards,
newbee_me
$data = $xml->XMLout($line_Holder);
$line_Holder has only the last line of your file, not the whole file, and not the perl hashref that would result from evaling the file. Try something like this:
my $ref = do $saveOut;
The do function loads and evals a file for you. You may want to do it in separate steps, like:
use File::Slurp "read_file";
my $fileContents = read_file( $saveOut );
my $ref = eval( $fileContents );
You might want to look at the Data::Dump module as a replacement for Data::Dumper; its output is already ready to re-eval back.
Basically to load Dumper data you eval() it:
use strict;
use Data::Dumper;
my $x = {"a" => "b", "c"=>[1,2,3],};
my $q = Dumper($x);
$q =~ s{\A\$VAR\d+\s*=\s*}{};
my $w = eval $q;
print $w->{"a"}, "\n";
The regexp (s{\A\$VAR\d+\s*=\s*}{}) is used to remove $VAR1= from the beginning of string.
On the other hand - if you need a way to store complex data structure, and load it again, it's much better to use Storable module, and it's store() and retrieve() functions.
This has worked for me, for hashes of hashes. Perhaps won't work so well with structures which contain references other structures. But works well enough for simple structures, like arrays, hashes, or hashes of hashes.
open(DATA,">",$file);
print DATA Dumper(\%g_write_hash);
close(DATA);
my %g_read_hash = %{ do $file };
Please use dump module as a replacement for Data::Dumper
You can configure the variable name used in Data::Dumper's output with $Data::Dumper::Varname.
Example
use Data::Dumper
$Data::Dumper::Varname = "foo";
my $string = Dumper($object);
eval($string);
...will create the variable $foo, and should contain the same data as $object.
If your data structure is complicated and you have strange results, you may want to consider Storable's freeze() and thaw() methods.