How to read value of a node in Perl libXML::Reader - perl

My XML looks like this-
<info>
<name>NameGoesHere</name>
<age>99</age>
</info>
Using XML::LibXML::Reader, I would like to print the age.
I read the documentation from here, and I am looking for an example. I am able to use $reader->nextElement('info') and then print innerXML but not able to figure out how do I print only age and not the whole innerxml.

Try localName:
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML::Reader;
my $reader = XML::LibXML::Reader->new(IO => \*DATA)
or die "Cannot read from \\*DATA\n";
while ( $reader->read ) {
print $reader->readInnerXml if $reader->localName eq 'age';
}
$reader->finish;
__DATA__
<info>
<name>NameGoesHere</name>
<age>99</age>
</info>
Mostly from Regex: keep everything in <profession></profession> tags

use strict;
use warnings;
use feature qw( say );
use XML::LibXML::Reader qw(
XML_READER_TYPE_ELEMENT
XML_READER_TYPE_END_ELEMENT
XML_READER_TYPE_TEXT
);
my $reader = XML::LibXML::Reader->new(IO => \*DATA);
my $in_age = 0;
my $age;
while ($reader->read()) {
if ($reader->nodeType() == XML_READER_TYPE_ELEMENT && $reader->name() eq 'age') {
$age = '' if !$in_age;
++$in_age;
}
elsif ($reader->nodeType() == XML_READER_TYPE_END_ELEMENT && $reader->name() eq 'age') {
--$in_age;
say $age if !$in_age;
}
elsif ($in_age && $reader->nodeType() == XML_READER_TYPE_TEXT) {
$age .= $reader->value();
}
}
__DATA__
<info>
<name>NameGoesHere</name>
<age>99</age>
</info>

Related

Config::Simple, how to change default output header?

Testing the generation on the fly of config files in ini format with the package Config::Simple generates the desired file but always with the same header section, which includes the name of the Perl package at the beginning of the file. Does Config::Simple have a method to modify this default printing? I would like to replace this name with a new one. Is it possible with some feature in the package?
Here is a toy code:
#!/usr/bin/perl
use strict;
use warnings;
use Config::Simple;
my $cfg = new Config::Simple(
syntax => 'ini'
) or die Config::Simple->error();
$cfg->param("Program.mode", "ALL");
$cfg->param("Program.strategies", "1,2,3,4,5,6,7,8,9,10");
$cfg->param("Data.name_specie", "Homo sapiens");
$cfg->write("test.cfg") or die $cfg->error();
The output:
; Config::Simple 4.58
; Mon Mar 16 12:33:55 2020
[Program]
strategies=1,2,3,4,5,6,7,8,9,10
mode=ALL
[Data]
name_specie=Homo sapiens
Just wanted to replace the ; Config::Simple 4.58 line.
Thanks in advance for your suggestions.
"Simple" modules often aren't simple because they make so many decisions for you. If you don't like those decisions, you are stuck. This particular module hasn't been updated in over a decade and has several architectural issues. If you want INI files, use a different module, such as Config:: IniFiles, Config::Tiny, or Config::INI which are maintained and flexible.
My first thought was to subclass and override the method that adds the header since it was hard-baked into one of the methods. This is onerous because the module uses a combination of "private" subroutines and package variables. I tend to try this first because I don't disturb the original package:
use strict;
use warnings;
use v5.12;
use Config::Simple;
package Local::Config::Simple {
use parent qw(Config::Simple);
# generates a writable string
sub as_string {
my $self = shift;
my $syntax = $self->{_SYNTAX} or die "'_SYNTAX' is not defined";
my $sub_syntax = $self->{_SUB_SYNTAX} || '';
my $currtime = localtime;
my $STRING = undef;
if ( $syntax eq 'ini' ) {
while ( my ($block_name, $key_values) = each %{$self->{_DATA}} ) {
unless ( $sub_syntax eq 'simple-ini' ) {
$STRING .= sprintf("[%s]\n", $block_name);
}
while ( my ($key, $value) = each %{$key_values} ) {
my $values = join (Config::Simple::WRITE_DELIM(), map { Config::Simple::quote_values($_) } #$value);
$STRING .= sprintf("%s=%s\n", $key, $values );
}
$STRING .= "\n";
}
} elsif ( $syntax eq 'http' ) {
while ( my ($key, $value) = each %{$self->{_DATA}} ) {
my $values = join (Config::Simple::WRITE_DELIM(), map { Config::Simple::quote_values($_) } #$value);
$STRING .= sprintf("%s: %s\n", $key, $values);
}
} elsif ( $syntax eq 'simple' ) {
while ( my ($key, $value) = each %{$self->{_DATA}} ) {
my $values = join (Config::Simple::WRITE_DELIM(), map { Config::Simple::quote_values($_) } #$value);
$STRING .= sprintf("%s %s\n", $key, $values);
}
}
$STRING .= "\n";
return $STRING;
}
}
my $cfg = Local::Config::Simple->new(
syntax => 'ini'
) or die Config::Simple->error();
$cfg->param("Program.mode", "ALL");
$cfg->param("Program.strategies", "1,2,3,4,5,6,7,8,9,10");
$cfg->param("Data.name_specie", "Homo sapiens");
$cfg->write("file.ini") or die $cfg->error();
That works and gives the output:
[Data]
name_specie=Homo sapiens
[Program]
mode=ALL
strategies=1,2,3,4,5,6,7,8,9,10
However, it broke several OO ideas, so I find this approach unpleasant. I can do a little bit less work by fixing the original package by redefining the original subroutine. Then the package variables and subroutines still work. Load the original module first then add your redefinitions:
use strict;
use warnings;
use v5.12;
use Config::Simple;
no warnings 'redefine';
package Config::Simple {
# generates a writable string
sub as_string {
my $self = shift;
my $syntax = $self->{_SYNTAX} or die "'_SYNTAX' is not defined";
my $sub_syntax = $self->{_SUB_SYNTAX} || '';
my $currtime = localtime;
my $STRING = undef;
if ( $syntax eq 'ini' ) {
while ( my ($block_name, $key_values) = each %{$self->{_DATA}} ) {
unless ( $sub_syntax eq 'simple-ini' ) {
$STRING .= sprintf("[%s]\n", $block_name);
}
while ( my ($key, $value) = each %{$key_values} ) {
my $values = join (WRITE_DELIM, map { quote_values($_) } #$value);
$STRING .= sprintf("%s=%s\n", $key, $values );
}
$STRING .= "\n";
}
} elsif ( $syntax eq 'http' ) {
while ( my ($key, $value) = each %{$self->{_DATA}} ) {
my $values = join (WRITE_DELIM, map { quote_values($_) } #$value);
$STRING .= sprintf("%s: %s\n", $key, $values);
}
} elsif ( $syntax eq 'simple' ) {
while ( my ($key, $value) = each %{$self->{_DATA}} ) {
my $values = join (WRITE_DELIM, map { quote_values($_) } #$value);
$STRING .= sprintf("%s %s\n", $key, $values);
}
}
$STRING .= "\n";
return $STRING;
}
}
my $cfg = Config::Simple->new(
syntax => 'ini'
) or die Config::Simple->error();
$cfg->param("Program.mode", "ALL");
$cfg->param("Program.strategies", "1,2,3,4,5,6,7,8,9,10");
$cfg->param("Data.name_specie", "Homo sapiens");
$cfg->write("file.ini") or die $cfg->error();
I write quite a bit about this in Effective Perl Programming as a way to deal with legacy code.
As a side note, you asked if there was some method in the module. You could have simply looked at the source to see what was happening and what was available. You would have seen that the header was hard-coded into as_string.

How to distinguish between "0" and NULL in perl?

Here we are looking for the string "reftext" in the given file. The line next to this contains a string with 3 integers. So we are extracting them in #all_num. We are printing the value of #all_num[2] only if is not NULL. But the logic used here doesn't print #all_num[2] even if it has 0.
#!/usr/bin/perl
open( READFILE, "<myfile.txt" );
#list = <READFILE>;
$total_lines = scalar #list;
for ( $count = 0; $count < $total_lines; $count++ ) {
if (#list[ $count =~ /reftext/ )
{
#all_num = #list[ $count + 1 ] =~ /(\d+)/g;
if ( #all_num[2] != NULL ) {
print "#all_num[2]\n";
}
}
}
Hope this helps,
use strict;
use warnings;
my #fvals = (
[ i => undef ],
[ j => 0 ],
[ k => "" ],
);
for my $r (#fvals) {
my ($k, $v) = #$r;
if (!defined($v)) { print "$k is undef\n"; }
elsif (!length($v)) { print "$k is empty string\n"; }
# elsif (!$v) { print "$k is zero\n"; }
# recognizes zero value in "0.0" or "0E0" notation
elsif ($v == 0) { print "$k is zero\n"; }
}
output
i is undef
j is zero
k is empty string
Perl does not include a NULL, so the line
if(#all_num[2]!= NULL)
is nonsensical in Perl. (More accurately, it attempts to locate a sub named NULL and run it to get the value to compare against #all_num[2], but fails to do so because you (presumably) haven't defined such a sub.) Note that, if you had enabled use strict, this would cause a fatal error instead of pretending to work. This is one of the many reasons to always use strict.
Side note: When you pull a value out of an array, it's only a single value, so you should say $all_num[2] rather than #all_num[2] when referring to the third element of the array #all_num. (Yes, this is a little confusing to get used to. I hear that it's been changed in Perl 6, but I'm assuming you're using Perl 5 here.) Note that, if you had enabled use warnings, it would have told you that "Scalar value #all_num[2] better written as $all_num[2]". This is one of the many reasons to always use warnings.
If you want to test whether $all_num[2] contains a value, the proper way to express that in Perl is
if (defined $all_num[2])
This is how your program would look using best practices
You should
Always use strict and use warnings, and declare all your variables with my
Use the three-parameter form of open
Check that open calls succeeded, and include $! in the die string if not
Use a while loop to process a file one line at a time, in preference to reading the entire file into memory
#!/usr/bin/perl
use strict;
use warnings;
open my $fh, '<', 'myfile.txt' or die $!;
while ( <$fh> ) {
next unless /reftext/;
my $next_line = <$fh>;
my #all_num = $next_line =~ /\d+/g;
print "$all_num[2]\n" if defined $all_num[2];
}
Try this:
#!/usr/bin/perl
use warnings;
use strict;
open(READFILE, "<", "myfile.txt") or die $!;
my #list = <READFILE>;
my $total_lines = scalar #list;
close (READFILE);
for(my $count=0; $count<$total_lines; $count++)
{
if($list[$count] =~ /reftext/)
{
my #all_num = $list[$count+1] =~ /(\d+)/g;
if($all_num[2] ne '')
{
print "$all_num[2]\n";
}
}
}
To check a variable is null or not:
if ($str ne '')
{
print $str;
}
or better:
my ($str);
$str = "";
if (defined($str))
{
print "defined";
}
else
{
print "not defined";
}
If the other answers do not work, try treating the variable as a string:
if ( $all_num[2] == 'null' && length($all_num[2]) == 4 ){
# null
} else {
# not null
}
As with any code you write, be sure to test your code.

How to search a string in web page and print that full line in which search string is present?

I'm new to programming, learning perl as well.
Here's my question: How do I search a string in web page and print that full line in which search string is present?
Is it possible to find/hit directly that string and then print that full line in which search string is present? Do we need to use xpaths compulsory for this?
If it is just a very basic string you are looking for you can use LWP::Simple and a small regular expression like this:
use LWP::Simple;
my $doc = get('http://stackoverflow.com/q/11771655/479133') || die "GET failed";
foreach my $line (split("\n", $doc)) {
print $line and last if $line =~ m/Here's my query/;
}
There are countless modules available on CPAN to do such things. Have a look at Task::Kensho::WebCrawling if you need something "bigger".
LWP::UserAgent and HTML::Parser can be used:
#!/usr/bin/env perl
use strict;
use warnings;
use HTML::Parser;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $response = $ua->get('http://search.cpan.org/');
if ( !$response->is_success ) {
print "No matches\n";
exit 1;
}
my $parser = HTML::Parser->new( 'text_h' => [ \&text_handler, 'dtext' ] );
$parser->parse( $response->decoded_content );
sub text_handler {
chomp( my $text = shift );
if ( $text =~ /language/i ) {
print "Matched: $text\n";
}
}

How to add one more node information to xml file

I written one script that create one xml file from multiple files,I written script like this.
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
use Carp;
use File::Find;
use File::Spec::Functions qw( canonpath );
use XML::LibXML::Reader;
use Digest::MD5 'md5';
if ( #ARGV == 0 ) {
push #ARGV, "c:/main/work";
warn "Using default path $ARGV[0]\n Usage: $0 path ...\n";
}
open( my $allxml, '>', "all_xml_contents.combined.xml" )
or die "can't open output xml file for writing: $!\n";
print $allxml '<?xml version="1.0" encoding="UTF-8"?>',
"\n<Shiporder xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">\n";
my %shipto_md5;
find(
sub {
return unless ( /(_stc\.xml)$/ and -f );
extract_information();
return;
},
#ARGV
);
print $allxml "</Shiporder>\n";
sub extract_information {
my $path = $_;
if ( my $reader = XML::LibXML::Reader->new( location => $path )) {
while ( $reader->nextElement( 'data' )) {
my $elem = $reader->readOuterXml();
my $md5 = md5( $elem );
print $allxml $reader->readOuterXml() unless ( $shipto_md5{$md5}++ );
}
}
return;
}
from above script I am extracting data node information from all xml files and stored in a new xml file . but I have one more node starts with "details", I need to extract that information and I need to add that information also to the file, I tried like this
$reader->nextElement( 'details' );
my $information = $reader->readOuterXml();
I added this in while loop but how can I assign or print this data into same file($all xml). Please help me with this problem.
After your suggestion I tried like this, It gives error
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
use Carp;
use File::Find;
use File::Spec::Functions qw( canonpath );
use XML::LibXML::Reader;
if ( #ARGV == 0 ) {
push #ARGV, "V:/main/work";
warn "Using default path $ARGV[0]\n Usage: $0 path ...\n";
}
my $libXML = new XML::LibXML;
my $outputDom = $libXML->parse_string('<?xml version="1.0" encoding="UTF-8"?
>','<Shiporder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">');
my $shiporder = $outputDom->documentElement;
find(
sub {
return unless ( /(_stc\.xml)$/ and -f );
extract_information();
return;
},
#ARGV
);
sub extract_information {
my $path = $_;
if(my #inputDom = XML::LibXML->load_xml(location => $path)){
$inputDom->findnodes('//data || //deatils');
foreach (#$inputDom) {
$shiporder->appendChild($_->parentNode->cloneNode(1));
}
$outputDom->toFile("allfiles.xml");
}
}
but it gives like " '\n\n:1: Parser error:Strat tag expected,'<' not found " Can you help me with script because I am very new to perl.
You would do a lot better if you used what XML::LibXML and related modules gives you, it is a very large and comprehensive module and allows you to do a lot in few lines.
You can use the parser to start a new dom document using parse_string, storing the root node using documentElement. From there, use parse_file to load up each of your input files, then findnodes on the input files to extract the nodes you want to clone. Then append a clone of your input nodes to the output document, and finally use the toFile method to write out your output.
Something like:
my $libXML = new XML::LibXML;
my $outputDom = $libXML->parse_string('<?xml version="1.0" encoding="UTF-8"?>',
'\n<Shiporder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">\n');
my $shiporder = $outputDom->documentElement;
...
my $inputDom = $libXML->parse_file(some_file_name);
$inputDom->findnodes('//data || //details'); # use a more suitable xpath
foreach (#$inputDom) {
$shipOrder->appendChild($_->parentNode->cloneNode(1)); # if you want parent too...
}
...
$outputDom->toFile(some_output_file);
}
You will have to allow for namespaces and whatnot, but this gives one approach to start with.

How to keep data marked as UTF-8 after parsing with HTML::Tree?

I wrote a script, where i slurp in UTF-8 encoded HTML-file and then parse it to tree with HTML::Tree. Problem is that after parsing the strings are not marked as UTF-8 anymore.
As _utf8_on() is not recommended way to set flag on, i am looking for proper way.
My simplified code-example:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use utf8::all;
use autodie;
use HTML::Tree;
use Encode qw/is_utf8/;
my $file = shift;
my $tree;
if ($file) {
my $content = slurp_in( 'file' => $file );
$tree = html_tree('content' => $content);
} else {
die "no file";
}
my $title = $tree->look_down(_tag => 'title');
$title = $title->as_HTML('');
if ( is_utf8( $title ) ) {
say "OK: $title";
} else {
say "NOT OK: $title";
}
## SUBS
##
sub slurp_in {
my %v = #_;
open(my $fh, "<:utf8", $v{file}) || die "no $v{file}: $!";
local $/;
my $content = (<$fh>);
close $fh;
if ($content) {
return $content;
} else {
die "no content in $v{file} !";
}
}
sub html_tree {
my %v = #_;
my $tree = HTML::Tree->new();
$tree->utf8_mode(1); ## wrong call here, no such method, but no warnings on it!
$tree->parse( $v{content} );
if ($tree) {
return $tree;
} else {
die "no tree here";
}
}
Your code is overcomplicated, and you employ utf8::all and decode manually and call that strange method all at once. Rhetorically asking, what do you expect to achieve that way? I do not have the patience to find out the details what goes wrong and where, especially since you did not post any input with which your program fails to do the expected, so I drastically reduce it to a much simpler one. This works:
#!/usr/bin/env perl
use 5.010;
use strict;
use warnings FATAL => ':all';
use File::Slurp qw(read_file); # autodies on error
use HTML::Tree qw();
my $file = shift;
die 'no file' unless $file;
my $tree = HTML::Tree->new_from_content(
read_file($file, binmode => ':encoding(UTF-8)')
);
my $title = $tree->look_down(_tag => 'title');
$title->as_HTML(''); # returns a Perl string