How to skip unwanted elements using XML::Twig?

How to skip unwanted elements using XML::Twig? - perl

Trying to learn XML::Twig and fetch some data from an XML document.
My XML contains 20k+ <ADN> elements. Eaach <ADN> element contains tens of child elements, one of them is the <GID>. I want process only those ADN where the GID == 1. (See the example XML is the __DATA__)
The docs says:
Handlers are triggered in fixed order, sorted by their type (xpath
expressions first, then regexps, then level), then by whether they
specify a full path (starting at the root element) or not, then by
number of steps in the expression , then number of predicates, then
number of tests in predicates. Handlers where the last step does not
specify a step (foo/bar/*) are triggered after other XPath handlers.
Finally all handlers are triggered last.
Important: once a handler has been triggered if it returns 0 then no
other handler is called, except a all handler which will be called
anyway.
My actual code:
use 5.014;
use warnings;
use XML::Twig;
use Data::Dumper;
my $cat = load_xml_catalog();
say Dumper $cat;
sub load_xml_catalog {
my $hr;
my $current;
my $twig= XML::Twig->new(
twig_roots => {
ADN => sub { # process the <ADN> elements
$_->purge; # and purge when finishes with one
},
},
twig_handlers => {
'ADN/GID' => sub {
return 1 if $_->trimmed_text == 1;
return 0; # skip the other handlers - if the GID != 1
},
'ADN/ID' => sub { #remember the ID as a "key" into the '$hr' for the "current" ADN
$current = $_->trimmed_text;
$hr->{$current}{$_->tag} = $_->trimmed_text;
},
#rules for the wanted data extracting & storing to $hr->{$current}
'ADN/Name' => sub {
$hr->{$current}{$_->tag} = $_->text;
},
},
);
$twig->parse(\*DATA);
return $hr;
}
__DATA__
<ArrayOfADN>
<ADN>
<GID>1</GID>
<ID>1</ID>
<Name>name 1</Name>
</ADN>
<ADN>
<GID>2</GID>
<ID>20</ID>
<Name>should be skipped because GID != 1</Name>
</ADN>
<ADN>
<GID>1</GID>
<ID>1000</ID>
<Name>other name 1000</Name>
</ADN>
</ArrayOfADN>
It outputs
$VAR1 = {
'1000' => {
'ID' => '1000',
'Name' => 'other name 1000'
},
'1' => {
'Name' => 'name 1',
'ID' => '1'
},
'20' => {
'Name' => 'should be skipped because GID != 1',
'ID' => '20'
}
};
So,
The handler for the ADN/GID returns 0 when the GID != 1.
Why the other handlers are still called?
The expected (wanted) output is without the '20' => ... .
How to skip the unwanted nodes correctly?

The "returns zero" thing is a bit of a red herring in this context. If you had multiple matches on your element, then one of them returning zero would inhibit the others.
That doesn't mean it won't still try and process subsequent nodes.
I think you're getting confused - you have handlers for separate subelements of your <ADN> elements - and they trigger separately. That's by design. There is a precedence order for xpath but only on duplicate matches. Yours are completely separate though, so they all 'fire' because they trigger on different elements.
However, you might find it useful to know - twig_handlers allows xpath expressions - so you can explicitly say:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig->parse( \*DATA );
$twig -> set_pretty_print('indented_a');
foreach my $ADN ( $twig -> findnodes('//ADN/GID[string()="1"]/..') ) {
$ADN -> print;
}
This also works in the twig_handlers syntax. I would suggest doing a handler is only really useful if you need to pre-process your XML, or you're memory constrained. With 20,000 nodes, you may be. (at which point purge is your friend).
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig->new(
pretty_print => 'indented_a',
twig_handlers => {
'//ADN[string(GID)="1"]' => sub { $_->print }
}
);
$twig->parse( \*DATA );
__DATA__
<ArrayOfADN>
<ADN>
<GID>1</GID>
<ID>1</ID>
<Name>name 1</Name>
</ADN>
<ADN>
<GID>2</GID>
<ID>20</ID>
<Name>should be skipped because GID != 1</Name>
</ADN>
<ADN>
<GID>1</GID>
<ID>1000</ID>
<Name>other name 1000</Name>
</ADN>
</ArrayOfADN>
Although, I would probably just do it this way instead:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
sub process_ADN {
my ( $twig, $ADN ) = #_;
return unless $ADN -> first_child_text('GID') == 1;
print "ADN with name:", $ADN -> first_child_text('Name')," Found\n";
}
my $twig = XML::Twig->new(
pretty_print => 'indented_a',
twig_handlers => {
'ADN' => \&process_ADN
}
);
$twig->parse( \*DATA );
__DATA__
<ArrayOfADN>
<ADN>
<GID>1</GID>
<ID>1</ID>
<Name>name 1</Name>
</ADN>
<ADN>
<GID>2</GID>
<ID>20</ID>
<Name>should be skipped because GID != 1</Name>
</ADN>
<ADN>
<GID>1</GID>
<ID>1000</ID>
<Name>other name 1000</Name>
</ADN>
</ArrayOfADN>

Related

How to print an element from an array which is inside the hash in perl

I'm trying to print the outputs from an API which are in multidimensional format.
use strict;
use warnings;
use Data::Dumper;
my $content={
'school_set' => 'SSET1234',
'result' => [
{
'school_name' => 'school_abc',
'display_value' => 'IL25',
'school_link' => 'example.com',
'status' => 'registerd',
'status_message' => 'only arts',
'school_id' => '58c388d40596191f',
}
],
'school_table' => 'arts_schools'
};
print "school_name is=".$content{result}[0]{school_name};
print "school_status is=".$content{result}[3]{status};
output
Global symbol "%content" requires explicit package name (did you forget to declare "my %content"?) at test8.pl line 20.
Global symbol "%content" requires explicit package name (did you forget to declare "my %content"?) at test8.pl line 21.
I have to print the outputs like below from the result.
school_name = school_abc
school_status = registered

If $content is a hash reference, you need to dereference it first. Use the arrow operator for that:
$content->{result}[0]{school_name}
The syntax without the arrow is only possible for %content.
my %content = ( result => [ { school_name => 'abc' } ] );
print $content{result}[0]{school_name};
If you want to print all the results, you have to loop over the array somehow. For example
#!/usr/bin/perl
use warnings;
use strict;
my $content = {
'result' => [
{
'school_name' => 'school_abc',
'status' => 'registerd',
},
{
'school_name' => 'school_def',
'status' => 'pending',
}
],
};
for my $school (#{ $content->{result} }) {
print "school_name is $school->{school_name}, status is $school->{status}\n";
}

Your data structure assumes an array, perhaps it would be useful to utilize loop output for the data of interest.
The data presented as hash reference and will require de-referencing to loop through an array.
Following code snippet is based on your posted code and demonstrates how desired output can be achieved.
use strict;
use warnings;
use feature 'say';
my $dataset = {
'school_set' => 'SSET1234',
'result' => [
{
'school_name' => 'school_abc',
'display_value' => 'IL25',
'school_link' => 'example.com',
'status' => 'registerd',
'status_message' => 'only arts',
'school_id' => '58c388d40596191f',
}
],
'school_table' => 'arts_schools'
};
for my $item ( #{$dataset->{result}} ) {
say "school_name is = $item->{school_name}\n"
. "school_status is = $item->{status}";
}
exit 0;
Output
school_name is = school_abc
school_status is = registerd

Perl: overwrite structure member value

I am creating $input with this code:
push(#{$input->{$step}},$time);, then I save it in an xml file, and at the next compiling, I read it from that file. When i print it, i get the structure bellow.
if(-e $file)
my $input =XMLin($file...);
print Dumper $input;
and I get this structure
$VAR1 = {
'opt' => {
'step820' => '0',
'step190' => '0',
'step124' => '0',
}
};
for each step with it's time..
push(#{$input->{$step}},$time3);
XmlOut($file, $input);
If I run the program again, I get this structure:
$VAR1 = {
'opt' => {
'step820' => '0',
'step190' => '0',
'step124' => '0',
'opt' => {
'step820' => '0',
'step190' => '0',
'step124' => '0'
}
}
I just need to overwrite the values of steps(ex:$var1->opt->step820 = 2). How can i do that?

I just need to overwrite the values of steps(ex:$var1->opt->step820 = 2). How can i do that?
$input->{opt}->{step820} = 2;

I'm going to say what I always do, whenever someone posts something asking about XML::Simple - and that is that XML::Simple is deceitful - it isn't simple at all.
Why is XML::Simple "Discouraged"?
So - in your example:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my $xml= XML::Twig->new->parsefile($file);
$xml -> get_xpath('./opt/step820',0)->set_text("2");
$xml -> print;
The problem is that XML::Simple is only any good for parsing the type of XML that you didn't really need XML for in the first place.
For more simple examples - have you considered using JSON for serialisation? As it more directly reflects the hash/array structure of native perl data types.
That way you can instead:
print {$output_fh} to_json ( $myconfig, {pretty=>1} );
And read it back in:
my $myconfig = from_json ( do { local $/; <$input_fh> });
Something like:
#!/usr/bin/env perl
use strict;
use warnings;
use JSON;
my $input;
my $time = 0;
foreach my $step ( qw ( step820 step190 step124 ) ) {
push(#{$input->{$step}},$time);
}
print to_json ( $input, {pretty=>1} );
Giving resultant JSON of:
{
"step190" : [
0
],
"step820" : [
0
],
"step124" : [
0
]
}
Although actually, I'd probably:
foreach my $step ( qw ( step820 step190 step124 ) ) {
$input->{$step} = $time;
}
print to_json ( $input, {pretty=>1} );
Which gives;
{
"step190" : 0,
"step124" : 0,
"step820" : 0
}
JSON uses very similar conventions to perl - in that {} denote key value pairs (hashes) and [] denote arrays.

Look at the RootName option of XMLout. By default, when "XMLout()" generates XML, the root element will be named 'opt'. This option allows you to specify an alternative name.
Specifying either undef or the empty string for the RootName option will produce XML with no root elements.

Returning a hash of the Parsed document (using Twig in Perl) to be used for processing in other subs

I am failing terribly to return a Hash of the Parsed XML document using twig - in order to use it in OTHER subs for performing several validation checks. The goal is to do abstraction and create re-usable blocks of code.
XML Block:
<?xml version="1.0" encoding="utf-8"?>
<Accounts locale="en_US">
<Account>
<Id>abcd</Id>
<OwnerLastName>asd</OwnerLastName>
<OwnerFirstName>zxc</OwnerFirstName>
<Locked>false</Locked>
<Database>mail</Database>
<Customer>mail</Customer>
<CreationDate year="2011" month="8" month-name="fevrier" day-of-month="19" hour-of-day="15" minute="23" day-name="dimanche"/>
<LastLoginDate year="2015" month="04" month-name="avril" day-of-month="22" hour-of-day="11" minute="13" day-name="macredi"/>
<LoginsCount>10405</LoginsCount>
<Locale>nl</Locale>
<Country>NL</Country>
<SubscriptionType>free</SubscriptionType>
<ActiveSubscriptionType>free</ActiveSubscriptionType>
<SubscriptionExpiration year="1980" month="1" month-name="janvier" day-of-month="1" hour-of-day="0" minute="0" day-name="jeudi"/>
<SubscriptionMonthlyFee>0</SubscriptionMonthlyFee>
<PaymentMode>Undefined</PaymentMode>
<Provision>0</Provision>
<InternalMail>asdf#asdf.com</InternalMail>
<ExternalMail>fdsa#zxczxc.com</ExternalMail>
<GroupMemberships>
<Group>werkgroep X.Y.Z.</Group>
</GroupMemberships>
<SynchroCount>6</SynchroCount>
<LastSynchroDate year="2003" month="12" month-name="decembre" day-of-month="5" hour-of-day="12" minute="48" day-name="mardi"/>
<HasActiveSync>false</HasActiveSync>
<Company/>
</Account>
<Account>
<Id>mnbv</Id>
<OwnerLastName>cvbb</OwnerLastName>
<OwnerFirstName>bvcc</OwnerFirstName>
<Locked>true</Locked>
<Database>mail</Database>
<Customer>mail</Customer>
<CreationDate year="2012" month="10" month-name="octobre" day-of-month="10" hour-of-day="10" minute="18" day-name="jeudi"/>
<LastLoginDate/>
<LoginsCount>0</LoginsCount>
<Locale>fr</Locale>
<Country>BE</Country>
<SubscriptionType>free</SubscriptionType>
<ActiveSubscriptionType>free</ActiveSubscriptionType>
<SubscriptionExpiration year="1970" month="1" month-name="janvier" day-of-month="1" hour-of-day="1" minute="0" day-name="jeudi"/>
<SubscriptionMonthlyFee>0</SubscriptionMonthlyFee>
<PaymentMode>Undefined</PaymentMode>
<Provision>0</Provision>
<InternalMail/>
<ExternalMail>qweqwe#qwe.com</ExternalMail>
<GroupMemberships/>
<SynchroCount>0</SynchroCount>
<LastSynchroDate year="1970" month="1" month-name="janvier" day-of-month="1" hour-of-day="1" minute="0" day-name="jeudi"/>
<HasActiveSync>false</HasActiveSync>
<Company/>
</Account>
</Accounts>
Perl Block:
my $file = shift || (print "NOTE: \tYou didn't provide the name of the file to be checked.\n" and exit);
my $twig = XML::Twig -> new ( twig_roots => { 'Account' => \& parsing } ); #'twig_roots' mode builds only the required sub-trees from the document while ignoring everything outside that twig.
$twig -> parsefile ($file);
sub parsing {
my ( $twig, $accounts ) = #_;
my %hash = #_;
my $ref = \%hash; #because was getting an error of Odd number of hash elements
return $ref;
$twig -> purge;
It gives a hash reference - which I'm unable to deference properly (even after doing thousands of attempts).
Again - just need a single clean function (sub) for doing the Parsing and returning the hash of all elements ('Accounts' in this case) - to be used in other other function (valid_sub) for performing the validation checks.
I'm literally stuck at this point - and will HIGHLY appreciate your HELP.

Such a hash is not created by Twig, you have to create it yourself.
Beware: Commands after return will never be reached.
#!/usr/bin/perl
use warnings;
use strict;
use XML::Twig;
use Data::Dumper;
my $twig = 'XML::Twig'->new(twig_roots => { Account => \&account });
$twig->parsefile(shift);
sub account {
my ($twig, $account) = #_;
my %hash;
for my $ch ($account->children) {
if (my $text = $ch->text) {
$hash{ $ch->name } = $text;
} else {
for my $attr (keys %{ $ch->atts }) {
$hash{ $ch->name }{$attr} = $ch->atts->{$attr};
}
}
}
print Dumper \%hash;
$twig -> purge;
validate(\%hash);
}
Handling of nested elements (e.g. GroupMemberships) left as an exercise to the reader.
And for validation:
sub validate {
my $account = shift;
if ('abcd' eq $account->{Id}) {
...
}
}

The problem with downconverting XML into hashes, is that XML is fundamentally a more complicated data structure. Each element has properties, children and content - and it's ordered - where hashes... don't.
So I would suggest that you not do what you're doing, and instead of passing a hash, use an XML::Twig::Elt and pass that into your validation.
Fortunately, this is exactly what XML::Twig passes to it's handlers:
## this is fine:
sub parsing {
my ( $twig, $accounts ) = #_;
but this is nonsense - think about what's in #_ at this point - it's references to XML::Twig objects - two of them, you've just assigned them.
my %hash = #_;
And this doesn't makes sense as a result
my $ref = \%hash; #because was getting an error of Odd number of hash elements
And where are you returning it to? (this is being called when XML::Twig is parsing)
return $ref;
#this doesn't happen, you've already returned
$twig -> purge;
But bear in mind - you're returning it to your twig proces that's parsing, that's ... discarding the return code. So that's not going to do anything anyway.
I would suggest instead you 'save' the $accounts reference and use that for your validation - just pass it into your subroutines to validate.
Or better yet, configure up a set of twig_handlers that do this for you:
my %validate = ( 'Account/Locked' => sub { die if $_ -> trimmed_text eq "true" },
'Account/CreationDate' => \&parsing,
'Account/ExternalMail' => sub { die unless $_ -> text =~ m/\w+\#\w+\.\w+ }
);
my $twig = XML::Twig -> new ( twig_roots => \%validate );
You can either die if you want to discard the whole lot, or use things like cut to remove an invalid entry from a document as you parse. (and maybe paste it into a seperate doc).
But if you really must turn your XML into a perl data structure - first read this for why it's a terrible idea:
Why is XML::Simple "Discouraged"?
And then, if you really want to carry on down that road, look at the simplify option of XML::Twig:
sub parsing {
my ( $twig, $accounts ) = #_;
my $horrible_hacky_hashref = $accounts->simplify(forcearray => 1, keyattr => [], forcecontent => 1 );
print Dumper \$horrible_hacky_hashref;
$twig -> purge;
#do something with it.
}
Edit:
To expand:
XML::Twig::Elt is a subset of XML::Twig - it's the 'building block' of an XML::Twig data structure - so in your example above, $accounts is.
sub parsing {
my ( $twig, $accounts ) = #_;
print Dumper $accounts;
}
You will get a lot of data if you do this, because you're dumping the whole data structure - which is effectively a daisy chain of XML::Twig::Elt objects.
$VAR1 = \bless( {
'parent' => bless( {
'first_child' => ${$VAR1},
'flushed' => 1,
'att' => {
'locale' => 'en_US'
},
'gi' => 6,
....
'att' => {},
'last_child' => ${$VAR1}->{'first_child'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'}->{'next_sibling'},
'gi' => 7
}, 'XML::Twig::Elt' );
But it already encapsulates the information you need, as well as the structure you require - that's why XML::Twig is using it. And is in no small part going to illustrate why forcing your data into a hash/array, you're going to lose data.

adding incremented id attribute to all elements

There is a XML-Twig example that shows how to add id attribute with incremented value to a specified element. Is there easy way to add incremented id to all elements.
#!/bin/perl -w
#########################################################################
# #
# This example adds an id to each player #
# It uses the set_id method, by default the id attribute will be 'id' #
# #
#########################################################################
use strict;
use XML::Twig;
my $id="player001";
my $twig= new XML::Twig( twig_handlers => { player => \&player } );
$twig->parsefile( "nba.xml"); # process the twig
$twig->flush;
exit;
sub player
{ my( $twig, $player)= #_;
$player->set_id( $id++);
$twig->flush;
}

I'm going to assume when you say "every element" you mean it. There's several ways you can do that via twig_handlers. There is the special handler _all_. Or since twig_handler keys are XPath expressions you can use *.
use strict;
use warnings;
use XML::Twig;
my $id="player001";
sub add_id {
my($twig, $element)= #_;
# Only set if not already set
$element->set_id($id++) unless defined $element->id;
$twig->flush;
}
my $twig= new XML::Twig(
twig_handlers => {
# Either one will work.
# '*' => \&add_id,
'_all_' => \&add_id,
},
pretty_print => 'indented',
);
$twig->parsefile(shift); # process the twig
$twig->flush;

troubleshooting "pseudo-hashes are deprecated" while using xml module

I am just learning how to use perl hashes and ran into this message in perl. I am using XML::Simple to parse xml output and using exists to check on the hash keys.
Message:
Pseudo-hashes are deprecated at ./h2.pl line 53.
Argument "\x{2f}\x{70}..." isn't numeric in exists at ./h2.pl line 53.
Bad index while coercing array into hash at ./h2.pl line 53.
I had the script working earlier with one test directory and then executed the script on another directory for testing when I got this message. How do I resolve/workaround this?
Code that the error references:
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
#my $data = XMLin($xml);
my $data = XMLin($xml, ForceArray => [qw (file) ]);
my $size=0;
if (exists $data->{class}
and $data->{class}=~ /FileNotFound/) {
print "The directory: $Path does not exist\n";
exit;
} elsif (exists $data->{file}->{path}
and $data->{file}->{path} =~/test-out-00/) {
$size=$data->{file}->{size};
if ($size < 1024000) {
print "FILE SIZE:$size BYTES\n";
exit;
}
} else {
exit;
}
print Dumper( $data );
Working test case, data structure looks like this:
$VAR1 = {
'recursive' => 'no',
'version' => '0.20.202.1.1101050227',
'time' => '2011-09-30T02:49:39+0000',
'filter' => '.*',
'file' => {
'owner' => 'test_act',
'replication' => '3',
'blocksize' => '134217728',
'permission' => '-rw-------',
'path' => '/source/feeds/customer/test/test-out-00',
'modified' => '2011-09-30T02:48:41+0000',
'size' => '135860644',
'group' => '',
'accesstime' => '2011-09-30T02:48:41+0000'
'modified' => '2011-09-30T02:48:41+0000'
},
'exclude' => ''
};
recursive:no
version:0.20.202.1.1101050227
time:2011-10-01T07:06:16+0000
filter:.*
file:HASH(0x84c83ec)
path:/source/feeds/customer/test
directory:HASH(0x84c75d8)
exclude:
Data structure with seeing error:
$VAR1 = {
'recursive' => 'no',
'version' => '0.20.202.1.1101050227',
'time' => '2011-10-03T04:49:36+0000',
'filter' => '.*',
'file' => [
{
'owner' => 'test_act',
'replication' => '3',
'blocksize' => '134217728',
'permission' => '-rw-------',
'path' => '/source/feeds/customer/test/20110531/test-out-00',
'modified' => '2011-10-03T04:47:46+0000',
'size' => '121406618',
'group' => 'feeds',
'accesstime' => '2011-10-03T04:47:46+0000'
},
Test xml file:
<?xml version="1.0" encoding="UTF-8"?><listing time="2011-10-03T04:49:36+0000" recursive="no" path="/source/feeds/customer/test/20110531" exclude="" filter=".*" version="0.20.202.1.1101050227"><directory path="/source/feeds/customer/test/20110531" modified="2011-10-03T04:48:19+0000" accesstime="1970-01-01T00:00:00+0000" permission="drwx------" owner="test_act" group="feeds"/><file path="/source/feeds/customer/test/20110531/test-out-00" modified="2011-10-03T04:47:46+0000" accesstime="2011-10-03T04:47:46+0000" size="121406618" replication="3" blocksize="134217728" permission="-rw-------" owner="test_act" group="feeds"/><file path="/source/feeds/customer/test/20110531/test-out-01" modified="2011-10-03T04:48:04+0000" accesstime="2011-10-03T04:48:04+0000" size="127528522" replication="3" blocksize="134217728" permission="-rw-------" owner="test_act" group="feeds"/><file path="/source/feeds/customer/test/20110531/test-out-02" modified="2011-10-03T04:48:19+0000" accesstime="2011-10-03T04:48:19+0000" size="125452919" replication="3" blocksize="134217728" permission="-rw-------" owner="test_act" group="feeds"/></listing>

The "Pseudo-hashes are deprecated" error means you're trying to access an array as a hash, which means that either $data->{file} or $data->{file}{path} is an arrayref.
You can check the data type by using print ref $data->{file}. The Data::Dumper module may also help you to see what is in your data structure (perhaps while setting $Data::Dumper::Maxdepth = N to limit the dump to N number of levels if the structure is big).
UPDATE
Now that you are using ForceArray, $data->{file} should always point to an arrayref, which may possibly have multiple references to path. Here is a modified segment of your code to handle that. But note that the logic of the if-then-exit conditions may have to change.
if (defined $data->{class} and $data->{class}=~ /FileNotFound/) {
print "The directory: $Path does not exist\n";
exit;
}
exit if ! defined $data->{file};
# filter the list for the first file entry named test-out-00
my ( $file ) = grep {
defined $_->{path} && $_->{path} =~ /test-out-00/
} #{ $data->{file} };
exit if ! defined $file;
$size = $file->{size};
if ($size < 1024000) {
print "FILE SIZE:$size BYTES\n";
exit;
}

When using XML::Simple, the ForceArray option is one of the most important to understand, especially in cases when your input data has nested elements that can occur 1 or more times. For example:
use XML::Simple;
use Data::Dumper;
my #xml_snippets = (
'<opt> <name x="3" y="4">B</name> <name x="5" y="6">C</name> </opt>',
'<opt> <name x="1" y="2">A</name> </opt>',
);
for my $xs (#xml_snippets){
my $data = XMLin($xs, ForceArray => 0);
print Dumper($data);
}
Output:
$VAR1 = {
'name' => [ # Array ref because there are 2 <name> elements.
{
'y' => '4',
'content' => 'B',
'x' => '3'
},
{
'y' => '6',
'content' => 'C',
'x' => '5'
}
]
};
$VAR1 = {
'name' => { # No intermediate array ref.
'y' => '2',
'content' => 'A',
'x' => '1'
}
};
By activating the ForceArray option, you can direct XML::Simple to produce consistent data structures that always use the intermediate array reference, even when there is only 1 of a particular nested element. You can activate the option globally or for specific tags, as illustrated here:
my $data = XMLin($xs, ForceArray => 1 ); # Globally.
my $data = XMLin($xs, ForceArray => [qw(name foo bar)]);

First, I recommend that you use ForceArray => [qw( file )] as previously discussed. That will cause an array to be returned for file, whether there's one or more file element. This is easier to handle than having two possible formats.
As I previously indicated, the problem is that you made no provision for looping over multiple file elements. You said you wanted to exit if the file doesn't exist, so that means you want
my $found;
for my $file (#{ $data->{file} }) {
if ($file->{path} =~ m{/test-out-00\z}) {
$found = $file;
last;
}
}
die("Test file not found\n") if !$found;
... do something with file data in $found ...

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to skip unwanted elements using XML::Twig? - perl

Related

How to print an element from an array which is inside the hash in perl

Perl: overwrite structure member value

Returning a hash of the Parsed document (using Twig in Perl) to be used for processing in other subs

adding incremented id attribute to all elements

troubleshooting "pseudo-hashes are deprecated" while using xml module

Categories

Resources