I'm trying to parse some XML into Perl, but testing isn't yielding what I'd expect.
$buffer = qq[<DeliveryReport><message id="msgID" sentdate="xxxxx" donedate="xxxxx" status="xxxxxx" gsmerror="0" /></DeliveryReport>];
$xml = XML::Simple->new( ForceArray => 1 );
$file = $xml->XMLin($buffer) or die "Failed for $reply: $!\n";
use Data::Dumper;
print Dumper($file);
$msgid = $file->{message}->{id};
$message_status = $file->{message}->{status};
print "OUTPUT: $msgid $message_status";
but the output is blank and the print Dumper looks wrong regards id attribute but I'm not sure why.
$VAR1 = {
'message' => {
'msgID' => {
'status' => 'xxxxxx',
'gsmerror' => '0',
'sentdate' => 'xxxxx',
'donedate' => 'xxxxx'
}
}
};
OUTPUT:
Here is the final code working correctly.
use XML::Simple;
use Data::Dumper;
$xml = XML::Simple->new (KeyAttr=>'',ForceArray => 1);
$file = $xml->XMLin('
<DeliveryReport>
<message id="msgID1" sentdate="xxxxx" donedate="xxxxx" status="xxxxxx" gsmerror="0" />
<message id="msgID2" sentdate="yyy" donedate="yyy" status="yyy" gsmerror="0" />
</DeliveryReport>
') or die "Failed for $reply: $!\n";
print Dumper($file);
$numOfMsgs = #{$file->{message}};
print "<br /><br />I've received $numOfMsgs records<br />";
for($i = 0; $i < $numOfMsgs; $i++) {
$msgid = $file->{message}->[$i]->{id};
$message_status = $file->{message}->[$i]->{status};
print "message id: [$msgid]<br />";
print "status id: [$message_status]<br />";
print "<br />";
}
By default, XML::Simple chooses to fold around the following keys by default: name, key, id (see note 1).
Your XML schema contains the id key, which is why the hash is being split there. You can clear the KeyAttr value when you create your object (e.g. $xml = XML::Simple( KeyAttr=>"" );) to override the default behavior.
Your output, with multiple message entries, would look like:
$VAR1 = {
'message' => [
{
'gsmerror' => '0',
'status' => 'xxxxxx',
'id' => 'msgID',
'donedate' => 'xxxxx',
'sentdate' => 'xxxxx'
},
{
'gsmerror' => '1',
'status' => 'yyyyyy',
'id' => 'msgID2',
'donedate' => 'yyyyy',
'sentdate' => 'yyyyy'
}
]
};
So you need to adjust your code slightly to account for %message containing an array of message hashes. The format would be the same for a single message if you keep the ForceArray option, so your code change would work for both cases.
Related
I have some data in input file
user date="" name="" id="small"
user date="" name="" id="sample test"
user date="" name="" id="big city"
I want to get only id's from above file
code::-
use strict;
use warnings;
my $input = "location\input.txt";
open("FH","<$input") or die;
while(my $str = <FH>)
{
my #arr = split(/ /,$str);
$arr[2] =~ s/id=//g;
$arr[2] =~ s/"//g;
print "$arr[2]\n";
}
close("FH");
Output :
small
sample
big
Note :: Here I'm not able to print complete word like "small test", "big city"
Expectation : I need to get complete word "sample test" and "big city" anyone please help me on this
If you know the format will always have quotes after id, you can do:
use feature qw(say);
use strict;
use warnings;
open my $fh, "<", "location/input.txt" or die $!;
while (my $line = <$fh>) {
my ($id) = $line =~ /id="(.*?)"/;
say $id;
}
Breaking down that complicated line we have:
$line =~ /id="(.*?)"/: match id="..." and grab the smallest possible
.... If you use .* instead, you will grab up until the last " of the line, which might belong to another field. This is not the case for id, but try it with date and you'll see.
my ($id) = ...: process the regex match in list context, which returns the capture groups, and assign it pairwise to the list ($id). Concretely, this stuffs the matched value in $id
say $id: prints $id with an automatic newline after it.
A nice module for handling quoted strings is Text::ParseWords. It is a core module too, making it even handier. You can use it here to easily split the string on whitespace, then parse the result into hash keys.
use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;
while (<DATA>) {
chomp;
my %data = map { my ($key, $data) = split /=/, $_, 2; ($key => $data); } quotewords('\s+', 0, $_);
print Dumper \%data;
}
__DATA__
user date="" name="" id="small"
user date="" name="" id="sample test"
user date="" name="" id="big city"
Output:
$VAR1 = {
'user' => undef,
'name' => '',
'date' => '',
'id' => 'small'
};
$VAR1 = {
'name' => '',
'date' => '',
'id' => 'sample test',
'user' => undef
};
$VAR1 = {
'id' => 'big city',
'date' => '',
'name' => '',
'user' => undef
};
A simplified version to extract data of interest
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
while(<DATA>) {
my %d = /(\w+)="(.*?)"/g;
say 'id: ' . $d{id};
say Dumper(\%d);
}
__DATA__
user date="" name="" id="small"
user date="" name="" id="sample test"
user date="" name="" id="big city"
Output
id: small
$VAR1 = {
'date' => '',
'id' => 'small',
'name' => ''
};
id: sample test
$VAR1 = {
'id' => 'sample test',
'date' => '',
'name' => ''
};
id: big city
$VAR1 = {
'name' => '',
'id' => 'big city',
'date' => ''
};
I have a query string like this:
id=60087888;jid=16471827;from=advance;action=apply
or it can be like this :
id=60087888&jid=16471827&from=advance&action=apply
Now from this i want to create a hash that will have key as id and its value
I have done this
my %in;
$buffer = 'resid=60087888;jobid=16471827;from=advance;action=apply';
#pairs = split(/=/, $buffer);
foreach $pair (#pairs){
($name, $value) = split(/=/, $pair);
$in{$name} = $value;
}
print %in;
But the issue is in the query string it can be semin colon or & so how can we do this please help me
Don't try to solve it with new code; this is what CPAN modules are for. Specifically in this case, URI::Query
use URI::Query;
use Data::Dumper;
my $q = URI::Query->new( "resid=60087888;jobid=16471827;from=advance;action=apply" );
my %hash = $q->hash;
print Dumper( \%hash );
Gives
{ action => 'apply',
from => 'advance',
jobid => '16471827',
resid => '60087888' }
You've already an answer that works - but personally I might tackle it like this:
my %in = $buffer =~ m/(\w+)=(\w+)/g;
What this does is use regular expressions to pattern match either side of the equals sign.
It does so in pairs - effectively - and as a result is treated by a sequence of key-values in the hash assignment.
Note - it does assume you've not got special characters in your keys/values, and that you have no null values. (Or if you do, they'll be ignored - you can use (\w*) instead if that's the case).
But you get:
$VAR1 = {
'from' => 'advance',
'jid' => '16471827',
'action' => 'apply',
'id' => '60087888'
};
Alternatively:
my %in = map { split /=/ } split ( /[^=\w]/, $buffer );
We split using 'anything that isn't word or equals' to get a sequence, and then split on equals to make the same key-value pairs. Again - certain assumptions are made about valid delimiter/non-delimiter characters.
Check this answer:
my %in;
$buffer = 'resid=60087888;jobid=16471827;from=advance;action=apply';
#pairs = split(/[&,;]/, $buffer);
foreach $pair (#pairs){
($name, $value) = split(/=/, $pair);
$in{$name} = $value;
}
delete $in{resid};
print keys %in;
I know I'm late to the game, but....
#!/usr/bin/perl
use strict;
use CGI;
use Data::Dumper;
my $query = 'id=60087888&jid=16471827&from=advance&action=apply&blank=¬_blank=1';
my $cgi = CGI->new($query);
my %hash = $cgi->Vars();
print Dumper \%hash;
will produce:
$VAR1 = {
'not_blank' => '1',
'jid' => '16471827',
'from' => 'advance',
'blank' => '',
'action' => 'apply',
'id' => '60087888'
};
Which has the added benefit of dealing with keys that might not have values in the source string.
Some of the other examples will produce:
$VAR1 = {
'id' => '60087888',
'1' => undef,
'jid' => '16471827',
'from' => 'advance',
'blank' => 'not_blank',
'action' => 'apply'
};
which may not be desirable.
I would have used URI::Query #LeoNerd 's answer, but I didn't have the ability to install a module in my case and CGI.pm was handy.
also, you could
my $buffer = 'id=60087888&jid=16471827&from=advance&action=apply';
my %hash = split(/&|=/, $buffer);
which gives:
$hash = {
'jid' => '16471827',
'from' => 'advance',
'action' => 'apply',
'id' => '60087888'
};
This is VERY fragile, so I wouldn't advocate using it.
my %book = (
'name' => 'abc',
'author' => 'monk',
'isbn' => '123-890',
'issn' => '#issn',
);
my %chapter = (
'title' => 'xyz',
'page' => '90',
);
How do I incorporate %book inside %chapter through reference so that when I write "$chapter{name}", it should print 'abc'?
You can copy the keys/values of the %book into the %chapter:
#chapter{keys %book} = values %book;
Or something like
%chapter = (%chapter, %book);
Now you can say $chapter{name}, but changes in %book are not reflected in %chapter.
You can include the %book via reference:
$chapter{book} = \%book;
Now you could say $chapter{book}{name}, and changes do get reflected.
To have an interface that allows you to say $chapter{name} and that does reflect changes, some advanced techniques would have to be used (this is fairly trivial with tie magic), but don't go there unless you really have to.
You could write a subroutine to check a list of hashes for a key. This program demonstrates:
use strict;
use warnings;
my %book = (
name => 'abc',
author => 'monk',
isbn => '123-890',
issn => '#issn',
);
my %chapter = (
title => 'xyz',
page => '90',
);
for my $key (qw/ name title bogus / ) {
print '>> ', access_hash($key, \%book, \%chapter), "\n";
}
sub access_hash {
my $key = shift;
for my $hash (#_) {
return $hash->{$key} if exists $hash->{$key};
}
undef;
}
output
Use of uninitialized value in print at E:\Perl\source\ht.pl line 17.
>> abc
>> xyz
>>
I am just learning how to use perl hashes and ran into this message in perl. I am using XML::Simple to parse xml output and using exists to check on the hash keys.
Message:
Pseudo-hashes are deprecated at ./h2.pl line 53.
Argument "\x{2f}\x{70}..." isn't numeric in exists at ./h2.pl line 53.
Bad index while coercing array into hash at ./h2.pl line 53.
I had the script working earlier with one test directory and then executed the script on another directory for testing when I got this message. How do I resolve/workaround this?
Code that the error references:
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
#my $data = XMLin($xml);
my $data = XMLin($xml, ForceArray => [qw (file) ]);
my $size=0;
if (exists $data->{class}
and $data->{class}=~ /FileNotFound/) {
print "The directory: $Path does not exist\n";
exit;
} elsif (exists $data->{file}->{path}
and $data->{file}->{path} =~/test-out-00/) {
$size=$data->{file}->{size};
if ($size < 1024000) {
print "FILE SIZE:$size BYTES\n";
exit;
}
} else {
exit;
}
print Dumper( $data );
Working test case, data structure looks like this:
$VAR1 = {
'recursive' => 'no',
'version' => '0.20.202.1.1101050227',
'time' => '2011-09-30T02:49:39+0000',
'filter' => '.*',
'file' => {
'owner' => 'test_act',
'replication' => '3',
'blocksize' => '134217728',
'permission' => '-rw-------',
'path' => '/source/feeds/customer/test/test-out-00',
'modified' => '2011-09-30T02:48:41+0000',
'size' => '135860644',
'group' => '',
'accesstime' => '2011-09-30T02:48:41+0000'
'modified' => '2011-09-30T02:48:41+0000'
},
'exclude' => ''
};
recursive:no
version:0.20.202.1.1101050227
time:2011-10-01T07:06:16+0000
filter:.*
file:HASH(0x84c83ec)
path:/source/feeds/customer/test
directory:HASH(0x84c75d8)
exclude:
Data structure with seeing error:
$VAR1 = {
'recursive' => 'no',
'version' => '0.20.202.1.1101050227',
'time' => '2011-10-03T04:49:36+0000',
'filter' => '.*',
'file' => [
{
'owner' => 'test_act',
'replication' => '3',
'blocksize' => '134217728',
'permission' => '-rw-------',
'path' => '/source/feeds/customer/test/20110531/test-out-00',
'modified' => '2011-10-03T04:47:46+0000',
'size' => '121406618',
'group' => 'feeds',
'accesstime' => '2011-10-03T04:47:46+0000'
},
Test xml file:
<?xml version="1.0" encoding="UTF-8"?><listing time="2011-10-03T04:49:36+0000" recursive="no" path="/source/feeds/customer/test/20110531" exclude="" filter=".*" version="0.20.202.1.1101050227"><directory path="/source/feeds/customer/test/20110531" modified="2011-10-03T04:48:19+0000" accesstime="1970-01-01T00:00:00+0000" permission="drwx------" owner="test_act" group="feeds"/><file path="/source/feeds/customer/test/20110531/test-out-00" modified="2011-10-03T04:47:46+0000" accesstime="2011-10-03T04:47:46+0000" size="121406618" replication="3" blocksize="134217728" permission="-rw-------" owner="test_act" group="feeds"/><file path="/source/feeds/customer/test/20110531/test-out-01" modified="2011-10-03T04:48:04+0000" accesstime="2011-10-03T04:48:04+0000" size="127528522" replication="3" blocksize="134217728" permission="-rw-------" owner="test_act" group="feeds"/><file path="/source/feeds/customer/test/20110531/test-out-02" modified="2011-10-03T04:48:19+0000" accesstime="2011-10-03T04:48:19+0000" size="125452919" replication="3" blocksize="134217728" permission="-rw-------" owner="test_act" group="feeds"/></listing>
The "Pseudo-hashes are deprecated" error means you're trying to access an array as a hash, which means that either $data->{file} or $data->{file}{path} is an arrayref.
You can check the data type by using print ref $data->{file}. The Data::Dumper module may also help you to see what is in your data structure (perhaps while setting $Data::Dumper::Maxdepth = N to limit the dump to N number of levels if the structure is big).
UPDATE
Now that you are using ForceArray, $data->{file} should always point to an arrayref, which may possibly have multiple references to path. Here is a modified segment of your code to handle that. But note that the logic of the if-then-exit conditions may have to change.
if (defined $data->{class} and $data->{class}=~ /FileNotFound/) {
print "The directory: $Path does not exist\n";
exit;
}
exit if ! defined $data->{file};
# filter the list for the first file entry named test-out-00
my ( $file ) = grep {
defined $_->{path} && $_->{path} =~ /test-out-00/
} #{ $data->{file} };
exit if ! defined $file;
$size = $file->{size};
if ($size < 1024000) {
print "FILE SIZE:$size BYTES\n";
exit;
}
When using XML::Simple, the ForceArray option is one of the most important to understand, especially in cases when your input data has nested elements that can occur 1 or more times. For example:
use XML::Simple;
use Data::Dumper;
my #xml_snippets = (
'<opt> <name x="3" y="4">B</name> <name x="5" y="6">C</name> </opt>',
'<opt> <name x="1" y="2">A</name> </opt>',
);
for my $xs (#xml_snippets){
my $data = XMLin($xs, ForceArray => 0);
print Dumper($data);
}
Output:
$VAR1 = {
'name' => [ # Array ref because there are 2 <name> elements.
{
'y' => '4',
'content' => 'B',
'x' => '3'
},
{
'y' => '6',
'content' => 'C',
'x' => '5'
}
]
};
$VAR1 = {
'name' => { # No intermediate array ref.
'y' => '2',
'content' => 'A',
'x' => '1'
}
};
By activating the ForceArray option, you can direct XML::Simple to produce consistent data structures that always use the intermediate array reference, even when there is only 1 of a particular nested element. You can activate the option globally or for specific tags, as illustrated here:
my $data = XMLin($xs, ForceArray => 1 ); # Globally.
my $data = XMLin($xs, ForceArray => [qw(name foo bar)]);
First, I recommend that you use ForceArray => [qw( file )] as previously discussed. That will cause an array to be returned for file, whether there's one or more file element. This is easier to handle than having two possible formats.
As I previously indicated, the problem is that you made no provision for looping over multiple file elements. You said you wanted to exit if the file doesn't exist, so that means you want
my $found;
for my $file (#{ $data->{file} }) {
if ($file->{path} =~ m{/test-out-00\z}) {
$found = $file;
last;
}
}
die("Test file not found\n") if !$found;
... do something with file data in $found ...
I have data dumper outputting a remotely hosted xml file into a local text file and I am getting the following info:
$VAR1 = {
'resource' => {
'005cd410-41d6-4e3a-a55f-c38732b73a24.xml' => {
'standard' => 'DITA',
'area' => 'holding',
'id' => 'Comp_UKCLRONLINE_UKCLR_2000UKCLR0278',
},
'003c2a5e-4af3-4e70-bf8b-382d0b4edda1.xml' => {
'standard' => 'DITA',
'area' => 'holding',
'id' => 'Comp_UKCLRONLINE_UKCLR_2000UKCLR0278',
},
etc. What I want to do is work with just one/key and value in each resource. Ie pick out the ID and then create a url from that.
I would normally use a regex on the file and pull the info I need from that but I'm thinking there must be an easier/proper way but can't think of the right term to use in a search and am therefore not finding it.
Here is the code I am using to write this output to a file:
#-----------------------------------------------
sub request_url {
#-----------------------------------------------
my $useragent = LWP::UserAgent->new;
my $request = HTTP::Request->new( GET => "http://digitalessence.net/resource.xml" );
$resource = $useragent->request( $request );
}
#-----------------------------------------------
sub file_write {
#-----------------------------------------------
open OUT, ">$OUT" or Log_message ("\n$DATE - $TIME - Could not create filelist.doc \t");
Log_message ("\n$DATE - $TIME - Opened the output file");
print OUT Dumper (XML::Simple->new()->XMLin( $resource->content ));
Log_message ("\n$DATE - $TIME - Written the output file");
}
thanks
I'm not really understanding your question, but I'm guessing you want to access some data from the hash.
You don't need a regex or other strage stuff; just `do` your data and get the value from the hassref you get back:
A simple one liner as an example (assuming your file is called `dumper.out`):
perl -Mstrict -wE 'my $hashref = do{ do "dumper.out" }; say $hashref->{resource}{"005cd410-41d6-4e3a-a55f-c38732b73a24.xml"}{id}'
HTH, Paul
Maybe you want to walk the data structure built by XML::Simple.
Each resource is inside an ARRAYREF you get using the resource key with $doc data structure.
use XML::Simple;
use LWP;
use Data::Dumper;
my $ua = LWP::UserAgent->new;
my $req = HTTP::Request->new( GET => "http://digitalessence.net/resource.xml" );
my $res = $ua->request( $req );
my $xs = XML::Simple->new();
my $doc = $xs->XMLin( $res->content );
printf "resources: %s\n", scalar keys %{ $doc->{ resource } };
foreach ( keys %{ $doc->{ resource } } ) {
printf "resource => %s, id => %s\n", $_, $doc->{ resource }->{ $_ }->{ id };
}
The output is this:
resources: 7
resource => 005cd410-41d6-4e3a-a55f-c38732b73a24.xml, id => Comp_UKCLRONLINE_UKCLR_2000UKCLR0278
resource => 003c2a5e-4af3-4e70-bf8b-382d0b4edda1.xml, id => Comp_UKCLRONLINE_UKCLR_2002UKCLR0059
resource => 0033d4d3-c397-471f-8cf5-16fb588b0951.xml, id => Comp_UKCLRONLINE_UKCLR_navParentTopic_67
resource => 002a770a-db47-41ef-a8bb-0c8aa45a8de5.xml, id => Comp_UKCLRONLINE_UKCLR_navParentTopic_308
resource => 000fff79-45b8-4ac3-8a57-def971790f16.xml, id => Comp_UKCLRONLINE_UKCLR_2002UKCLR0502
resource => 00493372-c090-4734-9a50-8f5a06489591.xml, id => Comp_UKCLRONLINE_COMPCS_2010_10_0002
resource => 004377bf-8e24-4a69-9411-7c6baca80b87.xml, id => Comp_CLJONLINE_CLJ_2002_01_11