Parsing data elements and it's attributes from dataset into an array - perl

I have a data set with different types of plans, the data below shows the activeplans and pastplans elements, (and other plans not included in the example).
I think the data can be represented as follows
$data->activeplans->activeplanloc[activeplan->{}, activeplan->{},
...activeplan->{}] $data->pastplans->pastplanloc[pastplan->{},
pastplan->{}, ...pastplan->{}]
Each plan element has number of attributes, for instance id, lat, long, numpersons (and other attributes not included in the example)
My goal is to loop through all the plan items and extract the attributes.
Also note, the ...planloc[] outer element and the lat/long fields it contains along with the empty ...plan[] - can be ignored.
This is the loop I tried to do it with, but I'm stuck on exacting the activeplan elements, can you help correct my syntax error, I don't now how to properly load the elements into an array given this data stucture?
foreach my $planArrayItem (#{$data->{"activeplans"}->{"activeplanloc"}->{"activeplan"}{}}) {
#...
if (exists $planArrayItem->{numpersons}) {
$tmp .= "<li>Number of personal: $projArrayItem->{numpersons}</li>";
}
#...
}
Oh, and this is the data set.
{ 'updatetime' => '3/24/2021 11:44:19 AM', 'pastplans' =>
{ 'pastplanloc' => [ { 'longitude' => '-29.51502', 'latitude' =>
'32.307558', 'pastplan' => { 'planclass' => 'A', 'longitude' =>
'-29.51502', 'id' => '211', 'latitude' => '32.307558',
'numlocations' => '15' } }, { 'longitude' => '-28.798305',
'latitude' => '32.656135', 'pastplan' => [ { 'id' => '214',
'longitude' => '-28.798305', 'latitude' => '32.656135',
'planclass' => 'E', 'numlocations' => '16' }, { 'longitude' =>
'-28.798305', 'id' => '215', 'latitude' => '32.656135', 'planclass'
=> 'C', 'numlocations' => '21' } ] } ] }, 'activeplans' =>
{ 'activeplanloc' => [ { 'latitude' => '33.132491', 'activeplan'
=> [ { 'planclass' => 'B', 'longitude' => '-25.304968', 'id' =>
'942', 'latitude' => '33.132491', 'numpersons' => '17' },
{ 'numpersons' => '21', 'planclass' => 'G', 'id' => '943',
'longitude' => '-25.304968', 'latitude' => '33.132491' } ],
'longitude' => '-25.304968' }, { 'latitude' => '33.097290',
'activeplan' => { 'numpersons' => '31', 'id' => '944',
'longitude' => '-25.295086', 'latitude' => '33.097290',
'planclass' => 'M' }, 'longitude' => '-25.295086' } ] } };
This is the XML format if there is a better way to format it while reading in perhaps?
<?xml version="1.0" encoding="utf-8"?>
<plans>
<updatetime>3/24/2021 11:44:19 AM</updatetime>
<pastplans>
<pastplanloc latitude="32.307558" longitude="-29.51502">
<pastplan planclass="A" id="211" numpersons="15" latitude="32.307558" longitude="-29.51502"/>
</pastplanloc>
<pastplanloc latitude="32.656135" longitude="-28.798305">
<pastplan planclass="E" id="214" numpersons="16" latitude="32.656135" longitude="-28.798305"/>
<pastplan planclass="C" id="215" numpersons="21" latitude="32.656135" longitude="-28.798305"/>
</pastplanloc>
</pastplans>
<activeplans>
<activeplanloc latitude="33.132491" longitude="-25.304968">
<activeplan planclass="B" id="942" numpersons="17" latitude="33.132491" longitude="-25.304968"/>
<activeplan planclass="G" id="943" numpersons="21" latitude="33.132491" longitude="-25.304968"/>
</activeplanloc>
<activeplanloc latitude="33.097290" longitude="-25.295086">
<activeplan planclass="M" id="944" numpersons="31" latitude="33.097290" longitude="-25.295086"/>
</activeplanloc>
</activeplans>
</plans>

I am quite certain that:
foreach my $planArrayItem (#{$data->{"activeplans"}->{"activeplanloc"}}) {
#...
if ($planArrayItem->{"activeplan"}{numpersons}) {
$tmp.= "<li>Number of personal: ".$planArrayItem->{"activeplan"}->{numpersons}."</li>";
}
}
is the code you are looking for. As you stated above "activeplanloc" contains an array which reference an activeplan. So the outer loop has to iterate over this.

With your new "data set", the correct code is:
foreach my $planArrayItem (#{$data->{"activeplans"}->{"activeplanloc"}}) {
#...
my $plans = ref $planArrayItem->{"activeplan"} eq "ARRAY" ?
$planArrayItem->{"activeplan"} : [$planArrayItem->{"activeplan"}];
foreach my $plan (#$plans) {
if ($plan->{numpersons}) {
$tmp .= "<li>Number of personal: ".$plan->{numpersons}."</li>";
}
}
}
See https://pastebin.com/QeD6ZwZz for a working example

Related

Iterate an array reference and convert to hash in perl

I have an hash (Printed by Dumper) which is described below
$VAR1 = {
'items' => [
{
'name' => 'test1',
'id' => '1',
'desc' => 'desc1',
},
{
'name' => 'test2',
'id' => '2',
'desc' => 'desc2',
}
],
};
I need to convert "items" which is array reference to a hash like below. ('items' will be a hash of hash with the value of 'id' being the key)
$VAR1 = {
'items' => {
'1' =>{
'name' => 'test1',
'id' => '1',
'desc' => 'desc1',
},
'2' => {
'name' => 'test2',
'id' => '2',
'desc' => 'desc2',
}
}
};
Lets start with the below code. (Assume $data represents the original data and $newitems represents the modified items)
my $data;
my $items = $data->{items};
my %newitems;
foreach my $element (#$items) {
......
}
This looks like an XY problem to me - I'm guessing you're trying to transform some XML, so I'd suggest you want to look upstream to solve this problem.
But on the offchance you're not, then:
$data -> {items} = { map { $_ -> {id} => $_ } #{$data->{items} } };

Perl printing of a hash gives ARRAY(xxxxxxx)

I know there are many questions already with this kind of subject, but as far as I know (perl beginner so I could be wrong), I'm not using an array so I don't understand where this output comes from
$VAR1 = {
'BridgeMode' => {
'Ten-GigabitEthernet1/0/5' => {
'Description' => 'poort1',
'Duplex' => 'F(a)',
'Interface' => 'Ten-GigabitEthernet1/0/5',
'Link' => 'UP',
'PVID' => '100',
'Speed' => '10G(a)',
'Type' => 'A',
'Vlan100' => {
'UntaggedPorts' => [
'Ten-GigabitEthernet1/0/5'
]
},
'vlanID' => [
'Vlan100'
]
},
Above is the content of my dumper and this is my print statement:
my $untaggedInterface = $data{BridgeMode}{"Ten-GigabitEthernet1/0/5"}{Vlan100}{UntaggedPorts} ;
print "Untagged: $untaggedInterface \n" ;
I would expect that the print statement would print "Ten-GigabitEthernet1/0/5" but instead it shows this:
Untagged: ARRAY(0x24a8ec0)
edit - it is possible that there exists an tagged and an untagged:
$VAR1 = {
'BridgeMode' => {
'Ten-GigabitEthernet1/0/12' => {
'Description' => 'poort5',
'Duplex' => 'F(a)',
'Interface' => 'Ten-GigabitEthernet1/0/12',
'Link' => 'UP',
'PVID' => '100',
'Speed' => '10G(a)',
'Type' => 'H',
'Vlan100' => {
'UntaggedPorts' => [
'Ten-GigabitEthernet1/0/12'
]
},
'Vlan107' => {
'TaggedPorts' => [
'Ten-GigabitEthernet1/0/12'
]
},
edit: printing the content of the array
my #untaggedInterface = $data{BridgeMode}{"Ten-GigabitEthernet1/0/5"}{Vlan100}{UntaggedPorts} ;
print join(", ", #untaggedInterface) ;
stil gives
ARRAY(0x1c03a68)
You would get the expected result if you had the following, i.e. a string instead of a string array:
'UntaggedPorts' => 'Ten-GigabitEthernet1/0/12'
Otherwise, you must specify the index of the array element:
my $untaggedInterface = $data{BridgeMode}{"Ten-GigabitEthernet1/0/5"}{Vlan100}{UntaggedPorts}[0];

Accessing a nested data structure

I have an array of hashes nested to multiple levels. I need to extract a value from all deeply-nested hashes that have a given value for a different key in the same hash
This is a collection of entities from our database, and the data represents contacts within each entity and all of their contact values.
There is a hash key contact_method_type_id which refers to an integer defining the type of contact method. The contact_method_type_id that I care about is 1, which is email.
The first contact has three different contact_methods. The first is 4 which is an office phone, the second is a 2 which is a home phone, and the third is a 1 which is email.
Within the same hash is there is a 'contact_method_value', which is the string representation of their email address.
I need a way to extract just these values into a new array
Here are the contents of the first element of the array
$VAR1 = [
{ 'total' => '2',
'results' => [
{ 'contact_type_name' => 'Primary Technical Contact',
'street' => undef,
'state_id' => undef,
'state_name' => undef,
'last_name' => 'Barb',
'entities' => [
{ 'entity_name' => 'XXXXX',
'entity_id' => 'XXXXX'
}
],
'state_abbr_name' => undef,
'city' => undef,
'country_id' => undef,
'latitude' => undef,
'contact_id' => 'XXXXXX',
'contact_type_id' => '1',
'roles' => [],
'contact_methods' => [
{ 'entity_name' => undef,
'contact_method_value' => 'XXXXXXX',
'contact_method_type_id' => '4',
'contact_method_id' => '24041',
'entity_id' => undef,
'contact_method_type_name' => 'Cell Phone'
},
{ 'entity_name' => undef,
'contact_method_value' => 'XXXXXX',
'contact_method_type_id' => '2',
'contact_method_id' => '24051',
'entity_id' => undef,
'contact_method_type_name' => 'Office Phone'
},
{ 'entity_name' => undef,
'contact_method_value' => 'example#example.com',
'contact_method_type_id' => '1',
'contact_method_id' => '24061',
'entity_id' => undef,
'contact_method_type_name' => 'Email'
}
],
'country_name' => undef,
'longitude' => undef,
'country_abbr_name' => undef,
'full_name' => 'NAME',
'networks' => [
{ 'network_name' => 'NET',
'network_id' => 'X'
}
],
'timezone_id' => undef,
'zip' => undef,
'timezone_name' => undef,
'title' => 'MAC/Network Specialist',
'first_name' => 'Terri'
},
{ 'contact_type_name' => 'Primary Technical Contact',
'street' => 'STREET',
'state_id' => undef,
'state_name' => undef,
'last_name' => 'NAME',
'entities' => [
{ 'entity_name' => 'NAME',
'entity_id' => '2679'
}
],
'state_abbr_name' => undef,
'city' => 'CITY',
'country_id' => undef,
'latitude' => undef,
'contact_id' => '7896',
'contact_type_id' => '1',
'roles' => [],
'contact_methods' => [
{ 'entity_name' => undef,
'contact_method_value' => 'example#example.com',
'contact_method_type_id' => '1',
'contact_method_id' => '16796',
'entity_id' => undef,
'contact_method_type_name' => 'Email'
},
{ 'entity_name' => undef,
'contact_method_value' => 'number',
'contact_method_type_id' => '2',
'contact_method_id' => '16797',
'entity_id' => undef,
'contact_method_type_name' => 'Office Phone'
}
],
'country_name' => undef,
'longitude' => undef,
'country_abbr_name' => undef,
'full_name' => 'NAME',
'networks' => [
{ 'network_name' => 'net',
'network_id' => '17'
}
],
'timezone_id' => undef,
'zip' => 'zip',
'timezone_name' => undef,
'title' => 'Infrastructure Manager',
'first_name' => 'name'
}
],
'offset' => '0'
},
...
This looks suspiciously like something that XML::Simple would have generated.
Assuming this is the case, then I would suggest that you've fallen for the classic mistake of assuming XML::Simple actually helps.
Under that assumption, if you instead use XML::Twig:
Taking your $VAR1. Although - ideally you'll just parse the original source with parse or parsefile:
use XML::Twig;
use XML::Simple;
my $twig = XML::Twig->parse( XMLout($VAR1) );
print $_->att('contact_method_value'), "\n" for $twig->findnodes('//contact_methods[#contact_method_type_name="Email"]');
Which given your sample (as $VAR1):
example#example.com
example#example.com
Edit: Because you've commented that it's JSON then I wouldn't necessarily do this (Although - it does actually work, despite that).
If the data structures are all of the same kind, this is very trivial. You just need to iterate all the outer hashrefs (I called those resultsets). Inside those, you need to look at all results, and in each result you need to look at all the contact methods. If one of them has a contact_method_type_id of 1, you take the contact_method_value. And that's it.
my #email_addresses;
foreach my $resultset ( #{$data} ) {
foreach my $result ( #{ $resultset->{results} } ) {
foreach my $contact ( #{ $result->{contact_methods} } ) {
push #email_addresses , [ $contact->{contact_method_value} ]
if $contact->{contact_method_type_id} == 1;
}
}
}
This code assumes your structure is called $data. #email_addresses looks like this when output.
[
[ 'EMAIL' ],
[ 'EMAIL' ]
];
If you have this on a database then you should use an SQL query to retrieve it, rather than fetching everything into memory and processing what you have
The output from Data::Dumper shows the contents of your data, but it doesn't explain what you're dealing with in your code. Specifically, you don't have a $VAR1 in your code, but I have no idea what you do have
In the end, I think I wouldn't start from here. But since it's the only starting point I have to work with, it's a simple matter of recursing through the data structure
I've assumed that you want
$VAR1->[*]{results}[*]{contact_methods}[*]{contact_method_value}
where
$VAR1->[*]{results}[*]{contact_methods}[*]{contact_method_type_name} eq 'Email'
Update
Since your comments I've altered my code to select the same values where
$VAR1->[*]{results}[*]{contact_methods}[*]{contact_method_type_id} == 1
Since you said nothing about your code at all, I've had to assume a variable $data which contains a reference to the array that you show in your question
for my $item ( #$data ) {
my $results = $item->{results};
for my $result ( #$results ) {
my $methods = $result->{contact_methods} or die;
for my $method ( #$methods ) {
#my $type_name = $method->{contact_method_type_name};
#next unless $type_name eq 'Email';
my $type_id = $method->{contact_method_type_id};
next unless $type_id == 1; ## Email
my $value = $method->{contact_method_value};
print "$value\n";
}
}
}
output
example#example.com
example#example.com

ElasticSearch (search_context_missing_exception) with Search::ElasticSearch::Scroll

I'm using Search::Elasticsearch and Search::Elasticsearch::Scroll for search and scroll into my elasticsearch server.
In scrolling process, for some querys, I'm seeing the next errors while I'm scrolling the search results:
2016/03/22 11:03:38 - 265885 FATAL: [Daemon.pm][8221]: Something gone wrong, error $VAR1 = bless( {
'msg' => '[Missing] ** [http://localhost:9200]-[404] Not Found, called from sub Search::Elasticsearch::Scroll::next at searcher.pl line 92. With vars: {\'body\' => {\'hits\' => {\'hits\' => [],\'max_score\' => \'0\',\'total\' => 5215},\'timed_out\' => bless( do{\\(my $o = 0)}, \'JSON::XS::Boolean\' ),\'_shards\' => {\'failures\' => [{\'index\' => undef,\'reason\' => {\'reason\' => \'No search context found for id [4920053]\',\'type\' => \'search_context_missing_exception\'},\'shard\' => -1},{\'index\' => undef,\'reason\' => {\'reason\' => \'No search context found for id [5051485]\',\'type\' => \'search_context_missing_exception\'},\'shard\' => -1},{\'index\' => undef,\'reason\' => {\'reason\' => \'No search context found for id [4920059]\',\'type\' => \'search_context_missing_exception\'},\'shard\' => -1},{\'index\' => undef,\'reason\' => {\'reason\' => \'No search context found for id [5051496]\',\'type\' => \'search_context_missing_exception\'},\'shard\' => -1},{\'index\' => undef,\'reason\' => {\'reason\' => \'No search context found for id [5051500]\',\'type\' => \'search_context_missing_exception\'},\'shard\' => -1}],\'failed\' => 5,\'successful\' => 0,\'total\' => 5},\'_scroll_id\' => \'c2NhbjswOzE7dG90YWxfaGl0czo1MjE1Ow==\',\'took\' => 2},\'request\' => {\'serialize\' => \'std\',\'path\' => \'/_search/scroll\',\'ignore\' => [],\'mime_type\' => \'application/json\',\'body\' => \'c2Nhbjs1OzQ5MjAwNTM6bHExbENzRDVReEc0OV9UMUgzd3Vkdzs1MDUxNDg1OnJrQ3lsUkRKVHRxRWRWeURoOTB4WVE7NDkyMDA1OTpscTFsQ3NENVF4RzQ5X1QxSDN3dWR3OzUwNTE0OTY6cmtDeWxSREpUdHFFZFZ5RGg5MHhZUTs1MDUxNTAwOnJrQ3lsUkRKVHRxRWRWeURoOTB4WVE7MTt0b3RhbF9oaXRzOjUyMTU7\',\'qs\' => {\'scroll\' => \'1m\'},\'method\' => \'GET\'},\'status_code\' => 404}
',
'stack' => [
[
'searcher.pl',
92,
'Search::Elasticsearch::Scroll::next'
]
],
'text' => '[http://localhost:9200]-[404] Not Found',
'vars' => {
'body' => {
'hits' => {
'hits' => [],
'max_score' => '0',
'total' => 5215
},
'timed_out' => bless( do{\(my $o = 0)}, 'JSON::XS::Boolean' ),
'_shards' => {
'failures' => [
{
'index' => undef,
'reason' => {
'reason' => 'No search context found for id [4920053]',
'type' => 'search_context_missing_exception'
},
'shard' => -1
},
{
'index' => undef,
'reason' => {
'reason' => 'No search context found for id [5051485]',
'type' => 'search_context_missing_exception'
},
'shard' => -1
},
{
'index' => undef,
'reason' => {
'reason' => 'No search context found for id [4920059]',
'type' => 'search_context_missing_exception'
},
'shard' => -1
},
{
'index' => undef,
'reason' => {
'reason' => 'No search context found for id [5051496]',
'type' => 'search_context_missing_exception'
},
'shard' => -1
},
{
'index' => undef,
'reason' => {
'reason' => 'No search context found for id [5051500]',
'type' => 'search_context_missing_exception'
},
'shard' => -1
}
],
'failed' => 5,
'successful' => 0,
'total' => 5
},
'_scroll_id' => 'c2NhbjswOzE7dG90YWxfaGl0czo1MjE1Ow==',
'took' => 2
},
'request' => {
'serialize' => 'std',
'path' => '/_search/scroll',
'ignore' => [],
'mime_type' => 'application/json',
'body' => 'c2Nhbjs1OzQ5MjAwNTM6bHExbENzRDVReEc0OV9UMUgzd3Vkdzs1MDUxNDg1OnJrQ3lsUkRKVHRxRWRWeURoOTB4WVE7NDkyMDA1OTpscTFsQ3NENVF4RzQ5X1QxSDN3dWR3OzUwNTE0OTY6cmtDeWxSREpUdHFFZFZ5RGg5MHhZUTs1MDUxNTAwOnJrQ3lsUkRKVHRxRWRWeURoOTB4WVE7MTt0b3RhbF9oaXRzOjUyMTU7',
'qs' => {
'scroll' => '1m'
},
'method' => 'GET'
},
'status_code' => 404
},
'type' => 'Missing'
}, 'Search::Elasticsearch::Error::Missing' );
The code I'm using is the next one (simplified) :
# Retrieve scroll
my $scroll = $self->getScrollBySignature($item);
# Retrieve all affected documents ids
while (my #docs = $scroll->next(500)) {
# Do stuff with #docs
}
The function getScrollBySignature have the next code in order to call to elasticSearch
my $scroll = $self->{ELASTIC}->scroll_helper(
index => $self->{INDEXES},
search_type => 'scan',
ignore_unavailable => 1,
body => {
size => $self->{PAGINATION},
query => {
filtered => {
filter => {
bool => {
must => [{term => {signature_id => $item->{profileId}}}, {terms => {channel_type_id => $type}}]
}
}
}
}
}
);
As you can see, I'm doing the scroll without passing scroll parameter then as documentation says, the time that scroll is alive is 1 min.
The elasticSearch is a cluster of 3 servers, and the query that ends with that error retrieves a bit more than 5000 docs.
My first solution was to update the life time for scroll to 5 minutes and the error didn't appear.
The question is, as I understand every time I'm calling $scroll->next() the life time off scroll affected is upgraded 1m more, then how is possible to receive those context related errors?
I'm doing something in a bad manner?
Thank you all.
The first thing that comes to mind is that the timer is not updated. Have you checked this? You can do a query every 10 seconds for example and see if at the 6th query it gives you the error ...
Well, a good rule of thumb is inside a ->next() block, don't stay by iteration more than time that you've configured in scroll.
Between each call of ->next() you cannot stay more than that time configured. If you stay more, the scroll may be not be there and the error earch_context_missing_exception will appear.
My solution for this problem was inside next block only store data into array/hash structure and once the scroll process ended work with all data.
The solution of the question example:
# Retrieve scroll
my $scroll = $self->getScrollBySignature($item);
# Retrieve all affected documents ids
my #allDocs;
while (my #docs = $scroll->next(500)) {
push #allDocs, map {$_->{_id}} #docs
}
foreach (#allDocs) {
# Do stuff with doc
}

How do I access values in the data structure returned by XML::Simple?

Hi everyone,
This is very simple for perl programmers but not beginners like me,
I have one xml file and I processed using XML::Simple like this
my $file="service.xml";
my $xml = new XML::Simple;
my $data = $xml->XMLin("$file", ForceArray => ['Service','SystemReaction',
'Customers', 'Suppliers','SW','HW'],);
Dumping out $data, it looks like this:
$data = {
'Service' => [{
'Suppliers' => [{
'SW' => [
{'Path' => '/work/service.xml', 'Service' => 'b7a'},
{'Path' => '/work/service1.xml', 'Service' => 'b7b'},
{'Path' => '/work/service2.xml', 'Service' => 'b5'}]}
],
'Id' => 'SKRM',
'Customers' =>
[{'SW' => [{'Path' => '/work/service.xml', 'Service' => 'ASOC'}]}],
'Des' => 'Control the current through the pipe',
'Name' => ' Control unit'
},
{
'Suppliers' => [{
'HW' => [{
'Type' => 'W',
'Path' => '/work/hardware.xml',
'Nr' => '18',
'Service' => '1'
},
{
'Type' => 'B',
'Path' => '/work/hardware.xml',
'Nr' => '7',
'Service' => '1'
},
{
'Type' => 'k',
'Path' => '/work/hardware.xml',
'Nr' => '1',
'Service' => '1'
}]}
],
'Id' => 'ADTM',
'Customers' =>
[{'SW' => [{'Path' => '/work/service.xml', 'Service' => 'SDCR'}]}],
'Des' => 'It delivers actual motor speed',
'Name' => ' Motor Drivers and Diognostics'
},
# etc.
],
'Systemreaction' => [
# etc.
],
};
How to access each elements in the service and systemReaction(not provided). because I am using "$data" in further processing. So I need to access each Id,customers, suppliers values in each service. How to get particular value from service to do some process with that value.for example I need to get all Id values form service and create nodes for each id values.
To get Type and Nr value I tried like this
foreach my $service (#{ $data->{Service}[1]{Suppliers}[0]{HW}[0] }) {
say $service->{Nr};
}
foreach my $service (#{ $data->{Service}[1]{Suppliers}[0]{HW}[0] }) {
say $service->{Type};
}
can you help me how to get all Nr and Type values from Supplier->HW.
I suggest reading perldocs Reference Tutorial and References and Nested Data Structures. They contain an introduction and full explanation of how to access data like that.
But, for example, you can access the service ID by doing:
say $data->{Service}[0]{Id} # prints SKRM
You could go through all the services, printing their ID, with a loop:
foreach my $service (#{ $data->{Service} }) {
say $service->{Id};
}
In response to your edit
$data->{Service}[1]{Suppliers}[0]{HW}[0] is an hash reference (you can check this quickly by either using Data::Dumper or Data::Dump on it, or just the ref function). In particular, it is { Nr => 18, Path => "/work/hardware.xml", Service => 1, Type => "W" }
In other words, you've almost got it—you just went one level too deep. It should be:
foreach my $service (#{ $data->{Service}[1]{Suppliers}[0]{HW} }) {
say $service->{Nr};
}
Note the lack of the final [0] that you had.