I'm using Search::Elasticsearch and Search::Elasticsearch::Scroll for search and scroll into my elasticsearch server.
In scrolling process, for some querys, I'm seeing the next errors while I'm scrolling the search results:
2016/03/22 11:03:38 - 265885 FATAL: [Daemon.pm][8221]: Something gone wrong, error $VAR1 = bless( {
'msg' => '[Missing] ** [http://localhost:9200]-[404] Not Found, called from sub Search::Elasticsearch::Scroll::next at searcher.pl line 92. With vars: {\'body\' => {\'hits\' => {\'hits\' => [],\'max_score\' => \'0\',\'total\' => 5215},\'timed_out\' => bless( do{\\(my $o = 0)}, \'JSON::XS::Boolean\' ),\'_shards\' => {\'failures\' => [{\'index\' => undef,\'reason\' => {\'reason\' => \'No search context found for id [4920053]\',\'type\' => \'search_context_missing_exception\'},\'shard\' => -1},{\'index\' => undef,\'reason\' => {\'reason\' => \'No search context found for id [5051485]\',\'type\' => \'search_context_missing_exception\'},\'shard\' => -1},{\'index\' => undef,\'reason\' => {\'reason\' => \'No search context found for id [4920059]\',\'type\' => \'search_context_missing_exception\'},\'shard\' => -1},{\'index\' => undef,\'reason\' => {\'reason\' => \'No search context found for id [5051496]\',\'type\' => \'search_context_missing_exception\'},\'shard\' => -1},{\'index\' => undef,\'reason\' => {\'reason\' => \'No search context found for id [5051500]\',\'type\' => \'search_context_missing_exception\'},\'shard\' => -1}],\'failed\' => 5,\'successful\' => 0,\'total\' => 5},\'_scroll_id\' => \'c2NhbjswOzE7dG90YWxfaGl0czo1MjE1Ow==\',\'took\' => 2},\'request\' => {\'serialize\' => \'std\',\'path\' => \'/_search/scroll\',\'ignore\' => [],\'mime_type\' => \'application/json\',\'body\' => \'c2Nhbjs1OzQ5MjAwNTM6bHExbENzRDVReEc0OV9UMUgzd3Vkdzs1MDUxNDg1OnJrQ3lsUkRKVHRxRWRWeURoOTB4WVE7NDkyMDA1OTpscTFsQ3NENVF4RzQ5X1QxSDN3dWR3OzUwNTE0OTY6cmtDeWxSREpUdHFFZFZ5RGg5MHhZUTs1MDUxNTAwOnJrQ3lsUkRKVHRxRWRWeURoOTB4WVE7MTt0b3RhbF9oaXRzOjUyMTU7\',\'qs\' => {\'scroll\' => \'1m\'},\'method\' => \'GET\'},\'status_code\' => 404}
',
'stack' => [
[
'searcher.pl',
92,
'Search::Elasticsearch::Scroll::next'
]
],
'text' => '[http://localhost:9200]-[404] Not Found',
'vars' => {
'body' => {
'hits' => {
'hits' => [],
'max_score' => '0',
'total' => 5215
},
'timed_out' => bless( do{\(my $o = 0)}, 'JSON::XS::Boolean' ),
'_shards' => {
'failures' => [
{
'index' => undef,
'reason' => {
'reason' => 'No search context found for id [4920053]',
'type' => 'search_context_missing_exception'
},
'shard' => -1
},
{
'index' => undef,
'reason' => {
'reason' => 'No search context found for id [5051485]',
'type' => 'search_context_missing_exception'
},
'shard' => -1
},
{
'index' => undef,
'reason' => {
'reason' => 'No search context found for id [4920059]',
'type' => 'search_context_missing_exception'
},
'shard' => -1
},
{
'index' => undef,
'reason' => {
'reason' => 'No search context found for id [5051496]',
'type' => 'search_context_missing_exception'
},
'shard' => -1
},
{
'index' => undef,
'reason' => {
'reason' => 'No search context found for id [5051500]',
'type' => 'search_context_missing_exception'
},
'shard' => -1
}
],
'failed' => 5,
'successful' => 0,
'total' => 5
},
'_scroll_id' => 'c2NhbjswOzE7dG90YWxfaGl0czo1MjE1Ow==',
'took' => 2
},
'request' => {
'serialize' => 'std',
'path' => '/_search/scroll',
'ignore' => [],
'mime_type' => 'application/json',
'body' => 'c2Nhbjs1OzQ5MjAwNTM6bHExbENzRDVReEc0OV9UMUgzd3Vkdzs1MDUxNDg1OnJrQ3lsUkRKVHRxRWRWeURoOTB4WVE7NDkyMDA1OTpscTFsQ3NENVF4RzQ5X1QxSDN3dWR3OzUwNTE0OTY6cmtDeWxSREpUdHFFZFZ5RGg5MHhZUTs1MDUxNTAwOnJrQ3lsUkRKVHRxRWRWeURoOTB4WVE7MTt0b3RhbF9oaXRzOjUyMTU7',
'qs' => {
'scroll' => '1m'
},
'method' => 'GET'
},
'status_code' => 404
},
'type' => 'Missing'
}, 'Search::Elasticsearch::Error::Missing' );
The code I'm using is the next one (simplified) :
# Retrieve scroll
my $scroll = $self->getScrollBySignature($item);
# Retrieve all affected documents ids
while (my #docs = $scroll->next(500)) {
# Do stuff with #docs
}
The function getScrollBySignature have the next code in order to call to elasticSearch
my $scroll = $self->{ELASTIC}->scroll_helper(
index => $self->{INDEXES},
search_type => 'scan',
ignore_unavailable => 1,
body => {
size => $self->{PAGINATION},
query => {
filtered => {
filter => {
bool => {
must => [{term => {signature_id => $item->{profileId}}}, {terms => {channel_type_id => $type}}]
}
}
}
}
}
);
As you can see, I'm doing the scroll without passing scroll parameter then as documentation says, the time that scroll is alive is 1 min.
The elasticSearch is a cluster of 3 servers, and the query that ends with that error retrieves a bit more than 5000 docs.
My first solution was to update the life time for scroll to 5 minutes and the error didn't appear.
The question is, as I understand every time I'm calling $scroll->next() the life time off scroll affected is upgraded 1m more, then how is possible to receive those context related errors?
I'm doing something in a bad manner?
Thank you all.
The first thing that comes to mind is that the timer is not updated. Have you checked this? You can do a query every 10 seconds for example and see if at the 6th query it gives you the error ...
Well, a good rule of thumb is inside a ->next() block, don't stay by iteration more than time that you've configured in scroll.
Between each call of ->next() you cannot stay more than that time configured. If you stay more, the scroll may be not be there and the error earch_context_missing_exception will appear.
My solution for this problem was inside next block only store data into array/hash structure and once the scroll process ended work with all data.
The solution of the question example:
# Retrieve scroll
my $scroll = $self->getScrollBySignature($item);
# Retrieve all affected documents ids
my #allDocs;
while (my #docs = $scroll->next(500)) {
push #allDocs, map {$_->{_id}} #docs
}
foreach (#allDocs) {
# Do stuff with doc
}
Related
I have a data set with different types of plans, the data below shows the activeplans and pastplans elements, (and other plans not included in the example).
I think the data can be represented as follows
$data->activeplans->activeplanloc[activeplan->{}, activeplan->{},
...activeplan->{}] $data->pastplans->pastplanloc[pastplan->{},
pastplan->{}, ...pastplan->{}]
Each plan element has number of attributes, for instance id, lat, long, numpersons (and other attributes not included in the example)
My goal is to loop through all the plan items and extract the attributes.
Also note, the ...planloc[] outer element and the lat/long fields it contains along with the empty ...plan[] - can be ignored.
This is the loop I tried to do it with, but I'm stuck on exacting the activeplan elements, can you help correct my syntax error, I don't now how to properly load the elements into an array given this data stucture?
foreach my $planArrayItem (#{$data->{"activeplans"}->{"activeplanloc"}->{"activeplan"}{}}) {
#...
if (exists $planArrayItem->{numpersons}) {
$tmp .= "<li>Number of personal: $projArrayItem->{numpersons}</li>";
}
#...
}
Oh, and this is the data set.
{ 'updatetime' => '3/24/2021 11:44:19 AM', 'pastplans' =>
{ 'pastplanloc' => [ { 'longitude' => '-29.51502', 'latitude' =>
'32.307558', 'pastplan' => { 'planclass' => 'A', 'longitude' =>
'-29.51502', 'id' => '211', 'latitude' => '32.307558',
'numlocations' => '15' } }, { 'longitude' => '-28.798305',
'latitude' => '32.656135', 'pastplan' => [ { 'id' => '214',
'longitude' => '-28.798305', 'latitude' => '32.656135',
'planclass' => 'E', 'numlocations' => '16' }, { 'longitude' =>
'-28.798305', 'id' => '215', 'latitude' => '32.656135', 'planclass'
=> 'C', 'numlocations' => '21' } ] } ] }, 'activeplans' =>
{ 'activeplanloc' => [ { 'latitude' => '33.132491', 'activeplan'
=> [ { 'planclass' => 'B', 'longitude' => '-25.304968', 'id' =>
'942', 'latitude' => '33.132491', 'numpersons' => '17' },
{ 'numpersons' => '21', 'planclass' => 'G', 'id' => '943',
'longitude' => '-25.304968', 'latitude' => '33.132491' } ],
'longitude' => '-25.304968' }, { 'latitude' => '33.097290',
'activeplan' => { 'numpersons' => '31', 'id' => '944',
'longitude' => '-25.295086', 'latitude' => '33.097290',
'planclass' => 'M' }, 'longitude' => '-25.295086' } ] } };
This is the XML format if there is a better way to format it while reading in perhaps?
<?xml version="1.0" encoding="utf-8"?>
<plans>
<updatetime>3/24/2021 11:44:19 AM</updatetime>
<pastplans>
<pastplanloc latitude="32.307558" longitude="-29.51502">
<pastplan planclass="A" id="211" numpersons="15" latitude="32.307558" longitude="-29.51502"/>
</pastplanloc>
<pastplanloc latitude="32.656135" longitude="-28.798305">
<pastplan planclass="E" id="214" numpersons="16" latitude="32.656135" longitude="-28.798305"/>
<pastplan planclass="C" id="215" numpersons="21" latitude="32.656135" longitude="-28.798305"/>
</pastplanloc>
</pastplans>
<activeplans>
<activeplanloc latitude="33.132491" longitude="-25.304968">
<activeplan planclass="B" id="942" numpersons="17" latitude="33.132491" longitude="-25.304968"/>
<activeplan planclass="G" id="943" numpersons="21" latitude="33.132491" longitude="-25.304968"/>
</activeplanloc>
<activeplanloc latitude="33.097290" longitude="-25.295086">
<activeplan planclass="M" id="944" numpersons="31" latitude="33.097290" longitude="-25.295086"/>
</activeplanloc>
</activeplans>
</plans>
I am quite certain that:
foreach my $planArrayItem (#{$data->{"activeplans"}->{"activeplanloc"}}) {
#...
if ($planArrayItem->{"activeplan"}{numpersons}) {
$tmp.= "<li>Number of personal: ".$planArrayItem->{"activeplan"}->{numpersons}."</li>";
}
}
is the code you are looking for. As you stated above "activeplanloc" contains an array which reference an activeplan. So the outer loop has to iterate over this.
With your new "data set", the correct code is:
foreach my $planArrayItem (#{$data->{"activeplans"}->{"activeplanloc"}}) {
#...
my $plans = ref $planArrayItem->{"activeplan"} eq "ARRAY" ?
$planArrayItem->{"activeplan"} : [$planArrayItem->{"activeplan"}];
foreach my $plan (#$plans) {
if ($plan->{numpersons}) {
$tmp .= "<li>Number of personal: ".$plan->{numpersons}."</li>";
}
}
}
See https://pastebin.com/QeD6ZwZz for a working example
I know there are many questions already with this kind of subject, but as far as I know (perl beginner so I could be wrong), I'm not using an array so I don't understand where this output comes from
$VAR1 = {
'BridgeMode' => {
'Ten-GigabitEthernet1/0/5' => {
'Description' => 'poort1',
'Duplex' => 'F(a)',
'Interface' => 'Ten-GigabitEthernet1/0/5',
'Link' => 'UP',
'PVID' => '100',
'Speed' => '10G(a)',
'Type' => 'A',
'Vlan100' => {
'UntaggedPorts' => [
'Ten-GigabitEthernet1/0/5'
]
},
'vlanID' => [
'Vlan100'
]
},
Above is the content of my dumper and this is my print statement:
my $untaggedInterface = $data{BridgeMode}{"Ten-GigabitEthernet1/0/5"}{Vlan100}{UntaggedPorts} ;
print "Untagged: $untaggedInterface \n" ;
I would expect that the print statement would print "Ten-GigabitEthernet1/0/5" but instead it shows this:
Untagged: ARRAY(0x24a8ec0)
edit - it is possible that there exists an tagged and an untagged:
$VAR1 = {
'BridgeMode' => {
'Ten-GigabitEthernet1/0/12' => {
'Description' => 'poort5',
'Duplex' => 'F(a)',
'Interface' => 'Ten-GigabitEthernet1/0/12',
'Link' => 'UP',
'PVID' => '100',
'Speed' => '10G(a)',
'Type' => 'H',
'Vlan100' => {
'UntaggedPorts' => [
'Ten-GigabitEthernet1/0/12'
]
},
'Vlan107' => {
'TaggedPorts' => [
'Ten-GigabitEthernet1/0/12'
]
},
edit: printing the content of the array
my #untaggedInterface = $data{BridgeMode}{"Ten-GigabitEthernet1/0/5"}{Vlan100}{UntaggedPorts} ;
print join(", ", #untaggedInterface) ;
stil gives
ARRAY(0x1c03a68)
You would get the expected result if you had the following, i.e. a string instead of a string array:
'UntaggedPorts' => 'Ten-GigabitEthernet1/0/12'
Otherwise, you must specify the index of the array element:
my $untaggedInterface = $data{BridgeMode}{"Ten-GigabitEthernet1/0/5"}{Vlan100}{UntaggedPorts}[0];
I have the following output from hash in perl:
$VAR1 = {
'ins_api' => {
'sid' => 'eoc',
'outputs' => {
'output' => [
{
'body' => {
'TABLE_interface' => {
'ROW_interface' => [
{
'vdc_lvl_in_pkts' => 17081772,
'vdc_lvl_in_avg_bits' => 3128,
'eth_autoneg' => 'on',
'eth_speed' => '1000 Mb/s',
'admin_state' => 'up',
'vdc_lvl_out_mcast' => '65247',
'state' => 'up',
'eth_mtu' => '1500',
'eth_hw_addr' => '78ba.f9ad.b248',
'eth_mdix' => 'off',
'interface' => 'mgmt0',
'eth_ip_addr' => '10.56.32.84',
'eth_bw' => 1000000,
'vdc_lvl_in_avg_pkts' => '3',
'vdc_lvl_out_bytes' => '3463952330',
'vdc_lvl_in_ucast' => '7653891',
'eth_ip_prefix' => '10.',
'eth_rxload' => '1',
'eth_txload' => '1',
'eth_reliability' => '255',
'eth_dly' => 10,
'vdc_lvl_in_mcast' => '8742911',
'eth_ip_mask' => 24,
'eth_bia_addr' => '78ba.f9ad.b248',
'eth_duplex' => 'full',
'vdc_lvl_out_pkts' => '8668507',
'vdc_lvl_out_avg_pkts' => '1',
'vdc_lvl_in_bcast' => '684970',
'vdc_lvl_out_avg_bits' => '1840',
'medium' => 'broadcast',
'vdc_lvl_out_bcast' => '5',
'vdc_lvl_out_ucast' => '8603255',
'eth_ethertype' => '0x0000',
'vdc_lvl_in_bytes' => '1985125644',
'eth_hw_desc' => 'GigabitEthernet'
},
{
'eth_babbles' => '0',
'eth_outbytes' => '7362149107971',
'eth_outucast' => '16348249961',
'eth_clear_counters' => 'never',
'eth_watchdog' => '0',
'eth_inpkts' => 8644872191,
'eth_inbytes' => '3415386845315',
'eth_out_flowctrl' => 'off',
'eth_bad_proto' => '0',
'eth_frame' => '0',
----- output omitted -------
},
What would be the best way to loop via ROW_interface array and print some of the elements? I am just trying to get the elements in the ROW_interface array.
my $ROW_Interfaces = $output->{body}{TABLE_interface}{ROW_interface};
for my $ROW_Interfaces (#$ROW_Interfaces) {
...
}
It seems there can be more than one output, so you'll have to locate the appropriate one similarly.
Similar to #ikegami's answer, but handles the multiple output entries. The if defined... is there as the structure isn't complete, and I wasn't sure if each entry had the same keys or not.
for my $output (#{ $VAR1->{ins_api}{outputs}{output} }){
for my $row_int (#{ $output->{body}{TABLE_interface}{ROW_interface} }){
print "$row_int->{eth_frame}\n" if exists $row_int->{eth_frame};
}
}
I have the following code to scrape a form for inputs and get the attributes id and name.
#!/usr/bin/perl
use warnings;
use strict;
use URI;
use Data::Dumper::Simple;
use Web::Scraper;
my $urlToScrape = "http://digitalarkivet.arkivverket.no/finn_kilde";
my $scrap = scraper {
process 'div.listGroup.open > ul.grouped > li.expandable', 'data[]' => scraper {
process 'input', 'id' => '#id', name => '#name';
process 'label', 'label_for' => '#for';
process 'span.listExpander ', 'Text' => 'TEXT';
process 'ul.sublist1', 'sublist[]' => scraper {
process 'input', 'id' => '#id', name => '#name';
process 'label', 'label_for' => '#for';
process 'span', 'label' => 'TEXT';
};
};
};
my $res = $scrap->scrape(URI->new($urlToScrape));
print Dumper($res);
which gives me (shortend $res to fit screen better)
$res = {
'data' => [
{
'label_for' => 'ka0',
'sublist' => [
{
'label' => 'Statlig folketelling',
'label_for' => 'ka0kt0',
'name' => 'kt[]',
'id' => 'ka0kt0'
}
],
'name' => 'ka[]',
'id' => 'ka0>',
'Text' => 'Folketellinger'
},
{
'sublist' => [
{
'label' => 'Manntall',
'name' => 'kt[]',
'label_for' => 'ka1kt0',
'id' => 'ka1kt0'
}
],
'label_for' => 'ka1',
'id' => 'ka1>',
'name' => 'ka[]',
'Text' => 'Manntall'
},
....
{
'label_for' => 'r0',
'sublist' => [
{
'label_for' => 'r0f0',
'id' => 'r0f0',
'name' => 'f[]',
'label' => "01 Østfold"
}
],
'id' => 'r0',
'name' => 'r[]',
'Text' => "Østlandet"
},
{
'Text' => "Sørlandet",
'id' => 'r1',
'sublist' => [
{
'label_for' => 'r1f0',
'name' => 'f[]',
'id' => 'r1f0',
'label' => '09 Aust-Agder'
}
],
'label_for' => 'r1',
'name' => 'r[]'
}
]
};
I' have 2 issues I need to fix. First, I only want to get data for inputs having 'name' = ka[] (at top level).
Second, I only get data for first ul.sublist1 (If you study the page I'm scraping you can see that several "Kildekategori" have subsets of data, which are revealed if expanded/ clicked upon. Putting brackets on Text[] only gets me the sublist textnames, but not their attributes.
I'm thinking I might have to grab data in 2 scrapes instead, since nested values are revealed by id and label_for.
Solved it by scraping three times, foreach "level"
#!/usr/bin/perl
use strict;
use warnings;
use URI;
use Web::Scraper;
use Data::Dumper::Simple;
my %site;
my #res;
my $i;
my $j;
my $label_for;
my #scrape;
$site{'siteID'} = 1;
$site{'url'} = "http://digitalarkivet.arkivverket.no/finn_kilde";
$site{'name'} = "finn_kilde";
open FIL, ">$site{'name'}.csv" or die $!;
my $seperator=";";
$scrape[0] = scraper {
process 'div.listGroup.open > ul.grouped > li.expandable', 'data[]' => scraper {
process 'input',
'id' => '#id',
'value' => '#value',
'type' => '#type',
'name' => '#name';
process 'label', 'label_for' => '#for';
process 'span.listExpander ', 'text' => 'TEXT';
};
};
$scrape[1] = scraper {
process 'ul.sublist1 > li', 'data[]' => scraper {
process 'input',
'id' => '#id',
'value' => '#value',
'type' => '#type',
'name' => '#name';
process 'label', 'label_for' => '#for';
process 'span', 'text' => 'TEXT';
}
};
$scrape[2] = scraper {
process 'ul.sublist2 > li', 'data[]' => scraper {
process 'input',
'id' => '#id',
'value' => '#value',
'type' => '#type',
'name' => '#name';
process 'label', 'label_for' => '#for';
process 'span', 'text' => 'TEXT';
}
};
for $i (0 .. $#scrape){
$res[$i] = $scrape[$i]->scrape(URI->new($site{'url'}));
unless ($i) {
print FIL join($seperator,"label_for","text","name","value","id","type")."\n";
}
for $j (0 .. $#{$res[$i]->{data}}) {
if (defined($res[$i]->{data}[$j]->{label_for})){
$label_for=$res[$i]->{data}[$j]->{label_for};
} else {
$label_for="";
}
if (length($label_for)>0) {
my $name=$res[$i]->{data}[$j]->{name};
my $text=$res[$i]->{data}[$j]->{text};
my $value=$res[$i]->{data}[$j]->{value};
my $id=$res[$i]->{data}[$j]->{id};
my $type=$res[$i]->{data}[$j]->{type};
my #row=($label_for,$text,$name,$value,$id,$type);
print FIL join($seperator,#row);
print FIL "\n";
}
}
sleep(2);
}
close FIL;
print Dumper(\#res);
1;
I'v been trying to figure out if I can change an appender's filter at run-time that I've defined via a configuration file.
log4perl.filter.M1 = Log::Log4perl::Filter::LevelMatch
log4perl.filter.M2 = Log::Log4perl::Filter::LevelMatch
log4perl.filter.M1.LevelToMatch = INFO
log4perl.filter.M1.AcceptOnMatch = true
log4perl.filter.M2.LevelToMatch = WARN
log4perl.filter.M2.AcceptOnMatch = true
log4perl.filter.MyBoolean0 = Log::Log4perl::Filter::Boolean
log4perl.filter.MyBoolean0.logic = M1
log4perl.filter.MyBoolean1 = Log::Log4perl::Filter::Boolean
log4perl.filter.MyBoolean1.logic = M1 || M2
log4perl.appender.SCREEN.Filter = MyBoolean0
I'd like to change this filter from MyBoolean0 for the SCREEN to MyBoolean1, but do it after my program has started running.
Poking at the APPENDER_BY_NAME hash for SCREEN using Data::Dumper shows the following:
$VAR1 = bless( {
'appender' => bless( {
'Filter' => 'MyBoolean0',
'color' => {
...
...
'filter' => bless( {·
'params' => {·
'M3' => bless( {·
'LevelToMatch' => 'ERROR',
'name' => 'M3',
'AcceptOnMatch' => 1
}, 'Log::Log4perl::Filter::LevelMatch' ),
'M1' => bless( {·
'LevelToMatch' => 'INFO',
'name' => 'M1',
'AcceptOnMatch' => 1
}, 'Log::Log4perl::Filter::LevelMatch' ),
'M2' => bless( {·
'LevelToMatch' => 'WARN',
'name' => 'M2',
'AcceptOnMatch' => 1
}, 'Log::Log4perl::Filter::LevelMatch' )
},
'name' => 'MyBoolean0',
'eval_func' => sub { "DUMMY" },
'logic' => 'M1 || M2 || M3'
}, 'Log::Log4perl::Filter::Boolean' ),
'warp_message' => undef,
'name' => 'SCREEN'
}, 'Log::Log4perl::Appender' );
But mucking with this HASH seems hackish to me. Is there a better way to change an appender's filters?
You may use undocumented appender's property filter:
$Log::Log4perl::Logger::APPENDER_BY_NAME{'SCREEN'}->filter(
Log::Log4perl::Filter::by_name('MyBoolean1')
);
Also you may use two appenders:
log4perl.appender.SCREEN0.Filter = MyBoolean0
log4perl.appender.SCREEN1.Filter = MyBoolean1
And change it in runtime:
$logger->remove_appender('SCREEN0', 1);
$logger->add_appender(
Log::Log4perl::Config::create_appender_instance(
$Log::Log4perl::Config::OLD_CONFIG,
'SCREEN1',
\%Log::Log4perl::Logger::APPENDER_BY_NAME
)
);