How can I do a scrolled search on MetaCPAN? - perl

I'm trying to convert this script to use the new Elasticsearch official client instead of the older (now deprecated) ElasticSearch.pm, but I can't get the scrolled search to work. Here's what I've got:
#! /usr/bin/perl
use strict;
use warnings;
use 5.010;
use Elasticsearch ();
use Elasticsearch::Scroll ();
my $es = Elasticsearch->new(
nodes => 'http://api.metacpan.org:80',
cxn => 'NetCurl',
cxn_pool => 'Static::NoPing',
#log_to => 'Stderr',
#trace_to => 'Stderr',
);
say 'Getting all results at once works:';
my $results = $es->search(
index => 'v0',
type => 'release',
body => {
filter => { range => { date => { gte => '2013-11-28T00:00:00.000Z' } } },
fields => [qw(author archive date)],
},
);
foreach my $hit (#{ $results->{hits}{hits} }) {
my $field = $hit->{fields};
say "#$field{qw(date author archive)}";
}
say "\nUsing a scrolled search does not work:";
my $scroller = Elasticsearch::Scroll->new(
es => $es,
index => 'v0',
search_type => 'scan',
size => 100,
type => 'release',
body => {
filter => { range => { date => { gte => '2013-11-28T00:00:00.000Z' } } },
fields => [qw(author archive date)],
},
);
while (my $hit = $scroller->next) {
my $field = $hit->{fields};
say "#$field{qw(date author archive)}";
} # end while $hit
The first search, where I'm just getting all the results in 1 chunk, works fine. But the second search, where I'm trying to scroll through the results, produces:
Using a scrolled search does not work:
[Request] ** [http://api.metacpan.org:80]-[500]
ActionRequestValidationException[Validation Failed: 1: scrollId is missing;],
called from sub Elasticsearch::Transport::try {...}
at .../Try/Tiny.pm line 83. With vars: {'body' =>
'ActionRequestValidationException[Validation Failed: 1: scrollId is missing;]',
'request' => {'path' => '/_search/scroll','serialize' => 'std',
'body' => 'c2Nhbjs1OzE3MjU0NjM2MjowakFELUU3VFFibTJIZW1ibUo0SUdROzE3MjU0NjM2NDowakFELUU3VFFibTJIZW1ibUo0SUdROzE3MjU0NjM2MTowakFELUU3VFFibTJIZW1ibUo0SUdROzE3MjU0NjM2MDowakFELUU3VFFibTJIZW1ibUo0SUdROzE3MjU0NjM2MzowakFELUU3VFFibTJIZW1ibUo0SUdROzE7dG90YWxfaGl0czoxNDQ7',
'method' => 'GET','qs' => {'scroll' => '1m'},'ignore' => [],
'mime_type' => 'application/json'},'status_code' => 500}
What am I doing wrong? I'm using Elasticsearch 0.75 and Elasticsearch-Cxn-NetCurl 0.02, and Perl 5.18.1.

I finally got it working with the newer Search::Elasticsearch official client. Here's the short version:
#! /usr/bin/perl
use strict;
use warnings;
use 5.010;
use Search::Elasticsearch ();
my $es = Search::Elasticsearch->new(
cxn_pool => 'Static::NoPing',
nodes => 'api.metacpan.org:80',
);
my $scroller = $es->scroll_helper(
index => 'v0',
type => 'release',
search_type => 'scan',
scroll => '2m',
size => 100,
body => {
fields => [qw(author archive date)],
query => { range => { date => { gte => '2015-02-01T00:00:00.000Z' } } },
},
);
while (my $hit = $scroller->next) {
my $field = $hit->{fields};
say "#$field{qw(date author archive)}";
} # end while $hit
Note that the records are not sorted when you do a scrolled search. I wound up dumping the records into a temporary database and sorting them locally. The updated script is on GitHub.

I don't have a direct answer, but I might have an approach to trouble shooting:
I followed your link to the Elasticsearch::Client and found a scroll() method:
https://metacpan.org/pod/Elasticsearch::Client::Direct#scroll
This method takes scroll and scroll_id as parameters. scroll is the number of minutes that you can keep calling the scroll method before the search expires. scroll_id is a marker to the place where the last call to scroll() ended.
$results = $e->scroll(
scroll => '1m',
scroll_id => $id
);
Elasticsearch::Scroll is an object oriented wrapper around scroll() which hides scroll and scroll_id.
I would run perl -d on your script, and step in to $scroller->next and follow that as far down the rabbit hole as you can. Something in there is trying a search which should be populating scroll_id or scrollId and is failing.
My description here is admittedly pretty rough... I ran across an accurate description of what the scroll id is and does during my googling, but I can't seem to find it again.

Related

Config::IniFiles hash behaves different than manually written hash

I am loading a config file, which ends up as an embedded hash, with Config::IniFiles. After that, I want to modify the resulting hash by, for some keys, bringing its values one level up. In the example below, I am aiming for this as a result:
$VAR1 = {
'max_childrensubtree' => '7',
'port' => '1984',
'user' => 'someuser',
'password' => 'somepw',
'max_width' => '20',
'host' => 'localhost',
'attrs' => {
'subattr2' => 'cat',
'topattr1' => 'cat',
'subattr2_1' => 'pt',
'subattr1' => 'rel'
},
'max_descendants' => '1000'
};
So for the keys params and basex at the highest level, I want to move its contents (key-value pairs) to the highest level - and remove the items themselves. In short:
(
a => {
'key1' => 'ok',
'key2' => 'hello'
}
)
turns into
(
'key1' => 'ok',
'key2' => 'hello'
)
The strange thing is that what I am trying to do does not work on a hash built from a read INI file, but it does work with a manually inserted hash. In other words, this works:
#!/usr/bin/perl
use utf8;
use strict;
use warnings;
use Data::Dumper;
my %ini = (
'params' => {
'max_width' => '20',
'max_childrensubtree' => '7',
'max_descendants' => '1000'
},
'attrs' => {
'topattr1' => 'cat',
'subattr1' => 'rel',
'subattr2' => 'cat',
'subattr2_1' => 'pt',
},
'basex' => {
'host' => 'localhost',
'port' => '1984',
'user' => 'someuser',
'password' => 'somepw'
}
);
&_parse_ini(\%ini);
sub _parse_ini {
my $ref = shift;
foreach (('params', 'basex')) {
foreach my $k (keys %{$ref->{$_}}) {
$ref->{$k} = $ref->{$_}->{$k};
}
delete $ref->{$_};
}
print Dumper($ref);
}
But this does not:
#!/usr/bin/perl
use utf8;
use strict;
use warnings;
use Data::Dumper;
use Config::IniFiles;
# Load config file
tie my %ini, 'Config::IniFiles', (-file => $ARGV[0]);
&_parse_ini(\%ini);
sub _parse_ini {
my $ref = shift;
foreach (('params', 'basex')) {
foreach my $k (keys %{$ref->{$_}}) {
$ref->{$k} = $ref->{$_}->{$k};
}
delete $ref->{$_};
}
print Dumper($ref);
}
The input ini file for this example would be:
[params]
max_width = 20
max_childrensubtree = 7
max_descendants = 1000
[attrs]
topattr1 = cat
subattr1 = rel
subattr2 = cat
subattr2_1 = pt
[basex]
host = localhost
port = 1984
user = admin
password = admin
I have been looking in the documentation and on SO for similar issues but have found none. It appears that the hashes are identical (Config::IniFiles doesn't seem to add something specific), so I have no idea why it works for 'manual' hashes, and not for read-in ones.
The two hashes are not identical at all, although they may appear to be from the point of view of the data they contain.
The first one is a regular hash. You can do whatever you like with it.
The second one is a tied hash. It becomes an object of Config::IniFiles, but with a hash like interface. So whilst it appears to be a hash, the package can override the methods for storing or fetching information in the hash however it likes.
In this particular case, it looks like Config::IniFiles will only store a new key value in the hash if the value is hash ref. So you can't flatten out the tied hash as you want. Instead you'll have to create a new hash and copy the data in to it to do what you want.

Example for creating split-transaction in Perl Finance::QIF

I need to import my bank-exported transactions (CSV) into GNUcash.
I am almost finished with the perl script using Finance::QIF
I parse the CSV and write it out like this:
my $record = {
header => "Type:Bank",
date => $outdatum,
memo => $outtext,
transaction => $outbetrag,
};
$out->header( $record->{header} );
$out->write($record);
....
But my problem is creating a split.
http://finance-qif.sourceforge.net/ says " If the transaction contains splits this will be defined and consist of an array of hash references. With each split potentially having the following values." - so I tried this:
my $record = {
header => "Type:Bank",
date => $outdatum,
memo => $outtext,
transaction => $outbetrag,
#splits = (
{
category => "Gesundheit:Arzt:Kind1",
memo => "L",
amount => "-161,66"
},
{
category => "Gesundheit:Arzt:Kind2",
memo => "F",
amount => "-162,66"
}
)
};
This leads to the error:
Unsupported field 'HASH(0x221c9e8)' found in record ignored in file '>_TESTqif.qif' line 22 at convert_bank_CSV.pl line 195.
Unfortunately, I nowhere found an example for creating a split, just for a normal transaction.
Can someone please help how Finance::QIF can be used to create split-transactions?
I know nothing about Finance::QIF but your #splits code makes no sense.
Try this instead:
my $record = {
header => "Type:Bank",
date => $outdatum,
memo => $outtext,
transaction => $outbetrag,
splits => [
{
category => "Gesundheit:Arzt:Kind1",
memo => "L",
amount => "-161,66",
},
{
category => "Gesundheit:Arzt:Kind2",
memo => "F",
amount => "-162,66",
}
],
};
See perldoc perlreftut for more information about references and data structures in Perl.

Perl- Get Hash Value from Multi level hash

I have a 3 dimension hash that I need to extract the data in it. I need to extract the name and vendor under vuln_soft-> prod. So far, I manage to extract the "cve_id" by using the following code:
foreach my $resultHash_entry (keys %hash){
my $cve_id = $hash{$resultHash_entry}{'cve_id'};
}
Can someone please provide a solution on how to extract the name and vendor. Thanks in advance.
%hash = {
'CVE-2015-6929' => {
'cve_id' => 'CVE-2015-6929',
'vuln_soft' => {
'prod' => {
'vendor' => 'win',
'name' => 'win 8.1',
'vers' => {
'vers' => '',
'num' => ''
}
},
'prod' => {
'vendor' => 'win',
'name' => 'win xp',
'vers' => {
'vers' => '',
'num' => ''
}
}
},
'CVE-2015-0616' => {
'cve_id' => 'CVE-2015-0616',
'vuln_soft' => {
'prod' => {
'name' => 'unity_connection',
'vendor' => 'cisco'
}
}
}
}
First, to initialize a hash, you use my %hash = (...); (note the parens, not curly braces). Using {} declares a hash reference, which you have done. You should always use strict; and use warnings;.
To answer the question:
for my $resultHash_entry (keys %hash){
print "$hash{$resultHash_entry}->{vuln_soft}{prod}{name}\n";
print "$hash{$resultHash_entry}->{vuln_soft}{prod}{vendor}\n";
}
...which could be slightly simplified to:
for my $resultHash_entry (keys %hash){
print "$hash{$resultHash_entry}{vuln_soft}{prod}{name}\n";
print "$hash{$resultHash_entry}{vuln_soft}{prod}{vendor}\n";
}
because Perl always knows for certain that any deeper entries than the first one is always a reference, so the deref operator -> isn't needed here.

perl XML::Simple for repeated elements

I have the following xml code
<?xml version="1.0"?>
<!DOCTYPE pathway SYSTEM "http://www.kegg.jp/kegg/xml/KGML_v0.7.1_.dtd">
<!-- Creation date: Aug 26, 2013 10:02:03 +0900 (GMT+09:00) -->
<pathway name="path:ko01200" >
<reaction id="14" name="rn:R01845" type="irreversible">
<substrate id="108" name="cpd:C00447"/>
<product id="109" name="cpd:C05382"/>
</reaction>
<reaction id="15" name="rn:R01641" type="reversible">
<substrate id="109" name="cpd:C05382"/>
<substrate id="104" name="cpd:C00118"/>
<product id="110" name="cpd:C00117"/>
<product id="112" name="cpd:C00231"/>
</reaction>
</pathway>
I am trying to print the substrate id and product id with following code which I am stuck for the one that have more than one ID. Tried to use dumper to see the data structure but I don't know how to proceed. I have already used XML simple for the rest of my parsing script (this part is a small part of my whole script ) and I can not change that now
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
my $xml=new XML::Simple;
my $data=$xml->XMLin("test.xml",KeyAttr => ['id']);
print Dumper($data);
foreach my $reaction ( sort keys %{$data->{reaction}} ) {
print $data->{reaction}->{$reaction}->{substrate}->{id}."\n";
print $data->{reaction}->{$reaction}->{product}->{id}."\n";
}
Here is the output
$VAR1 = {
'name' => 'path:ko01200',
'reaction' => {
'15' => {
'substrate' => {
'104' => {
'name' => 'cpd:C00118'
},
'109' => {
'name' => 'cpd:C05382'
}
},
'name' => 'rn:R01641',
'type' => 'reversible',
'product' => {
'112' => {
'name' => 'cpd:C00231'
},
'110' => {
'name' => 'cpd:C00117'
}
}
},
'14' => {
'substrate' => {
'name' => 'cpd:C00447',
'id' => '108'
},
'name' => 'rn:R01845',
'type' => 'irreversible',
'product' => {
'name' => 'cpd:C05382',
'id' => '109'
}
}
}
};
108
109
Use of uninitialized value in concatenation (.) or string at line 12.
Use of uninitialized value in concatenation (.) or string at line 13.
First of all, don't use XML::Simple. it is hard to predict what exact data structure it will produce from a bit of XML, and it's own documentation mentions it is deprecated.
Anyway, your problem is that you want to access an id field in the product and substrate subhashes – but they don't exist in one of the reaction subhashes
'15' => {
'substrate' => {
'104' => {
'name' => 'cpd:C00118'
},
'109' => {
'name' => 'cpd:C05382'
}
},
'name' => 'rn:R01641',
'type' => 'reversible',
'product' => {
'112' => {
'name' => 'cpd:C00231'
},
'110' => {
'name' => 'cpd:C00117'
}
}
},
Instead, the keys are numbers, and each value is a hash containing a name. The other reaction has a totally different structure, so special-case code would have been written for both. This is why XML::Simple shouldn't be used – the output is just to unpredictable.
Enter XML::LibXML. It is not extraordinary, but it implememts standard APIs like the DOM and XPath to traverse your XML document.
use XML::LibXML;
use feature 'say'; # assuming perl 5.010
my $doc = XML::LibXML->load_xml(file => "test.xml") or die;
for my $reaction_item ($doc->findnodes('//reaction/product | //reaction/substrate')) {
say $reaction_item->getAttribute('id');
}
Output:
108
109
109
104
110
112

Grab the fsynclock status of a mongo db in perl

I'm trying to build a nagios check to check for how long a mongoDB has been locked using fsyncLock() for backup purposes (if the iSCSI snapshotting script blows up and the mongo is not being unlocked for example)
I was thinking about using a simple
$currentLock->run_command({currentOp => 1})
$isLocked = $currentLock->{fsyncLock}
But it seems like run_command() doesn't support currentOp yet. (As seen in there: https://github.com/MLstate/opalang/blob/master/lib/stdlib/apis/mongo/commands.opa)
Woudl anybody have an advice on how to check if a mongo is locked with a perl script? If not, I guess I'll go for some bash. I was thinking about using a db.eval('db.currentOp()') but I'm getting a bit lost.
Thanks!
You are right that run_command does not support doing a currentOp directly. However, if we look at the implementation of db.currentOp in the mongo shell, we can see how it works under the hood:
> db.currentOp
function (arg) {
var q = {};
if (arg) {
if (typeof arg == "object") {
Object.extend(q, arg);
} else if (arg) {
q.$all = true;
}
}
return this.$cmd.sys.inprog.findOne(q);
}
So we can query the special collection $cmd.sys.inprog on the Perl side to get the same inprog array that would be returned in the shell.
use strict;
use warnings;
use MongoDB;
my $db = MongoDB::MongoClient->new->get_database( 'test' );
my $current_op = $db->get_collection( '$cmd.sys.inprog' )->find_one;
When the server is not locked, it will return a structure in $current_op that looks something like this:
{
'inprog' => [
{
'connectionId' => 53,
'insert' => {},
'active' => bless( do{\(my $o = 0)}, 'boolean' ),
'lockStats' => {
'timeAcquiringMicros' => {
'w' => 1,
'r' => 0
},
'timeLockedMicros' => {
'w' => 9,
'r' => 0
}
},
'numYields' => 0,
'locks' => {
'^' => 'w',
'^test' => 'W'
},
'waitingForLock' => $VAR1->{'inprog'}[0]{'active'},
'ns' => 'test.fnoof',
'client' => '127.0.0.1:50186',
'threadId' => '0x105a81000',
'desc' => 'conn53',
'opid' => 7152352,
'op' => 'insert'
}
]
};
During an fsyncLock(), you'll get an empty inprog array but you will have a helpful info field and the expected fsyncLock boolean:
{
'info' => 'use db.fsyncUnlock() to terminate the fsync write/snapshot lock',
'fsyncLock' => bless( do{\(my $o = 1)}, 'boolean' ), # <--- that's true
'inprog' => []
};
So, putting it all together, we get:
use strict;
use warnings;
use MongoDB;
my $db = MongoDB::MongoClient->new->get_database( 'fnarf' );
my $current_op = $db->get_collection( '$cmd.sys.inprog' )->find_one;
if ( $current_op->{fsyncLock} ) {
print "fsync lock is currently ON\n";
} else {
print "fsync lock is currently OFF\n";
}
I actually decided to switch for a solution in bash (easier for what I want to do with the data later):
currentOp=`mongo --port $port --host $host --eval "printjson(db.currentOp())"`
Then some sort of grep -Po '"fsyncLock" : \d'
Thanks for the Perl insight though, it worked perfectly