Export ElasticSearch objects to CSV with Perl - perl

I need to export some objects from an ElasticSearch db in the form of CSV "tables".
I just need to retrieve all records from a specified index.
I've tried this found from clintongormley, but I'm facing inssues. The perl code is:
#!/usr/bin/perl
use ElasticSearch;
use Text::CSV_XS;
my $csv_file = 'output.csv';
open my $fh, '>:encoding(utf8)', $csv_file or die $!;
my $csv = Text::CSV_XS->new;
my $e = ElasticSearch->new(servers => '127.0.0.1:9200');
my $s = $e->scrolled_search(
index => 'myindex',
type => 'mytype',
query => { match_all => '' }
);
my #field_names = qw(title name foo bar);
while (my $doc = $s->next) {
my #cols = map {$doc->$_} #field_names;
$csv->print($fh, \#cols);
}
close $fh or die $!;
I get the following:
[_na] query malformed, no field after start_object];
I think the problem is in the es query.
Any suggestions?

Mea culpa. I wrote that code very quickly without testing it :)
Also, the code is very old and refers to the now deprecated ElasticSearch.pm modulde. The new module is Elasticsearch.pm (note the small s).
Here is the code rewritten to use the new module, and so it actually works:
#!/usr/bin/perl
use Elasticsearch;
use Elasticsearch::Scroll;
use Text::CSV_XS;
my $csv_file = 'output.csv';
open my $fh, '>:encoding(utf8)', $csv_file or die $!;
my $csv = Text::CSV_XS->new;
$csv->eol("\r\n");
my $es = Elasticsearch->new( nodes => '127.0.0.1:9200' );
my $s = Elasticsearch::Scroll->new(
es => $es,
index => 'myindex',
type => 'mytype',
body => { query => { match_all => {} } }
);
my #field_names = qw(title count);
while ( my $doc = $s->next ) {
my #cols = map { $doc->{_source}{$_} } #field_names;
$csv->print( $fh, \#cols );
}
close $fh or die $!;
To test it, you can run these curl commands to setup an index with some data:
# delete the test index in case it already exists
curl -XDELETE localhost:9200/myindex
# create some sample docs
curl -XPOST localhost:9200/myindex/mytype/_bulk -d'
{"index": {}}
{"title": "Doc one", "count": 1}
{"index": {}}
{"title": "Doc two", "count": 2}
{"index": {}}
{"title": "Doc three", "count": 3}
'
If you then run the Perl code, the file output.csv will look like this:
"Doc two",2
"Doc three",3
"Doc one",1
Apologies for the bad original example code

Related

Perl parsing file content

I am trying to parse text file content consists of 3 categories and access it in the main code. I got to know that hash maybe a good way but since no columns in the input file is unique (Name could be repeatedly or different), I doubt is there other way to do it. Appreciate any reply.
#!/usr/bin/perl
use strict;
use warnings;
my $file = "/path/to/text.txt";
my %info = parseCfg($file);
#get first line data in text file (Eric cat xxx)
#get second line data in text file (Michelle dog yyy)
#so on
}
sub parseCfg {
my $file = shift;
my %data;
return if !(-e $file);
open(my $fh, "<", $file) or die "Can't open < $file: $!";
my $msg = "-I-: Reading from config file: $file\n";
while (<$fh>) {
if (($_=~/^#/)||($_=~/^\s+$/)) {next;}
my #fields = split(" ", $_);
my ($name, $son, $address) = #fields;
#return something
}
close $fh;
}
Input file format:(basically 3 columns)
#Name pet address
Eric cat xxx
Michelle dog yyy
Ben horse zzz
Eric cat aaa
The question isn't clear how the data will be used in the code.
Following code sample demonstrates how the data can be read and stored in anonymous hash referenced by $href. Then $href stored in anonymous array referenced in $aref which returned by parse_cnf() subroutine.
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my $fname = 'pet_data.txt';
my $data = parse_cnf($fname);
say Dumper($data);
printf "Name: %-12s Pet: %-10s Address: %s\n", $_->#{qw/name pet address/} for $data->#*;
exit 0;
sub parse_cnf {
my $fname = shift;
my $aref;
open my $fh, '<', $fname
or die "Couldn't open $fname";
while( <$fh> ) {
next if /(^\s*$|^#)/;
my $href->#{qw/name pet address/} = split;
push $aref->#*, $href;
}
close $fh;
return $aref;
}
Output
$VAR1 = [
{
'address' => 'xxx',
'pet' => 'cat',
'name' => 'Eric'
},
{
'pet' => 'dog',
'name' => 'Michelle',
'address' => 'yyy'
},
{
'name' => 'Ben',
'pet' => 'horse',
'address' => 'zzz'
},
{
'address' => 'aaa',
'pet' => 'cat',
'name' => 'Eric'
}
];
Name: Eric Pet: cat Address: xxx
Name: Michelle Pet: dog Address: yyy
Name: Ben Pet: horse Address: zzz
Name: Eric Pet: cat Address: aaa

Elastic Search bulk indexing using perl

I have tried the bulk API Perl client for content indexing in Elasticsearch. I am getting Error on the Bulk Ingestion line. Please find the code below:
my $ifileid=0;
my $dir = '/home/bala/input_files/output';
opendir(DIR, $dir) or die $!;
my #arfiles = readdir (DIR);
closedir(DIR);
print scalar #arfiles." Total files\n";
foreach(#arfiles)
{
my $file = $_;
if ($ifileid>1)
{
$doc = {index => 'my_index', type => 'blog_post', id => $ifileid, body => {filename => $file, content => 'bala'}};
push #docs, { create => $doc };
if ($ibulkid==100)
{
# bulk index docs
my $res = $e->bulk(\#docs);
if ( $res->{errors} )
{
die "Bulk index had issues: " . $json->encode( $res->{errors} );
}
$ibulkid=0;
}
$ibulkid++;
}
$ifileid++;
}
I am getting the following error:
Error => Not a HASH reference at /usr/local/share/perl5/Search/Elasticsearch/Role/Client/Direct.pm line 15.
The above usage of bulk api is wrong. bulk takes as input a hashref where the body is a reference to array of actions and documents
For example something on these lines should work:
$action = {index => {_index => 'my_index', _type => 'blog_post', _id => $ifileid}};
$doc = {filename => $file, content => 'bala'};
push #docs, $action;
push #docs,$doc
if ($ibulkid==100)
{
# bulk index docs
my $res = $e->bulk(body => \#docs);
if ( $res->{errors} )
{
die "Bulk index had issues: " . $json->encode( $res->{errors} );
}
$ibulkid=0;
}
$ibulkid++;
}
$ifileid++;

Read ini files without section names

I want to make a configuration file which hold some objects, like this (where of course none of the paramaters can be considered as a primary key)
param1=abc
param2=ghj
param1=bcd
param2=hjk
; always the sames parameters
This file could be read, lets say with Config::IniFiles, because it has a direct transcription into ini file, like this
[0]
param1=abc
param2=ghj
[1]
param1=bcd
param2=hjk
with, for example, something like
perl -pe 'if (m/^\s*$/ || !$section ) print "[", ($section++ || 0) , "]"'
And finish with
open my $fh, '<', "/path/to/config_file.ini" or die $!;
$cfg = Config::IniFiles->new( -file => $fh );
(...parse here the sections starting with 0.)
But, I here ask me some question about the thing becoming quite complex....
(A) Is There a way to transform the $fh, so that it is not required to execute the perl one-liner BEFORE reading the file sequentially? So, to transform the file during perl is actually reading it.
or
(B) Is there a module to read my wonderfull flat database? Or something approching? I let myslef said, that Gnu coreutils does this kind of flat file reading, but I cannot remember how.
You can create a simple subclass of Config::INI::Reader:
package MyReader;
use strict;
use warnings;
use base 'Config::INI::Reader';
sub new {
my $class = shift;
my $self = $class->SUPER::new( #_ );
$self->{section} = 0;
return $self;
}
sub starting_section { 0 };
sub can_ignore { 0 };
sub parse_section_header {
my ( $self, $line ) = #_;
return $line =~ /^\s*$/ ? ++$self->{section} : undef ;
}
1;
With your input this gives:
% perl -MMyReader -MData::Dumper -e 'print Dumper( MyReader->read_file("cfg") )'
$VAR1 = {
'1' => {
'param2' => 'hjk',
'param1' => 'bcd'
},
'0' => {
'param2' => 'ghj',
'param1' => 'abc'
}
};
You can use a variable reference instead of a file name to create a filehandle that reads from it:
use strict;
use warnings;
use autodie;
my $config = "/path/to/config_file.ini";
my $content = do {
local $/;
open my $fh, "<", $config;
"\n". <$fh>;
};
# one liner replacement
my $section = 0;
$content =~ s/^\s*$/ "\n[". $section++ ."]" /mge;
open my $fh, '<', \$content;
my $cfg = Config::IniFiles->new( -file => $fh );
# ...
You can store the modified data in a real file or a string variable, but I suggest that you use paragraph mode by setting the input record separator $/ to the empty string. Like this
use strict;
use warnings;
{
local $/ = ''; # Read file in "paragraphs"
my $section = 0;
while (<DATA>) {
printf "[%d]\n", $section++;
print;
}
}
__DATA__
param1=abc
param2=ghj
param1=bcd
param2=hjk
output
[0]
param1=abc
param2=ghj
[1]
param1=bcd
param2=hjk
Update
If you read the file into a string, adding section identifiers as above, then you can read the result directly into a Config::IniFiles object using a string reference, for instance
my $config = Config::IniFiles->new(-file => \$modified_contents)
This example shows the tie interface, which results in a Perl hash that contains the configuration information. I have used Data::Dump only to show the structure of the resultant hash.
use strict;
use warnings;
use Config::IniFiles;
my $config;
{
open my $fh, '<', 'config_file.ini' or die "Couldn't open config file: $!";
my $section = 0;
local $/ = '';
while (<$fh>) {
$config .= sprintf "[%d]\n", $section++;
$config .= $_;
}
};
tie my %config, 'Config::IniFiles', -file => \$config;
use Data::Dump;
dd \%config;
output
{
# tied Config::IniFiles
"0" => {
# tied Config::IniFiles::_section
param1 => "abc",
param2 => "ghj",
},
"1" => {
# tied Config::IniFiles::_section
param1 => "bcd",
param2 => "hjk",
},
}
You may want to perform operations on a flux of objects (as Powershell) instead of a flux of text, so
use strict;
use warnings;
use English;
sub operation {
# do something with objects
...
}
{
local $INPUT_RECORD_SEPARATOR = '';
# object are separated with empty lines
while (<STDIN>) {
# key value
my %object = ( m/^ ([^=]+) = ([[:print:]]*) $ /xmsg );
# key cannot have = included, which is the delimiter
# value are printable characters (one line only)
operation ( \%object )
}
A like also other answers.

Perl Hash + File + While

well, the idea is to remove a file a direction with their description and store it in a hash
this is content in file /home/opmeitle/files-pl/bookmarks2
}, {
"date_added": "12989744094664781",
"id": "1721",
"name": "Perl DBI - dbi.perl.org",
"type": "url",
"url": "http://dbi.perl.org/"
}, {
"date_added": "12989744373130384",
"id": "1722",
"name": "DBD::mysql - MySQL driver for the Perl5 Database Interface (DBI) - metacpan.org",
"type": "url",
"url": "https://metacpan.org/module/DBD::mysql"
}, {
now, the code in perl.
use strict;
open(FILE, '/home/opmeitle/files-pl/bookmarks2');
my #lines = <FILE>;
my #list55;
my $count = 1;
my $n = 0;
my %hash=(); #$hash{$lines[$n]}=$lines[$n];
while ($lines[$n]) {
if ($lines[$n] =~ /(http:|https:|name)/) {
if ($lines[$n] =~ s/("|: |,|id|url|name|\n)//g) {
if ($lines[$n] =~ s/^\s+//){
if ($lines[$n] =~ /http:|https/){
$hash{$lines[$n]} = '';
}
else {
$hash{$n} = $lines[$n];
}
}
}
}
$n++;
$count++;
}
close(FILE);
# print hash
my $key;
my $value;
while( ($key,$value) = each %hash){
print "$key = $value\n";
}
result after executing the script.
http://dbi.perl.org/ =
https://metacpan.org/module/DBD::mysql =
3 = Perl DBI - dbi.perl.org
9 = DBD::mysql - MySQL driver for the Perl5 Database Interface (DBI) - metacpan.org
but i need something like this
http://dbi.perl.org/ = Perl DBI - dbi.perl.org
Perl DBI - dbi.perl.org = DBD::mysql - MySQL driver for the Perl5 Database Interface (DBI) - metacpan.org
thanks for you answers.
As #amon hinted, Chrome bookmarks are JSON format, for which there are several good modules on CPAN.
use strict;
use warnings;
use JSON;
my $file = '/home/opmeitle/files-pl/bookmarks2';
open my $fh, '<', $file or die "$file: $!\n";
my $inhash = decode_json(join '', <$fh>);
close $fh;
my %outhash = map traverse($_), values %{ $inhash->{roots} };
sub traverse
{
my $hashref = shift;
if (exists $hashref->{children}) {
return map traverse($_), #{ $hashref->{children} };
} else {
return $hashref->{url} => $hashref->{name};
}
}
Now %outhash has the data you wanted.
EDIT: to help understand what's going on here:
use Data::Dumper;
print Dumper($inhash); # pretty-print the structure returned by decode_json
As others have said, the best thing to do is to load the JSON data into a Perl datastructure. This is easily done using the JSON module. Before we can do this, we need to read in the file. There are two ways to do this. The non-CPAN way:
# always ...
use strict;
use warnings;
my $file = '/home/opmeitle/files-pl/bookmarks2';
my $text = do {
open my $fh, '<', $file or die "Cannot open $file: $!\n";
local $/; #enable slurp
<$fh>;
};
or the CPAN way
# always ...
use strict;
use warnings;
use File::Slurp;
my $text = read_file $file;
Once you have the file read in, then decode
use JSON;
my $data = decode_json $text;
Please post a whole file and a better description of what you want and I would be glad to comment on a more formal way of traversing the datastructure.

passing variable to mongodb query in perl

I want to query a list of nickname from a text file.
#!/usr/bin/perl
use strict;
use warnings;
use MongoDB;
# Open file
print "--> Read file\n";
open( INPUT, "<userlist.txt" ) or die "Could not open the file: $!\n";
print "--> Read INPUT OK\n";
open( OUTPUT, ">>outfile.txt" ) or die "Could not open the file: $!\n";
print "--> Read OUTPUT OK\n";
# MongoDB parameter
my $mongoHost = localhost;
my $mongoPort = 12345;
my $conn = MongoDB::Connection->new( "host" => "$mongoHost", "port" => $mongoPort ); # Open connection
my $db = $conn->mylist; # Connect to database
my $user_stats = $db->user_stats; # Choose a collection
print "--> Connect to MongoDB\n";
# Read through line
foreach my $line ( <INPUT> ) {
# Extract content
chomp( $line ); # Remove newline
print "$line\n";
my $statsResult = $user_stats->find( { nickname => '$line' } );
while ( my $obj = $statsResult->next ) {
print $obj->{ "nickname" } . ";";
print $obj->{ "total" } . "\n";
}
}
close( OUTPUT );
close( INPUT );
print "--> End of Code\n";
exit 0;
It seem it fail to recognise variable $line at the line my $statsResult = $user_stats->find( { msisdn => '$line' } );
It works if I replace $line with a string like mynickname. The print statement in previously works ok.
Am I missing something here?
You're using single quotes in your line
my $statsResult = $user_stats->find( { nickname => '$line' } );
Meaning that the database is being searched for the string $line, not the contents of the variable. Remove the single quotes and you should be fine.
Also, here's a nice tutorial on the different forms of quoting in Perl, which explains why single quotes are different from double quotes, what qq means, etc.