im interested in performing multiple database actions in parallel. I have played with Perl Parallel::ForkManager but not used it with any databases yet. I have read that database connectivity is not supported very well with this. Does anyone have experience with this?
As an example i would probably be spawning a system call(which does the DBI work) NOT raw code, i.e.
#!/opt/local/bin/perl -w
use strict;
use Parallel::ForkManager;
$| = 1;
my $max_procs = 10;
my $pm = new Parallel::ForkManager($max_procs);
for (my $var = 0; $var <= 10; $var++) {
my $pid = $pm->start('proc'.$var) and next;
sleep ( 2 );
system( "./DBworker.pl $var" );
$pm->finish(0);
}
print "Waiting for child procs\n";
$pm->wait_all_children;
print "complete!\n";
If the work is being done by other programs, there is no danger to forking. The danger comes when you open a connection to the database and then fork. The child can't reuse the parents connection; however, take a look at DBIx::Connector, it handles the things you need to do after forking for you and running multiple programs is generally not the right answer.
#!/usr/bin/perl
use strict;
use warnings;
use DBIx::Connector;
use Parallel::ForkManager;
my $dsn = "dbi:SQLite:dbname=foo.db";
my $user = "";
my $pass = "";
my $conn = DBIx::Connector->new($dsn, $user, $pass,
{
AutoCommit => 0,
PrintError => 0,
RaiseError => 1,
ChopBlanks => 1,
FetchHashKeyName => 'NAME_lc',
}
);
END { unlink "foo.db" }
#setup table
$conn->run(fixup => sub {
my $dbh = $_;
$dbh->do("create table foo ( id integer, name char(35) )");
my $sth = $dbh->prepare("insert into foo (id, name) values (?, ?)");
while (<DATA>) {
chomp;
$sth->execute(split /,/);
}
});
my $pm = Parallel::ForkManager->new(3);
my $sth = $conn->dbh->prepare("select * from foo where id = ?");
for my $id (1 .. 3) {
next if $pm->start;
$sth->execute($id);
while (my $row = $sth->fetchrow_hashref) {
print "$id saw $row->{id} => $row->{name}\n";
}
$pm->finish;
}
$pm->wait_all_children;
print "done\n";
__DATA__
1,foo
2,bar
3,baz
Related
I'm trying to get rid of the if statements at the end of a subroutine I'm writing to handle SELECT queries:
sub select_query {
my ($params, $query, $return_type) = #_;
my $qh = $dbx->prepare($query);
my $param_count = 1;
foreach my $param (#$params) {
$qh->bind_param($param_count++, $param);
}
$qh->execute;
if ($return_type eq 'fetchrow_array') {
return $qh->fetchrow_array;
}
if ($return_type eq 'fetchall_arrayref') {
return $qh->fetchall_arrayref;
}
... AND SO ON ...
}
I'm familiar with the idea of a dispatch table to call different subroutines. What code could I use to efficiently call the various dbi methods on the $qh handle?
All you need is $qh->$return_type(...).
If you wish to validate, you can use a trivial lookup table.
my #valid_return_types = qw( fetchrow_array ... );
my %valid_return_types = map { $_ => 1 } #valid_return_types;
die "..." if !$valid_return_types{$return_type};
You will need to define the dispatch table after creating the statement handle.
use strict;
use warnings;
use DBI;
use Data::Dumper;
my $dbh = DBI->connect("DBI:mysql:mysql:localhost", 'user', 'password');
my $sth = $dbh->prepare("select * from user");
my $return_type = 'fetchall_arrayref';
my %dispatch = (fetchall_arrayref => sub {return $sth->fetchall_arrayref},
fetchrow_array => sub {return $sth->fetchrow_array});
$sth->execute;
print Dumper $dispatch{$return_type}->();
UPDATE:
Actually, let me make a correction. You can, if you wish, define the dispatch table prior creating the statement handle.
my %dispatch = (fetchall_arrayref => sub {return $_[0]->fetchall_arrayref},
fetchall_hashref => sub {return $_[0]->fetchall_hashref('User')},
fetchrow_array => sub {return $_[0]->fetchrow_array});
my $dbh = DBI->connect("DBI:mysql:mysql:localhost", 'root', 'password') or die;
my $sth = $dbh->prepare("select * from user");
foreach my $type (qw(fetchall_arrayref fetchall_hashref fetchrow_array)) {
$sth->execute;
print Dumper $dispatch{$type}->($sth);
}
I need to query different LDAP servers in perl via Net::LDAP. I have something that works well. However, in an attempt to speed up things, I tried to query the different servers in parallel, using Parallel::Forkmanager - and things do not work when I do that.
I get the following types of errors:
decode error 02<=>30 0 8 at /Users/myname/perl5/perlbrew/perls/perl-5.18.2/lib/site_perl/5.18.2/Convert/ASN1/_decode.pm line 113, <> line 18.
decode error 43<=>30 0 8 at /Users/myname/perl5/perlbrew/perls/perl-5.18.2/lib/site_perl/5.18.2/Convert/ASN1/_decode.pm line 113, <> line 25.
at the line that gets the search response:
$mesg = $ldap->search( base => $dn, filter => '(CN=*)');
I am puzzled.
Telling it in other other words, why does this fail:
use Net::LDAP;
use Parallel::Forkmanager;
...; # bind LDAP servers
while (<>) {
chop;
my $dn = $_;
foreach my $ldap (#servers) {
my $pid;
$pid = $pm->start and next; # do the fork
print $dn, $pid;
my $mesg;
try {
$mesg = $ldap->search( base => $dn, filter => '(CN=*)');
} catch {
...;
}
$pm->finish;
}
}
while this:
use Net::LDAP;
...; # bind LDAP servers
while (<>) {
chop;
my $dn = $_;
foreach my $ldap (#servers) {
print $dn;
my $mesg;
try {
$mesg = $ldap->search( base => $dn, filter => '(CN=*)');
} catch {
...;
}
}
}
works perfectly?
Whilst forking doesn't have quite the same thread safety problems of threading - there are still a few places you have gotchas. I think this is what's biting you - your Net::LDAP objects are created in the parent thread, but then (effectively) cloned to each when you fork.
Which means in your code - there's a very real possibility that if you've got the list of names coming in fast enough, that a new fork will try to reuse an existing Net::LDAP connection before a previous one is finished with it.
The easy way of preventing this is call wait_all_children to ensure all your parallel LDAP queries are finished before the next one starts.
If you put your LDAP bind within the ForkManager loop, do you still have the same problem? I appreciate that's a potential overhead as you'll be binding each iteration, but if that addresses it, I'd suggest that it's because Net::LDAP is sharing the same file descriptors between forks.
The next best solution there would be to adopt a 'worker' model, where you've got a bunch of 'workers' each with their on LDAP connections to do the querying. That's easier with threading, than forking - goes a bit like this:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
sub server_worker {
my ( $hostname, $q_ref ) = #_;
## connect LDAP to $hostname;
while ( my $dn = $$q_ref->dequeue ) {
#query $dn
}
}
my #queues;
foreach my $server (#list_of_servers) {
my $server_q = Threads::Queue->new();
push( #queues, $server_q );
threads->create( \&server_worker, $hostname, \$server_q );
}
while ( my $dn = <STDIN> ) {
chomp($dn);
foreach my $q (#queues) {
$q->enqueue($dn);
}
}
foreach my $q ( #queues ) {
$q -> end;
}
foreach my $thr ( threads->list ) {
$thr->join();
}
Doing something similar with forking should work:
#!/usr/bin/perl
use strict;
use warnings;
use IO::Pipe;
use Parallel::ForkManager;
use Net::LDAP;
my #list_of_servers = qw ( servername servenama anotherserver );
my $pm = Parallel::ForkManager -> new ( scalar #list_of_servers );
my %pipe_for;
foreach my $server ( #list_of_servers ) {
my $pipe = IO::Pipe -> new();
my $pid = pm -> start;
if ( $pid ) {
print "$$: parent\n";
$pipe -> writer -> autoflush;
$pipe_for{$server} = $pipe;
}
else {
print "$$ child connecting to $server\n";
$pipe -> reader -> autoflush;
close ( STDIN ); #because this is a child.
#Net::LDAP setup
while ( my $item = <$pipe> ) {
chomp ( $item );
#ldap_search $item;
}
}
$pm -> finish;
}
And then send stuff:
for my $number ( 1..10 ) {
foreach my $pipe ( values %pipe_for ) {
print {$pipe} "test$number\n";
}
}
$pm -> wait_all_children();
Edit: Note - autoflush is important, otherwise the IO buffers and doesn't look like it's working. I'm pretty sure closing STDIN is probably a good idea in the child, but perhaps not vitally necessary if they don't use it.
I have the following script that runs a command and puts the data in a DB. I need to account for the possibility of being asked for a password "password:" some of the time. How do I wrap an expect call into this?
#!/usr/software/bin/perl
use strict;
use DatabaseLib;
use Data::Dumper;
use Expect;
#Connect to database
my $dbh = DBI->connect($DB_CONNECT_STRING, $DB_USER, $DB_PASSWORD, { RaiseError => 1, AutoCommit => 1 })
or die "failed to connect to database: $DB_CONNECT_STRING";
my $expect = Expect->new;
my %burtHash;
my #cols = qw/date_create sub_by impact date-lastmod lastmod-by bug_rel case_score state s p type subtype subteam found_by target_release/;
my #burtInfo = `mycommand`;
my $timeout = 20;
my $password = "password";
while(my $ele = shift(#burtInfo)){
my ($index, #data) = split(/\s+/, $ele);
for my $i(0 .. $#cols){
$burtHash{$index}->{$cols[$i]} = shift(#data);
}
for my $id (keys %burtHash){
my %burt_details;
for my $col (keys %{$burtHash{$id}} ) {
$burt_details{$col} = $burtHash{$id}->{$col};
}
if ( $id =~ /\d+/) {
burt_update(
$dbh,
$id ,
\%burt_details,
);
}
}
}
I think I just need to put in something like this and call it, but i'm not sure where/how:
$expect->expect($timeout,
[ qr/password:/i, #/
sub {
my $self = shift;
$self->send("$password\n");
exp_continue;
}
]);
You're not using $expect anywhere there. You have to run your command via $expect->spawn so that your Expect object can handle things. And then you'll need some way of gathering its output (I'm thinking using $expect->log_file(...) to set the log to a string filehandle or something).
Once you're using $expect->spawn, then you can insert your password check. But there's no way you can do this with qx (the backticks).
I am trying to have a pool of shared connections that can be accessed by Net::Server instances. Unfortunately IPC::Shareable does not allow me to store the connections as they are code references. This is a stripped down version of the code:
use IPC::Shareable (':lock');
use parent 'Net::Server::Fork';
use MyConnectClass;
sub login {
return MyConnectClass->new();
};
my %connection;
tie %connection, 'IPC::Shareable', 'CONN', {
'create' => 1,
'exclusive' => 0,
'mode' => 0666,
'destroy' => 'yes',
}
or croak 'Can not tie connection variable';
sub add_connection {
my $id = shift(#_);
my $con = shift(#_);
$connection{$id} = $con;
};
sub get_connection {
my $id = # .. find unused connection
return $connection{$id};
}
sub process_request {
my $self = shift(#_);
eval {
my $connection = get_connection();
my $line = <STDIN>;
# .. use $connection to fetch data for user
};
};
for (my $i=0; $i<10; $i++) {
add_connection($i, &login);
};
main->run(
'host' => '*',
'port' => 7000,
'ipv' => '*',
'max_server' => 3,
};
Unfortunately the program dies after the first login: 'Can't store CODE items at ../../lib/Storable.pm'. This happens even when hiding $connection in an anonymous array. I am looking for an alternative to utilize the pool.
I appreciate your support
I am unable to propose an alternative module, but make a suggestion which may or not be of use. While you cannot store CODE, you can store strings which can be evaluated to run. would it be possible to pass a reference to the string q!&login! which you can dereference call after being assigned to $connection. ?
#!/usr/bin/perl
use warnings;
use strict;
use Storable;
my $codestring = q'sub { q^japh^ };' ;
#my $codestring = q'sub { return MyConnectClass->new(); }';
#
# for (0..9){ add_connection($i, $codestring) }
open my $file, '>', '.\filestore.dat' or die $!;
store \ $codestring, $file;
close $file;
open $file, '<', '.\filestore.dat' or die " 2 $!";
my $stringref = retrieve $file; # my $con = get_connection()
close $file;
print &{ eval $$stringref } ; # &{eval $$con} ;
exit 0; # my $line = <STDIN>; ...
Does one of these two ways have to be preferred or is it only a matter of taste?
#!/usr/bin/env perl
use warnings;
use strict;
use DBI;
my $db = 'sqlite_db';
#################### A ####################
sub get_database_handle {
my ( $db ) = #_;
my $dbh;
eval {
$dbh = DBI->connect( "DBI:SQLite:$db", '', '', {...} )
or die DBI->errstr;
};
if ( $# ) {
print $#;
return;
}
return $dbh;
}
DATABASES: while ( 1 ) {
# choose a database from a list of databases
# ...
my $dbh = get_database_handle( $db );
next DATABASES if not defined $dbh;
# ...
# do something with the database
}
#################### B ####################
sub get_database_handle {
my ( $db ) = #_;
my $dbh = DBI->connect( "DBI:SQLite:$db", '', '', {...} )
or die DBI->errstr;
return $dbh;
}
DATABASES: while ( 1 ) {
# choose a database from a list of databases
# ...
my $dbh;
eval { $dbh = get_database_handle( $db ); };
if ( $# ) {
print $#;
next DATABASES;
}
# ...
# do something with the database
}
Why eval at all? What are you going to do when you can't get a database handle?
At the very least, the subroutine should either return a database handle or die, so no eval in there.
If you really have something productive to do when there is a database error, then eval outside of the subroutine. Not necessary around just the subroutine, could be even wider scope, as appropriate for your error handling logic.
But if all you want is terminate the program and print an error, just let the exception bubble up.
It really makes no sense to use RaiseError => 1 at all if your code is going to look like that. If you want to keep that code layout, use RaiseError => 0 instead. (You can always turn it on later using $dbh->{RaiseError} = 1;`.)
sub get_database_handle {
my ( $db ) = #_;
return DBI->connect("DBI:SQLite:$db", '', '', {
...
RaiseError => 0,
PrintError => 1,
});
}
for my $db ( ... ) {
my $dbh = get_database_handle( $db )
or next;
...
}
That said, I suggest that you continue using RaiseError => 1, but that you change your code layout instead. Specifically, you should widen the scope of the eval.
sub get_database_handle {
my ( $db ) = #_;
return DBI->connect("DBI:SQLite:$db", '', '', {
...
RaiseError => 1,
PrintError => 0,
});
}
for my $db ( ... ) {
if (!eval {
my $dbh = get_database_handle( $db );
...
1 # No exception
}) {
warn("Error processing database $db: $#");
}
}
This will catch any errors, not just database connection errors.
It depends on the preferred way of handling errors in the rest of your project.
If you plan to use exceptions, let your function throw them.
If you are going to handle errors manually via conditionals, don't throw (eval inside).
I myself prefer exceptions. They are loud (you know something is broken!), plus stack traces via Carp::confess/Carp::longmess and $SIG{__DIE__} are a nice bonus in case of a large codebase.
Here's an (un)success story. A month or two ago we rolled out broken code that silently corrupted data due to unchecked return value from DB-handling function. It was in production for a day. Resurrecting the data was a world of pain.