Perl running simultaneous routines - perl

I am running trying to run two sub routines at once in perl. What is the best way I can about doing that? For example:
sub 1{
print "im running";
}
sub 2{
print "o hey im running too";
}
How can I execute both routines at once?

Use threads.
use strict;
use warnings;
use threads;
sub first {
my $counter = shift;
print "I'm running\n" while $counter--;
return;
}
sub second {
my $counter = shift;
print "And I'm running too!\n" while $counter--;
return;
}
my $firstThread = threads->create(\&first,15); # Prints "I'm running" 15 times
my $secondThread = threads->create(\&second,15); # Prints "And I'm running too!"
# ... 15 times also
$_->join() foreach ( $firstThread, $secondThread ); # Cleans up thread upon exit
What you should pay attention to is how the printing is interleaved irregularly. Don't try to base any calculations on the false premise that the execution order is well-behaved.
Perl threads can intercommunicate using:
shared variables (use threads::shared;)
queues (use Thread::Queue;)
semaphores (use Thread::Semaphore;)
See perlthrtut for more information and an excellent tutorial.

I actually didn't realise that Perl can do this, but what you need is multithreading support:
http://search.cpan.org/perldoc?threads
Either that, or fork two processes, but that would be a bit harder to isolate the invocation of the subroutine.

Related

Trying to use fork to do a seemingly simple task, but failing miserably

So, basically I have a very large array that I need to read data from. I want to be able to do this in parallel; however, when I tried, I failed miserably. For the sake of simplicity, let's say I have an array with 100 elements in it. My idea was to partition the array into 10 equals parts and try to read them in parallel (10 is arbitrary, but I don't know how many processes I could run at once and 10 seemed low enough). I need to return a computation (new data structure) based off of my readings from each partition, but I am NOT modifying anything in the original array.
Instead of trying the above exactly, I tried something simpler, but I did it incorrectly, because it didn't work in any capacity. So, then I tried to simply use child processes to push to a an array. The code below is using Time::HiRes to see how much faster I can get this to run using forking as opposed to not, but I'm not at that point yet (I'm going to be testing that when I have closer to a few million entries in my array):
use strict;
use warnings;
use Time::HiRes;
print "Starting main program\n";
my %child;
my #array=();
my $counter=0;
my $start = Time::HiRes::time();
for (my $count = 1; $count <= 10; $count++)
{
my $pid = fork();
if ($pid)
{
$child{$pid}++;
}
elsif ($pid == 0)
{
addToArray(\$counter,\#array);
exit 0;
}
else
{
die "couldnt fork: $!\n";
}
}
while (keys %child)
{
my $pid = waitpid(-1,0);
delete $child{$pid};
}
my $stop = Time::HiRes::time();
my $duration = $stop-$start;
print "Time spent: $duration\n";
print "Size of array: ".scalar(#array)."\n";
print "End of main program\n";
sub addToArray
{
my $start=shift;
my $count=${$start};
${$start}+=10;
my $array=shift;
for (my $i=$count; $i<$count +10; $i++)
{
push #{$array}, $i;
}
print scalar(#{$array})."\n";
}
NB: I used push in lieu of ${$array}[$i]=$i, because I realized that my $counter wasn't actually updating, so that would never work with this code.
I assume that this doesn't work because the children are all copies of the original program and I'm never actually adding anything to the array in my "original program". On that note, I'm very stuck. Again, the actual problem that I'm actually trying to solve is how to partition my array (with data in it) and try to read them in parallel and return a computation based off of my readings (NOTE: I'm not going to modify the original array), but I'm never going to be able to do that if I can't figure out how to actually get my $counter to update. I'd also like to know how to get the code above to do what I want it to do, but that's a secondary goal.
Once I can get my counter to update correctly, is there any chance that another process would start before it updates and I wouldn't actually be reading in the entire array? If so, how do I account for this?
Please, any help would be much appreciated. I'm very frustrated/stuck. I hope there is an easy fix. Thanks in advance.
EDIT: I attempted to use Parallel::ForkManager, but to no avail:
#!/usr/local/roadm/bin/perl
use strict;
use warnings;
use Time::HiRes;
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new(10);
for (my $count = 1; $count <= 10; $count++)
{
my $pid = $pm->start and next;
sub1(\$counter,\#array);
$pm->finish; # Terminates the child process
}
$pm->wait_all_children;
I didn't include the other extraneous stuff, see above for missing code/sub... Again, help would be much appreciated. I'm very new to this and kind of need someone to hold my hand. I also tried to do something with run_on_start and run_on_finish, but they didn't work either.
Your code has two issues: Your child processes share no data, and you would have a race condition if forked processes would share data. The solution is to use threads. Any possibility for race conditions can be eliminated by partitioning the data in the parent thread, and of course, by not using shared data.
Threads
Threads in Perl behave similar to forking: by default, there is no shared memory. This makes using threads quite easy. However, each thread runs it own perl interpreter, which makes threads quite costly. Use sparingly.
First, we have to activate threading support via use threads. To start a thread, we do threads->create(\&code, #args), which returns a thread object. The code will then run in a separate thread, and will be invoked with the given arguments. After the thread has finished execution, we can collect the return value by calling $thread->join. Note: The context of the threaded code is determined by the create method, not by join.
We could mark variables with the :shared attribute. Your $counter and #array would be examples for this, but it is generally better to pass explicit copies of data around than to use shared state (disclaimer: from a theoretical standpoint, that is). To avoid race conditions with the shared data, you'd actually have to protect your $counter with a semaphore, but again, there is no need for shared state.
Here is a toy program showing how you could use threads to parallelize a calculation:
use strict;
use warnings;
use threads;
use 5.010; # for `say`, and sane threads
use Test::More;
# This program calculates differences between elements of an array
my #threads;
my #array = (1, 4, 3, 5, 5, 10, 7, 8);
my #delta = ( 3, -1, 2, 0, 5, -3, 1 );
my $number_of_threads = 3;
my #partitions = partition( $#array, $number_of_threads );
say "partitions: #partitions";
for (my $lower_bound = 0; #partitions; $lower_bound += shift #partitions) {
my $upper_bound = $lower_bound + $partitions[0];
say "spawning thread with [#array[$lower_bound .. $upper_bound]]";
# pass copies of the values in the array slice to new thread:
push #threads, threads->create(\&differences, #array[$lower_bound .. $upper_bound]);
# note that threads->create was called in list context
}
my #received;
push #received, $_->join for #threads; # will block until all are finished
is_deeply \#received, \#delta;
done_testing;
# calculates the differences. This doesn't need shared memory.
# note that #array could have been safely accessed, as it is never written to
# If I had written to a (unshared) variable, these changes would have been thread-local
sub differences {
say "Hi from a worker thread, I have ", 0+#_, " elements to work on";
return map $_[$_] - $_[$_-1], 1 .. $#_;
# or more readable:
# my #d;
# for my $i (1 .. $#_) {
# push #d, $_[$i] - $_[$i-1];
# }
# return #d;
}
# divide workload into somewhat fair parts, giving earlier threads more work
sub partition {
my ($total, $parts) = #_;
my $base_size = int($total / $parts);
my #partitions = ($base_size) x $parts;
$partitions[$_-1]++ for 1 .. $total - $base_size*$parts;
return #partitions;
}
A note on the number of threads: This should depend on the number of processors of your system. If you have four cores, more than four threads don't make much sense.
If you're going to use child processes after forking, each child process is autonomous and has its own copy of the data in the program as of the time it was forked from the main program. The changes made by the child in its own memory have no effect on the parent's memory. If you need that, either you need a threading Perl and to use threads, or you need to think again — maybe using shared memory, but locating Perl data into the shared memory might be tricky.
So, one option is to read all the data into memory before forking off and having the children work on their own copies of the data.
Depending on the structure of the problem, another possibility might be to have each child read and work on a portion of the data. This won't work if each child must have access to all the data.
It isn't clear how much speed up you'll get through threading or forking if the threads or processes are all tied up reading the same file. Getting the data into memory may be best treated as a single-threaded (single-tasking) operation; the parallelism can spring into effect — and yield benefits — once the data is in memory.
There are some CPAN modules that makes your life easier. One of them is Parallel::ForkManager, which is a simple parallel processing fork manager
So, after my struggle, here's the fix:
EDIT: THIS DOES NOT ACCOMPLISH WHAT I WANTED TO DO
#!/usr/local/roadm/bin/perl
use strict;
use warnings;
use Time::HiRes;
use Parallel::ForkManager;
print "Starting main program\n";
my #array=();
my $counter=0;
my $start = Time::HiRes::time();
my $max_processes=20;
my $partition=10;
my $max_elements=100;
my $pm = Parallel::ForkManager->new($max_processes);
$pm->run_on_start( sub {
my ($pid, $exit_code, $ident) = #_;
sub1(\$counter,\#array);
});
while ($counter < $max_elements)
{
my $pid = $pm->start and next;
$pm->finish; # Terminates the child process
}
$pm->wait_all_children;
my $stop = Time::HiRes::time();
my $duration = $stop-$start;
print "Time spent: $duration\n";
print "Size of array: ".scalar(#array)."\n";
print "\nEnd of main program\n";
sub sub1 {
my $start=shift;
my $count=${$start};
${$start}+=$partition;
my $array=shift;
for (my $i=$count; $i<$count + $partition; $i++)
{
push #{$array}, $i;
}
return #{$array};
}

Problems with joining threads

I've got some issue with a part of my perl script, bothering me for days now. To summarize the purpose is to read in a large file in chunks and do some operation on the input stream (not relevant for my question). When I first implemented it, I just looped over the file and then did some stuff on it, like this:
while (read FILE, $buffer, $chunksize){
callSomeOperation($buffer);
# Do some other stuff
}
Unfortunately the file is really big and the operation somehow complex with many function calls, therefore this led to steadily increasing Memory perl couldn't allocate memory anymore and the script failed. So I did some investigation and tried several things to minimize the memory overhead (defined variables outside the loop, set to undef and so on), which led the allocated memory size increasing slower, but at the end still failed. (And if I figured out right, perl giving back memory to the OS is sth. that won't happen in practice.)
So I decided to nest the function call and all its definition in a subthread, wait for its finish, join and then call the thread again with the next chunk:
while (read FILE, $buffer, $chunksize){
my $thr = threads->create(\&thrWorker,$buffer);
$thr->join();
}
sub thrWorker{
# Do the stuff here!
}
Which might have been a solution, if the thread would join! But it actually does not. If I run it with $thr->detach(); everything works fine, besides I get hundrets of threads at the same time, which is not a good idea, and in this case, I need to run them consecutively.
So I took some Investigation on this join issue and got some voices that ther might be an issue with perl 5.16.1 so I updated to 5.16.2 but it still never joins. Anywhere in a Mailing list I cant remember I read from somebody managed to get Threads to join with CPAN module Thread::Queue but this didn't worked for me either.
So I gave up with threads and tried to fork this thing. But with fork it seems like the total number of "forks" is limited? Anyway it went fine till the 13th to 20th iteration and then gave up with the message it couldn't fork anymore.
my $pid = fork();
if( $pid == 0 ){
thrWorker($buffer);
exit 0;
}
I also tried it with CPAN modules Parallel::ForkManager and Proc::Fork but that didn't help.
So now I'm somehow stuck and cant help myself out. Maybe somebody else can! Any suggestions greatly appreciated!
How can I get this thing to work with threads or child processes?
Or at least how can I force perl freeing memory so I can do this in the same process?
Some additional information on my system:
OS: Windows 7 64bit / Ubuntu Server 12.10
Perl on Windows: Strawberry Perl 5.16.2 64bit
One of my first posts on Stackoverflow. Hope I did it right :-)
I recommend reading: this
I usually use Thread::Queue to manage the input of thread.
Sample code:
my #threads = {};
my $Q = new Thread::Queue;
# Start the threads
for (my $i=0; $i<NUM_THREADS; $i++) {
$threads[$i] =
threads->new(\&insert_1_thread, $Q);
}
# Get the list of sites and put in the work queue
foreach $row ( #{$ref} ) {
$Q->enqueue( $row->[0] );
#sleep 1 while $Q->pending > 100;
} # foreach $row
# Signal we are done
for (my $i=0; $i<NUM_THREADS; $i++) {
$Q->enqueue( undef ); }
$count = 0;
# Now wait for the threads to complete before going on to the next step
for (my $i=0; $i<NUM_THREADS; $i++) {
$count += $threads[$i]->join(); }
And for the worker thread:
sub insert_1_thread {
my ( $Q ) = #_;
my $tid = threads->tid;
my $count = 0;
Log("Started thread #$tid");
while( my $row = $Q->dequeue ) {
PROCESS ME...
$count++;
} # while
Log("Thread#$tid, done");
return $count;
} # sub insert_1_thread
I don't know if it is a solution for you, but you could create an array of chunk objects and process them in parallel like this:
#!/usr/bin/perl
package Object; {
use threads;
use threads::shared;
sub new(){
my $class=shift;
share(my %this);
return(bless(\%this,$class));
}
sub set {
my ($this,$value)=#_;
lock($this);
# $this->{"data"}=shared_clone($value);
$this->{"data"}=$value;
}
sub get {
my $this=shift;
return $this->{"data"};
}
}
package main; {
use strict;
use warnings;
use threads;
use threads::shared;
my #objs;
foreach (0..2){
my $o = Object->new();
$o->set($_);
push #objs, $o;
}
threads->create(\&run,(\#objs))->join();
sub run {
my ($obj) = #_;
$$obj[$_]->get() foreach(0..2);
}
}

Making an IRC bot - how can I let people !eval perl/javascript code?

I'm working on a bot in Perl (based on POE) and so far so good, but I can't figure out how can I add a !js or !perl command to evaluate respective code and return one line of output to be printed into the channel. I found App::EvalServer but I don't get how to use it.
Thanks for any help!
The App::EvalServer module comes with a binary to run as a standalone application. You do not put it in your program but rather run it on it's own. It opens a port where you can hand it code as a json string. This does not sound like a good idea to me either.
There is another module you might want to look at called Safe. I suggest you read through the complete documentation as well as the one to Opcode (linked in the doc) before you do anything with this. YOU CAN DO SERIOUS DAMAGE IF YOU EVALUATE ARBITRARY CODE! Never forget that.
UPDATE:
Here's an example of how to capture the output of print or say from your evaled code. You can use open with a variable to make printed output always go to that variable. If you switch back afterwards you can work with the captured output in your var. This is called an in-memory file.
use strict; use warnings;
use feature 'say';
use Safe;
# Put our STDOUT into a variable
my $printBuffer;
open(my $buffer, '>', \$printBuffer);
# Everything we say and print will go into $printBuffer until we change it back
my $stdout = select($buffer);
# Create a new Safe
my $compartment = new Safe;
$compartment->permit(qw(print)); # for testing
# This is where the external code comes in:
my $external_code = qq~print "Hello World!\n"~;
# Execute the code
my $ret = $compartment->reval($external_code, 1);
# Go back to STDOUT
select($stdout);
printf "The return value of the reval is: %d\n", $ret;
say "The reval's output is:";
say $printBuffer;
# Now you can do whatever you want with your output
$printBuffer =~ s/World/Earth/;
say "After I change it:";
say $printBuffer;
Disclaimer: Use this code at your own risk!
Update 2: After a lengthy discussion in chat, here's what we came up with. It implements a kind of timeout to stop the execution if the reval is taking to long, e.g. because of an infinite loop.
#!/usr/bin/perl
use warnings;
use strict;
use Safe;
use Benchmark qw(:hireswallclock);
my ($t0, $t1); # Benchmark
my $timedOut = 0;
my $userError = 0;
my $printBuffer;
open (my $buffer, '>', \$printBuffer);
my $stdout = select($buffer);
my $cpmt = new Safe;
$cpmt->permit_only(qw(:default :base_io sleep));
eval
{
local $SIG{'ALRM'} = sub { $timedOut = 1; die "alarm\n"};
$t0 = Benchmark->new;
alarm 2;
$cpmt->reval('print "bla\n"; die "In the user-code!";');
# $cpmt->reval('print "bla\n"; sleep 50;');
alarm 0;
$t1 = Benchmark->new;
if ($#)
{
$userError = "The user-code died! $#\n";
}
};
select($stdout);
if ($timedOut)
{
print "Timeout!\n";
my $td = timediff($t1, $t0);
print timestr($td), "\n";
print $printBuffer;
}
else
{
print "There was no timeout...\n";
if ($userError)
{
print "There was an error with your code!\n";
print $userError;
print "But here's your output anyway:\n";
print $printBuffer;
}
else
{
print $printBuffer;
}
}
Take a look at perl eval(), you can pass it variables/strings and it will evaluate it as if it's perl code. Likewise in javascript, there's also an eval() function that performs similarly.
However, DO NOT EVALUATE ARBITRARY CODE in either perl or javascript unless you can run it in a completely closed environment (and even then, it's still a bad idea). Lot's of people spend lots of time preventing just this from happening. So that's how you'd do it, but you don't want to do it, really at all.

Perl, fork, semaphores, processes

I need to create a program that would run 3 processes at the same time in random sequence from a list and lock those processes with semaphore one by one so to avoid duplicates.
For example, you have a list of 3 programs:
#array = ( 1,2,3);
perl script.pl runs 2 at first;
By random tries to run 2 again and receives an error (because 2 is now locked with semaphore).
Runs 1.
Runs 3.
script.pl waits all of 1,2,3 to end work and then exit itself.
Here's my code so far:
#!/usr/bin/perl -w
use IPC::SysV qw(IPC_PRIVATE S_IRUSR S_IWUSR IPC_CREAT);
use IPC::Semaphore;
use Carp ();
print "Program started\n";
sub sem {
#semaphore lock code here
}
sub chooseProgram{
#initialise;
my $program1 = "./program1.pl";
my $program2 = "./program2.pl";
my $program3 = "./program3.pl";
my $ls = "ls";
my #programs = ( $ls, $program1, $program2, $program3 );
my $random = $programs[int rand($#programs+1)];
print $random."\n";
return $random;
}
#parent should fork child;
#child should run random processes;
#avoid process clones with semaphore;
sub main{
my $pid = fork();
if ($pid){
#parent here
}
elsif (defined($pid)){
#child here
print "$$ Child started:\n";
#simple cycle to launch and lock programs
for (my $i = 0; $i<10; $i++){
# semLock(system(chooseProgram()); #run in new terminal window
# so launched programs are locked and cannot be launched again
}
}
else {
die("Cannot fork: $!\n");
}
waitpid($pid, 0);
my $status = $?;
#print $status."\n";
}
main();
exit 0;
Problems:
Need to lock file; (I don't know how to work with semaphore. Failed some attempts to lock files so excluded that code.)
Child waits until first program ends before second start. How can I start three of programs at the same time with one child? (Is it possible or should I create one child for one program?).
Programs are non-gui and should run in terminal. How to run a program in new terminal window(tab)?
No correct check if all programs of #programs were launched yet. -- less important.
Your randomness requirement is very strange, but if I understood your requirements correctly, you don't need any sort of locking to do what you want. (So 1) in your question is gone)
Start by shuffling the program array, then start each command of that shuffled array (this deals with your 4)). Then only waitpid after you've started everything (which deals with your 2)).
The code below does that, starting various sleep instances in new terminals (I use urxvt, adapt depending on what terminal you want to spawn - this deals with your 3)).
#! /usr/bin/perl -w
use strict;
use warnings;
my #progs = ("urxvt -e sleep 5", "urxvt -e sleep 2", "urxvt -e sleep 1");
my #sgrop;
my #pids;
# Shuffle the programs
while (my $cnt = scalar(#progs)) {
push #sgrop, splice #progs, int(rand($cnt)), 1;
}
# Start the progs
foreach my $prog (#sgrop) {
my $pid = fork();
if (!$pid) {
exec($prog);
# exec does not return
} else {
print "Started '$prog' with pid $pid\n";
push #pids, $pid;
}
}
# Wait for them
map {
waitpid($_, 0);
print "$_ done!\n";
} (#pids);
Not sure the shuffling is the best out there, but it works. The idea behind it is just to pick one element at random from the initial (sorted) list, remove it from the there and add it to the shuffled one. Repeat until the initial list is empty.
If you're trying to lock the programs system wide (i.e. no other process in your system should be able to start them), then I'm sorry but that's not possible unless the programs protect themselves from concurrent execution.
If your question was about semaphores, then I'm sorry I missed your point. The IPC documentation has sample code for that. I don't really think it's necessary to go to that complexity for what you're trying to do though.
Here's how you could go about it using the IPC::Semaphore module for convenience.
At the start of your main, create a semaphore set with as many semaphores as required:
use IPC::SysV qw(S_IRUSR S_IWUSR IPC_CREAT IPC_NOWAIT);
use IPC::Semaphore;
my $numprocs = scalar(#progs);
my $sem = IPC::Semaphore->new(1234, # this random number is the semaphore key. Use something else
$numprocs, # number of semaphores you want under that key
S_IRUSR | S_IWUSR | IPC_CREAT);
Check for errors, then initialize all the semaphores to 1.
$sem->setall( (1) x $numprocs) || die "can't set sems $!";
In the code that starts your processes, before you start (after the fork though), try to grab the semaphore:
if ($sem->op($proc_number, -1, IPC_NOWAIT)) {
# here, you got the semaphore - so nothing else is running this program
# run the code
# and once the code is done:
$sem->op($proc_number, 1, 0); # release the semaphore
exit(0);
} else {
# someone else is running this program already
exit(1); # or something
}
In the above, $proc_number must be unique for each program (could be it's index in your programs array for instance). Don't use exec to start the program. Use system instead for example.
Note that you will have to deal with the exit code of the child process in this case. If the exit code is zero, you can mark that program as having run. If not, you need to retry. (This is going to get messy, you'll need to track which program was run or not. I'd suggest a hash with the program number ($proc_number) where you'd store whether it already completed or not, and the current pid running (or trying to run) that code. You can use that hash to figure out what program still needs to be executed.)
Finally after all is done and you've waited for all the children, you should clean up after yourself:
$sem->remove;
This code lacks proper error checking, will work strangely (i.e. not well at all) if the cleanup was not done correctly (i.e. semaphores are already laying around when the code starts). But it should get you started.

How do I make a time counter as child process that works in parallel to program?

How can I make a live time counter that ticks on the screen in parallel to part of a working program?
let's say I have the following little sub code which run an internal program for a couple of minutes:
system (`compile command`);
exec "simu -sh";
While waiting it to end, can I open a fork or something that output to stdout a running clock time?
Another question may be, how can I output to screen the ALARM counter without hurt the rest of the script?
It is important to give context to your question. You already have two processes: a parent and a child. The child is replacing itself with the exec, so you can't use the child to do any form of monitoring, but the parent is available. We just need to make the waitpid call non-blocking (i.e. it won't wait to be successful, it will fail right away). This also gets rid of the need for the eval and alarm functions:
#!/usr/bin/perl
use strict;
use warnings;
use POSIX ":sys_wait_h";
my $timeout = 180;
my $program = "simulator --shell";
die "could not fork: $!" unless defined (my $pid = fork);
#this is the child process
unless ($pid) {
exec $program;
#if we reach this code the exec failed
die "exec of simulator failed: $!";
}
#this is the parent process
my $tries = 0;
#check to see if $pid is done, but don't block if it isn't
until (waitpid(-1, WNOHANG) == $pid) {
#put what you want to print while waiting here:
print scalar localtime, "\n";
if ($tries++ > $timeout) {
warn "timed out, sending SIGKILL to simulator\n";
kill 9, $pid;
waitpid($pid, 0);
last;
}
} continue {
sleep 1;
}
How about spawning it as a thread and then waiting for a value to be set (assuming you have a thread enabled perl):
# Modules to be used
use strict;
use warnings;
# Threads module
use Thread;
# Share out the variable so it can be set and
# view by main thread and spawned thread
my $value:shared = 0; # value to be set when completed
# Create a thread with a subroutine to compile and set the passed in reference
# to 1 when complete. Pass in the reference to value
my $t = Thread->new(sub {`compile command`; ${$_[0]} = 1;}, \$value);
# Counter to count
my $count = 0;
# Loop until the routine set the value
while ( $value == 0 )
{
# Increment the count and print it out.
$count++;
print "$count\n";
# Sleep for second to let the other thread process
sleep 1;
}
# Thread as completed so join back together
$t->join();
# Indicate items have completed.
print "Done $count\n";
I ran the example above in ActiveState PERL 5.10 on Windows XP.
This will give some indication in seconds of how long it took to
do the command. Hopefully you are not looking for more then a second of granularity. You could substitute localtime() for the counter if you wanted the actual time.
I am not locking the reference as I am only concerned when it is set, which is at the end of the routine, to it will complete and join back up.
For more information on perl threads.
Or look at Perlmonks.