How to run a background process with mod perl

How to run a background process with mod perl - perl

I am using perl to return data sets in XML. Now I have come across a situation where I need to run some clean up after sending a dataset to the client. But some where, in the chain of mod perl and Apache, the output gets held onto until my method returns.
I have attempted to clear the buffers with commands like.
$| =1;
STDOUT->flush(); # flush the buffer so the content is sent to the client and the finish hook can carry on, with out delaying the return.
if ($mod_perl_io){
$mod_perl_io->rflush;
}
Yet I still get no output until my method returns. I then found out that my browser my be waiting for the connection to close and found that setting the content type in the header should fix this.
rint $cgi->header(-type => "text/plain; charset=UTF-8", -cookie => $config->{'cookie'});
Still no luck, in fact I had always been sending the correct headers.
So I though the best option is to simply start a new thread and let my method return. But when I create a new thread.
use threads ('yield',
'stack_size' => 64*4096,
'exit' => 'threads_only',
'stringify');
my $thr = threads->create('doRebuild', $dbconnect, $dbusername, $dbpassword, $bindir);
sub doRebuild {
my ($dbconnect, $dbusername, $dbpassword, $bindir ) = #_;
1;
}
I get a segfault
[Fri Feb 22 10:16:47 2013] [notice] child pid 26076 exit signal Segmentation fault (11)
From what I have read this is done by mod perl to ensure thread safe operation. Not sure if this is correct.
So I thought I'd try using {exe }
{exec 'perl', "$bindir/rebuild_needed_values.pl", qw('$dbconnect' '$dbusername' '$dbpassword');}
From what I gather this is taking over the process from mod perl and not letting it return anything.
I know this isn't as specific as a question on stack overflow should be, but this sort of thing must be a common problem how have others solved it?

You could use fork(), however I like to recommend http://gearman.org/ for background processing.
A solution like Gearman is much better, because your background process is not in Apache's process chain.
Your process will survive an Apache restart if implemented using gearman. It is also more secure, as the Gearman environment can be run in a chroot jail.
A nice side effect of using Gearman is that your background process becomes callable from other machines and even other languages.
Gearman makes it easy to collect the data from your process at a later time as well, and you can feed back progress information to your web app rather easily.

Related

Tcl/Tk - How to keep other buttons useable while separate function still running?

I'm very new to Tcl/Tk and have been dealing with an issue for the last couple of days. Basically I have a server written in C and a client GUI written in Tcl/Tk. So far it doesn't do a ton. To test it, I start up the server so that it's listening for connections, then run my GUI. When I click one of the buttons, the GUI should open up a separate toplevel window with a text widget embedded in it. (This part works.) Then, my client connects to the server and gives it a couple of settings, and through this the server decides what info to send back. The server's response is what gets printed to that second window's text widget.
What I'm trying to add in now is a Stop button. Right now, my server is set up to wait a couple of seconds, then write the same message to the client. This is set up inside a loop that is waiting to hear a "Stop" command from my client. I have a Stop button in the GUI with a command set up to write that command to the server when clicked. However, all of my buttons get frozen as soon as I hit the begin button and messages are written to the client.
Basically, how can I keep allowing my server to write to my client while still keeping the rest of my GUI usable? I want my client to write a new line to the text widget on my separate window whenever it receives a new message from the server, but I still want the main GUI window that has all my command buttons to behave independently.

In general, it depends on whether what you are doing is CPU-intensive (where reading from a plain file counts as CPU-intensive) or I/O-intensive (where running things in another process counts as I/O-intensive; database calls often count as CPU-intensive here despite not really needing to). I'm only going to mention summaries of what's going on as you aren't quite providing enough information.
For I/O-intensive code, you want to structure your code to be event-driven. Tcl has good tools for this, in that fileevent works nicely on sockets, terminals and pipelines on all supported platforms. The coroutine system of Tcl 8.6 can help a lot with preventing the callbacks required from turning your code into a tangled mess!
For CPU-intensive code, the main option is to run in another thread. That thread won't be able to touch the GUI directly (which in turn will be free to be responsive), but will be able to do all the work and send messages back to the main thread with whatever UI updates it wants done. (Technically you can do this with I/O-intensive code too, but it's more irritating than using a coroutine.) Farming things out to a subprocess is just another variation on this where the communications are more expensive (but much isolation is enforced by the OS).
If you're dealing with sockets, you're probably I/O-intensive. Assume that until you show otherwise. Here's a simple example:
proc gets_async {sock} {
set sock [lindex $args end]
fileevent $sock readable [info coroutine]
while {[gets $sock data] < 0 && [fblocked $sock]} {
yield
}
fileevent $sock readable {}
return $data
}
proc handler {socket} {
set n 0
while {![eof $socket]} {
# Write to the server
puts $socket "this is message [incr n] to the server"
# Read from the server
puts [gets_async $socket]
}
close $socket
}
proc launchCommunications {host port} {
set sock [socket $host $port]
fconfigure $sock -blocking 0 -encoding utf-8
coroutine comms($host:$port) handler $socket
}
Note that gets_async is much like coroutine::util gets in Tcllib.

What's the most reliable method for cross-platform alarm signal handling or execution timeouts in Perl?

I've added advisory locking to Sqitch, using Postgres advisory locks and MySQL GET_LOCK(). This feature prevents more than one instance of Sqitch from deploying to a database at one time. This works great, but I wanted to add a lock timeout, too, so that one never finds a CI/CD process hung for hours or days because something went amiss.
MySQL's GET_LOCK() supports a timeout argument, but Postgres advisory locks do not. Since I thought it likely that other database engines would also not have timeouts, I thought it best to implement the timeout in Perl. Following the DBI manual, I used Sys::SigAction to set and handle the timeout:
# Try waiting for the lock.
require App::Sqitch::SigAction;
return $self->_locked(1) unless App::Sqitch::SigAction::timeout_call($wait, sub {
$self->wait_lock
});
I also added tests to confirm it works with both MySQL and Postgres. So far so good.
Alas, Sys::SigAction does not work on Windows. I took a stab and testing it on Windows, but since Windows Perl is not compiled with d_sigaction, which Sys::SigAction also requires, I didn't get far. I tried implementing the Perl-standard alarm/$SIG{ALRM} pattern, but it failed to send the signal while waiting on the Postgres lock.
Which has led me here and to my question: What is the best cross-platform pattern for timing out some execution in Perl? Ideally it has a straight-forward interface, works on *nix and Windows, and effectively handles breaking out of a database query.

I ended up ditching Sys::SigAction following discussion here and elsewhere, and instead switched to:
Letting the database handle the timeout, as MySQL's get_lock() does
Adding a simple interface for polling with exponential backoff and timeout that engines can use to poll for a lock instead of waiting (similar to Retry::Backoff)
Switching the Postgres implementation to use the async query support in DBD::Pg to send off the lock request, and uses the backoff/timeout interface to check to see if it has returned and cancel the query if it times out
I was especially pleased to realize I could do #3, as I originally used the timeout/backoff interface to poll with pg_try_advisory_lock( key ), which just feels heavy. Better to asynchronously call pg_advisory_lock ( key ) and poll for its response. It looks like this:
sub wait_lock {
my $self = shift;
# Asyncronouslly request a lock with an indefinite wait.
my $dbh = $self->dbh;
$dbh->do(
'SELECT pg_advisory_lock(75474063)',
{ pg_async => DBD::Pg::PG_ASYNC() },
);
# Use _timeout to periodically check for the result.
return 1 if $self->_timeout(sub { $dbh->pg_ready && $dbh->pg_result });
# Timed out, cancel the query and return false.
$dbh->pg_cancel;
return 0;
}
Of course the MySQL implementation is simpler, since get_lock() does all the work:
sub wait_lock {
my $self = shift;
$self->dbh->selectcol_arrayref(
q{SELECT get_lock('sqitch working', ?)},
undef, $self->lock_timeout
)->[0]
}

PERL - Net::Websocket::Server with external periodic event

In my server program I need to have ability to iterate every 5 minutes through all opened connections and see which is really "active" or not.
I know that the best approach is to use "heart beat", but then, the server need to have somehow ability to check weather the connection is "off" in order to delete the associated "user parameters" that is attached to the connection.
My first approach was to use "Async" module, but this works in a separate process - so I cannot delete any element from the main process unless I use a technique to invoke a subroutine from the main process called from the child process (I don't know how, any help will be warmly welcomed).
Another possibility using Async is create a static client that is all the time on (also in the server) and sending "commands" to the server, but to me it looks "too exaggerating" to create such "wasting memory" client in the server, and also "eat" CPU time (I think much more than simple event like equivalent to "setTimeout" in JS).
Yet another approach is to use EV: But when I call EV::run it will NOT RUN anything ELSE than this "periodic event" - means that it will not reach the next line where the ->Start for the server is.
Placing it after the ->Start will make this event useless too: As the server works the program will not go behind the ->Start line.
Using EV::run EV::RUN_NOWAIT; will make the server work, but the EV will somehow not work, for a strange reason (Anyone know how can I still make it work?)
I prefer to not use Net::Websocket::EV, because as per their script, it doesn't do the handshake automatically, and many things (as well as SSL connection that I have) I will need to do manually and for me it is change a lot in my program.
PROBLEM SUMMARY:
How to make the code in EV run every 5 minutes, together with the server?
my %CON; # Connections (And user data) hash
my $__ConChk=EV::periodic 0, 300, 0, sub {
my #l=keys %CON;
for(my $i=0 ; $i<#l ; $i++) {
if($CON{$l[$i]}{"T"}+3600<time()) { # I give one hour time to be completely offline (for different reasons)
$CON{$l[$i]}{"C"}->disconnect(); delete $CON{$l[$i]};
}
}
};
EV::run EV::RUN_NOWAIT; # This is not working - Seems to be ignored!
Net::WebSocket::Server->new(
listen => $ssl, # Earlier preset
silence_max=>60, # Time to just shut the connection off, but don't delete user data
on_connect => sub {
my($serv,$conn)=#_;
my $cid; # Connection ID (for the hash)
$conn->on(
handshake => sub {
my($conn,$handshake)=#_;
# Create user data in $CON{$cid}
},
binary => sub {
$CON{$cid}{"T"}=time();
# Handling of single incomming message (command)
},
disconnect => sub {
# Do NOT DELETE THE ENTRY!! Maybe the connection drop due to instability!!
}
);
}
)->start; # This will run - but ignoring EV::run - what to do....?
undef $__ConChk;

Perl IPC - FIFO and daemons & CPU Usage

I have a mail parser perl script which is called every time a mail arrives for a user (using .qmail). It extracts a calendar attachment out of the mail and places the "path" of the file in a FIFO queue implemented using the Directory::Queue module.
Another perl script which reads the path of the calendar attachment and performs certain file operations on the local system as well as on the remote CalDAV server, is being run as a daemon, as explained here. So basically this script looks like:
my $declarations
sub foo {
.
.
}
sub bar {
.
.
}
while ($keep_running) {
for(keep-checking-the-queue-for-new-entries) {
sub caldav_logic1 {
.
.
}
sub caldav_logic2 {
.
.
}
}
}
I am using Proc::Daemon for running the script as a daemon. Now the problem is, this process has almost 100% CPU usage. What are the suggested ways to implement the daemon in a more standard, safer way ? I am using pretty much the same code as mentioned in the link mentioned for usage of Proc::Daemon.

I bet it is your for loop and checking for new queue entries.
There are ways to watch a directory for file changes. These ways are OS dependent but there might be a Perl module that wraps them up for you. Use that instead of busy looping. Even with a sleep delay, the looping is inefficient when you can have your program told exactly when to wake up by an OS event.
File::ChangeNotify looks promising.

Maybe you don't want truly continuous polling. Is keep-checking-the-queue-for-new-entries a CPU-intensive part of the code, even when the queue is empty? That would explain why your processor is always busy.
Try putting a sleep 1 statement at the very top (or very bottom) of the while loop to let the processor rest between queue checks. If that doesn't degrade the program performance too much (i.e., if everyone can tolerate waiting an extra second before the company calendars get updated) and if the CPU usage still seems high, try sleep 2, sleep 5, etc.

cpan Linux::Inotify2
The kernel knows when files change and sends this information to your program which runs the sub. Maybe this will be better because the program will run the sub only when the file is changed.

How do I call a perl process that is already running from another script?

Problem:
scriptA.cgi is sitting in an infinite loop and handling an open socket to a Flash client.
scriptB.cgi is called from the web, does what it needs to do and then needs to inform scriptA to send a message to the client.
Is this possible? I'm stuck on how to have scriptB identify the instance of scriptA that is sitting there with the socket connection, rather than launching one of its own.
all thoughts appreciated.

If the communication needs are simple, this is a good application for signals.
Edited to store process id from scriptA and read it in scriptB -- scripts A and B have to agree on a name.
# script B
do_scriptB_job();
if (open(my $PID_FILE, "<", "scriptA.pid.file")) {
$process_id_for_scriptA = <$PID_FILE>;
close $PID_FILE;
kill 'USR1', $process_id_for_scriptA; # makes scriptA run the SIGUSR1 handler
}
# script A
open(my $PID_FILE, ">", "scriptA.pid.file");
print $PID_FILE $$;
close $PID_FILE;
my $signaled = 0;
$SIG{"USR1"} = \sub { $signaled = 1 } # simple SIGUSR1 handler, set a variable
while ( in_infinite_loop ) {
if ($signaled) {
# this block runs only if SIGUSR1 was received
# since last time this block was run
send_a_message_to_the_client();
$signaled = 0;
} else {
do_something_else();
}
}
unlink "scriptA.pid.file"; # cleanup
When script A receives a SIGUSR1 signal, the script will be interrupted to run the USR1 signal handler, setting $signaled. The thread of execution will then resume and the script can use the information.

Have scriptA store it's pid somwhere (in a db with some kind of id), then scriptB can look up the pid in the db and send a signal to scriptA.
Edit:
Answering question asked in comment
The pid of the process can be found using perls builtin variables $$ or $PID or $PROCESS_ID depending on how old your perl is.
See perlvar for details.
I hope this is the ID you where looking for. If not you'll have to find a way to separate the different scriptA instances. (Perhaps by session id, or socket. Here I cant help you further)

Other people have mentioned how to get the PID (if you didn't fork() it yourself, just have the other-process write it... somewhere... that both processes know how to get it. or walk the process table, but that's a horrible solution and completely unscalable beyond a singleton).
Since you note that any thoughts are welcome, note that perldoc perlipc explains a variety of mechanisms you might use for the actual communication:
NAME
perlipc - Perl interprocess communication (signals, fifos, pipes, safe
subprocesses, sockets, and semaphores)
DESCRIPTION
The basic IPC facilities of Perl are built out of the good old Unix
signals, named pipes, pipe opens, the Berkeley socket routines, and SysV
IPC calls. Each is used in slightly different situations.

Domain sockets: http://www.perl.com/doc/FMTEYEWTK/IPC/unix.html
?

I was tempted to answer, 'send signals' or 'use some kind of IPC to talk between apps' but, a far easier and scalable approach is to use a sqlite (or other) database that all scripts can talk to,
ScriptA.cgi would poll the database by doing something like 'SELECT event FROM events WHERE clientID=?'.
ScriptB.cgi would simply insert a row into the events table with the right clientID.
That avoids all of the 'find the pid' mess and also mean that you don't get the blocking IO problems you would get with named pipes or if one script crashed.