Perl script running a periodic (main) task and providing a REST interface - rest

I am working on a Perl script which does some periodic processing based on file-system contents.
The overall structure is like this:
# ... initialization...
while(1) {
# ... scan filesystem, perform actions depending on changes detected ...
sleep 5;
}
I would like to add the ability to input some data into this process by means of exposing an interface through HTTP. E.g. I would like to add an endpoint to skip the sleep, but also some means to input data that is processed in the next iteration. Additionally, I would like to be able to query some of the program's status through HTTP (i.e. a simple fork() to run the webserver-part in a separate process is insufficient?)
So far I have already used the Dancer2 framework once but it has a start; call that blocks and thus does not allow any other tasks (like my loop) to run. Additionally, I could of course move the code which is currently inside the loop to an endpoint exposed through Dancer2 but then I would need to call that periodically (though an external program?) which seems to be quite an obscure indirection compared to just having the webserver-part running in background.
Is it possible to unobtrusively (i.e. without blocking the program) add a REST-server capability to a Perl script? If yes: Which modules would be used for the purpose? If no: Should I really implement an external process to periodically invoke a certain endpoint or pursue a different solution altogether?
(I have tried to add a dancer2 tag, but could not do so due to insufficient reputation. Do not be mislead by this: I have so far only tried with Dancer2 not the Dancer (v.1))

You could try to launch your processing loop in a background thread, before you run start;.
See man perlthrtut
You probably want use threads::shared; to declare some variables shared between the REST part and the background thread. Or use dedicated queues/event mechanisms.

Related

FastCGI or PSGI Interface to NGINX in 2021

This question I asked has resulted in me exploring directly interfacing my FastCGI script to NGINX, rather than using a reverse proxy to Apache. I successfully modified my FastCGI script to run as a daemon using some code I found online:
my $s = FCGI::OpenSocket(':9000',20);
my $request = FCGI::Request( \*STDIN, \*STDOUT, \*STDERR, \%ENV, $s);
# Remaining code stays just as it does when using with Apache's mod_fcgid
while($request->Accept() >= 0) {
# Call core app subroutines.
}
It works, but near as I can tell this has a distinct disadvantage over mod_fcgid: I have one process running which will handle one request at a time and if that process dies, there's nothing to start it back up. There are references on Stack Overflow to code that properly spun off workers, but the sites referenced inevitably seem to have gone offline, much like FastCGI's own site.
So, I'm trying to figure out what I need to add and also -- pardon the pun -- figure out if I need to take a fork in this road. Here are the options that I am trying to consider, if I understand my issues correctly:
Directly implement some sort of forking mechanism, ideally it seems like it should (1) toss off the request to a process/thread/worker -- perhaps one that can stay alive for multiple requests -- and move on to being ready for the next request and (2) be independent enough from the workers that if something goes wrong with a worker, it doesn't bring down the whole system until I catch it and restart the main process (e.g. autorestart processes). If this can be done simply and reliably, this seems to have a huge appeal since the code already works with FastCGI.
Give up on direct FastCGI and convert to PSGI and use an application server to handle these things. Given that I'm using Perl, I'd guess Starman is the logical option, although I've been reading on uwsgi's PSGI support and it sounds almost ideal in "tyrant Emperor" mode, where it could run processes with different privileges, auto restart missing processes, etc.
Option 1 seems intriguing since it requires the least modification to my existing code and a FastCGI script started up without FastCGI still works like a normal CGI script. (I'm not running this code under FastCGI when it is used by sites that are very low traffic).
Option 2, though, feels like it might be more "modern." At least PSGI documentation seems to still be online, for example, and using Starman or uwsgi seem like they take care of the background stuff I need probably better than I would cooking up my own system. Downside: I'd need two startup scripts for my code: one to be used by the PSGI enabled sites and one for sites still running in CGI.
Update: Continuing to explore option 1, I read through this tutorial on Perl fork() which seems somewhat relevant. Would using fork to break off each FastCGI request be a good approach if I go with option 1? I assume I'd be at risk of fork bombing, although if I kept track of the number of forks and issued wait() if ($forks > 10); perhaps that would be a safe approach? (Or perhaps using Parallel::ForkManager to do that process watching.) Or would it be safer and/or more efficient using something like Thread::Queue and passing FastCGI request objects to a set a threads that are reliably already established? There seem to be plenty of pitfalls I might overlook, which then returns me to whether I should opt for Option 2.

Sharing variables\data between Powershell processes

I would like to come up with a mechanism by which I can share 'data' between different Powershell processes. This would be in order to implement a kind of job system, whereby a function can be run in one Powershell process, complete and then someone communicate its status to a function run from another (distinct) Powershell process...
I guess what I'd ideally like psjob results to be shareable between sessions, but this does not seem to be possible.
I can think of a few dirty ways of achieving this (like O/S environment variables), but am I missing an semi-elegant way?
For example:
Function giveMeNumber
{
$return_vlaue = Get-Random -Minimum -100 -Maximum 100
Return $return_vlaue
}
What are some ways i could get this function to store it's return somewhere and then grab it from another Powershell session (without using a database).
Cheers.
The QA mentioned by Keith refers to using MSMQ, a message queueing feature optionally available on desktop, mobile & server OS's from Microsoft.
It doesn't run by default on desktop OS's so you would have to ensure that the appropriate service was started. Seems like serious overkill to me unless you wanted something pretty beefy.
Of course, the most common choice for this type of task would be a simple shared file.
Alternatively, you could create a TCP listener in each of the jobs that you want to have accept external info. Not done this myself in PowerShell though I know it is possible. Node.JS would be a more familiar environment or Python. Seems like overkill if a shared file would do the job!
Another way would be to use the registry. Though you might consider that cheating since it is actually a database (of a very broken and simplistic sort).
I'm actually not sure that environment variables would work since I know that they can be picky about the parent environment scope (for example setting an env variable in a cmd doesn't make it available outside of the cmd scope by default.
UPDATE: Doh, missed a few! Some of them very obvious. Microsoft have a list:
Clipboard
COM
Data Copy
DDE
File Mapping
Mailslots
Pipes
RPC
Windows Sockets
Pipes was the one I was trying to remember. Windows sockets would be similar to a TCP listener.

Running perl function in background or separate process

I'm running in circles. I have webpage that creates a huge file. This file takes forever to be created and is in a subroutine.
What is the best way for my page to run this subroutine but not wait for it to be created/processed? Are there any issues with apache processes since I'm doing this from a webpage?
The simplest way to perform this task is to simply use fork() and have the long-running subroutine run in the child process. Meanwhile, have the parent return to Apache. You indicate that you've tried this already, but absent more information on exactly what the code looks like and what is failing it's hard to help you move forward on this path.
Another option is to have run a separate process that is responsible for managing the long-running task. Have the webpage send a unit of work to the long-running process using a local socket (or by creating a file with the necessary input data), and then your web script can return immediately while the separate process takes care of completing the long running task.
This method of decoupling the execution is fairly common and is often called a "task queue" (if there is some mechanism in place for queuing requests as they come in). There are a number of tools out there that will help you design this sort of solution (but for simple cases with filesystem-based communication you may be fine without them).
I think you want to create a worker grandchild of Apache -- that is:
Apache -> child -> grandchild
where the child dies right after forking the grandchild, and the grandchild closes STDIN, STDOUT, and STDERR. (The grandchild then creates the file.) These are the basic steps in creating a zombie daemon (a parent-less worker process unconnected with the webserver).

threads in Dancer

I'm using Dancer 1.31, in a standard configuration (plackup/Starman).
In a request I wished to call a perl function asynchronously, so that the request returns inmmediately. Think of the typical "long running operation" scenario, in which one wants to return a "processing page" with a refresh+redirect.
I (naively?) tried with a thread:
sub myfunc {
sleep 9; # just for testing a slow operation
}
any '/test1' => sub {
my $thr = threads->create('myfunc');
$thr->detach();
return "done" ;
};
I does not work, the server seems to freeze, and the error log does not show anything. I guess manual creation of threads are forbidden inside Dancer? It's an issue with PSGI? Which is the recommended way?
I would stay away from perl threads especially in a web server environment. It will most likely crash your server when you join or detach them.
I usually create a few threads (thread pool) BEFORE initializing other modules and keep them around for the entire life time of the application. Thread::Queue nicely provides communication between the workers and the main thread.
The best asynchronous solution I find in Perl is POE. In Linux I prefer using POE::Wheel::Run to run executables and subroutines asynchronously. It uses fork and has a beautiful interface allowing communication with the child process. (In Windows it's not usable due to thread dependency)
Setting up Dancer and POE inside the same application/script may cause problems and POE's event loop may be blocked. A single worker thread dedicated to POE may come handy, or I would write another server based on POE and just communicate with the Dancer application via sockets.
Threads are definitively iffy with Perl. It might be possible to write some threaded Dancer code, but to be honest I don't think we ever tried it. And considering that Dancer 1's core use simpleton classes, it might also be very tricky.
As Ogla says, there are other ways to implement asynchronous behavior in Dancer. You say that you are using Starman, which is a forking engine. But there is also Twiggy, which is AnyEvent-based. To see how to leverage it to write asynchronous code, have a gander at Dancer::Plugin::Async.

MonoDevelop: macros or running commands sequentially

I'm trying to create an add-in for MonoDevelop, which will run commands, triggered from external tools (ex: updating source, building and running project on incoming message from Jabber). Since I could not find macros, I use "commands", by calling them through IdeApp.CommandService.DispatchCommand(). For single action this works great, but when I try to run several commands sequentially, they are executed simultaneously.
So, how to implement the command queue, where one command waits completion of previous?
DispatchCommand is synchronous, however some of the commands that it runs may start asynchronous operations, and the commands have no way to return a handle to those operations.
For those particular commands, I'd recommend that you don't dispatch them as commands but instead directly call the high-level APIs to perform those operations. For example, IdeApp.ProjectOperations.Build returns an IAsyncOperation handle that you can block on using its WaitForCompleted method. You can use IdeApp.Workspace to open projects and get handles to opened projects, set the active configuration, etc.