How to spawn pre-loaded mojolicious applications from a mojolicious application? - perl

I work on a fairly large Mojolicious application, which takes many seconds to compile.
Parts of the test suite of that application are written using playwright, which currently sets up a pristine database for each test case, and spins up an instance of the mojolicious application using #mojolicious/server-starter.
The compile-time of the application is starting to make it impractical to significantly expand the playwright test suite, and I'd like to address that without having to give up on the isolation that separate test databases and mojolicious application instances currently afford me.
In order to achieve that, the idea I'm currently pursuing is to have a small perl application that can pre-load the larger Mojolicious application, and which can be asked by the playwright test-suite to spawn a new instance of the larger application with a pristine database on an open port.
I'd like to communicate with that small perl application using HTTP, mainly out of convenience, and I'd like that small perl application to use Mojolicious to perform the HTTP communication, because that's also convenient and consistent with the rest of the code-base.
I've tried some naive approaches to implementing this idea, which looked roughly like this:
use TheBigApp;
$app->routes->post('/spawn-child')->to(cb => sub ($c) {
my ($sock, $port) = new_listen_socket();
if (my $pid = fork()) {
# record pid to later be able to shut it down or whatever
$c->render(json => { url => "http://localhost:$port" })
} else {
my $bapp = TheBigApp->new($c->req->json);
my $s = Mojo::Server::Daemon->new(listen => ...);
$s->app($bapp);
$bapp->start;
return;
}
});
All of the implementations I've tried along these lines seemed to run into issues due to the various singletons, such as the IOLoop, and even when overriding IOLoop->singleton to return a new instance with a new reactor within the sub-process, it appeared as if the forked-off child processes were still listening on the same socket as the parent-process that spawned them.
Are there perhaps lower-level Mojolicious APIs that could make this use-case work? Would it perhaps be simpler to implement the small parent process without Mojolicious to sidestep the issue entirely?
Thanks!

Digging a little in the Mojo Server/Daemon code, it always(?) sets up 'ReusePort' on the IO::Socket.
On linux (not macOS or BSD, windows I have no idea) that means that TCP connections to the same IP and port combination are 'load balanced' across multiple server instances by the kernel.
It is unclear from you post if the spawned processes are listening on the same port or not.
Assuming you are running linux, changing the listening port for each spawned child might help.

Related

FastCGI or PSGI Interface to NGINX in 2021

This question I asked has resulted in me exploring directly interfacing my FastCGI script to NGINX, rather than using a reverse proxy to Apache. I successfully modified my FastCGI script to run as a daemon using some code I found online:
my $s = FCGI::OpenSocket(':9000',20);
my $request = FCGI::Request( \*STDIN, \*STDOUT, \*STDERR, \%ENV, $s);
# Remaining code stays just as it does when using with Apache's mod_fcgid
while($request->Accept() >= 0) {
# Call core app subroutines.
}
It works, but near as I can tell this has a distinct disadvantage over mod_fcgid: I have one process running which will handle one request at a time and if that process dies, there's nothing to start it back up. There are references on Stack Overflow to code that properly spun off workers, but the sites referenced inevitably seem to have gone offline, much like FastCGI's own site.
So, I'm trying to figure out what I need to add and also -- pardon the pun -- figure out if I need to take a fork in this road. Here are the options that I am trying to consider, if I understand my issues correctly:
Directly implement some sort of forking mechanism, ideally it seems like it should (1) toss off the request to a process/thread/worker -- perhaps one that can stay alive for multiple requests -- and move on to being ready for the next request and (2) be independent enough from the workers that if something goes wrong with a worker, it doesn't bring down the whole system until I catch it and restart the main process (e.g. autorestart processes). If this can be done simply and reliably, this seems to have a huge appeal since the code already works with FastCGI.
Give up on direct FastCGI and convert to PSGI and use an application server to handle these things. Given that I'm using Perl, I'd guess Starman is the logical option, although I've been reading on uwsgi's PSGI support and it sounds almost ideal in "tyrant Emperor" mode, where it could run processes with different privileges, auto restart missing processes, etc.
Option 1 seems intriguing since it requires the least modification to my existing code and a FastCGI script started up without FastCGI still works like a normal CGI script. (I'm not running this code under FastCGI when it is used by sites that are very low traffic).
Option 2, though, feels like it might be more "modern." At least PSGI documentation seems to still be online, for example, and using Starman or uwsgi seem like they take care of the background stuff I need probably better than I would cooking up my own system. Downside: I'd need two startup scripts for my code: one to be used by the PSGI enabled sites and one for sites still running in CGI.
Update: Continuing to explore option 1, I read through this tutorial on Perl fork() which seems somewhat relevant. Would using fork to break off each FastCGI request be a good approach if I go with option 1? I assume I'd be at risk of fork bombing, although if I kept track of the number of forks and issued wait() if ($forks > 10); perhaps that would be a safe approach? (Or perhaps using Parallel::ForkManager to do that process watching.) Or would it be safer and/or more efficient using something like Thread::Queue and passing FastCGI request objects to a set a threads that are reliably already established? There seem to be plenty of pitfalls I might overlook, which then returns me to whether I should opt for Option 2.

Perl script running a periodic (main) task and providing a REST interface

I am working on a Perl script which does some periodic processing based on file-system contents.
The overall structure is like this:
# ... initialization...
while(1) {
# ... scan filesystem, perform actions depending on changes detected ...
sleep 5;
}
I would like to add the ability to input some data into this process by means of exposing an interface through HTTP. E.g. I would like to add an endpoint to skip the sleep, but also some means to input data that is processed in the next iteration. Additionally, I would like to be able to query some of the program's status through HTTP (i.e. a simple fork() to run the webserver-part in a separate process is insufficient?)
So far I have already used the Dancer2 framework once but it has a start; call that blocks and thus does not allow any other tasks (like my loop) to run. Additionally, I could of course move the code which is currently inside the loop to an endpoint exposed through Dancer2 but then I would need to call that periodically (though an external program?) which seems to be quite an obscure indirection compared to just having the webserver-part running in background.
Is it possible to unobtrusively (i.e. without blocking the program) add a REST-server capability to a Perl script? If yes: Which modules would be used for the purpose? If no: Should I really implement an external process to periodically invoke a certain endpoint or pursue a different solution altogether?
(I have tried to add a dancer2 tag, but could not do so due to insufficient reputation. Do not be mislead by this: I have so far only tried with Dancer2 not the Dancer (v.1))
You could try to launch your processing loop in a background thread, before you run start;.
See man perlthrtut
You probably want use threads::shared; to declare some variables shared between the REST part and the background thread. Or use dedicated queues/event mechanisms.

Should I disconnect() if I'm using Apache::DBI's connect_cached()?

My mod_perl2-based intranet app uses DBI->connect_cached() which is supposedly overridden by Apache::DBI's version of the same. It has normally worked quite well, but just recently we started having an issue on our testing server--which had only two users connected--whereby our app would sometimes, but not always, die when trying to reload a page with 'FATAL: sorry, too many clients already' connecting to our postgres 9.0 backend, despite all of them being <IDLE> if I look at the stats in pgadmin3.
The backend is separate from our development and production backends, but they're all configured with max_connections = 100. Likewise the httpd services are all separate, but configured with
StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 99
MaxClients 99
MaxRequestsPerChild 4000
....
PerlModule Apache::DBI
I had been under the impression that I shouldn't call disconnect() on my database handles if I wanted them to actually benefit from caching. Was I wrong about that? If not, I guess I'll ask about the above error separately. Just wanted to make sure it wasn't this setup...
Apache::DBI's docs say:
When loading the DBI module (do not confuse this with the Apache::DBI
module) it checks if the environment variable 'MOD_PERL' has been set
and if the module Apache::DBI has been loaded. In this case every
connect request will be forwarded to the Apache::DBI module.
....
There is no need to remove the disconnect statements from your code.
They won't do anything because the Apache::DBI module overloads the
disconnect method.
If you are developing new code that is strictly for use in mod_perl,
you may choose to use DBI->connect_cached() instead, but consider
adding an automatic rollback after each request, as described above.
So I guess for my mod_perl2-only app, I don't need Apache::DBI because Apache::DBI's devs recommend using DBI->connect_cached. And I don't need disconnect statements.
But then DBI's docs say:
Note that the behaviour of [ connect_cached ] differs in several
respects from the behaviour of persistent connections implemented by
Apache::DBI. However, if Apache::DBI is loaded then connect_cached
will use it.
This makes it sound like Apache::DBI will actually affect connect_cached, in that instead of getting DBI->connect_cached behaviour when I call that, I'll get Apache::DBI->connect behaviour. And Apache::DBI's docs recommend against that.
UPDATE: I've set the first 5 parameters in the above config all to 1, and my app is still using up more and more connections as I hit its pages. This I don't understand at all--it should only have one process, and that one process should be re-using its connection.
Unless you plan on dropping Apache::DBI, the answer is a firm no, because Apache::DBI's override really does nothing:
# overload disconnect
{
package Apache::DBI::db;
no strict;
#ISA=qw(DBI::db);
use strict;
sub disconnect {
my $prefix = "$$ Apache::DBI ";
Apache::DBI::debug(2, "$prefix disconnect (overloaded)");
1;
}
;
}

Pause/Resume embedded python interpreter

Is there any possibility to pause/resume the work of embedded python interpreter in place, where I need? For example:
C++ pseudo-code part:
main()
{
script = "python_script.py";
...
RunScript(script); //-- python script runs till the command 'stop'
while(true)
{
//... read values from some variables in python-script
//... do some work ...
//... write new value to some other variables in python-script
ResumeScript(script); //-- python script resumes it's work where
// it was stopped. Not from begin!
}
...
}
Python script pseudo-code part:
#... do some init-work
while true:
#... do some work
stop # - here script stops and C++-function RunScript()
# returns control to C++-part
#... After calling C++-function ResumeScript
# the work continues from this line
Is this possible to do with Python/C API?
Thanks
I too have recently been searching for a way to manually "drive" an embedded language and I came across this question and figured I'd share a potential workaround.
I would implement the "blocking" behavior either through a socket, or some kind of messaging system. Instead of actually stopping the whole python interpreter, just have it block when it is waiting for C++ to do it's evaluations.
C++ will start the embedded runtime, then enter a loop of some sort that waits for python to "throw the signal" that it's ready. For instance C++ listens on port 5000, starts python, python does work, connects to port 5000 on localhost, then C++ sees the connection and grabs the data from python, performs work on it, then shuffles the data back over the socket to python, where python then receives the data and leaves the blocking loop.
I still need a way to fully pause the virtual runtime, but in your case you could achieve the same thing with a socket and some blocking behavior that uses the socket to coordinate the two pieces of code.
Good luck :)
EDIT: You may be able to hook this "injection" functionality used in this answer to completely stop python. Just modify it to inject a wait-loop perhaps.
Stopping embedded Python

threads in Dancer

I'm using Dancer 1.31, in a standard configuration (plackup/Starman).
In a request I wished to call a perl function asynchronously, so that the request returns inmmediately. Think of the typical "long running operation" scenario, in which one wants to return a "processing page" with a refresh+redirect.
I (naively?) tried with a thread:
sub myfunc {
sleep 9; # just for testing a slow operation
}
any '/test1' => sub {
my $thr = threads->create('myfunc');
$thr->detach();
return "done" ;
};
I does not work, the server seems to freeze, and the error log does not show anything. I guess manual creation of threads are forbidden inside Dancer? It's an issue with PSGI? Which is the recommended way?
I would stay away from perl threads especially in a web server environment. It will most likely crash your server when you join or detach them.
I usually create a few threads (thread pool) BEFORE initializing other modules and keep them around for the entire life time of the application. Thread::Queue nicely provides communication between the workers and the main thread.
The best asynchronous solution I find in Perl is POE. In Linux I prefer using POE::Wheel::Run to run executables and subroutines asynchronously. It uses fork and has a beautiful interface allowing communication with the child process. (In Windows it's not usable due to thread dependency)
Setting up Dancer and POE inside the same application/script may cause problems and POE's event loop may be blocked. A single worker thread dedicated to POE may come handy, or I would write another server based on POE and just communicate with the Dancer application via sockets.
Threads are definitively iffy with Perl. It might be possible to write some threaded Dancer code, but to be honest I don't think we ever tried it. And considering that Dancer 1's core use simpleton classes, it might also be very tricky.
As Ogla says, there are other ways to implement asynchronous behavior in Dancer. You say that you are using Starman, which is a forking engine. But there is also Twiggy, which is AnyEvent-based. To see how to leverage it to write asynchronous code, have a gander at Dancer::Plugin::Async.