My mod_perl2-based intranet app uses DBI->connect_cached() which is supposedly overridden by Apache::DBI's version of the same. It has normally worked quite well, but just recently we started having an issue on our testing server--which had only two users connected--whereby our app would sometimes, but not always, die when trying to reload a page with 'FATAL: sorry, too many clients already' connecting to our postgres 9.0 backend, despite all of them being <IDLE> if I look at the stats in pgadmin3.
The backend is separate from our development and production backends, but they're all configured with max_connections = 100. Likewise the httpd services are all separate, but configured with
StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 99
MaxClients 99
MaxRequestsPerChild 4000
....
PerlModule Apache::DBI
I had been under the impression that I shouldn't call disconnect() on my database handles if I wanted them to actually benefit from caching. Was I wrong about that? If not, I guess I'll ask about the above error separately. Just wanted to make sure it wasn't this setup...
Apache::DBI's docs say:
When loading the DBI module (do not confuse this with the Apache::DBI
module) it checks if the environment variable 'MOD_PERL' has been set
and if the module Apache::DBI has been loaded. In this case every
connect request will be forwarded to the Apache::DBI module.
....
There is no need to remove the disconnect statements from your code.
They won't do anything because the Apache::DBI module overloads the
disconnect method.
If you are developing new code that is strictly for use in mod_perl,
you may choose to use DBI->connect_cached() instead, but consider
adding an automatic rollback after each request, as described above.
So I guess for my mod_perl2-only app, I don't need Apache::DBI because Apache::DBI's devs recommend using DBI->connect_cached. And I don't need disconnect statements.
But then DBI's docs say:
Note that the behaviour of [ connect_cached ] differs in several
respects from the behaviour of persistent connections implemented by
Apache::DBI. However, if Apache::DBI is loaded then connect_cached
will use it.
This makes it sound like Apache::DBI will actually affect connect_cached, in that instead of getting DBI->connect_cached behaviour when I call that, I'll get Apache::DBI->connect behaviour. And Apache::DBI's docs recommend against that.
UPDATE: I've set the first 5 parameters in the above config all to 1, and my app is still using up more and more connections as I hit its pages. This I don't understand at all--it should only have one process, and that one process should be re-using its connection.
Unless you plan on dropping Apache::DBI, the answer is a firm no, because Apache::DBI's override really does nothing:
# overload disconnect
{
package Apache::DBI::db;
no strict;
#ISA=qw(DBI::db);
use strict;
sub disconnect {
my $prefix = "$$ Apache::DBI ";
Apache::DBI::debug(2, "$prefix disconnect (overloaded)");
1;
}
;
}
Related
I work on a fairly large Mojolicious application, which takes many seconds to compile.
Parts of the test suite of that application are written using playwright, which currently sets up a pristine database for each test case, and spins up an instance of the mojolicious application using #mojolicious/server-starter.
The compile-time of the application is starting to make it impractical to significantly expand the playwright test suite, and I'd like to address that without having to give up on the isolation that separate test databases and mojolicious application instances currently afford me.
In order to achieve that, the idea I'm currently pursuing is to have a small perl application that can pre-load the larger Mojolicious application, and which can be asked by the playwright test-suite to spawn a new instance of the larger application with a pristine database on an open port.
I'd like to communicate with that small perl application using HTTP, mainly out of convenience, and I'd like that small perl application to use Mojolicious to perform the HTTP communication, because that's also convenient and consistent with the rest of the code-base.
I've tried some naive approaches to implementing this idea, which looked roughly like this:
use TheBigApp;
$app->routes->post('/spawn-child')->to(cb => sub ($c) {
my ($sock, $port) = new_listen_socket();
if (my $pid = fork()) {
# record pid to later be able to shut it down or whatever
$c->render(json => { url => "http://localhost:$port" })
} else {
my $bapp = TheBigApp->new($c->req->json);
my $s = Mojo::Server::Daemon->new(listen => ...);
$s->app($bapp);
$bapp->start;
return;
}
});
All of the implementations I've tried along these lines seemed to run into issues due to the various singletons, such as the IOLoop, and even when overriding IOLoop->singleton to return a new instance with a new reactor within the sub-process, it appeared as if the forked-off child processes were still listening on the same socket as the parent-process that spawned them.
Are there perhaps lower-level Mojolicious APIs that could make this use-case work? Would it perhaps be simpler to implement the small parent process without Mojolicious to sidestep the issue entirely?
Thanks!
Digging a little in the Mojo Server/Daemon code, it always(?) sets up 'ReusePort' on the IO::Socket.
On linux (not macOS or BSD, windows I have no idea) that means that TCP connections to the same IP and port combination are 'load balanced' across multiple server instances by the kernel.
It is unclear from you post if the spawned processes are listening on the same port or not.
Assuming you are running linux, changing the listening port for each spawned child might help.
This question I asked has resulted in me exploring directly interfacing my FastCGI script to NGINX, rather than using a reverse proxy to Apache. I successfully modified my FastCGI script to run as a daemon using some code I found online:
my $s = FCGI::OpenSocket(':9000',20);
my $request = FCGI::Request( \*STDIN, \*STDOUT, \*STDERR, \%ENV, $s);
# Remaining code stays just as it does when using with Apache's mod_fcgid
while($request->Accept() >= 0) {
# Call core app subroutines.
}
It works, but near as I can tell this has a distinct disadvantage over mod_fcgid: I have one process running which will handle one request at a time and if that process dies, there's nothing to start it back up. There are references on Stack Overflow to code that properly spun off workers, but the sites referenced inevitably seem to have gone offline, much like FastCGI's own site.
So, I'm trying to figure out what I need to add and also -- pardon the pun -- figure out if I need to take a fork in this road. Here are the options that I am trying to consider, if I understand my issues correctly:
Directly implement some sort of forking mechanism, ideally it seems like it should (1) toss off the request to a process/thread/worker -- perhaps one that can stay alive for multiple requests -- and move on to being ready for the next request and (2) be independent enough from the workers that if something goes wrong with a worker, it doesn't bring down the whole system until I catch it and restart the main process (e.g. autorestart processes). If this can be done simply and reliably, this seems to have a huge appeal since the code already works with FastCGI.
Give up on direct FastCGI and convert to PSGI and use an application server to handle these things. Given that I'm using Perl, I'd guess Starman is the logical option, although I've been reading on uwsgi's PSGI support and it sounds almost ideal in "tyrant Emperor" mode, where it could run processes with different privileges, auto restart missing processes, etc.
Option 1 seems intriguing since it requires the least modification to my existing code and a FastCGI script started up without FastCGI still works like a normal CGI script. (I'm not running this code under FastCGI when it is used by sites that are very low traffic).
Option 2, though, feels like it might be more "modern." At least PSGI documentation seems to still be online, for example, and using Starman or uwsgi seem like they take care of the background stuff I need probably better than I would cooking up my own system. Downside: I'd need two startup scripts for my code: one to be used by the PSGI enabled sites and one for sites still running in CGI.
Update: Continuing to explore option 1, I read through this tutorial on Perl fork() which seems somewhat relevant. Would using fork to break off each FastCGI request be a good approach if I go with option 1? I assume I'd be at risk of fork bombing, although if I kept track of the number of forks and issued wait() if ($forks > 10); perhaps that would be a safe approach? (Or perhaps using Parallel::ForkManager to do that process watching.) Or would it be safer and/or more efficient using something like Thread::Queue and passing FastCGI request objects to a set a threads that are reliably already established? There seem to be plenty of pitfalls I might overlook, which then returns me to whether I should opt for Option 2.
I have initialized a Cache::Memcached::Fast object in the mod_perl startup for re-use by the scripts
For eg in startup.pl
$GLOBAL::memc = new Cache::Memcached::Fast({servers => '192.168.1.1:11211'});
I notice that when multiple calls to $GLOBAL::memc->get() happen simultaneously to the scripts the data for 1 process is sometimes copied to the results of the other
How can I make sure the memc handles are multi process-safe
This link explains a different problem , that the memcache handle dies .. but I guess this is also because of the same reason
What is the best way to create persistent memcached connections under mod_perl?
Is Memcached get and put methods are thread safe
use cas(compare and set) function to have proper syncronization between processes/threads.
I'm using Dancer 1.31, in a standard configuration (plackup/Starman).
In a request I wished to call a perl function asynchronously, so that the request returns inmmediately. Think of the typical "long running operation" scenario, in which one wants to return a "processing page" with a refresh+redirect.
I (naively?) tried with a thread:
sub myfunc {
sleep 9; # just for testing a slow operation
}
any '/test1' => sub {
my $thr = threads->create('myfunc');
$thr->detach();
return "done" ;
};
I does not work, the server seems to freeze, and the error log does not show anything. I guess manual creation of threads are forbidden inside Dancer? It's an issue with PSGI? Which is the recommended way?
I would stay away from perl threads especially in a web server environment. It will most likely crash your server when you join or detach them.
I usually create a few threads (thread pool) BEFORE initializing other modules and keep them around for the entire life time of the application. Thread::Queue nicely provides communication between the workers and the main thread.
The best asynchronous solution I find in Perl is POE. In Linux I prefer using POE::Wheel::Run to run executables and subroutines asynchronously. It uses fork and has a beautiful interface allowing communication with the child process. (In Windows it's not usable due to thread dependency)
Setting up Dancer and POE inside the same application/script may cause problems and POE's event loop may be blocked. A single worker thread dedicated to POE may come handy, or I would write another server based on POE and just communicate with the Dancer application via sockets.
Threads are definitively iffy with Perl. It might be possible to write some threaded Dancer code, but to be honest I don't think we ever tried it. And considering that Dancer 1's core use simpleton classes, it might also be very tricky.
As Ogla says, there are other ways to implement asynchronous behavior in Dancer. You say that you are using Starman, which is a forking engine. But there is also Twiggy, which is AnyEvent-based. To see how to leverage it to write asynchronous code, have a gander at Dancer::Plugin::Async.
I'm planning to move from Class::DBI to Rose::DB::Object due to its nice structure and the jargon that RDBO is faster compares to CDBI and DBIC.
However on my machine (linux 2.6.9-89, perl 5.8.9) RDBO compiled time is much slower than CDBI:
$ time perl -MClass::DBI -e0
real 0m0.233s
user 0m0.208s
sys 0m0.024s
$ time perl -MRose::DB::Object -e0
real 0m1.178s
user 0m1.097s
sys 0m0.078s
That's a lot different...
Anyone experiences similar behaviour here?
Cheers.
#manni and #john: thanks for the explanation about the modules referenced by RDBO, it surely answers why the compile-time is slower than CDBI.
The application is not running on a persistent environment. In fact it's invoked by several simultaneous cron jobs that run at 2 mins, 5 mins, and x mins interval - so yes, compile-time is crucial here...
Jonathan Rockway's App::Persistent seems interesting, however its (current) limitation to allow only one application running at a time is not suitable for my purpose. Also it has issue when we kill the client, the server process is still running...
Rose::DB::Object simply contains (or references from other modules) much more code than Class::DBI. On the bright side, it also has many more features and is much faster at runtime than Class::DBI. If compile time is concern for you, then your best bet is to load as little code as possible (or get faster disks).
Another option is to set auto_load_related_classes to false in your Metadata objects. To do this early enough and globally will probably require you to make a Metadata subclass and then set that as the meta_class in your common Rose::DB::Object base class.
Turning auto_load_related_classes off means that you'd have to manually load related classes that you actually want to use in your script. That's a bit of a pain, but it lets you control how many classes get loaded. (If you have heavily interrelated classes, loading a single one can end up pulling all the other ones in.)
You could, perhaps, have an environment variable to control the behavior. Example metadata class:
package My::DB::Object::Metadata;
use base 'Rose::DB::Object::Metadata';
# New class method to handle default
sub default_auto_load_related_classes
{
return $ENV{'RDBO_AUTO_LOAD_RELATED_CLASSES'} ? 1 : 0
}
# Override existing object method, honoring new class-defined default
sub auto_load_related_classes
{
my($self) = shift;
return $self->SUPER::auto_load_related_classes(#_) if(#_);
if(defined(my $value = $self->SUPER::auto_load_related_classes))
{
return $value;
}
# Initialize to default
return $self->SUPER::auto_load_related_classes(ref($self)->default_auto_load_related_classes);
}
And here's how it's tied to your common object base class:
package My::DB::Object;
use base 'Rose::DB::Object';
use My::DB::Object::Metadata;
sub meta_class { 'My::DB::Object::Metadata' }
Then set RDBO_AUTO_LOAD_RELATED_CLASSES to true when you're running in a persistent environment, and leave it false (and don't forget to explicitly load related classes) for command-line scripts.
Again, this will only help if you're currently loading more classes than you strictly need in a particular script due to the default true value of the auto_load_related_classes Metadata attribute.
If compile time is an issue, there are methods to lessen the impact. One is PPerl which makes a normal Perl script into a daemon that is compiled once. The only change you need to make (after installing it, of course) is to the shebang line:
#!/usr/bin/pperl
Another option is to code write a client/server model program where the bulk of the work is done by a server that loads the expensive modules and a thin script that just interacts with the server over sockets or pipes.
You should also look at App::Persistent and this article, both of which were written by Jonathan Rockway (aka jrockway).
This looks almost as dramatic over here:
time perl -MClass::DBI -e0
real 0m0.084s
user 0m0.080s
sys 0m0.004s
time perl -MRose::DB::Object -e0
real 0m0.391s
user 0m0.356s
sys 0m0.036s
I'm afraid part of the difference can simply be explained by the number of dependencies in each module:
perl -MClass::DBI -le 'print scalar keys %INC'
46
perl -MRose::DB::Object -le 'print scalar keys %INC'
95
Of course, you should ask yourself how much compilation time really matters for your particular problem. And what source code would be easier to maintain for you.