I'm trying to integrate a basic 'hello world' jruby sinatra application with sinatra-synchrony and keep running into errors.
app.rb:
require 'sinatra/synchrony'
class App < Sinatra::Base
register Sinatra::Synchrony
get '/' do
'Hello world!'
end
end
config.ru:
require 'sinatra'
require 'app.rb'
run App
I've tried running this on a few different web servers and get varying errors to do with threads or memory leaks.
Synchrony libraries for Ruby are designed around using Fibers in event loops a la Eventmachine. For this particular case, you should consider using MRI and Goliath.io as your rack server.
However, Jruby is growing by leaps and bounds. I've been using it for last few months and avoiding the event loop paradigm altogether. Try removing the Synchrony library from your example and running it using puma.io.
Keep in mind the JVM takes a bit to "warm up". Hit it a few thousand times to optimize speed.
Related
I work on a fairly large Mojolicious application, which takes many seconds to compile.
Parts of the test suite of that application are written using playwright, which currently sets up a pristine database for each test case, and spins up an instance of the mojolicious application using #mojolicious/server-starter.
The compile-time of the application is starting to make it impractical to significantly expand the playwright test suite, and I'd like to address that without having to give up on the isolation that separate test databases and mojolicious application instances currently afford me.
In order to achieve that, the idea I'm currently pursuing is to have a small perl application that can pre-load the larger Mojolicious application, and which can be asked by the playwright test-suite to spawn a new instance of the larger application with a pristine database on an open port.
I'd like to communicate with that small perl application using HTTP, mainly out of convenience, and I'd like that small perl application to use Mojolicious to perform the HTTP communication, because that's also convenient and consistent with the rest of the code-base.
I've tried some naive approaches to implementing this idea, which looked roughly like this:
use TheBigApp;
$app->routes->post('/spawn-child')->to(cb => sub ($c) {
my ($sock, $port) = new_listen_socket();
if (my $pid = fork()) {
# record pid to later be able to shut it down or whatever
$c->render(json => { url => "http://localhost:$port" })
} else {
my $bapp = TheBigApp->new($c->req->json);
my $s = Mojo::Server::Daemon->new(listen => ...);
$s->app($bapp);
$bapp->start;
return;
}
});
All of the implementations I've tried along these lines seemed to run into issues due to the various singletons, such as the IOLoop, and even when overriding IOLoop->singleton to return a new instance with a new reactor within the sub-process, it appeared as if the forked-off child processes were still listening on the same socket as the parent-process that spawned them.
Are there perhaps lower-level Mojolicious APIs that could make this use-case work? Would it perhaps be simpler to implement the small parent process without Mojolicious to sidestep the issue entirely?
Thanks!
Digging a little in the Mojo Server/Daemon code, it always(?) sets up 'ReusePort' on the IO::Socket.
On linux (not macOS or BSD, windows I have no idea) that means that TCP connections to the same IP and port combination are 'load balanced' across multiple server instances by the kernel.
It is unclear from you post if the spawned processes are listening on the same port or not.
Assuming you are running linux, changing the listening port for each spawned child might help.
This question I asked has resulted in me exploring directly interfacing my FastCGI script to NGINX, rather than using a reverse proxy to Apache. I successfully modified my FastCGI script to run as a daemon using some code I found online:
my $s = FCGI::OpenSocket(':9000',20);
my $request = FCGI::Request( \*STDIN, \*STDOUT, \*STDERR, \%ENV, $s);
# Remaining code stays just as it does when using with Apache's mod_fcgid
while($request->Accept() >= 0) {
# Call core app subroutines.
}
It works, but near as I can tell this has a distinct disadvantage over mod_fcgid: I have one process running which will handle one request at a time and if that process dies, there's nothing to start it back up. There are references on Stack Overflow to code that properly spun off workers, but the sites referenced inevitably seem to have gone offline, much like FastCGI's own site.
So, I'm trying to figure out what I need to add and also -- pardon the pun -- figure out if I need to take a fork in this road. Here are the options that I am trying to consider, if I understand my issues correctly:
Directly implement some sort of forking mechanism, ideally it seems like it should (1) toss off the request to a process/thread/worker -- perhaps one that can stay alive for multiple requests -- and move on to being ready for the next request and (2) be independent enough from the workers that if something goes wrong with a worker, it doesn't bring down the whole system until I catch it and restart the main process (e.g. autorestart processes). If this can be done simply and reliably, this seems to have a huge appeal since the code already works with FastCGI.
Give up on direct FastCGI and convert to PSGI and use an application server to handle these things. Given that I'm using Perl, I'd guess Starman is the logical option, although I've been reading on uwsgi's PSGI support and it sounds almost ideal in "tyrant Emperor" mode, where it could run processes with different privileges, auto restart missing processes, etc.
Option 1 seems intriguing since it requires the least modification to my existing code and a FastCGI script started up without FastCGI still works like a normal CGI script. (I'm not running this code under FastCGI when it is used by sites that are very low traffic).
Option 2, though, feels like it might be more "modern." At least PSGI documentation seems to still be online, for example, and using Starman or uwsgi seem like they take care of the background stuff I need probably better than I would cooking up my own system. Downside: I'd need two startup scripts for my code: one to be used by the PSGI enabled sites and one for sites still running in CGI.
Update: Continuing to explore option 1, I read through this tutorial on Perl fork() which seems somewhat relevant. Would using fork to break off each FastCGI request be a good approach if I go with option 1? I assume I'd be at risk of fork bombing, although if I kept track of the number of forks and issued wait() if ($forks > 10); perhaps that would be a safe approach? (Or perhaps using Parallel::ForkManager to do that process watching.) Or would it be safer and/or more efficient using something like Thread::Queue and passing FastCGI request objects to a set a threads that are reliably already established? There seem to be plenty of pitfalls I might overlook, which then returns me to whether I should opt for Option 2.
I have a sinatra app, which is performing considerably slower than I would like. My first suspicion is that it is my own code that is the bottleneck, so I extracted it to a standalone benchmarking script.
THREADS = 100
ITERATIONS = 1
def make_calls
ITERATIONS.times do
# ... my stuff here
end
end
1.upto(THREADS) do |n|
Benchmark.bm do |bm|
threads = []
n.times do
threads << Thread.new { make_calls }
end
bm.report("#{n} threads:") { threads.each { |t| t.value } }
end
end
Where make_calls calls my own code. I'm pleased to say that by the time we have reached 100 threads, the cumulative times of make_calls in all threads is 0.6 seconds, which is fast enough for my purposes. The reason I am wrapping the make_calls method in threads above is because my own code uses threads (java native threads via a java.concurrent.FixedThreadPool(500)) /ExecutorService and I wanted to make sure that this was behaving nicely in an environment that potentially uses other threading models. A single iteration in a single thread runs in about 0.02 seconds once the jruby has warmed up.
So the above is good, but when I add this to a sinatra web server with the following:
require 'sinatra'
get '/' do
# ... my stuff here
end
The response time on a request to this endpoint is approx .5 seconds - Increase the number of concurrent requests and the response time goes up in a linear fashion. I've used both jetty-rackup and trinidad to try this, using jruby 1.7 on both linux and solaris.
I have tried to optimise the trinidad instance to no avail (max/min runtimes etc). The best performance we have seen is by running either server in threadsafe! mode, and both servers show comparative performance in this mode.
Can anyone explain to me where the time is being consumed or how to improve this setup?
It doesn't sound like the server is the limiting factor. Perhaps it's a problem with rack or Sinatra, but
# ... my stuff here
doesn't really give much away. Try using a profiling tool (VisualVM is OK to start with) to check if you're allocating masses of objects you don't need, whether all your threads are waiting, e.t.c then change or eliminate what you think is the bottleneck.
Repeat this until you think it's fast enough ;)
I'm using Dancer 1.31, in a standard configuration (plackup/Starman).
In a request I wished to call a perl function asynchronously, so that the request returns inmmediately. Think of the typical "long running operation" scenario, in which one wants to return a "processing page" with a refresh+redirect.
I (naively?) tried with a thread:
sub myfunc {
sleep 9; # just for testing a slow operation
}
any '/test1' => sub {
my $thr = threads->create('myfunc');
$thr->detach();
return "done" ;
};
I does not work, the server seems to freeze, and the error log does not show anything. I guess manual creation of threads are forbidden inside Dancer? It's an issue with PSGI? Which is the recommended way?
I would stay away from perl threads especially in a web server environment. It will most likely crash your server when you join or detach them.
I usually create a few threads (thread pool) BEFORE initializing other modules and keep them around for the entire life time of the application. Thread::Queue nicely provides communication between the workers and the main thread.
The best asynchronous solution I find in Perl is POE. In Linux I prefer using POE::Wheel::Run to run executables and subroutines asynchronously. It uses fork and has a beautiful interface allowing communication with the child process. (In Windows it's not usable due to thread dependency)
Setting up Dancer and POE inside the same application/script may cause problems and POE's event loop may be blocked. A single worker thread dedicated to POE may come handy, or I would write another server based on POE and just communicate with the Dancer application via sockets.
Threads are definitively iffy with Perl. It might be possible to write some threaded Dancer code, but to be honest I don't think we ever tried it. And considering that Dancer 1's core use simpleton classes, it might also be very tricky.
As Ogla says, there are other ways to implement asynchronous behavior in Dancer. You say that you are using Starman, which is a forking engine. But there is also Twiggy, which is AnyEvent-based. To see how to leverage it to write asynchronous code, have a gander at Dancer::Plugin::Async.
Various Perl scripts (Server Side Includes) are calling a Perl module with many functions on a website.
EDIT:
The scripts are using use lib to reference the libraries from a folder.
During busy periods the scripts (not the libraries) become zombies and overload the server.
The server lists:
319 ? Z 0:00 [scriptname1.pl] <defunct>
320 ? Z 0:00 [scriptname2.pl] <defunct>
321 ? Z 0:00 [scriptname3.pl] <defunct>
I have hundreds of instances of each.
EDIT:
We are not using fork, system or exec, apart form the SSI directive
<!--#exec cgi="/cgi-bin/scriptname.pl"-->
As far as I know, in this case httpd itself will be the owner of the process.
MaxRequestPerChild is set to 0 which should not let the parents die before the child process is finished.
So far we figured that temporarily suspending some of the scripts help the server coping with the defunct processes and prevent it from falling over however zombie processes are still forming without a doubt.
Apparently gbacon seems to be the closest to the truth with his theory that the server is not being able to cope with the load.
What could lead to httpd abandoning these processes?
Is there any best practice to prevent these from happening?
Thanks
Answer:
The point goes to Rob.
As he says, CGI scripts that generate SSI's will not have those SSI's handled. The evaluation of SSI's happens before the running of CGI's in the Apache 1.3 request cycle. This was fixed with Apache 2.0 and later so that CGI's can generate SSI commands.
Since we were running on Apache 1.3, for every page view the SSI's turned into defunct processes. Although the server was trying to clear them it was way too busy with the running tasks to be able to succeed. As a result, the server fell over and become unresponsive.
As a short term solution we reviewed all SSI's and moved some of the processes to client side to free up server resources and give it time to clean up.
Later we upgraded to Apache 2.2.
More Band-Aid than best practice, but sometimes you can get away with simple
$SIG{CHLD} = "IGNORE";
According to the perlipc documentation
On most Unix platforms, the CHLD (sometimes also known as CLD) signal has special behavior with respect to a value of 'IGNORE'. Setting $SIG{CHLD} to 'IGNORE' on such a platform has the effect of not creating zombie processes when the parent process fails to wait() on its child processes (i.e., child processes are automatically reaped). Calling wait() with $SIG{CHLD} set to 'IGNORE' usually returns -1 on such platforms.
If you care about the exit statuses of child processes, you need to collect them (commonly referred to as "reaping") by calling wait or waitpid. Despite the creepy name, a zombie is merely a child process that has exited but whose status has not yet been reaped.
If your Perl programs themselves are the child processes becoming zombies, that means their parents (the ones that are forking-and-forgetting your code) need to clean up after themselves. A process cannot stop itself from becoming a zombie.
I just saw your comment that you are running Apache 1.3 and that may be associated with your problem.
SSI's can run CGI's. But CGI scripts that generate SSI's will not have those SSI's handled. The evaluation of SSI's happens before the running of CGI's in the Apache 1.3 request cycle. This was fixed with Apache 2.0 and later so that CGI's can generate SSI commands.
As I'd suggested above, try running your scripts on their own and have a look at the output. Are they generating SSI's?
Edit: Have you tried launching a trivial Perl CGI script to simply printout a Hello World type HTTP response?
Then if this works add a trivial SSI directives such as
<!--#printenv -->
and see what happens.
Edit 2: Just realised what is probably happening. Zombies occur when a child process exits and isn't reaped. These processes are hanging around and slowly using up resources within the process table. A process without a parent is an orphaned process.
Are you forking off processes within your Perl script? If so, have you added a waitpid() call to the parent?
Have you also got the correct exit within the script?
CORE::exit(0);
As you have all the bits yourself, I'd suggest running the individual scripts one at a time from the command line to see if you can spot the ones that are hanging.
Does a ps listing show an inordinate number of instances of one particular script running?
Are you running the CGI's using mod_perl?
Edit: Just saw your comments regarding SSI's. Don't forget that SSI directives can run Perl scripts themselves. Have a look to see what the CGI's are trying to run?
Are they dependent on yet another server or service?