How can a PSGI application be served with many concurrent connections? I have tried event-based and preforking webservers but the number of concurrent connections seems to be limited by the number of worker processes. I've heard that for instance Node.js scales to several thousand parallel connections, can you achieve similar in Perl?
Here is a sample application that keeps connection open infinitely. The point is not to have infinite connections but to keep connections open long enough to hit connection limits:
my $app = sub {
my $env = shift;
return sub {
my $responder = shift;
my $writer = $responder->(['200', ['Content-Type' => 'text/plain' ]]);
my $counter=0;
while (1);
$writer->write(++$counter."\n");
sleep 1; # or non-blocking sleep such as Coro::AnyEvent::sleep
}
$writer->close;
};
};
I don't think you're supposed to have infinite loops inside apps, I think you're supposed to only setup a recurring timer, and in that timer notify/message/write... See Plack::App::WebSocket - WebSocket server as a PSGI application and Re^4: real-time output from Mojolicious WebSockets?
While I have not yet tried it I came across this question whilst searching for a solution to a problem faced when using a socket server to report on progress etc of long-running jobs. Initially I was thinking of an approach along the lines of ParallelUserAgent except as a server and not a client.
Returning to the problem a few days later after realising that Net::WebSocket::Server blocked new connection requests if a long-running block of code within the new connection handler callback.
My next approach will be split out the long running functionality to a new spawned shell process and use a DB to track the progress which can then be accessed as required within the server without lengthy blocking.
Thought I'd throw up my approach in case it helps anyone walking a similar path.
Related
I have a question about how I should be using IO::Socket; I have a script that should run constantly, monitoring an Asterisk server for certain events. When these events happen, the script sends data from the event off to another server via a TCP socket. I've found that occasionally, the socket will close. My question is whether I should be using a single socket, and keep it open forever (and figure out why + prevent it from closing), or should I open and close a new socket for each bit of data sent out?
My experience with this sort of thing is very minimal, and I've read all the documentation without finding the answer I'm looking for. Below is a sample of what I've got so far:
#!/usr/bin/perl
use Asterisk::AMI;
use IO::Socket;
use strict;
use warnings;
my $sock = new IO::Socket::INET (
PeerAddr => '127.0.0.1',
PeerPort => '1234',
Proto => 'tcp',
);
sub newchannel {
my ($ami, $event) = #_;
if ($event->{'Context'} eq "from-trunk") {
my $unique_id = $event->{'Uniqueid'};
my $this_call = $call{$unique_id};
$this_call->{caller_name} = $event->{'CallerIDName'};
$this_call->{caller_number} = $event->{'CallerIDNum'};
$this_call->{dnis} = $event->{'Exten'};
$call{$unique_id} = $this_call;
};
}
sub ringcheck {
my ($ami, $event) = #_;
if ($event->{SubEvent} eq 'Begin') {
my $unique_id = $event->{UniqueID};
if (exists $call{$unique_id}) {
my $this_call = $call{$unique_id};
$this_call->{system_extension} = $event->{Dialstring};
$this_call->{dest_uniqueid} = $event->{DestUniqueID};
printf $sock "R|%s|%s|%s||%s\n",
$this_call->{caller_name},
$this_call->{caller_number},
$this_call->{system_extension},
$this_call->{dnis};
$this_call->{status} = "ringing";
}
}
There's a bit more to it than that, but this shows where I feel I should be starting/stopping a new socket (within the ringcheck sub).
Let me know if you need me to clarify or add anything.
Thanks!
Whether it is better establish a new connection for each message or to keep the connection open depends on a few factors:
Is the overhead associated with establishing connections significant? This depends on factors such as the frequency with which messages need to be sent, and the quality of the network connection.
If the remote end is 'localhost', as in your sample script above, then this is not likely to be an issue, and in fact in that case I would recommend using a Unix domain socket instead anyway.
Is the remote end sending anything back? Much harder to manage sporadic connections if either side may have asynchronous messages to send. Does not sound like it is the case for you though.
Are there any significant resources which you would be holding up by keeping the connection open?
Note that I don't consider random connection dropouts are a good reason to argue for making a new connection each time. If possible, better to diagnose that problem in any case. Otherwise, you might get unreliable performance no matter what approach you take.
In my experience a very common reason for seemingly random dropouts in long held TCP connections is intermediate tracking firewalls. Such firewalls will drop a connection if they don't see any activity on it for a period of time, to conserve their own resources. One way to combat this, which I use in some of my tools, is to set the socket option SO_KEEPALIVE on the socket, like this:
use Socket;
...
setsockopt($sock, SOL_SOCKET, SO_KEEPALIVE, 1);
This has a couple of benefits - it results in the Kernel sending keepalive messages on your connection at regular intervals, even if all is quiet, which by itself is enough to keep some firewalls happy. Also, if your connection does drop, your program can find out straight away instead of next time you want to write to it (although you may not notice it unless you are regularly checking for errors on your sockets).
Perhaps your best approach might be to set the SO_KEEPALIVE, and keep your socket open, but also check for errors whenever you try to write to it, and if you have an error, close and re-open the connection.
This question may also be of use to you.
Here's my situation: I'm developing a web application using Dancer framework, and I would like to insert some data to the database on the server side from the browser side. The problem is, when the data is too large, the uploading takes so long that I'm considering displaying a progress bar describing the progress.
I implemented this by sending two requests: one for posting data, and the other polling the status. But it seems once the first requests is being handled, the other won't work until the first finishes. So the status returns nothing and suddenly 100%. To manage this, I create a thread when handling the first requests, so the main thread could return to handle the second polling requests. This works quite well until I have to kill some child progress spawned in the child thread (this is another question).
So my question is, is there any other ideas about dealing with the multiple requests simultaneously except for the multithread one? Normally how does the web programmers handle this situation?
You should have no problem handling multiple requests simultaneously.
How do you run your app? If you use built-in server (perl your_app.pl) then by default it is single threaded and will only process one request at a time.
You might want to use mutliprocess/multithread deployment options, for example Starman. It is described in https://metacpan.org/module/YANICK/Dancer-1.3113/lib/Dancer/Deployment.pod#Running-on-Perl-webservers-with-plackup
I'd start by gluing Dancer to AnyEvent and using Twiggy to host the app. A google search turns up this, which looks like a good starting point.
http://blogs.perl.org/users/mstplbg/2010/12/using-anyevent-and-dancer.html
http://blogs.perl.org/users/mstplbg/2010/12/anyevent-and-dancer-condvars.html
http://blogs.perl.org/users/mstplbg/2011/03/long-running-requests-with-progress-bar-in-dancer-anyevent.html
You can use Dancer with plackup and Starman, here is a example:
foo.psgi:
#!/usr/bin/perl
use strict;
use warnings;
use Dancer2;
$| = 1;
get '/foo' => sub {
`sleep 5`;
'ok';
};
to_app;
Run the program with plackup:
$ plackup -s Starman foo.pl
Resolved [*]:5000 to [0.0.0.0]:5000, IPv4
Binding to TCP port 5000 on host 0.0.0.0 with IPv4
Setting gid to "0 0 0"
Starman: Accepting connections at http://*:5000/
Then run the following for loop:
for i in $(seq 1 3)
> do
> time curl http://localhost:5000/foo &
> done
Output:
ok
real 0m5.077s
user 0m0.004s
sys 0m0.010s
ok
real 0m5.079s
user 0m0.001s
sys 0m0.012s
ok
real 0m5.097s
user 0m0.009s
sys 0m0.004s
You can see Dancer2 can accept multiple request now.
In my Catalyst app I have a very important connection to a remote server using SOAP with WSDL.
Everything works fine, but when the remote server goes down due to any reason, ALL my app waits until the timeout expires. EVERYTHING. ALL the controllers and processes, ALL the clients!!
If I set a 15 secs timeout for the SOAP LITE transport error, everything waits for 15 secs.
Any page from any user or connection can't be displayed during the timeout wait.
I use Fast CGI and Ngnix for the Catalyst app. If I use multiple fcgi processes when one waits, others take care of the connections, but if all of them try to access the faulty SOAP service... they all wait and wait for an answer until they reach their timeouts. When all of them are waiting, no more connections are allowed.
Looking for answers I have read somewhere that SOAP::LITE is "single threaded".
Is it true? Does it means that ALL my app, with ALL the visitors can only use one SOAP connection? It is hard to believe.
This is my code for the call:
sub check_result {
my ($self, $code, $IP, $PORT) = #_;
my $soap = SOAP::Lite->new( proxy => "http://$IP:$PORT/REMOTE_SOAP
+");
$soap->autotype(0);
$soap->default_ns('http://REMOTENAMESPACE/namespace/default');
$soap->transport->timeout(15);
$soap-> on_fault(sub { my($soap, $res) = #_;
eval { die ref $res ? $res->faultstring : $soap->transport->st
+atus };
return ref $res ? $res : new SOAP::SOM;
});
my $som = $soap->call("remote_function",
SOAP::Data->name( 'Entry1' )->value( $code ),
);
return $som->paramsout;
}
I also tried this slightly different approach kindly suggested at perlmonks, but nothing got better
Please, can someone point me in the rigth direction?
Migue
This is not a problem with SOAP::Lite or Catalyst per se. Pretty much any resource you query will most likely wait for the return (i.e.: file read on disk, database access). If the resource blocks for a long time, there's a chance that you could "starve" other requests while waiting for this return.
There's not an easy answer to this problem, but you could create a "job queue" that a separate process executes, then instead of calling the other service you would add the entry to the queue and get a token. When the request is finished, the queue stores the result associated with that token, then your app, in a separate request checks if the token you want already has a result or not.
There are specialized "job queue" frameworks, such as RabbitMQ, ApacheMQ and even some solutions on top of Redis. If your web application uses rich Javascript, you could even have the "job queue" notification reach the javascript client using, for instance, WebSockets, but otherwise, just poll every second to see if there is a response or not.
I'm using $r->pool->cleanup_register(\&cleanup); to run a subroutine after a page has been processed and printed to the client. My hope was that the client would see the complete page, and Apache could continue doing some processing in the background that takes a few seconds.
But the client browser hangs until cleanup sub has returned. Is there a way to get apache to finalise the connection with the client before all my code has returned?
I'm convinced I've done this before, but I can't find it again.
Use a job queue system and do the long operation completely asynchronously -- just schedule the operation during the web request. A job queue also handles peak load situations better than doing something expensive within the web server processes themselves.
You want to flush the buffer. It doesn't finalize the connection, but your client will see the output before the task completes.
sub handler {
my $r = shift;
$r->content_type('text/html');
$r->rflush; # send the headers out
$r->print(long_operation());
return Apache2::Const::OK;
}
I'm writing an internal service that needs to touch a mod_perl2 instance for a long-running-process. The job is fired from a HTTP POST, and them mod_perl handler picks it up and does the work. It could take a long time, and is ready to be handled asynchronously, so I was hoping I could terminate the HTTP connection while it is running.
PHP has a function ignore_user_abort(), that when combined with the right headers, can close the HTTP connection early, while leaving the process running (this technique is mentioned here on SO a few times).
Does Perl have an equivalent? I haven't been able to find one yet.
Ok, I figured it out.
Mod_perl has the 'opposite' problem of PHP here. By default, mod_perl processes are left open, even if the connection is aborted, where PHP by default closes the process.
The Practical mod_perl book says how to deal with aborted connections.
(BTW, for the purposes of this specific problem, a job queue was lower on the list than a 'disconnecting' http process)
#setup headers
$r->content_type('text/html');
$s = some_sub_returns_string();
$r->connection->keepalive(Apache2::Const::CONN_CLOSE);
$r->headers_out()->{'Content-Length'} = length($s);
$r->print($s);
$r->rflush();
#
# !!! at this point, the connection will close to the client
#
#do long running stuff
do_long_running_sub();
You may want to look at using a job queue for this. Here is one provided by Zend that will let you start background processing jobs. There should be a number of these to choose from for php and perl.
Here's another thread that talks about this problem and an article on some php options. I'm not perl monk, so I'll leave suggestions on those tools to others.