Catalyst comet/long polling - perl
I've been battling this question for a while, as surely many Catalyst programmers did as well. Now we see some examples from John about nonblocking applications running with Twiggy.
But I think Twiggy is not the best option to run my whole application. So clearly I want to decouple it and run my app on nginx for example and forward my comet traffic to the Twiggy.
The main problem I see is the authentication. There are several possible options here, that I see:
move authentication to a front-end server
move authentication to a separate catalyst app
use session ids when communicationg with a Twiggy
?? Using Plack sessions ??
First option is not really good, because it does not give me flexibility when changing a front-end server. The second one has also considerable expenses. And the third one I think is the easiest one, taking into account that the Catalyst::Plugin::Session::Store::DBI is used as a session backend.
These are the options that came to my mind. Surely I miss something. So maybe someone encountered the same problems? I would be grateful to anyone who can give me a hint or expand my view on this problematic. It will be also helpful to see pros and contras about each option, as well as some hints about them.
Catalyst and long-polling (comet) applications
Overview
When in the middle of 2013 I decided to incorporate support of reverse AJAX ( further Comet ) functionality into my Catalyst application I found surprisingly little information about it on the Web. So I started to collect the information piece by piece, which forced me to plunge down deeper into the understanding of the Catalyst framework. Then there appeared some good examples of non-blocking code from John Napiorkowski (current maintaner of Catalyst), which have clarified a lot about this topic. So I wrote a simple server, running on Twiggy and providing long-living websockets connections for clients. This solution does not pretend to be the best or even a good one. It is just a working solution. The code has not been refactored and is provided as is. It can be used as a basis for building more robust and reliable comet applications. It can be also improved in many ways. So if you see some mistakes or suggestions for improvement, please let me know.
Introduction
In this section I would like to outline the background for this code.
I have a Catalyst application implementing social networking functionality. It uses mostly AJAX-requests to get data from server. Every minute it makes an AJAX-request to get data updates for a logged-in user. If session has expired, user will be logged out and redirected to the log-in page. For a logged in user there is a part of my application, where a user needs to get periodic updates of data. It is not critical to use comet on this page, I could easily use AJAX (an easier option, but related with network latency and bandwidth and with unnecessary requests sent), but I decided to experiment a bit.
If you run your Catalyst app on a preforking server, you will have a number of servers which serve your clients. If you want to have a long-living connection in your Catalyst app, it means that you will block one instance of your app while keeping this connection open. If you are going to have only a few clients and a lot of hardware resources you may be able to prefork your app. However, if you want to have hundreds, let alone thousands of concurrent connections, this solution may not be suitable for you, because you run out of resorces very fast. This means that either your Catalyst application must run on a nonblocking server, or your client (browser) should communicate with another server, which does not consume much hardware resources and can allocate an instance of itself for each client, while keeping the connection open. Or the client can be connected to a server which runs in an event loop and responds to new data for user in a asynchronous way.
The only nonblocking server for my Catalyst app I was able to find was Twiggy. This server is based on AnyEvent. AnyEvent is a framework for event-driven programming model in Perl and it makes it possible for Twiggy server to serve clients in a nonblocking asynchronous way. It is good for example for requests where it takes some time to get data for user. The server does not block and wait for data to be available for user, but instead it continues to listen to new incoming requests and as soon as data for some user is ready it will be sent to appropriate user.It is probably not the best idea to run your whole Catalyst app on Twiggy. One may want to use some robust, well-tested server (nginx or Apache or whatever) and run your Catalyst app behind those front-end server as FastCGI processes, for instance (the option I chose). So I decided to run Twiggy instance and direct Comet traffic to it (I tried to use Twiggy behind nginx proxy for websockets connection but it didn't work somehow, so I dropped it without further investigation). The first problem was authentication. Because it is done in the Catalyst app and not on the front-end server, it means that my Comet app must somehow know if user is authenticated.
Authentication
Catalyst's module for authentication is Catalyst::Plugin::Authentication. It takes care of authentication and authorization of a user in your app. You most certainly use session module Catalyst::Plugin::Session together with it. This allows to preserve application data across HTTP-requests (including that the user is authenticated). This session module is split into two parts: state and store. The first one allows to choose HOW you preserve app's state across different HTTP-requests (most probably with cookies). The second one allows you to choose WHERE you want to store user's data for his session.So I use Session::State::Cookie for the state and Session::Store::FastMmap for the store. This means that when user is being authenticated, he gets a session id, a secret string, which he will send in a HTTP-header in every HTTP-request to a server. This session id will be valid for some time and as long as it is valid, it is uniquely assigned to some user. Then on every incoming request user's data will be restored from a mmap'ed file through Session::Store::FastMmap. This file acts as a shared memory interprocess cache. This solution (FastMap) is good if your whole app runs on a single server, but if you do load-balancing, for example, you may want to use another solution ( like Catalyst::Plugin::Session::Store::DBI ).So I decided to hack on this session data. In my Comet app I can access this session data and check if user is authenticated. This is done in the following sub.
sub _check_session {
my ($sid, $this_user_id) = #_;
my $return = 0;
my $user_session = $session->get("session:$sid");
if ( $user_session ) {
## Check user realm existence
return $return unless ( $user_session->{__user_realm} );
## Check user presence
return $return unless ( $user_session->{__user} );
## Check session expiration time
my $session_expires_time = $session->get("expires:$sid");
my $now = time();
if ( $now > $session_expires_time ) {
return $return;
}
## Check if it is still the same user
if ( $this_user_id && ($this_user_id ne $user_session->{__user}->{id} ) ) {
return $return;
}
else {
$return = $user_session->{__user};
}
}
return $return;
}
Looking through Catalyst::Plugin::Session and Catalyst::Plugin::Authentication I concluded that it is necessary to check at least the following keys in the session data:
__user_realm: if user is authenticated in at least one realm, this key is present in the session hash
__user: if user is authenticated, this key represents user data ( which comes from the ::Store part of the Authentication module, most probably from DBIx)
"expires:$sid" represents a timestamp when the session expires
$session is an object allowing access to our mmap'ed file:
my $session = Cache::FastMmap->new( raw_values => 0, share_file => ('/tmp/myapp/session_data') );
We are interested in two pieces of data which can be looked up in the session file: "session:$sid" is a key for the session data and "expires:$sid" is a timestamp of session expiration.So now, when a browser tries to establish a websocket connection with our Comet app, we have to call this sub. If a user is authenticated, a websocket connection will be established with the server. While my application automatically closes websocket connection when user logs out or navigates away from the Comet app in his browser, I nevertheless decide to check session id every $interval seconds. So if a malicious user opens a websocket connection on his own, he will get no use of this. For the case when user A logs out and user B logs in using the same session id as user A and all this happens before the next session check, the session will be still active but will relate to another user. In this case we have to check if the session corresponds to the user who initially established the websocket connection:
if ( $this_user_id && ($this_user_id ne $user_session->{__user}->{id} ) ) {
return $return;
}
else {
$return = $user_session->{__user};
}
PSGI, Plack::Builder, Plack::Request
It was a natural choice to implement my Comet app as a PSGI application. I assume you are familiar with this specification.Say you want your app to map different URLs to different applications, for example when you have several areas in your website each requiring it's own comet logic. You can achieve this by using Plack::Builder:
use Plack::Builder;
...
## 1st app entrance point
my $psgi_app = sub {
my $env = shift;
...
}
...
builder {
## mount 1st app
mount "/comet/first_app" => $psgi_app;
}
Now you can mount as many applications as you want each corresponding to a different path (URL).As you know first argument to a PSGI app is $env, which is an environment variable, a hash containing different keys pertaining to a HTTP-request and keys which have to do with the PSGI specification. Using it we can create a Plack request object, which allows us to access different request data and cookies. One of the cookies will contain a session id, which is a starting point for authentication check.
## Request object
my $req = Plack::Request->new($env);
## session id
my $sid = $req->cookies->{myapp_session};
## HTTP origin header
my $req_base = $env->{HTTP_ORIGIN};
Delayed and streaming response
As you know a PSGI app should return a tuple (HTTP-status, HTTP-header and HTTP Body Data). But to enable a server push, an app should return a callback as its response. This callback will then be executed by the underlying server. You can now utilize an event loop in your app to stream data to client.To be able to implement a websocket server one has to use a PSGI extension psgix.io which gives access to a raw internet socket, so that one has a full access over streaming data. Because in the websocket specification one has to do an upgrade from HTTP to the ws protocol during an initial handshake connection, the low-level access to the socket is required.
my $psgi_app = sub {
my $env = shift;
my $fh = $env->{'psgix.io'} or return [500, [], []];
## Create websocket handshake
my $hs = Protocol::WebSocket::Handshake::Server->new_from_psgi($env);
$hs->parse($fh) or return [400, [], [$hs->error]];
return sub {
my $responder = shift;
...
}
}
So we create an object which takes care of data format of messages for the websocket protocol, which are exchanged between client and server. This object is initialized with our raw internet socket so that it can fulfil the HTTP upgrade. And afterwards we return a callback which will be our delayed response.
The comet app, server-side
So here is the whole comet psgi app:
use Plack::Builder;
use Plack::Request;
use AnyEvent;
use Protocol::WebSocket::Handshake::Server;
use Cache::FastMmap;
use JSON;
use Template;
use Log::Dispatch;
use Data::Dumper;
use DateTime;
use FindBin qw($Bin);
use lib "$Bin/../lib";
use myapp::Schema;
use warnings;
use strict;
## Session data
my $interval = 3;
my $session = Cache::FastMmap->new( raw_values => 0, share_file => ('/tmp/myapp/session_data') );
## Database connection, for example with a Postgres DB
my $db_schema = myapp::Schema->connect( {
dsn => 'dbi:Pg:dbname=myapp_test',
user => 'my_login',
password => 'my_passwd',
pg_enable_utf8 => 1
} );
## Logging object
my $log = Log::Dispatch->new( outputs => [ [ 'File', min_level => 'debug', filename => '/var/log/myapp_test/comet' ] ] );
## Adjust this sub correspondingly if Session::Store has been changed.
sub _check_session {
my ($sid, $this_user_id) = #_;
my $return = 0;
my $user_session = $session->get("session:$sid");
## Check if the sid and the user email match
if ( $user_session ) {
## Check user realm existence
return $return unless ( $user_session->{__user_realm} );
## Check user presence
return $return unless ( $user_session->{__user} );
## Check session expiration time
my $session_expires_time = $session->get("expires:$sid");
my $now = time();
if ( $now > $session_expires_time ) {
return $return;
}
## Check if it is still the same user
if ( $this_user_id && ($this_user_id ne $user_session->{__user}->{id} ) ) {
return $return;
}
else {
$return = $user_session->{__user};
}
}
return $return;
}
## 1st app entrance point
my $psgi_app = sub {
my $env = shift;
my $fh = $env->{'psgix.io'} or return [500, [], []];
## Create websocket handshake
my $hs = Protocol::WebSocket::Handshake::Server->new_from_psgi($env);
$hs->parse($fh) or return [400, [], [$hs->error]];
return sub {
my $responder = shift;
## App data
my ($w, $hd, $input_params, $req, $sid, $user_id, $ret, $time_lapsed, $req_base);
## Clean up the websocket local environment
my $clean_up = sub {
$log->debug("\nCleaning up...\n");
## Destroy websocket
$hd->destroy;
## Remove timer from event loop
undef $w;
};
$hd = AnyEvent::Handle->new(
fh => $fh,
on_error => sub {
my ($hd, $fatal, $msg) = #_;
$clean_up->();
}
);
## Send server websocket handshake
$hd->push_write($hs->to_string);
## Websockets connection is initialized and is ready for data to be sent
#$hd->push_write( $hs->build_frame( buffer => encode_json( { 'status' => "Connection init..." } ) )->to_bytes );
## Get request data
$req = Plack::Request->new($env);
$sid = $req->cookies->{myapp_session};
$req_base = $env->{HTTP_ORIGIN};
## Check if user is authenticated
unless ( $ret = _check_session($sid, undef) ) {
$clean_up->();
}
else {
$user_id = $ret->{id};
}
$time_lapsed = 0;
## Template toolkit
my $template = Template->new({
INCLUDE_PATH => "$Bin/../root/templates",
VARIABLES => {
req_base => $req_base,
user_id => $user_id,
user_lang => $ret->{language}
},
ENCODING => 'utf8',
});
## Input parameters and recieve user's data.
$hd->on_read(sub {
(my $frame = $hs->build_frame)->append($_[0]->rbuf);
while (my $message = $frame->next) {
my $decoded_data = eval { decode_json $message };
## If it's not a valid json - exit
if ($#) {
$clean_up->();
}
else {
## New connection
if ( $decoded_data->{is_new} ) {
$input_params = $decoded_data;
$stash = {
template_data => "some data"
};
my $tt_output;
$template->process( "template_path", $stash, \$tt_output );
$hd->push_write( $hs->build_frame( buffer => encode_json( { 'init_data' => $tt_output } ), max_payload_size => 200000 )->to_bytes );
}
## Else - additional data are sent from the client
else {
}
}
}
});
## THIS APP'S MAIN LOGIC
## As an example, let's track if user has changed his/her name and return a message to the browser
my $app_logic = sub {
my $this_params = shift;
if ( $user_id ) {
my $rs = $db_schema->resultset('User')->search( { id => $user_id } )->single;
if ( $rs->first_name ne $ret->{first_name}) {
$hd->push_write( $hs->build_frame( buffer => encode_json( { 'data' => "User changed his name!" } ) )->to_bytes );
}
}
};
## Any event logic
$w = AnyEvent->timer (
interval => $interval,
after => $interval,
cb => sub {
## Check every half a minute if the user is still authenticated
if ( $time_lapsed > 30 ) {
$time_lapsed = 0;
unless ( $ret = _check_session($sid, $user_id) ) {
$clean_up->();
}
else {
## Check if user' object has been changed (e.g. his language etc.)
}
}
## Execute main logic
$app_logic->($input_params);
$time_lapsed += $interval;
}
);
};
};
builder {
## mount 1st app
mount "/comet/myapp" => $psgi_app;
}
So we start the program with initializing some common objects like database handle and session object.When a websocket connection is terminated we don't want to respond to events pertaining to it, so we remove them from the event loop. That is what is done in the sub reference $clean_up. Then we define an AnyEvent::Handle object and listen to it's on_read() callback which is fired up every time new data arrives from the client.Because I want to be able to use the same template for generating HTML both for my Catalyst app and for my comet app, I create a Template object and initialize it with variables which must be the same in the Catalyst counterpart. First time the on_read() callback is called is when a client opens a websocket connection. In the javascript part we define a special key for this and send the client initial data on new request (in my case it ís the data for which later I get comet updates).Additionally, we create an AnyEvent timer object which will periodically execute our main logic app $app_logic. It will also check if the user is still authenticated and is granted to get the data update from the server.Don't forget, if you change some user data in your database through a Catalyst controller and this change must be reflected in the session hash, you have to persist it by calling
$c->persist_user();
The comet app, client-side
I use module pattern for javascript modules to create a separate namespace for every javascript module. Here is one to handle communication with the comet server.
var myapp = (function() {
// Context data, private
var data_loaded = false;
var this_page = true;
return {
init: function() {
data_loaded = false;
this_page = true;
// No websockets in safari, somehow they don't work there
if ( navigator.userAgent.indexOf('Safari') != -1 && navigator.userAgent.indexOf('Chrome') == -1 ) {
myapp.myapp_load_ajax();
}
else {
// Create a websocket
websockets["myapp_socket"] = new WebSocket('ws://my-domain-name:5000/comet/myapp');
var input_hash = {};
input_hash["is_new"] = 1;
websockets["myapp_socket"].addEventListener("open", function(e) {
websockets["myapp_socket"].send(JSON.stringify(input_hash));
data_loaded = true;
});
websockets["myapp_socket"].addEventListener("message", function(e) {
var this_obj = JSON.parse(e.data);
// Connection is initialized
if ( this_obj.init_data ) {
// Make necessary initializations
myapp.init_after_loading();
}
// Websockets data update from server
else if ( this_obj.data ) {
// Do something meaningful on data update
}
});
websockets["myapp_socket"].addEventListener("close", function(e) {
//Connection has been closed
});
// In case when a websocket cannot be created, fall back to an AJAX request
websockets["myapp_socket"].addEventListener("error", function(e) {
// Unless the data have already been loaded, load it here for the first time, because
// this method will be also invoked when connection is dropped.
if ( !data_loaded && this_page ) {
myapp.myapp_load_ajax();
}
});
}
},
myapp_load_ajax: function() {
jQuery.ajax({
type: "POST",
url: "my_catalyst_app_load_ajax_data_path",
dataType: "json",
cache: false,
complete: function (xhr, textStatus) {
if ( xhr.responseText ) {
var this_data = jQuery.parseJSON(xhr.responseText);
if ( !this_data ) {
// An error happened
}
else {
// Make necessary initializations after your data has been inserted into the DOM
myapp.init_after_loading();
}
}
}
});
},
init_after_loading: function() {
// If you insert some data into DOM, initialize it here
},
close_socket: function() {
if ( websockets["myapp_socket"] ) {
websockets["myapp_socket"].close();
delete websockets["myapp_socket"];
}
},
};
})();
You can read about module pattern in details elsewhere on Internet. In short, it makes an object out of your javascript module and your methods will be accessible as this object's properties. This allows you to create a separate namespace for your module and to define private variables.In order to be able to access my websocket from another module, which may be necessary, I declare an object which holds it as a global one.What we have to do is to define event handlers for our websocket which include "open", "close" etc. If for some reason we cannot establish a websocket connection to our comet server (server is down, it does not accept new connections etc.), we fall back to AJAX. Additionally, if safari tries to create a websocket-connection with a dead server, it doesn't handle this case in an "error" event, so we just prohibit websockets in safari.So, we start by creating a new websocket connection for the URL that we have mounted in the plack builder in the comet server. Then we use websocket's event "open" to take care of new connection, signifying the server about new client connection. Event "message" is used to send messages to the comet server; "close" is called whenever a connection with the server is closed; "error" is called in case of problems with connection, for instance it cannot be established or it has been broken or the server has closed the connection or died. And that's it. Now we will get data updates from our comet server.
Starting comet server
Now what is left is to start our server. The current server's code assumes that it will run on the same machine as your Catalyst app. Some other possibilities will be discussed in the final notes section.We use command line utility plackup to start our server:
TWIGGY_DEBUG=1 plackup -s Twiggy comet.psgi
I use TWIGGY_DEBUG env var to see debug info from the Twiggy server.
Final notes
First thing to remember is that Twiggy server will exit as soon as a die statement will be executed. It means that you have to program it safely and to intercept every statement that can lead to this with an eval block.The premise for current comet server is that it runs on the same machine as your Catalyst app. If you are planning to do load balancing and to run it on another machine you have to take care of some things. First, you have to change your session plugin to Session::Store::DBI or something suited for distributing across several machines (and afterwards to adjust _check_session() to get data not from a file but from the database). Then change the dsn for the database connection to include a hostname and a port number.Another thing to note is that our main logic in the server checks every N seconds if user's name has been changed. So, it is inefficient to query a database so often if your server has many clients. Instead, there are some better solutions for this. First option is that if you run a Catalyt app and a comet server on a single machine, you can use the Cache::FastMmap file as a mediator between your Catalyst app and your comet server for getting notifications that some new data is available for the server and only then querying database to get the data update. In this case you make database queries only to get the new data. It means that in your Catalyst controller you have to write into the cache file to inform the comet server to check data every time when you make changes through you controller to the data that you get updates to in your comet server. For example you have a User controller and a User model. Whenever a user changes his name, User controller is called which in turn calls User model to change user'd data. So in the controller User you additionally write into the cache file that this is the case. Then the comet server will know when to get data updates from the database. The similar approach you can use if you do load balancing and your Catalyst and comet app run on different machines. But in this case you have to use your database as a mediator. You can, for example, create a new table which will be periodically queried by the comet server. Each column of the table could correspond to some application domain of your comet server. The column can be of type timestamp and to mark the time of last change of the data that you trace. In you Catalyst controller you write into corresponding column every time whenever the data in question has been changed and then you check this column in the comet server and you know then whether to query the database for the data updates or not. Thus you will avoid a lot of unnecessary data queires. But the better choice, however more complicated, would involve sockets. In this case when a user logs in we create a new socket for him and we write all data updates that we want to track directly into the socket. In the comet app instead of using an Anyevent timer, we define another AnyEvent::Handle which we initialize with the user's socket. And by using the on_read() method we get updates when they come and then return them immediately to the user. In this case we bypass data queries and it should work really fast. But this solution will require a lot of additional work in the Catalyst controller.Another thing to note is that current comet server does not support secure websockets (wss) protocol while Twiggy does not support TLS/SSL. The solution would be to use a SSL tunnel in front of your server which will transparently encrypt/decrypt messages (take a look at https://github.com/vti/app-tlsme).And the final note: I have tried to run a front-end proxy nginx in front of my comet server. But somehow nginx could not propagate messages to the Twiggy. So the client's browser communicates directly with the comet server. If you plan to have thousands and thousands of users then a websocket load balancing is a topic to think about.If you find any mistakes or have any improvements ideas please comment or write me an email (dhyana1981#yahoo.de).
I think the better option is 3 or 4.
Setup your nginx to server distinct location points for Catalyst-app and Twiggy-app.
And, if you are using Twiggy you may want speed, so instead of using DBI, I suggest to you save/restore/check sessions via a memory-based application, like Memcached or Redis, so you can scale up this later if you go to AWS or something like that.
You can do the job with Catalyst::Plugin::Session::Store::Memcached::Fast or Catalyst::Plugin::Session::Store::Cache and others, but if you known how to create a secure session token and how to keep/restore, you can do this by yourself, so you will known how to restore in Twiggy, Catalyst or anything else (even other languages)
Have a good day!
Related
Using Mojo::Pg::Pubsub under hypnotoad
I have a table with tokens in Pg database and to not to overload DB with permanent SELECTs I decided to cache tokens in RAM using few simple mojo helpers(for checking is token valid, deleting and adding tokens to such cache) in my app. I'm using Mojo::Pg::Pubsub and Pg notification system(I have a trigger that notifies about token insertion/deletion) to catch events of deletion/creation tokens in DB. All workers have scheduled sub in their ioloops to make SQL DELETE on tokens which invalidated in DB. With mechanism of Pg notifications I need to get a situation when all hypnotoad workers will have the same pool of tokens in memory because all of them notified about any change. But there is a problem that only 1 hypnotoad worker(random one from the pool, each time different one) catches this event. I understand that Mojo::Pg object probably becomes duplicative while workers are forks. I also found that Mojo::Server::Prefork which included somewhere under the hood of mojo app has and emits event called 'spawn' that noted in docs like 'Emitted when a worker process is spawned.' The solution that I guess acceptable for me is to subscribe to this event and recreate Mojo::Pg object for each new forked worker but I can't find the way I can access server object to subscribe this event. How can I do it? Or maybe I just doing something wrong and there are other ways to solve abovementioned problem? Here is the code in my mojo app for working with DB: my $pg = connect_mojo($app->config, $app->mode); $app->helper(pg => sub { return $pg; }); $app->helper(db => sub { return $app->pg->db; }); And here is the code that responsible to catch notifications from Pg: # Postgres notifies about token deletion through it's notification system $app->pg->pubsub->listen(token_deleted => sub { my ($this, $token) = #_; $app->log->info("Notification for deleting token from Pg received: $token"); $app->token_mem->invalidate($token); }); # Postgres notifies about token creation through it's notification system $app->pg->pubsub->listen(token_created => sub { my ($this, $token) = #_; $app->log->info("Notification for adding token from Pg received: $token"); $app->token_mem->add($token); });
Sending an unbuffered response in Plack
I'm working in a section of a Perl module that creates a large CSV response. The server runs on Plack, on which I'm far from expert. Currently I'm using something like this to send the response: $res->content_type('text/csv'); my $body = ''; query_data ( parameters => \%query_parameters, callback => sub { my $row_object = shift; $body .= $row_object->to_csv; }, ); $res->body($body); return $res->finalize; However, that query_data function is not a fast one and retrieves a lot of records. In there, I'm just concatenating each row into $body and, after all rows are processed, sending the whole response. I don't like this for two obvious reasons: First, it takes a lot of RAM until $body is destroyed. Second, the user sees no response activity until that method has finished working and actually sends the response with $res->body($body). I tried to find an answer to this in the documentation without finding what I need. I also tried calling $res->body($row_object->to_csv) on my callback section, but seems like that ends up sending only the last call I made to $res->body, overriding all previous ones. Is there a way to send a Plack response that flushes the content on each row, so the user starts receiving content in real time as the data is gathered and without having to accumulate all data into a veriable first? Thanks in advance for any comments!
You can't use Plack::Response because that class is intended for representing a complete response, and you'll never have a complete response in memory at one time. What you're trying to do is called streaming, and PSGI supports it even if Plack::Response doesn't. Here's how you might go about implementing it (adapted from your sample code): my $env = shift; if (!$env->{'psgi.streaming'}) { # do something else... } # Immediately start the response and stream the content. return sub { my $responder = shift; my $writer = $responder->([200, ['Content-Type' => 'text/csv']]); query_data( parameters => \%query_parameters, callback => sub { my $row_object = shift; $writer->write($row_object->to_csv); # TODO: Need to call $writer->close() when there is no more data. }, ); }; Some interesting things about this code: Instead of returning a Plack::Response object, you can return a sub. This subroutine will be called some time later to get the actual response. PSGI supports this to allow for so-called "delayed" responses. The subroutine we return gets an argument that is a coderef (in this case, $responder) that should be called and passed the real response. If the real response does not include the "body" (i.e. what is normally the 3rd element of the arrayref), then $responder will return an object that we can write the body to. PSGI supports this to allow for streaming responses. The $writer object has two methods, write and close which both do exactly as their names suggest. Don't forget to call the close method to complete the response; the above code doesn't show this because how it should be called is dependent on how query_data and your other code works. Most servers support streaming like this. You can check $env->{'psgi.streaming'} to be sure that yours does.
Plack is middleware. Are you using a web application framework on top of it, like Mojolicious or Dancer2, or something like Apache or Starman server below it? That would affect how the buffering works. The link above shows an example by Plack's author: https://metacpan.org/source/MIYAGAWA/Plack-1.0037/eg/dot-psgi/echo-stream-sync.psgi Or you can do it easily by using Dancer2 on top of Plack and Starman or Apache: https://metacpan.org/pod/distribution/Dancer2/lib/Dancer2/Manual.pod#Delayed-responses-Async-Streaming Regards, Peter
Some reading material for you :) https://metacpan.org/pod/PSGI#Delayed-Response-and-Streaming-Body https://metacpan.org/pod/Plack::Middleware::BufferedStreaming https://metacpan.org/source/MIYAGAWA/Plack-1.0037/eg/dot-psgi/echo-stream.psgi https://metacpan.org/source/MIYAGAWA/Plack-1.0037/eg/dot-psgi/nonblock-hello.psgi So copy/paste/adapt and report back please
Logging of ip address, transaction id or session id in Mojo::Log
I make use of simple logging in mojolicious application. I want to extend logging by some information. This could be ip address or transaction id or session id. What I do before is writing for each log level one helper like this: $self->helper( 'info' => sub { my $self=shift; my $msg=shift; my $ip=$self->tx->remote_address; $self->app->log->info("[$ip] $msg"); }); ... $self->info("Login failed of user $user."); I would like to modify format of logging output so I can make use of generic log function which will add any additionally values I need and without lot of helpers for each log level. Basic call of: $self->app->log->info("Login failed of user $user."); should also give log entries like [Sun Jun 8 11:09:12 2014] [info] [127.0.0.1] Login failed of user Tim. I try do do it by change log format but anything I do is ignored. $self->app->log->format(sub { my ($time, $level, #lines) = #_; return "[$time] [$level] [$self->tx->remote_address] #lines.\n"; }); I know there is Log4Perl in combination with Mojolicious. But I want to keep it simple as possible.
I got this going pretty quick with using Mojolicious::Lite; To start off, shift a log to someplace you can find quickly. use Mojo::Log; my $log = Mojo::Log->new(path => '~/log/mojo.log'); Then try this, set the remote address variable outside of the sub first. # $r_ip = remote ip address my $r_ip = $self->tx->remote_address; $self->app->log->format(sub { my ($time, $level, #lines) = #_; return "[" . localtime(time) . "] [$level] [$r_ip] . join("\n", #lines) . "\n"; }); The format can been seen at: http://mojolicio.us/perldoc/Mojo/Log
There is a possible to log a unique id per request/response? thread on the Mojolicious mailing list, where Sebastian/sri - the author of Mojolicious - answers: Simple answer: You can't, Mojolicious is async by design and there may be thousands of concurrent requests active at any given time. Complex answer: If you limit your application (through the server) to handling only one request at a time and don't use any real-time web features (not that it would make much sense if you can't handle concurrent requests), you could subclass Mojo::Log to customize the message format and store a global unique id for every new request. The other (currently accepted) answer does exactly that: It removes all concurrency and uses a global variable. That'll start breaking down when you start using the real-time Mojolicious features.
Catalyst event loops only reaching a single client at a time
I'm working on a Catalyst/psgi application that would make great use of asychronous streaming, however beyond a simple timer (like here: http://www.catalystframework.org/calendar/2013/13), I'm a little stumped on how to implement more "global" events. By global events, I mean things like: a periodic timer that is the same for all clients the visit to a given page by a single client (but updates all clients) a file stat watcher that will update all clients when a file changes. Correct me if I'm wrong, but to me these all seem very different from the example linked above, which will give each client a different counter. I would like to have events that happen "across the board." An example of what I've tried (using #2 from my list above): has 'write_fh' => ( is => 'rw', predicate => 'has_write_fh' ); sub events : Path('/stream') Args(0) { my ( $self, $c ) = #_; $c->res->body(""); $c->res->content_type('text/event-stream'); $self->write_fh( $c->res->write_fh() ); } sub trigger : Path('/trigger') : Args(0) { my ( $self, $c ) = #_; $self->write_fh->write( *the event string* ); } When I run this, it actually gets further than I would expect - the event does get triggered, but unreliably. With two browsers open, sometimes the event is sent to one, and sometimes to the other. Now, I think I understand why this would never work - the client who hits /trigger, has no knowledge of all the other clients who are watching /stream, and so the write_fh I'm trying to use is not useful. But if each client's request is in its own contained bubble, how am I to access their stream from some other request? Or am I completely on the wrong track...?
Your problem with write_fh is that this event is singlecast - once it was received by anyone, it won't be received anymore. so one of the connections catch it, and the other simply don't. you need to broadcast your events. Take a look at AnyEvent::IRC to see how it can be done. (note that it was written for an old version of AnyEvent, but it should still work)
Safe design for SET SESSION AUTHORIZATION multi-user mod-perl2 connection caching
I have a mod_perl2.0.4 / Apache2.2 web app running on CentOS 6.4 with PostgreSQL 9.0. Until recently, I had this setup: Apache::DBI and DBI->connect_cached for all connections, which was starting to give FATAL: sorry, too many clients already even in my development area where I'm the only user. In an effort to debug this, I have removed all references to Apache::DBI, upgraded to the latest DBI, and replaced all occurrences of connect_cached with plain DBI->connect. It seems to me now that somewhat less connections are made and then left <IDLE>. However, I realize that I haven't been calling disconnect() on all of my statement handles, because it had sounded like under Apache::DBI it wouldn't have made a difference. My connections currently connect all as the same user, then lower their privileges based on which user it is via SET SESSION AUTHORIZATION. I do it this way because some other apps that use the database allow for a passworded login, which can pass the credentials directly to the database, but this particular web app uses an honour system login screen whereby you just click your name to log in. So it's future-security-ready but convenience-enabled at the moment. Also, database triggers for history and such rely on the session user being set correctly to track who did what. Because I was concerned about a database handle being reused with the wrong session user, I pass { private_user_login => $login_role_name, PrintError => 0, RaiseError => 1, AutoCommit => 1} to connect_cached to differentiate each connection by user. But since I always set the session authorization immediately after connecting, I suppose that all the private_user_login hash does is make it so that for a given Apache process, there might be at least as many DB connections created and left idle as there are users, if eventually every user manages to randomly use a given Apache process. Meanwhile, because I don't disconnect any handles, they eventually get used up. My question is, is it safe to take out the private_user_login to make all the connection handles look the same, to cut down on the number of connections left open, or is it possible that a connection handle could be re-used in the middle of a script (after setting the session user) by a different user, thus creating a race condition? Also, although Apache::DBI's docs say I needn't remove disconnect() calls, should I still have such a call at the end of every one of my scripts so that Apache::DBI can decide whether to disconnect? In other words, without my private connection variable, do SET SESSION AUTHORIZATION's effects persist when the next Apache::DBI->connect() reuses the existing connection? If so, is it ever possible that a connection is re-used by another request while one request is currently executing but not currently using the database handle?
I recommend a somewhat different tack, if you can. Keep it simple in Apache. Use private sessions per user, if that's what's easiest to make safe and reliable. Then put a PgBouncer between the PostgreSQL server and your Apache instance. Set it to transaction pooling mode. It'll happily multiplex your connections, and it'll take care of calling DISCARD ALL whenever a connection switches between users. I think you can still use SET SESSION AUTHORIZATION on connections made via PgBouncer.
It seems safe. To "verify" you can make an artificial race condition like this: use Apache2::RequestUtil; use Apache2::RequestRec; my $r = Apache2::RequestUtil->request; $r->headers_out->add('Cache-control' => "must-revalidate, no-cache, no-store"); require Apache2::Request; my $req = Apache2::Request->new($r); $r->content_type("text/html"); my $login_role_name = $req->param('u'); $r->print($u); $r->print('<br>' . $$); use DBI; my $dbh = DBI->connect_cached("dbi:Pg:dbname=......,{ RaiseError => 1, AutoCommit => 1}); $dbh->do("set session authorization ?; ", undef, $login_role_name); { use warnings NONFATAL => 'all'; my $rows = $dbh->selectall_arrayref('select pg_backend_pid(), current_user::text'); warn "pg ${$$rows[0]}[0] mp $$ auth: ${$$rows[0]}[1] original auth: $login_role_name"; sleep 10; $rows = $dbh->selectall_arrayref('select pg_backend_pid(), current_user::text'); warn "pg ${$$rows[0]}[0] mp $$ auth: ${$$rows[0]}[1] original auth: $login_role_name"; } ...and then hit it with two different '?u=...' URLs. The auth will always match the original auth, because the dbh is not available to be given away while it's still in a script that's executing.