I'm working in a section of a Perl module that creates a large CSV response. The server runs on Plack, on which I'm far from expert.
Currently I'm using something like this to send the response:
$res->content_type('text/csv');
my $body = '';
query_data (
parameters => \%query_parameters,
callback => sub {
my $row_object = shift;
$body .= $row_object->to_csv;
},
);
$res->body($body);
return $res->finalize;
However, that query_data function is not a fast one and retrieves a lot of records. In there, I'm just concatenating each row into $body and, after all rows are processed, sending the whole response.
I don't like this for two obvious reasons: First, it takes a lot of RAM until $body is destroyed. Second, the user sees no response activity until that method has finished working and actually sends the response with $res->body($body).
I tried to find an answer to this in the documentation without finding what I need.
I also tried calling $res->body($row_object->to_csv) on my callback section, but seems like that ends up sending only the last call I made to $res->body, overriding all previous ones.
Is there a way to send a Plack response that flushes the content on each row, so the user starts receiving content in real time as the data is gathered and without having to accumulate all data into a veriable first?
Thanks in advance for any comments!
You can't use Plack::Response because that class is intended for representing a complete response, and you'll never have a complete response in memory at one time. What you're trying to do is called streaming, and PSGI supports it even if Plack::Response doesn't.
Here's how you might go about implementing it (adapted from your sample code):
my $env = shift;
if (!$env->{'psgi.streaming'}) {
# do something else...
}
# Immediately start the response and stream the content.
return sub {
my $responder = shift;
my $writer = $responder->([200, ['Content-Type' => 'text/csv']]);
query_data(
parameters => \%query_parameters,
callback => sub {
my $row_object = shift;
$writer->write($row_object->to_csv);
# TODO: Need to call $writer->close() when there is no more data.
},
);
};
Some interesting things about this code:
Instead of returning a Plack::Response object, you can return a sub. This subroutine will be called some time later to get the actual response. PSGI supports this to allow for so-called "delayed" responses.
The subroutine we return gets an argument that is a coderef (in this case, $responder) that should be called and passed the real response. If the real response does not include the "body" (i.e. what is normally the 3rd element of the arrayref), then $responder will return an object that we can write the body to. PSGI supports this to allow for streaming responses.
The $writer object has two methods, write and close which both do exactly as their names suggest. Don't forget to call the close method to complete the response; the above code doesn't show this because how it should be called is dependent on how query_data and your other code works.
Most servers support streaming like this. You can check $env->{'psgi.streaming'} to be sure that yours does.
Plack is middleware. Are you using a web application framework on top of it, like Mojolicious or Dancer2, or something like Apache or Starman server below it? That would affect how the buffering works.
The link above shows an example by Plack's author:
https://metacpan.org/source/MIYAGAWA/Plack-1.0037/eg/dot-psgi/echo-stream-sync.psgi
Or you can do it easily by using Dancer2 on top of Plack and Starman or Apache:
https://metacpan.org/pod/distribution/Dancer2/lib/Dancer2/Manual.pod#Delayed-responses-Async-Streaming
Regards, Peter
Some reading material for you :)
https://metacpan.org/pod/PSGI#Delayed-Response-and-Streaming-Body
https://metacpan.org/pod/Plack::Middleware::BufferedStreaming
https://metacpan.org/source/MIYAGAWA/Plack-1.0037/eg/dot-psgi/echo-stream.psgi
https://metacpan.org/source/MIYAGAWA/Plack-1.0037/eg/dot-psgi/nonblock-hello.psgi
So copy/paste/adapt and report back please
Related
My middleware need is to:
add an extra query param to requests made by a REST API client derived from GuzzleHttp\Command\Guzzle\GuzzleClient
I cannot do this directly when invoking APIs through the client because GuzzleClient uses an API specification and it only passes on "legal" query parameters. Therefore I must install a middleware to intercept HTTP requests after the API client prepares them.
The track I am currently on:
$apiClient->getHandlerStack()-push($myMiddleware)
The problem:
I cannot figure out the RIGHT way to assemble the functional Russian doll that $myMiddleware must be. This is an insane gazilliardth-order function scenario, and the exact right way the function should be written seems to be different from the extensively documented way of doing things when working with GuzzleHttp\Client directly. No matter what I try, I end up having wrong things passed to some layer of the matryoshka, causing an argument type error, or I end up returning something wrong from a layer, causing a type error in Guzzle code.
I made a carefully weighted decision to give up trying to understand. Please just give me a boilerplate solution for GuzzleHttp\Command\Guzzle\GuzzleClient, as opposed to GuzzleHttp\Client.
The HandlerStack that is used to handle middleware in GuzzleHttp\Command\Guzzle\GuzzleClient can either transform/validate a command before it is serialized or handle the result after it comes back. If you want to modify the command after it has been turned into a request, but before it is actually sent, then you'd use the same method of Middleware as if you weren't using GuzzleClient - create and attach middleware to the GuzzleHttp\Client instance that is passed as the first argument to GuzzleClient.
use GuzzleHttp\Client;
use GuzzleHttp\HandlerStack;
use GuzzleHttp\Command\Guzzle\GuzzleClient;
use GuzzleHttp\Command\Guzzle\Description;
class MyCustomMiddleware
{
public function __invoke(callable $handler) {
return function (RequestInterface $request, array $options) use ($handler) {
// ... do something with request
return $handler($request, $options);
}
}
}
$handlerStack = HandlerStack::create();
$handlerStack->push(new MyCustomMiddleware);
$config['handler'] = $handlerStack;
$apiClient = new GuzzleClient(new Client($config), new Description(...));
The boilerplate solution for GuzzleClient is the same as for GuzzleHttp\Client because regardless of using Guzzle Services or not, your request-modifying middleware needs to go on GuzzleHttp\Client.
You can also use
$handler->push(Middleware::mapRequest(function(){...});
Of sorts to manipulate the request. I'm not 100% certain this is the thing you're looking for. But I assume you can add your extra parameter to the Request in there.
private function createAuthStack()
{
$stack = HandlerStack::create();
$stack->push(Middleware::mapRequest(function (RequestInterface $request) {
return $request->withHeader('Authorization', "Bearer " . $this->accessToken);
}));
return $stack;
}
More Examples here: https://hotexamples.com/examples/guzzlehttp/Middleware/mapRequest/php-middleware-maprequest-method-examples.html
I'm working on a Catalyst/psgi application that would make great use of asychronous streaming, however beyond a simple timer (like here: http://www.catalystframework.org/calendar/2013/13), I'm a little stumped on how to implement more "global" events.
By global events, I mean things like:
a periodic timer that is the same for all clients
the visit to a given page by a single client (but updates all clients)
a file stat watcher that will update all clients when a file changes.
Correct me if I'm wrong, but to me these all seem very different from the example linked above, which will give each client a different counter. I would like to have events that happen "across the board."
An example of what I've tried (using #2 from my list above):
has 'write_fh' => ( is => 'rw', predicate => 'has_write_fh' );
sub events : Path('/stream') Args(0) {
my ( $self, $c ) = #_;
$c->res->body("");
$c->res->content_type('text/event-stream');
$self->write_fh( $c->res->write_fh() );
}
sub trigger : Path('/trigger') : Args(0) {
my ( $self, $c ) = #_;
$self->write_fh->write( *the event string* );
}
When I run this, it actually gets further than I would expect - the event does get triggered, but unreliably. With two browsers open, sometimes the event is sent to one, and sometimes to the other.
Now, I think I understand why this would never work - the client who hits /trigger, has no knowledge of all the other clients who are watching /stream, and so the write_fh I'm trying to use is not useful.
But if each client's request is in its own contained bubble, how am I to access their stream from some other request?
Or am I completely on the wrong track...?
Your problem with write_fh is that this event is singlecast - once it was received by anyone, it won't be received anymore. so one of the connections catch it, and the other simply don't.
you need to broadcast your events. Take a look at AnyEvent::IRC to see how it can be done.
(note that it was written for an old version of AnyEvent, but it should still work)
I am trying to get the output data from a package_state in my IRC bot, which uses POE::Component::IRC as a base. But I just cannot seem to do it.
Basically, in a subroutine outside of the POE session, I wish to get the data from an event subroutine fired by POE when it receives the data from the server.
I've tried saving the data in a global array and even external file, but the outer subroutine will read the old data from it before that data gets updated.
More specifically, I am trying to get this bot to check if someone is 'ison' and if they are, return true (or get all data ( #_ ) from irc_303).
Something like this:
sub check_ison {
my $who = "someguy";
$irc->yield(ison => $who);
$data = (somehow retrieve data from irc_303);
return $data; #or true if $data
}
It sounds like you want a synchronous solution to an asynchronous problem. Due to the asynchronous nature of IRC (and POE, for that matter ...), you'll need to issue your ISON query and handle the numeric response as it comes in.
As far as I know, most client NOTIFY implementations issue an ISON periodically (POE::Component::IRC provides timer sugar via POE::Component::Syndicator), update their state, and tell the user if something changes.
You have options...
You could issue ISONs on a timer, save state appropriately in your numeric response handler, and provide a method to query the state. If your application looks more like a client (the user/something needs to be notified when something changes, that is) your numeric response handler could do some basic list comparison and issue appropriate events for users appearing/disappearing.
Otherwise, you could simply have a 'check_ison' that issues the ISON and yields some sort of 'response received' event from the numeric response handler, letting you know fresh data is available.
I'm working with SOAP::WSDL and another company's custom WSDL file. Every time they make a change for me and I recreate my modules, something breaks. Finding the problem is rather tedious because I don't find a proper way to access the actual request that is sent to the SOAP server.
The only way to get to the request so far has been to use tcpdump in conjunction with wireshark to extract the request and result. That works, but since I don't have root privileges on the dev machine I have to get an admin over every time I want to do that. I feel there must be another way to get to the HTTP::Request object inside the SOAP::WSDL thing. But if the server returns a fault, I don't even have a response object, but rather a SOAP::WSDL::SOAP::Typelib::Fault11 object that has no visible relation to the request.
I've also tried using the debugger but I'm having trouble finding the actual request part. I've not yet understood how to tell the debuger to skip to a specific part deep inside a complex number of packages.
I stumbled across this, having the same problem myself. I found the answer is using both options that raina77ow listed.
$service->outputxml(1);
returns the whole SOAP envelope xml, but this needs to be combined with
$service->no_dispatch(1);
With no_dispatch set, the SOAP request is printed, instead of the reply from the request. Hopefully this can help others.
Have you tried to use SOAP::WSDL::Client tracing methods - and outputxml in particular? It returns the raw SOAP envelope which is to be sent to the server.
You can also use no_dispatch configuration method of SOAP::WSDL package:
When set, call() returns the plain request XML instead of dispatching
the SOAP call to the SOAP service. Handy for testing/debugging.
I found a way to at least print out the generated XML code.
First, I looked at SOAP::WSDL::Client as raina77ow suggested. That wasn't what I needed, though. But then I came across SOAP::WSDL::Factory::Serializer. There, it says:
Serializer objects may also be passed directly to SOAP::WSDL::Client
by using the set_serializer method.
A little fidgeting and I came up with a wrapper class for SOAP::WSDL::Serializer::XSD which is the default serializer used by SOAP::WSDL. A look at the code helped, too.
Here's the module I wrote. It uses SOAP::WSDL::Serializer::XSD as a base class and overloads the new and serialize methods. While it only passes arguments to new, it grabs the returned XML from serialize and prints it, which suffices for debugging. I'm not sure if there's a way to put it somewhere I can easily get it from.
package MySerializer;
use strict;
use warnings;
use base qw(SOAP::WSDL::Serializer::XSD);
sub new {
my $self = shift;
my $class = ref($self) || $self;
return $self if ref $self;
# Create the base object and return it
my $base_object = $class->SUPER::new(#_);
return bless ($base_object, $class);
}
sub serialize {
my ($self, $args_of_ref) = #_;
# This is basically a wrapper function that calls the real Serializer's
# serialize-method and grabs and prints the returned XML before it
# giving it back to the caller
my $xml = ref($self)->SUPER::serialize($args_of_ref);
print "\n\n$xml\n\n"; # here we go
return $xml;
}
1;
And here's how I call it:
my $serializer = MySerializer->new();
$self->{'_interface'} = Lib::Interfaces::MyInterface->new();
$self->{'_interface'}->set_serializer($serializer); # comment out to deactivate
It's easy to deactivate. Only put a comment in the set_serializer line.
Of course printing a block of XML to the command line is not very pretty, but it gets the job done. I only need it once in a while why coding/testing, so this is fine I guess.
I am very new to Perl and i am learning on the fly while i try to automate some projects for work. So far its has been a lot of fun.
I am working on generating a report for a customer. I can get this report from a web page i can access.
First i will need to fill a form with my user name, password and choose a server from a drop down list, and log in.
Second i need to click a link for the report section.
Third a need to fill a form to create the report.
Here is what i wrote so far:
my $mech = WWW::Mechanize->new();
my $url = 'http://X.X.X.X/Console/login/login.aspx';
$mech->get( $url );
$mech->submit_form(
form_number => 1,
fields =>{
'ctl00$ctl00$cphVeriCentre$cphLogin$txtUser' => 'someone',
'ctl00$ctl00$cphVeriCentre$cphLogin$txtPW' => '12345',
'ctl00$ctl00$cphVeriCentre$cphLogin$ddlServers' => 'Live',
button => 'Sign-In'
},
);
die unless ($mech->success);
$mech->dump_forms();
I dont understand why, but, after this i look at the what dump outputs and i see the code for the first login page, while i belive i should have reached the next page after my successful login.
Could there be something with a cookie that can effect me and the login attempt?
Anythings else i am doing wrong?
Appreciate you help,
Yaniv
This is several months after the fact, but I resolved the same issue based on a similar questions I asked. See Is it possible to automate postback from the client side? for more info.
I used Python's Mechanize instead or Perl, but the same principle applies.
Summarizing my earlier response:
ASP.NET pages need a hidden parameter called __EVENTTARGET in the form, which won't exist when you use mechanize normally.
When visited by a normal user, there is a __doPostBack('foo') function on these pages that gives the relevant value to __EVENTTARGET via a javascript onclick event on each of the links, but since mechanize doesn't use javascript you'll need to set these values yourself.
The python solution is below, but it shouldn't be too tough to adapt it to perl.
def add_event_target(form, target):
#Creates a new __EVENTTARGET control and adds the value specified
#.NET doesn't generate this in mechanize for some reason -- suspect maybe is
#normally generated by javascript or some useragent thing?
form.new_control('hidden','__EVENTTARGET',attrs = dict(name='__EVENTTARGET'))
form.set_all_readonly(False)
form["__EVENTTARGET"] = target
You can only mechanize stuff that you know. Before you write any more code, I suggest you use a tool like Firebug and inspect what is happening in your browser when you do this manually.
Of course there might be cookies that are used. Or maybe your forgot a hidden form parameter? Only you can tell.
EDIT:
WWW::Mechanize should take care of cookies without any further intervention.
You should always check whether the methods you called were successful. Does the first get() work?
It might be useful to take a look at the server logs to see what is actually requested and what HTTP status code is sent as a response.
If you are on Windows, use Fiddler to see what data is being sent when you perform this process manually, and then use Fiddler to compare it to the data captured when performed by your script.
In my experience, a web debugging proxy like Fiddler is more useful than Firebug when inspecting form posts.
I have found it very helpful to use Wireshark utility when writing web automation with WWW::Mechanize. It will help you in few ways:
Enable you realize whether your HTTP request was successful or not.
See the reason of failure on HTTP level.
Trace the exact data which you pass to the server and see what you receive back.
Just set an HTTP filter for the network traffic and start your Perl script.
The very short gist of aspx pages it that they hold all of the local session information within a couple of variables prefixed by "__" in the general aspxform. Usually this is a top level form and all form elements will be part of it, but I guess that can vary by implementation.
For the particular implementation I was dealing with I needed to worry about 2 of these state variables, specifically:
__VIEWSTATE
__EVENTVALIDATION.
Your goal is to make sure that these variables are submitted into the form you are submitting, since they might be part of that main form aspxform that I mentioned above, and you are probably submitting a different form than that.
When a browser loads up an aspx page a piece of javascript passes this session information along within the asp server/client interaction, but of course we don't have that luxury with perl mechanize, so you will need to manually post these yourself by adding the elements to the current form using mechanize.
In the case that I just solved I basically did this:
my $browser = WWW::Mechanize->new( );
# fetch the login page to get the initial session variables
my $login_page = 'http://www.example.com/login.aspx';
$response = $browser->get( $login_page);
# very short way to find the fields so you can add them to your post
$viewstate = ($browser->find_all_inputs( type => 'hidden', name => '__VIEWSTATE' ))[0]->value;
$validation = ($browser->find_all_inputs( type => 'hidden', name => '__EVENTVALIDATION' ))[0]->value;
# post back the formdata you need along with the session variables
$browser->post( $login_page, [ username => 'user', password => 'password, __VIEWSTATE => $viewstate, __EVENTVALIDATION => $validation ]);
# finally get back the content and make sure it looks right
print $response->content();