How to redirect from one CGI to another - perl

I am sending data from A.cgi to B.cgi. B.cgi updates the data in the database and is supposed to redirect back to A.cgi, at which point A.cgi should display the updated data. I added the following code to B.cgi to do the redirect, immediately after the database update:
$url = "http://Travel/cgi-bin/A.cgi/";
print "Location: $url\n\n";
exit();
After successfully updating the database, the page simply prints
Location: http://Travel/cgi-bin/A.cgi/
and stays on B.cgi, without ever getting redirected to A.cgi. How can I make the redirect work?

Location: is a header and headers must come before all ordinary output, that's probably your problem. But doing this manually is unneccessarly complicated anyways, you would be better of using the redirect function of CGI.pm

Use CGI's redirect method:
my $url = "http://Travel/cgi-bin/A.cgi";
my $q = CGI->new;
print $q->redirect($url);

Related

Sending an unbuffered response in Plack

I'm working in a section of a Perl module that creates a large CSV response. The server runs on Plack, on which I'm far from expert.
Currently I'm using something like this to send the response:
$res->content_type('text/csv');
my $body = '';
query_data (
parameters => \%query_parameters,
callback => sub {
my $row_object = shift;
$body .= $row_object->to_csv;
},
);
$res->body($body);
return $res->finalize;
However, that query_data function is not a fast one and retrieves a lot of records. In there, I'm just concatenating each row into $body and, after all rows are processed, sending the whole response.
I don't like this for two obvious reasons: First, it takes a lot of RAM until $body is destroyed. Second, the user sees no response activity until that method has finished working and actually sends the response with $res->body($body).
I tried to find an answer to this in the documentation without finding what I need.
I also tried calling $res->body($row_object->to_csv) on my callback section, but seems like that ends up sending only the last call I made to $res->body, overriding all previous ones.
Is there a way to send a Plack response that flushes the content on each row, so the user starts receiving content in real time as the data is gathered and without having to accumulate all data into a veriable first?
Thanks in advance for any comments!
You can't use Plack::Response because that class is intended for representing a complete response, and you'll never have a complete response in memory at one time. What you're trying to do is called streaming, and PSGI supports it even if Plack::Response doesn't.
Here's how you might go about implementing it (adapted from your sample code):
my $env = shift;
if (!$env->{'psgi.streaming'}) {
# do something else...
}
# Immediately start the response and stream the content.
return sub {
my $responder = shift;
my $writer = $responder->([200, ['Content-Type' => 'text/csv']]);
query_data(
parameters => \%query_parameters,
callback => sub {
my $row_object = shift;
$writer->write($row_object->to_csv);
# TODO: Need to call $writer->close() when there is no more data.
},
);
};
Some interesting things about this code:
Instead of returning a Plack::Response object, you can return a sub. This subroutine will be called some time later to get the actual response. PSGI supports this to allow for so-called "delayed" responses.
The subroutine we return gets an argument that is a coderef (in this case, $responder) that should be called and passed the real response. If the real response does not include the "body" (i.e. what is normally the 3rd element of the arrayref), then $responder will return an object that we can write the body to. PSGI supports this to allow for streaming responses.
The $writer object has two methods, write and close which both do exactly as their names suggest. Don't forget to call the close method to complete the response; the above code doesn't show this because how it should be called is dependent on how query_data and your other code works.
Most servers support streaming like this. You can check $env->{'psgi.streaming'} to be sure that yours does.
Plack is middleware. Are you using a web application framework on top of it, like Mojolicious or Dancer2, or something like Apache or Starman server below it? That would affect how the buffering works.
The link above shows an example by Plack's author:
https://metacpan.org/source/MIYAGAWA/Plack-1.0037/eg/dot-psgi/echo-stream-sync.psgi
Or you can do it easily by using Dancer2 on top of Plack and Starman or Apache:
https://metacpan.org/pod/distribution/Dancer2/lib/Dancer2/Manual.pod#Delayed-responses-Async-Streaming
Regards, Peter
Some reading material for you :)
https://metacpan.org/pod/PSGI#Delayed-Response-and-Streaming-Body
https://metacpan.org/pod/Plack::Middleware::BufferedStreaming
https://metacpan.org/source/MIYAGAWA/Plack-1.0037/eg/dot-psgi/echo-stream.psgi
https://metacpan.org/source/MIYAGAWA/Plack-1.0037/eg/dot-psgi/nonblock-hello.psgi
So copy/paste/adapt and report back please

Mojolicious not following redirection from webarchive.org

I'm using Mojolicious DOM and UserAgent to get the source of a page from Webarchive.org, parse it, and import it into a Dotclear database (using webarchive as a backup).
In the source, there are "Previous" and "Next" links allowing to get to the different posts originaly made on the blog.
The perl script I have developped is supposed to run through those links to import all pages of this blog's snapshot.
It first get the source of the first post of the blog, parses it, put the result in a local DB, and gets the link under "Next" to do that same thing on the next post, until there is no more "Next" posts.
As for the bases.
But the trick is that the link I get from the source is not the link Webarchive has.
Webarchive's links to snapshots go like this :
http://web.archive.org/web/20131012182412/http://www.mytarget.com/post?mypost
The big number between "web" and the original URL is (i guess) the date the snapshot was made. The trick is that it changes at each snapshot, and although it may appear on one post, the next post have been snapshoted on anotherdate. So the URL wont fit.
When I click on the link i get from the source, it brings me to webarchive.org, which automaticaly searches on the page i pass, and redirect me to it.
But when I try to get the source via the get() function of Mojolicious, it just gets the "Page not found" page of webarchive.
So, there is my question : is there a way to let mojolicious follow the redirection of webarchive ? I activated max_redirects(5) on my UserAgent, but still does the same.
Here is my code :
sub main{
my ($url) = #_;
my $ua = Mojo::UserAgent->new;
$ua = $ua->max_redirects(5);
my $dom = $ua->get($url)->res->dom;
#...Treatment and parsing of the source ...
return $nextUrl;
}
my $nextUrl="http://web.archive.org/web/20131012182412/http://www.mytarget.com/post?mypost";
my $secondUrl;
while ($nextUrl){
$secondUrl = main($nextUrl);
$nextUrl = $secondUrl;
}
Thanks in advance...
I've finally found a way around.
I use this piece of code to follow the URL and get the finally reached URL :
use LWP::UserAgent qw();
my $ua = LWP::UserAgent->new;
my $ret = $ua->get($url);
$url = $ret->request->uri ."";
print "URL returned: ".$url."\n";
Then I use that URL to get the source code and fetch it.

feedpp and session ID

we are using Perl and cpan Modul FeedPP to parse RSS Feeds.
The Perl script runs trough the different items of the RSS Feeds and save the link to the database, liket his:
my $response = $ua->get($url);
if ($response->is_success) {
my $feed = XML::FeedPP->new( $response->content, -type => 'string' );
foreach my $item ( $feed->get_item() ) {
my $link = $item->link();
[...]
$url contains the URL to an RSS Feed, like http://my.domain/RSS/feeds.xml
in this case, $item->link() will contain links to the RSS article, like http://my.domain/topic/myarticle.html
The Problem is, some webservers (which provides the RSS feeds) does an HTTP refer in order to add an session ID to the URL, like this: http://my.domain/RSS/feeds.xml;jsessionid=4C989B1DB91D706C3E46B6E30427D5CD.
The strange think is, that feedPP seams to add this session-ID to the link of every item. So $item->link() contain links to the RSS article, like http://my.domain/topic/myarticle.html;jsessionid=4C989B1DB91D706C3E46B6E30427D5CD
Even if the original link does not contain an session ID.
Is there a way to turn of that behavior of feedPP??
Thank you for any kind of help.
I took a look through http://metacpan.org/pod/XML::FeedPP but didn't see any way to turn have the link() method trim those session IDs for you. (I'm using XML::FeedPP in one of my scripts and the site I happen to be parsing doesn't use session IDs.)
So I think the answer is no, not currently. You could try contacting the author or filing a bug.
IMHO, the behavior is correct: uri components which follow a semi-colon are defined part of the path (configuration parameter for interpretation), so when the uri is used to make a relative url into an absolute uri it needs to be copied as well.
You expect compatible behavior with '&' parameters, but they are not equal.
https://rt.cpan.org/Ticket/Display.html?id=73895

How to read a web page content which may itself be redirected to another url?

I'm using this code to read the web page content:
my $ua = new LWP::UserAgent;
my $response= $ua->post($url);
if ($response->is_success){
my $content = $response->content;
...
But if $url is pointing to moved page then $response->is_success is returning false. Now how do I get the content of redirected page easily?
You need to chase the redirect itself.
if ($response->is_redirect()) {
$url = $response->header('Location');
# goto try_again
}
You may want to put this in a while loop and use "next" instead of "goto". You may also want to log it, limit the number of redirections you are willing to chase, etc.
[update]
OK I just noticed there is an easier way to do this. From the man page of LWP::UserAgent:
$ua->requests_redirectable
$ua->requests_redirectable( \#requests )
This reads or sets the object's list of request names that
"$ua->redirect_ok(...)" will allow redirection for. By default,
this is "['GET', 'HEAD']", as per RFC 2616. To change to include
'POST', consider:
push #{ $ua->requests_redirectable }, 'POST';
So yeah, maybe just do that. :-)

How can I access parameters passed in the URL when a form is POSTed to my script?

I've run into an issue with mod_rewrite when submitting forms to our site perl scripts. If someone does a GET request on a page with a url such as http://www.example.com/us/florida/page-title, I rewrite that using the following rewrite rule which works correctly:
RewriteRule ^us/(.*)/(.*)$ /cgi-bin/script.pl?action=Display&state=$1&page=$2 [NC,L,QSA]
Now, if that page had a form on it I'd like to do a form post to the same url and have Mod Rewrite use the same rewrite rule to call the same script and invoke the same action. However, what's happening is that the rewrite rule is being triggered, the correct script is being called and all form POST variables are being posted, however, the rewritten parameters (action, state & page in this example) aren't being passed to the Perl script. I'm accessing these variables using the same Perl code for both the GET and POST requests:
use CGI;
$query = new CGI;
$action = $query->param('action');
$state = $query->param('state');
$page = $query->param('page');
I included the QSA flag since I figured that might resolve the issue but it didn't. If I do a POST directly to the script URL then everything works correctly. I'd appreciate any help in figuring out why this isn't currently working. Thanks in advance!
If you're doing a POST query, you need to use $query->url_param('action') etc. to get parameters from the query string. You don't need or benefit from the QSA modifier.
Change your script to:
use CGI;
use Data::Dumper;
my $query = CGI->new; # even though I'd rather call the object $cgi
print $query->header('text/plain'), Dumper($query);
and take a look at what is being passed to your script and update your question with that information.