Tricky Issue Handling File Upload in Perl - perl

We use CGI.pm to help us handle file uploading on our website and through our API which is used by our Android & iPhone Apps. We recently noticed that CGI.pm seems to be returning no params for almost 50% of the files being uploaded via our iPhone App. We haven't seen a similar issue with those files being uploaded via our website.
We can't replicate the problem in testing but in production the cgi_error() method of CGI.pm isn't reporting any errors in those cases where the CGI.pm params are missing. We have confirmed that the iPhone App is always including the correct params when POSTing the files for upload.
Quick background on the setup. We have the application delpoyed on Amazon EC2 Servers which are being load balanced using the Amazon Elastic Load Balancers. We also have $CGI::POST_MAX=(1024*100000); so the POST max size is set to 100MB and we have confirmed that all uploads are under this limit.
I'm not sure where to go next. Any ideas on what the issue might be and how to resolve it would be great appreciated. Also helpful would be any ideas on how to identify the root of the issue so we can start troubleshooting.
Thanks in advance for your help!

The loss of params with no error logged is exactly the symptom of the CGI module encountering an error processing the POST data - such as POST exceeding $CGI::POST_MAX. Are you using CGI.pm in functional mode by calling param(), or object oriented mode calling $cgi->param()? Regarding cgi_error(), perldoc CGI warns: When using the function-oriented interface, errors may only occur the first time you call param(). Be ready for this!
As for debugging, if you suspect CGI.pm is masking errors from you, try looking at the CGI object before doing anything else:
use Data::Dumper;
my $cgi = CGI->new();
warn Dumper($cgi);
Within the dump of the CGI object you would see an error like this: '.cgi_error' => '413 Request entity too large' - which is what cgi_error() would return for POST_MAX exceeded.
Also, if using Modperl, be aware that CGI can old onto values such as $CGI::POST_MAX between requests to different apps. (But, since you are specifying POST_MAX in yours, this wouldn't appear to be your problem.)

We too are seeing this same behaviour with CGI.pm, although we had thought the problem was restricted to just IE. Solved by adding an $CGI::POST_MAX=5000000. Over kill as it is only a 50k file being passed back and for.

Related

Is it possible to enforce a max upload size in Plack::Middleware without reading the entire body of the request?

I've just converted a PageKit (mod_perl) application to Plack. This means that I now need some way to enforce the POST_MAX/MAX_BODY that Apache2::Request would have previously handled. The easiest way to do this would probably be just to put nginx in front of the app, but the app is already sitting behind HAProxy and I don't see how to do this with HAProxy.
So, my question is how I might go about enforcing a maximum body size in Plack::Middleware without reading the entire body of the request first?
Specifically I'm concerned with file uploads. Checking size via Plack::Request::Upload is too late, since the entire body would have been read at this point. The app will be deployed via Starman, so psgix.streaming should be true.
I got a response from Tatsuhiko Miyagawa via Twitter. He says, "if you deploy with Starman it's too late even with the middleware because the buffering is on. I'd do it with nginx".
This answers my particular question as I'm dealing with a Starman deployment.
He also noted that "rejecting a bigger upload before reading it on the backend could cause issues in general"

Perl Website with Dancer2 - how can I log user activity, history, etc?

We have a perl web interface that I am currently working on to slowly convert to using Dancer 2 and PSGI instead of our slow old plain vanilla CGI model.
In our old model, we stored everything in sessions -- the history of what the users did, the call stacks, the data inputs, ........ you get the idea.
We do not want to do it that way anymore so that we can keep the sessions small and efficient. BUT, we'd still like to log just what the users have been doing (that way when an error gets reported we can see what they did to get to the error, what input(s) they put in, etc).
I looked at Logging on Dancer2 documentation, but this doesn't seem to quite get to what we need - this would only record Dancer2 messages + what other messages I put in.
This one that I found Dancer2::Logger doesn't seem to quite cut it either.
What other libraries could I use to do what I need? I seriously doubt that perl does NOT have somethign that does this so...
Just off the top of my head, I can think of Log::log4perl and Log::Dispatch, though there are myriad others.
You can use them to establish your own log files, separate from dancer's log.
As for the best way, most logging interfaces have the same api for logging, but differ in run-time instantiation, and configuration syntax. So read the docs on a few of them and maybe try a couple out on for size.

Recreate a site from a tcpdump?

It's a long story, but I am trying to save an internal website from the pointy hair bosses who see no value from it anymore and will be flicking the switch at some point in the future. I feel the information contained is important and future generations will want to use it. No, it's not some adult site, but since it's some big corp, I can't say any more.
The problem is, the site is a mess of ASP and Flash that only works under IE7 and is buggy under IE8 and 32bit only even. All the urls are session style and are gibberish. The flash objects itself pull extra information with GET request to ASP objects. It's really poorly designed for scraping. :)
So my idea is to do a tcpdump as I navigate the entire site. Then somehow dump the result of every GET into a sql database. Then with a little messing with the host file, redirect every request to some cgi script that will look for a matching get request in the database and return the data. So the entire site will be located in an SQL database in URL/Data keypairs. Flat file may also work.
In theory, I think this is the only way to go about this. The only problem I see is if they do some client side ActiveX/Flash stuff that generates session URLs that will be different each time.
Anyway, I know Perl, and the idea seems simple with the right modules, so I think I can do most of the work in that, but I am open to any other ideas before I get started. Maybe this exist already?
Thanks for any input.
To capture I wouldn't use tcpdump, but either the crawler itself or a webproxy that can be tweaked to save everything, e.g. Fiddler, Squid, or mod_proxy.

Perl application move causing my head to explode...please help

I'm attempting to move a web app we have (written in Perl) from an IIS6 server to an IIS7.5 server.
Everything seems to be parsing correctly, I'm just having some issues getting the app to actually work.
The app is basically a couple forms. You fill the first one out, click submit, it presents you with another form based on what checkboxes you selected (using includes and such).
I can get past the first form once... but then after that it stops working and pops up the generated error message. After looking into the code and such, it basically states that there aren't any checkboxes selected.
I know the app writes data into .dat files... (at what point, I'm not sure yet), but I don't see those being created. I've looked at file/directory permissions and seemingly I have MORE permissions on the new server than I did on the last. The user/group for the files/dirs are different though...
Would that have anything to do with it? Why would it pass me on to the next form, displaying the correct "modules" I checked the first time and then not any other time after that? (it seems to reset itself after a while)
I know this is complicated so if you have any questions for me, please ask and I'll answer to the best of my ability :).
Btw, total idiot when it comes to Perl.
EDIT AGAIN
I've removed the source as to not reveal any security vulnerabilities... Thanks for pointing that out.
I'm not sure what else to do to show exactly what's going on with this though :(.
I'd recommend verifying, step by step, that what you think is happening is really happening. Start by watching the HTTP request from your browser to the web server - are the arguments your second perl script expects actually being passed to the server? If not, you'll need to fix the first script.
(start edit)
There's lots of tools to watch the network traffic.
Wireshark will read the traffic as it passes over the network (you can run it on the sending or receiving system, or any system on the collision domain).
You can use a proxy server, like WebScarab (free), Burp, Paros, etc. You'll have to configure your browser to send traffic to the proxy server, which will then forward the requests to the server. These particular servers are intended to aid testing, in that you'll be able to mess with the requests as they go by (and much more)
As Sinan indicates, you can use browser addons like Fx LiveHttpHeaders, or Tamper Data, or Internet Explorer's developer kit (IIRC)
(end edit)
Next, you should print out all CGI arguments that the second perl script receives. That way, you'll know what the script really thinks it gets.
Then, you can enable verbose logging in IIS, so that it logs the full HTTP request.
This will get you closer to the source of the problem - you'll know if it's (a) the first script not creating correct HTML, resulting in an incomplete HTTP request from the browser, (b) the IIS server not receiving the CGI arguments for some odd reason, or (c) the arguments aren't getting from the IIS server and into the perl script (or, possibly, that the perl script is not correctly accessing the arguments).
Good luck!
What you need to do is clear.
There is a lot of weird excess baggage in the script. There seemed to be no subroutines. Just one long series of commands with global variables.
It is time to start refactoring.
Get one thing running at a time.
I saw HTML::Template there but you still had raw HTML mixed in with code. Separate code from presentation.

Is there any way to allow failed uploads to resume with a Perl CGI script?

The application is simple, an HTML form that posts to a Perl script. The problem is we sometimes have our customers upload very large files (gt 500mb) and their internet connections can be unreliable at times.
Is there any way to resume a failed transfer like in WinSCP or is this something that can't be done without support for it in the client?
AFAIK, it must be supported by the client. Basically, the client and the server need to negotiate which parts of the file (likely defined as parts in "multipart/form-data" POST) have already been uploaded, and then the server code needs to be able to merge newly uploaded data with existing one.
The best solution is to have custom uploader code, usually implemented in Java though I think this may be possible in Flash as well. You might be even able to do this via JavaScript - see 2 sections with examples below
Here's an example of how Google did it with YouTube: http://code.google.com/apis/youtube/2.0/developers_guide_protocol_resumable_uploads.html
It uses "308 Resume Incomplete" HTTP response which sends range: bytes=0-408 header from the server to indicate what was already uploaded.
For additional ideas on the topic:
http://code.google.com/p/gears/wiki/ResumableHttpRequestsProposal
Someone implemented this using Google Gears on calient side and PHP on server side (the latter you can easily port to Perl)
http://michaelshadle.com/2008/11/26/updates-on-the-http-file-upload-front/
http://michaelshadle.com/2008/12/03/updates-on-the-http-file-upload-front-part-2/
It's a shame that your clients can't use ftp uploading, since this already includes abilities like that. There is also "chunked transfer encoding" in HTTP. I don't know what Perl modules might support it already.