Is there any way to allow failed uploads to resume with a Perl CGI script? - perl

The application is simple, an HTML form that posts to a Perl script. The problem is we sometimes have our customers upload very large files (gt 500mb) and their internet connections can be unreliable at times.
Is there any way to resume a failed transfer like in WinSCP or is this something that can't be done without support for it in the client?

AFAIK, it must be supported by the client. Basically, the client and the server need to negotiate which parts of the file (likely defined as parts in "multipart/form-data" POST) have already been uploaded, and then the server code needs to be able to merge newly uploaded data with existing one.
The best solution is to have custom uploader code, usually implemented in Java though I think this may be possible in Flash as well. You might be even able to do this via JavaScript - see 2 sections with examples below
Here's an example of how Google did it with YouTube: http://code.google.com/apis/youtube/2.0/developers_guide_protocol_resumable_uploads.html
It uses "308 Resume Incomplete" HTTP response which sends range: bytes=0-408 header from the server to indicate what was already uploaded.
For additional ideas on the topic:
http://code.google.com/p/gears/wiki/ResumableHttpRequestsProposal
Someone implemented this using Google Gears on calient side and PHP on server side (the latter you can easily port to Perl)
http://michaelshadle.com/2008/11/26/updates-on-the-http-file-upload-front/
http://michaelshadle.com/2008/12/03/updates-on-the-http-file-upload-front-part-2/

It's a shame that your clients can't use ftp uploading, since this already includes abilities like that. There is also "chunked transfer encoding" in HTTP. I don't know what Perl modules might support it already.

Related

How is my/the user's web browser displaying a web page built in Perl?

this isn't a specific programming related question, but more so a conceptual/software engineering related question.
I'm a new web dev hire at a small local company, who was given a really cool chance to learn and grow as a professional. They were kind enough to give me a chance, and I'd like to be proactive in learning as much about how their back-end system is working as I can, considering it's what I'll be working in most of the time.
From what I've gathered, their entire in-house built job tracking interface is built in Perl (will the aid of css, js, and sql), where the html pages are generated and spat out as the user wants to access them.
For example, if I want to access a specific job, it'll look like this in the user's url. https://tracking.ourcompanywebsite/jobtracker/job/1234
On the internal side, I know we have a "viewing" script that would be called something like "JobView" that will literally query all of the fields in the perl script, and structure an html page around that data we are requesting.
My question is, how the fudge is this happening? How does a user putting in that address on the url trigger a perl script to run on our server, and generate a page that is spat back out to the user?
I guess that's my main curiosity. In your average bare bones web development courses in college, I learned to make your html, css, and js files. When you want to view a web page, you simply put the directory of that html page, and it constructs everything around that.
When you put a directory to a perl file in a browser, it will just open that raw perl code haha.
I'm sure there may be some modules and various add-ons in our software that allows this to all work, that I may be missing, so please forgive me.
I know you guys don't have the codebase in front of you, but I figured conceptually there is something to be learned that doesn't necessarily need all of the specifics.
I hope that this question could be used for any other amateur devs having the same questions.
Consider the following two snippets:
cat file | program
printf 'foo\n' | cat | program
In the first snippet, cat reads its output from a file. In the second, it gets it from another program. But program doesn't care about any of that. It just reads whatever was provided to its STDIN.
The web browser is like program. It doesn't care where the web server got the HTML or image or whatever it requested. It sends a URL, and it receives a response with a document from the web server.
The web server, like cat, can obtain what it needs from multiple sources. Specifically, it can be configured to get the requested document in a few different ways.
The "default" would be to map the URL to a directory and return the file found there. But that's not the only option. There are two other major options commonly found in web servers:
Common Gateway Interface (CGI)
Some web servers can be configured to run a program based on the URL received. Information about the request is passed to the program, which is tasked with producing a response. The web server simply returns the output of this program to requesting browser.
FastCGI
It can be quite wasteful to spawn a new child for each request. FastCGI allows a web server to talk to an existing persistent process or pool of processes that listen for requests from the webserver. Again, the web server simply returns the response from this request to the requesting browser.

Tornado server does not receive big files

Hi I'm trying to realize a Tornado server with the goal to receive very big binary files (~1GB) into POST body. The following code works for small files, but does not answer if I try to send big files (~100MB).
class ReceiveLogs(tornado.web.RequestHandler):
def post(self):
file1 = self.request.body
output_file = open('./output.zip', 'wb')
output_file.write(file1)
output_file.close()
self.finish("file is uploaded")
Do you know any solutions?
I don't have a real implementation as an answer but one or two remarks that hopefully point to the right direction.
First of all there is a 100MB Upload limit which can be increased setting the
self.request.connection.set_max_body_size(size)
in the initalization of the Request handler. (taken from this answer)
The Problem is that tornado handles all file uploads in memory (and that HTTP is not a very reliable Protocol for handling large file uploads.)
This is quote from a member of the tornadoweb team from 2014 (see github issue here)
... You can adjust this limit with the max_buffer_size argument to the
HTTPServer constructor, although I don't think it would be a good idea
to set this larger than say 100MB.
Tornado does not currently support very large file uploads. Better
support is coming (#1021) and the nginx upload module is a popular
workaround in the meantime. However, I would advise against doing 1GB+
uploads in a single HTTP POST in any case, because HTTP alone does not
have good support for resuming a partially-completed upload (in
addition to the aforementioned error problem). Consider a multi-step
upload process like Dropbox's chunked_upload and commit_chunked_upload
(https://www.dropbox.com/developers/core/docs#chunked-upload)
As stated I would recommend to do one of the following:
if NGNIX is possible to handle and route requests to tornado=> look
at the NGNIX upload module (see ngnix wiki here)
If it must be a plain tornado solution use the
tornado.web.stream_request_body which came with tornado 4. This
streams the uploaded files to disk instead of trying to first get
them all in mem. (see tornado 4 release notes and this solution on stackoverflow)

Limit File Upload Size on client side GWT only

Is there any provision in which i can limit my file upload to some limt ?
I'm using FileUploadField in my GWT screen.
Is there any way i can apply some check that only allows me to upload file max. upto 10MB only ?
TIA !
That is the job of the server. Javascript (and thus abstractions of Javascript such as GWT) are not allowed access to the file being uploaded. The server side should check the file side and throw an exception.
According to http://www.artofsolving.com/node/50 finding the error client side is tricky. You have to actually parse the html results in the iframe used for the upload in the onSubmitComplete event.
As the above answer stated It is not able to be done due to security. It is possible via ActiveX but I am in no way recommending that.
So you can not have a way to check it front end but you could make it seem like it.
Your servlet in this instance would use a push technology such as Comet to send the status of that file such as too big or completed back to the UI.

Is it possible to send a file from the client computer on a perl web application without uploading it to the server first?

I've looked for this around the internet without getting any good answer so far, so here's the issue:
I have a perl web application used by a small group of people (accessed by their web browser on windows computers, around 100 users) , on an intranet (this application is on a RedHat Apache server) , this application gets the user's inputs and uses WWW::Mechanize to send everything to another page on (a different server, which shouldn't be used directly), process a form and return the result (I know it may not sound optimal, but it was done according to what was required), the issue here is that I need the users to be able to send a file (most likely it will be an image of ~500kB, either through WWW::Mechanize along with the other form data that gets submitted, or by an email with an attachment, either option is equally acceptable), and I know the file can be sent/attached if it's already on the server, my question is simple:
Is it possible to send a file from the client computer (running the perl web application on the browser) without uploading it to the server (that will send it) first?
P.S. This is not one of those "give me the code" questions, I'm not asking for any specific code, I just want to know if this is something that could be done (and if it is to have an idea how), or if I absolutely have to upload the file to the server running the perl application first (I already have a script for that). If this is not possible it's ok, I just want to be sure if I need to upload to the server first before sending the file.
Assuming I understand you correctly, yes, you can upload a file through WWW::Mechanize. See the pb-upload example.
Yes it can be done if the client computer has a mail client that is configurable enough to allow a commmand of the type "mail $TO $SUBJECT -attachment $ATT_FILE". Even then you need user to cooperate by launching the action.
Otherwise, no. You can't do it via JavaScript AFAIK due to sandbox restrictions.
I don't know enough about Flash to know if that's an option.

Perl application move causing my head to explode...please help

I'm attempting to move a web app we have (written in Perl) from an IIS6 server to an IIS7.5 server.
Everything seems to be parsing correctly, I'm just having some issues getting the app to actually work.
The app is basically a couple forms. You fill the first one out, click submit, it presents you with another form based on what checkboxes you selected (using includes and such).
I can get past the first form once... but then after that it stops working and pops up the generated error message. After looking into the code and such, it basically states that there aren't any checkboxes selected.
I know the app writes data into .dat files... (at what point, I'm not sure yet), but I don't see those being created. I've looked at file/directory permissions and seemingly I have MORE permissions on the new server than I did on the last. The user/group for the files/dirs are different though...
Would that have anything to do with it? Why would it pass me on to the next form, displaying the correct "modules" I checked the first time and then not any other time after that? (it seems to reset itself after a while)
I know this is complicated so if you have any questions for me, please ask and I'll answer to the best of my ability :).
Btw, total idiot when it comes to Perl.
EDIT AGAIN
I've removed the source as to not reveal any security vulnerabilities... Thanks for pointing that out.
I'm not sure what else to do to show exactly what's going on with this though :(.
I'd recommend verifying, step by step, that what you think is happening is really happening. Start by watching the HTTP request from your browser to the web server - are the arguments your second perl script expects actually being passed to the server? If not, you'll need to fix the first script.
(start edit)
There's lots of tools to watch the network traffic.
Wireshark will read the traffic as it passes over the network (you can run it on the sending or receiving system, or any system on the collision domain).
You can use a proxy server, like WebScarab (free), Burp, Paros, etc. You'll have to configure your browser to send traffic to the proxy server, which will then forward the requests to the server. These particular servers are intended to aid testing, in that you'll be able to mess with the requests as they go by (and much more)
As Sinan indicates, you can use browser addons like Fx LiveHttpHeaders, or Tamper Data, or Internet Explorer's developer kit (IIRC)
(end edit)
Next, you should print out all CGI arguments that the second perl script receives. That way, you'll know what the script really thinks it gets.
Then, you can enable verbose logging in IIS, so that it logs the full HTTP request.
This will get you closer to the source of the problem - you'll know if it's (a) the first script not creating correct HTML, resulting in an incomplete HTTP request from the browser, (b) the IIS server not receiving the CGI arguments for some odd reason, or (c) the arguments aren't getting from the IIS server and into the perl script (or, possibly, that the perl script is not correctly accessing the arguments).
Good luck!
What you need to do is clear.
There is a lot of weird excess baggage in the script. There seemed to be no subroutines. Just one long series of commands with global variables.
It is time to start refactoring.
Get one thing running at a time.
I saw HTML::Template there but you still had raw HTML mixed in with code. Separate code from presentation.