Tornado server does not receive big files - rest

Hi I'm trying to realize a Tornado server with the goal to receive very big binary files (~1GB) into POST body. The following code works for small files, but does not answer if I try to send big files (~100MB).
class ReceiveLogs(tornado.web.RequestHandler):
def post(self):
file1 = self.request.body
output_file = open('./output.zip', 'wb')
output_file.write(file1)
output_file.close()
self.finish("file is uploaded")
Do you know any solutions?

I don't have a real implementation as an answer but one or two remarks that hopefully point to the right direction.
First of all there is a 100MB Upload limit which can be increased setting the
self.request.connection.set_max_body_size(size)
in the initalization of the Request handler. (taken from this answer)
The Problem is that tornado handles all file uploads in memory (and that HTTP is not a very reliable Protocol for handling large file uploads.)
This is quote from a member of the tornadoweb team from 2014 (see github issue here)
... You can adjust this limit with the max_buffer_size argument to the
HTTPServer constructor, although I don't think it would be a good idea
to set this larger than say 100MB.
Tornado does not currently support very large file uploads. Better
support is coming (#1021) and the nginx upload module is a popular
workaround in the meantime. However, I would advise against doing 1GB+
uploads in a single HTTP POST in any case, because HTTP alone does not
have good support for resuming a partially-completed upload (in
addition to the aforementioned error problem). Consider a multi-step
upload process like Dropbox's chunked_upload and commit_chunked_upload
(https://www.dropbox.com/developers/core/docs#chunked-upload)
As stated I would recommend to do one of the following:
if NGNIX is possible to handle and route requests to tornado=> look
at the NGNIX upload module (see ngnix wiki here)
If it must be a plain tornado solution use the
tornado.web.stream_request_body which came with tornado 4. This
streams the uploaded files to disk instead of trying to first get
them all in mem. (see tornado 4 release notes and this solution on stackoverflow)

Related

Performance issues when when serialising JSON with larger base64 strings

As a result of a 3rd party integration that only supports file uploading via CURL, which we only found out after completing the implementation as a normal multi-part form and ran into CORS issues, we then had to proxy the requests via our API but took a simpler approach, which also solved/simplified other things of attaching the base64 encoded string for the file(s), as apposed to creating a new multi-part endpoint.
We base64 encode the files byte data and attach it as part of the POST payload, we then noticed some unusual performance issues, a 310kb file takes 20 seconds before the network request is actually started.
After some investigation and CPU profiling, we narrowed it down to the move method as part of darts iterable in the Dio chain. Given the majority of our network requests include lists this made little sense.
To confirm I pulled in the http package and made the network request using that, the time dropped from 2-25s to less than 2, so there is definitely something fishy up with Dio in our instance.
We are on v4.0.0 of Dio and 2.0.1 of Retrofit and owing to other dependencies that we cannot update at this time, we cannot progress further with these packages so it's possible this very likely edge case may be solved, however, I see no issues on Dio or Retrofit that could be related so it's also likely they may not be directly or indirectly causing this.
A solution would be fantastic, as preferably I would. not want to be running 2 network services.
Here is a link to the CPU profile, if that helps any. https://drive.google.com/file/d/1Dh7rEg7c04xxUvTmJB42Zs0RqEmdVPjQ/view?usp=sharing

Is it possible to enforce a max upload size in Plack::Middleware without reading the entire body of the request?

I've just converted a PageKit (mod_perl) application to Plack. This means that I now need some way to enforce the POST_MAX/MAX_BODY that Apache2::Request would have previously handled. The easiest way to do this would probably be just to put nginx in front of the app, but the app is already sitting behind HAProxy and I don't see how to do this with HAProxy.
So, my question is how I might go about enforcing a maximum body size in Plack::Middleware without reading the entire body of the request first?
Specifically I'm concerned with file uploads. Checking size via Plack::Request::Upload is too late, since the entire body would have been read at this point. The app will be deployed via Starman, so psgix.streaming should be true.
I got a response from Tatsuhiko Miyagawa via Twitter. He says, "if you deploy with Starman it's too late even with the middleware because the buffering is on. I'd do it with nginx".
This answers my particular question as I'm dealing with a Starman deployment.
He also noted that "rejecting a bigger upload before reading it on the backend could cause issues in general"

Using CouchDB as interface. Is it appropriate way?

our devices (microscopes with cameras) produce images and additional information to each image.
Now a middleware supplies wants to connect these devices to lab automation system. They have to acquire the data and we have to provide it. An astonishing thing for me was their interface suggestion - a very cryptical token separated format (ASTM E1394-97). Unfortunatelly, they even can't accomodate images in their protocol, and are aiming to get file-paths.
I thought it is not the up-to date approach. While lookink for alternatives, I saw CoachDB.
So, my idea was, our devices would import data including images in CoachDB and they could get the data. It seems even, that using mustache, we could produce the format they want (ascii-text) and placing URLs as image references instead of path's.
My question is, did someone applied CoachDB for such a use case already? It seems to be a little-bit misuse of CoachDB, as the main intention is interface not data storage. Another point disturbing me is, that the inventor of CoachDB went to other project Coachbase. Could it mean lack of support for CoachDB in the future?
Thank you very much for any insights and suggestions!
It's ok use-case and actually we're using CouchDB in such way - as proxing middleware between medical laboratory analyzers and LIS. Some of them publish images or pdf data on shared folders and we'd just loading them into related document as attachments.
More over you'd like to know, CouchDB is able to serve external processes (aka os_daemons) and take care about their lifespan: restarting if someone had terminated and starting right after you update config options through HTTP interface. This helps to setup ASTM client and server processes since this protocol is different from HTTP (which is native for CouchDB) which communicates with devices and creates documents as regular CouchDB clients. In same way you may setup daemons to monitor shared folders for specific files. And all this is just CouchDB with few "low bounded" plugins.

Getting the progress of the sent data through Zend Rest service

I have the following situation. We are using Zend Framework to create a web application that is communicating with it's database through REST services.
The problem I'm facing is that when a user tries to upload a big video file for example, the service is taking some time (sometimes a few minutes) to receive the request (which is also sending the video file encoded with base64_encode PHP function.) and returns the response for successful save or error.
My idea is to track how much of the data is sent and show the user a JS progress bar, which will be useful in these cases.
Does anyone have an idea, how I can track how much of the data is sent through the service and based on this I'll be able to show a progress bar?
Zend provides progress bar functionalities that might be paired with some javascript/jquery client.
You will easily find some example implementations like this one:
https://github.com/marcinwol/zfupload
However I don't think that REST services are the best solution for uploading videos as base64 encoding will make files bigger and slower to upload.
Check out Zend_File_Transfer that might be better suited to your needs:
http://framework.zend.com/manual/1.12/en/zend.file.transfer.introduction.html

Is there any way to allow failed uploads to resume with a Perl CGI script?

The application is simple, an HTML form that posts to a Perl script. The problem is we sometimes have our customers upload very large files (gt 500mb) and their internet connections can be unreliable at times.
Is there any way to resume a failed transfer like in WinSCP or is this something that can't be done without support for it in the client?
AFAIK, it must be supported by the client. Basically, the client and the server need to negotiate which parts of the file (likely defined as parts in "multipart/form-data" POST) have already been uploaded, and then the server code needs to be able to merge newly uploaded data with existing one.
The best solution is to have custom uploader code, usually implemented in Java though I think this may be possible in Flash as well. You might be even able to do this via JavaScript - see 2 sections with examples below
Here's an example of how Google did it with YouTube: http://code.google.com/apis/youtube/2.0/developers_guide_protocol_resumable_uploads.html
It uses "308 Resume Incomplete" HTTP response which sends range: bytes=0-408 header from the server to indicate what was already uploaded.
For additional ideas on the topic:
http://code.google.com/p/gears/wiki/ResumableHttpRequestsProposal
Someone implemented this using Google Gears on calient side and PHP on server side (the latter you can easily port to Perl)
http://michaelshadle.com/2008/11/26/updates-on-the-http-file-upload-front/
http://michaelshadle.com/2008/12/03/updates-on-the-http-file-upload-front-part-2/
It's a shame that your clients can't use ftp uploading, since this already includes abilities like that. There is also "chunked transfer encoding" in HTTP. I don't know what Perl modules might support it already.