RESTful server behavior when uploading same file multiple times - rest

I am implementing REST endpoints for uploading large (multiple GB) files using multipart requests. For uploading I have POST /files and PUT /files/{sha256_file_hash}. For the latter endpoint, the client can calculate the hash of the uploaded file locally before the upload and then PUT it directly to the file hash. The idea behind this is performance optimization. The upload does not need to happen, if a file with the given hash was already uploaded to the server before.
When uploading a file that already exists, my server responds with 200 OK and then just cancels the multipart upload. Tools like curl do not like this and exit with an error complaining that they did not manage to upload all multipart data.
What would be RESTful behavior of the server? Is this performance optimization that I have implemented on my the server even RESTful? Should the server always accept all data that the client wants to upload? Should the server rely on the client to first GET /files and check on his own whether the file he wants to upload is already there?

Related

How would you build a back-end that accepts video .mp4's from an input form on the front-end?

React/Next front-end + Node.js and MongoDB on the back?
How would video storage work?
You can use post requests to send files to your remote server. You backend code should read request data and then store the file onto the disk or any object storage like s3. Most backend web frameworks have many libraries to store files received in an HTTP request directly to s3
Most of the web development frameworks have the capability to guess the mimetype but here since you know it's video/mp4 you can just save it.
I must warn you if you are trying to upload huge files it might be a better idea to use chunked uploads. This gives you ability to pause and resume and is robust to network failures.

HTTP GET file extension support

I discovered HTTP as a nice way to handle my files on my server. I write C programs based on the sockets interface.
When I issue a HTTP GET, I can easily download files, but just files with known extensions. A (backup) file with the extension XXX is "not found" (actually the response return code is 200 ("OK"), but the response content is an HTML page containing the error message (404 = not found).
How can I make sure that the web server sends any file I ask for? I have experimented with the Accept keyword in the HTTP GET request, but that does not help (or I make a mistake).
I do not own the server, so I can not alter the server settings. At the client server, I do not use a browser, only the sockets interface (see above).
I think it is important to understand that HTTP does not really have a concept of "files" and "directories." Instead, the protocol operates on locations and resources. While they can represent files and directories, they are absolutely not guaranteed to be the same.
The server in question seems to be configured to serve 404 error pages when encountering unknown extensions. This is a bit weird and absolutely not up to the standard. Though it may happen if a Web-Application Firewall is deployed. Again, HTTP does not trust file extensions in any way but relies on metadata in form of MIME media types instead. That would also be what goes (more or less) into the Accept header of a request.
How can I make sure that the web server sends any file I ask for?
Well, you can't. While the client may express preferences, the server is the ultimate authority on what gets sent in which way.

Server architecture / security for file server

suppose I have the following server / software architecture for my application:
Server serves mainly as file server. Security by obscurity is achieved by putting the data into folders like /[random A-Z0-9]/mydata001.zip.
Clients requests data over a REST API. The send their secret token (access rights checked against database) and tags of desired data.
Server responses with JSON containing URL to the zip files for the requested data.
The Client now can download the data over plain HTTP.
So the main load comes from the downloads, right? How can I scale such an architecture to say three servers? Only be duplicating the data?
Thanks for some advice.
If security is a concern, then at least use HTTPS for the authentication. When the files need to be stored on multiple servers without duplicating the data I can think of several options:
Store the files for a specific user always on the same server. This means each user profile contains the server where its files are stored. This only works when there are a lot of users in order to distribute the files evenly.
Randomly store the file on one of the file servers and save the file location in the database.

How can I POST directly to a database (similar to S3 post method)?

S3 allows you to post directly from browser to S3 bypassing your webserver (http://doc.s3.amazonaws.com/proposals/post.html). How can I upload files to a database in a similar fashion. I don't want to first stage the file in the webserver in a temporary file and then upload from there to the database. Thanks.
If I cannot avoid the webserver, then how do I just use the webserver for streaming and not actually land the file in the webserver before loading to the database.
Thanks.
A handful of DBMSes provide an HTTP connection design, but this is more the exception, not the rule.
That said, you can make the HTTP server a thin layer over a more traditional database, but this is probably a bad idea, because most databases assume that anything that can access them has full privilege to execute queries on them, and an application (read "web server") will act as a gatekeeper between the database and obnoxious or malicious clients.
Basically, You're going to do best using a database engine that does all of these things at a fine grained level, expressly designed for it. MongoDB mostly addresses this exact use case. Otherwise, you'll just have to write an application that sits between HTTP and the raw database connection.

iPhone: Strategies for uploading large files from phone to server

We're running into issues uploading hires images from the iPhone to our backend (cloud) service. The call is a simple HTTP file upload, and the issue appears to be the connection breaking before the upload is complete - on the server side we're getting IOError: Client read error (Timeout?).
This happens sporadically: most of the time it works, sometimes it fails. When a good connection is present (ie. wifi) it always works.
We've tuned various timeout parameters on the client library to make sure we're not hitting any of them. The issue actually seems to be unreliable mobile connectivity.
I'm thinking about strategies for making the upload reliable even when faced with poor connectivity.
The first thing that came to mind was to break the file into smaller chunks and transfer it in pieces, increasing the likelihood of each piece getting there. But that introduces a fair bit of complexity on both the client and server side.
Do you have a cleverer approach? How would you tackle this?
I would use the ASIHTTPRequest library. It's have some great features like bandwidth throttling. It can upload files directly from the system instead of loading the file into memory first. Also I would break the photo into like 10 parts. So for a 5 meg photo, it would be like 500k each. You would just create each upload using a queue. Then when the app goes into background, it can complete the part it's currently uploading. If you cannot finish uploading all the parts in the allocated time, just post a local notification reminding the user it's not completed. Then after all the parts have been sent to your server, you would call a final request that would combine all the parts back into your photo on the server-side.
Yeah, timeouts are tricky in general, and get more complex when dealing with mobile connections.
Here are a couple ideas:
Attempt to upload to your cloud service as you are doing. After a few failures (timeouts), mark the file, and ask the user to connect their phone to a wifi network, or wait till they connect to the computer and have them manually upload via the web. This isn't ideal however, as it pushes more work to your users. The upside is that implementationwise, it's pretty straight forward.
Instead of doing an HTTP upload, do a raw socket send instead. Using raw socket, you can send binary data in chunks pretty easily, and if any chunk-send times out, resend it until the entire image file is sent. This is "more complex" as you have to manage binary socket transfer but I think it's easier than trying to chunk files through an HTTP upload.
Anyway that's how I would approach it.