Creating best API: Upload N files and json metadata - rest

I am creating an API.
I am unsure what the API should look like.
Several BLOB files (PDF, JPG, ZIP, ...) should get uploaded and
some JSON which contains meta data.
What is the most up-to-date way to design the API?
There are two cases:
the upload was successful. Then I think 201 (created) would be feasible
the upload was not successful (for example invalid meta data). Then 422 (Unprocessable Entity) should get returned.
Example:
Three pdf-files should get uploaded (at once) associated with some json meta data.

What you often see is that you have a resource for handling the BLOBs and one for the meta-data - Facebook and Twitter is doing this for images and videos.
For example /files would take your BLOB data and return an ID for the uploaded BLOB data.
The metadata would be send to another resource, called /posts and could consume application/json.
In the application I currently work for, we had the same issue and decided to use one endpoint consuming multipart/form-data - here you can send the BLOBs and the metadata within different boundaries and have everything in one resource.
Another way would be to base64 encode the BLOBs, which will result in an 33 % overhead, and I therefore not recommend it. But with base64 you could do all your work in one application/json resource.

Related

ASP.NET Core Web API - PUT vs PATCH

I've been learning how to build an ASP.NET Core Web API using the official documentation. However, under the PUT section it says:
According to the HTTP specification, a PUT request requires the client to send the entire updated entity, not just the changes. To support partial updates, use HTTP PATCH.
I know that POST is used to create a new resource and PUT is used to update a particular resource. But I cannot understand what is meant by "entire updated entity, not just the changes".
I cannot understand what is meant by "entire updated entity, not just the changes".
You should review the HTTP specification, which describes the semantics of PUT.
A successful PUT of a given representation would suggest that a subsequent GET on that same target resource will result in an equivalent representation being sent in a 200 (OK) response.
Representation here refers to the bytes that would appear as the payload of the GET request. For a web page, we're talking about the HTML. For a resouce that responds with application/json, we're talking about the entire json document.
In other words, if you want to create/edit a json document, and save it on the web server, then you use HTTP PUT, and the payload of the request should be the entire JSON document.
For cases where the JSON document is very big (much bigger than the HTTP headers) and the changes you are making are small, then you might instead choose to send the changes in a patch request, where the payload of the request is a patch document (probably JSON Patch or JSON Merge Patch
Now, here's the magic trick -- everybody on the web understands HTTP requests the same way. PUT always means PUT, PATCH always means PATCH. So you can use PUT and PATCH to make changes to things on the web that aren't really documents, but might instead be something else (like an entity).
That's pretty much the point of an HTTP API - it's a facade that (when seen from the outside) makes our service look like just another web server.
The fact that your resource is really rows in a database instead of a document on a file system is an implementation detail that is hidden behind the Web API.
Supposed you have a json Data like below
{
"Id":123,
"FirstName":"John",
"MiddleName":"Random",
"LastName":"Doe"
}
Suppose, you want to change middle name here. you want to remove it. You can use both put and patch to do so.
PUT
Here you've to send the whole new entity (the changed state of the json model) in your request. Though you're only changing the Middle name part, the full json data is being sent in the request payload
PUT /user?id=123
{
"Id":123,
"FirstName":"John",
"MiddleName":"", //Only this is being changed.
"LastName":"Doe"
}
PATCH
In this case, you can only send the diff between the old model and the new model, and that will be it. So, here, in this case, only the changes are being sent, not the whole updated entity
PATCH /user?id=123
{
"MiddleName":"", //Only this is being changed.
}
Using PATCH can be handy when you want to edit only a few properties of a huge object.

ASP.NET Web Api Best Practice for long Creating, Polling, Deliverying long running jobs

We are working on a new RESTful api using ASP.NET Web API. Many of our customers receive a nightly datafeed from us. For each feed we run a schedule SQL Agent job that fires off a stored procedure which executes an SSIS package and delivers files via Email/FTP. Several customers would benefit from being able to run this job on demand and then receive either their binary file (xml, xls, csv, txt, etc.) or a direct transfer of the data in JSON or XML.
The main issue is that the feeds generally take a while to run. Most run within a few minutes but there are a couple that can take 20 minutes (part of the project is optimizing these jobs). I need some help finding a best practice for setting up this api.
Here are our actions and proposed REST calls
Create Feed Request
POST ./api/feedRequest
Status 201 Created
Returns feedID in the body (JSON or XML)
We thought POST would be the correct request type because we're creating a new request.
Poll Feed Status
GET ./api/feedRequest/{feedID}
Status 102 Processing (feed is processing)
Status 200 OK (feed is completed)
Cancel Feed Request
DELETE .api/feedRequest/{feedID}
Status 204 No Content
Cancels feed request.
Get Feed
GET .api/feed/{feedID}
Status 200 OK
This will return the feed data. We'll probably pass parameters into the header to specify how they want their data. Setting feedType to "direct" would require JSON or XML setting in Content-Type. Setting feedType to "xml", "xls", "csv", etc., will transfer a binary data file back to the user. For some feeds this is a custom template that is specified in the feed definition already stored in our tables.
Questions
Does it appear that we're on the right track? Any immediate suggestions or concerns?
We are trying to decide whether to have a /feed resource and a /feedRequest resource or whether to keep it all under /feed. The above scenario is the two resource approach. The single resource would POST /feed to start request, PUT /feed to check the status, GET /feed when it's done. The PUT doesn't feel right and right now we're leaning towards the stated solution above. Does this seem right?
We're concerned about very large dataset returns. Should we be breaking these into pieces or will REST service handle these large returns. Some feeds can be in excess of 100MB.
We also have images that may be generated to accompany the feed, they're zipped up in a separate file when the feed stored procedure and package are called. We can keep this all in the same request and call GET /feed/{feedID}/images on the return.
Does anyone know of a Best Practice or a good GitHub example we could look at that does something similar to this with MS technologies? (We considered moving to ASP.NET Core as well.

Google Cloud Storage: Setting incorrect MIME-type

I have a Node.js server running on a Google Compute Engine virtual instance. The server streams incoming files to Google Cloud Storage GCS. My code is here: Node.js stream upload directly to Google Cloud Storage
I'm passing Content-Type in the XML headers and it's working just fine for image/jpeg MIME-types, but for video/mp4 GCS is writing files as application/octet-stream.
There's not much to this, so I'm totally at a loss for what could be wrong ... any ideas are welcome!
Update/Solution
The problem was due to the fact that the multiparty module was creating content-type: octet-stream headers on the 'part' object that I was passing into the pipe to GCS. This caused GCS to receive two content-types, of which the octet part was last. As a result, GCS was using this for the inbound file.
Ok, looking at your HTTP request and response it seems like content-type is specified in the URL returned as part of the initial HTTP request. The initial HTTP request should return the endpoint which can be used to upload the file. I'm not sure why that is specified there but looking at the documentation (https://developers.google.com/storage/docs/json_api/v1/how-tos/upload - start a resumable session) it says that X-Upload-Content-Type needs to be specified, along some other headers. This doesn't seem to be specified in HTTP requests that were mentioned above. There might be an issue with the library used but the returned endpoint does not look as what is specified in the documentation.
Have a look at https://developers.google.com/storage/docs/json_api/v1/how-tos/upload, "Example: Resumable session initiation request" and see if you still have the same issue if you specify the same headers as suggested there.
Google Cloud Storage is content-type agnostic, i.e., it treats any kind of content in the same way (videos, music, zip files, documents, you name it).
But just to give some idea,
First I believe that the video () you are uploading is more or less size after it being uploded. so , it falls in application/<sub type>. (similar to section 3.3 of RFC 4337)
To make this correct, I believe you need to fight with storing mp4 metadata before and after the file being uploaded.
please let us know of your solution.
A solution that worked for me in a similar situation is below. TLDR: Save video from web app to GCS with content type video/mp4 instead of application/stream.
Here is the situation. You want to record video in the browser and save it to Google Cloud Storage with a content type set to video/mp4 instead of application/octet-stream. User records video and clicks button to send video file to your server for saving. After sending the video file from the client to your server, the server sends the video file to Google Cloud Storage for saving.
You successfully save the video to Google Cloud Storage and by default GCS assigns a content type of application/octet-stream to the video.
To assign a content type video/mp4 instead of application/octet-stream, here is some server-side Python code that works.
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_file(file_obj, rewind=True)
blob.content_type = 'video/mp4'
blob.patch()
Here are some links that might help.
https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-python
https://stackoverflow.com/a/33320634/19829260
https://stackoverflow.com/a/64274097/19829260
NOTE: at the time of this writing, the Google Docs about editing metadata don't work for me because they say to set metadata but metadata seems to be read-only (see SO post https://stackoverflow.com/a/33320634/19829260)
https://cloud.google.com/storage/docs/viewing-editing-metadata#edit

Post JSON data to Facebook

In the app, I want to post a photo, and some text. I am able to post if I am using local stored data in resources but when data (in JSON format) is coming from server at the run time, I am not able to post that image and text which is coming from server in the JSON format.
Is there any way to post data at the runtime or I have to store the data at the client side, but in that case, the app will be bulky because data could be different at different locations?
I am not sure but you may be asking about posting an image using a URL instead of assuming data is local. If so, see this blog post - https://developers.facebook.com/blog/post/526/ and it introduced the ability to post an image by passing in a "url" parameter through the Graph API.

upload small thumbnail images from iPhone app to Amazon S3

I noticed couple thread on this already and they even provided sample code.
http://brunofuster.wordpress.com/2010/11/03/uploading-an-image-from-iphone-with-uiimagepicker-and-asihttprequests3/
But what baffled me is that - there was no response to get handled? is it because that s3 doesn't return any response? I am expecting to receive at least an URL to the image on S3, how could I get that?
If you look at the S3 REST object PUT documentation you will see the response that is returned from S3.
When you post to S3 you know the bucket name you are putting the image into plus you know the filename. These two pieces of information should be all you need to get a url to the image.
The documentation states that in addition to the PUT response header(s) you can see some of the common headers too.
This implementation of the operation
can include the following response
headers in addition to the response
headers common to all responses. For
more information, see Common Response
Headers.
If you look at the ASIHTTPRequest Amazon Simple Storage Service (S3) support you will see how to get a response from the ASIS3ObjectRequest object.
Tom,
If you wish to just get your S3 image url, you don't need the response information considering you already know the image name and the bucket (if there was no error).
Anyway, you can get the response from a sync request by using [request responseString|responseData].
But the right thing to do is an async call using operation queues and delegates to get the response success or error. My blog post just provided a minimal sample. I will look into that and improve the post itself.
Thanks!
In addition to the answers already provided, you might also want to look at Amazon's recently released AWS SDK for iOS, which includes sample code for uploading images, etc. to S3.