Is S3 REST for PUT (direct upload) operation chunked? - rest

In S3 REST API, how does the PUT operation i.e. a direct upload not the multipart upload exactly send requests for such large files i.e. Gigabytes through HTTP? Is the direct upload also chunked (like the multipart upload) and has a defined size internally?
When tried doing a PUT (direct upload) operation using S3 REST API, the maximum I could upload was around 5GB which is what even Amazon says their maximum limit for direct upload is. But when tried uploading a file which larger then the limit it throws an exception "Your proposed upload exceeds the maximum allowed size" and also has a HTTP response returned where the header tag 'transfer-encoding' is 'chunked'.

Here's a randomly-selected error response from S3.
< HTTP/1.1 412 Precondition Failed
< x-amz-request-id: 207CAFB3CEXAMPLE
< x-amz-id-2: EXAMPLE/DCHbRTTnpavsMQIg/KRRnoEXAMPLEBJQrqR1TuaRy0SHEXAMPLE5otPHRZw4EXAMPLE=
< Content-Type: application/xml
< Transfer-Encoding: chunked
< Date: Fri, 23 Jun 2017 19:51:52 GMT
< Server: AmazonS3
<
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>...
The Transfer-Encoding: chunked response header only indicates that the error response body S3 is sending back to you will use chunked transfer encoding.
This is unrelated to what is permitted for uploads, and the presence of Transfer-Encoding: chunked in either direction (request or response) of an HTTP transaction is independent of whether it is present or supported in the opposite direction.
The PUT object REST API call does not support Transfer-Encoding: chunked on the request. It requires Content-Length: in the request headers, which precludes using chunked transfer encoding.
There is no chunking, blocking, etc., mechanism involved at the HTTP layer in standard uploads -- there is no meaningful internal structure "part-size," because there are no parts: it's a continuous TCP stream of un-encoded octets of exactly Content-Length length (number of octets/bytes), with retries and network errors handled by TCP, and HTTP unaware of these mechanisms.
If the Content-Length header you send exceeds the maximum allowed upload, you get the error about your proposed upload exceeding the maximum allowed size. If the connection is accidentally or intentionally severed before Content-Length number of octets are received by S3, the uploaded data is discarded, because partial objects are never created.

Related

ReST low latency - how should I reply to a GET while an upload is pending?

I am designing a ReST API which follows the basic CRUD pattern.
My API can receive a request to update a resource which may take a short time to process. Ideally I would like to inform clients that a new version is about to be available and that there is some uncertainty over when the version I have cached actually expires.
So the process I intend to use something like this (improvements welcome):
client: GET /some/item
myapi: 200 OK
last-modified: time-stamp-of-v1
etag: some-hash-relating-to-v1-of-my-item-in-this-format
content: json or whatever
data/for/some/item/v1...
client: PUT /some/item
if-match: some-hash-relating-to-v1-of-my-item-in-this-format
content: json or whatever
data/for/some/item/v2...
myapi: 202 ACCEPTED,
content: json or whatever
time-accepted: time-stamp-after-v1-but-before-v2
your item will be at /some/item
here is a URI /some/taskid to track progress
while upload is pending:
client: GET /some/item
myapi: 200 OK
some/item ...
last-modified: time-stamp-of-v1
etag: some-hash-relating-to-v1-of-my-item-in-this-format
>>>> expires: time-stamp-after-v1-but-before-v2 <<<
>>>> warning: 110 Response is stale <<<<
content: json or whatever
data/for/some/item/v1...
client: GET /some/task/id
myapi: 200 OK
content: json or whatever
time-accepted: time-stamp-after-v1-but-before-v2
your item will be at /some/item
status/of/upload/v2...
after task completed:
client: GET /some/item
myapi: 200 OKAY
some/item/v2 ...
last-modified: time-stamp-of-v2
etag: some-hash-relating-to-v2-of-my-item-in-this-format
content: json or whatever
data/for/some/item/v2...
client: GET /some/task/id
myapi: 303 SEE OTHER
look-here: /some/item
If you are a proxy and know know your content is stale you can put "warning: 110 - response is stale" in the header.
However, in this case the data is not actually invalid yet.
I would like to say that I can guarantee it is valid up until the time I received and passed on the upload request (time-stamp-after-v1-but-before-v2 or later as if I am in contact with the upload server). It hasn't really expired at the time I receive the upload request. I just expect its going to.
(In fact if the request fails it might not be updated at all).
Now the default choice is just to serve the old content and let the client catch up on its own. This has high latency. If possible, I would like to do better.
For example, if the client knows the document is about to expire it could poll more often or it could try to upgrade the connection to a web-socket and get sent an update the moment I get it (would that still count as ReST?)
There is another case where using expired data must be avoided at all costs. For that scenario I think I want to tell the client that the resource is temporarily unavailable. Using the warning and expires fields as I have above seems correct there. Though it might be better to send a 503 with a suitable retry-after header.
So the question is: how should I reply to a GET while the upload of a new version is pending?
In anticipation of answers along the lines of use a messaging framework like AMQP or zeroMQ instead for low latency, I should point out this API is acting as a AMQP gateway/proxy for clients unwilling to use AMQP directly. Information on using webhooks or websockets would be still be interesting.
Some related useful content is:
How to proper design a restful API to invalidate a cache?
https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
HTTP status code for temporarily unavailable pages
http://www.albertoleal.me/posts/how-to-prevent-race-conditions-in-restful-apis.html
(the etag prevents races from simultaneously uploads)
Tl;Dr;
While upload is pending send:
client: GET /some/item
myapi: 200 OK
some/item ...
last-modified: time-stamp-of-v1
etag: some-hash-relating-to-v1-of-my-item-in-this-format
expires: time-stamp-after-v1-but-before-v2
stale-while-revalidate: 100
warning: 110 Response is stale
content: json or whatever
data/for/some/item/v1...
At first sight it looks like using Warning is not correct. See https://www.rfc-editor.org/rfc/rfc7234#section-5.5.0
In this case the server is acting as a proxy (though not an HTTP proxy).
It is not disconnected from AMQP and "A proxy MUST NOT send stale responses" unless it is disconnected.
This is annoying as it looked like the right thing to do here.
4.2.4. Serving Stale Responses
A "stale" response is one that either has explicit expiry
information or is allowed to have heuristic expiry calculated, but
is not fresh according to the calculations in Section 4.2.
A cache MUST NOT generate a stale response if it is prohibited by
an explicit in-protocol directive (e.g., by a "no-store" or
"no-cache" cache directive, a "must-revalidate"
cache-response-directive, or an applicable "s-maxage" or
"proxy-revalidate" cache-response-directive; see Section 5.2.2).
**> A cache MUST NOT send stale responses unless it is disconnected
(i.e., it cannot contact the origin server or otherwise find a
forward path) or doing so is explicitly allowed (e.g., by the
max-stale request directive; see Section 5.2.1).**
A cache SHOULD generate a Warning header field with the 110
warn-code (see Section 5.5.1) in stale responses. Likewise, a
cache SHOULD generate a 112 warn-code (see Section 5.5.3) in stale
responses if the cache is disconnected.
A cache SHOULD NOT generate a new Warning header field when
forwarding a response that does not have an Age header field, even if
the response is already stale. A cache need not validate a response
that merely became stale in transit.
Also
4.4. Invalidation
Because unsafe request methods (Section 4.2.1 of [RFC7231]) such as
PUT, POST or DELETE have the potential for changing state on the
origin server, intervening caches can use them to keep their contents
up to date.
**> A cache MUST invalidate the effective Request URI (Section 5.5 of
[RFC7230]) as well as the URI(s) in the Location and Content-Location
response header fields (if present) when a non-error status code is
received in response to an unsafe request method.**
However a warning is required if stale-while-revalidate is used (see https://www.rfc-editor.org/rfc/rfc5861)
The stale-while-revalidate Cache-Control Extension
When present in an HTTP response, the stale-while-revalidate Cache-
Control extension indicates that caches MAY serve the response in
which it appears after it becomes stale, up to the indicated number
of seconds.
stale-while-revalidate = "stale-while-revalidate" "=" delta-seconds
If a cached response is served stale due to the presence of this
extension, the cache SHOULD attempt to revalidate it while still
serving stale responses (i.e., without blocking).
I thought this was unclear so I submitted an errata. This was rejected (though at the time of writing its still showing as reported) on the grounds that the cache control extensions in rfc5861 override the MUST NOT in rfc7234 ("doing so is explicitly allowed" see above).
It is okay to use expires but its not very helpful as it doesn't imply anything.
5.3. Expires
The "Expires" header field gives the date/time after which the
response is considered stale. See Section 4.2 for further discussion
of the freshness model.
**> The presence of an Expires field does not imply that the original
resource will change or cease to exist at, before, or after that
time.**

REST API Design: Respond with 406 or 404 if a resource is not available in a requested representation

We have a REST API to fetch binary files from the server.
The requests look like
GET /documents/e62dd3f6-18b0-4661-92c6-51c7258f9550 HTTP/1.1
Accept: application/octet-stream
For every response indicating an error, we'd like to give a reason in JSON.
The problem is now, that as the response is not of the same content type as the client requested.
But what kind of response should the server produce?
Currently, it responds with a
HTTP / 1.1 406 Not Acceptable
Content-Type: application/json
{
reason: "blabla"
...
}
Which seems wrong to me, as the underlying issue is, that the resource is not existing and not the client requesting the wrong content type.
But the question is, what would be the right way to deal with such situations?
Is it ok, to respond with 404 + application/json although application/octet-stream was requested
Is it ok, to respond with 406 + application/json, as the client did not specify an application/json as an acceptable type
Should spec been extended so that the client should use the q-param - for example, application/octet-stream, application/json;q=0.1
Other options?
If no representation can be found for the requested resource (because it doesn't exist or because the server wishes to "hide" its existence), the server should return 404.
If the client requests a particular representation in the Accept header and the server is not available to provide such representation, the server could either:
Return 406 along with a list of the available representations. (see note** below)
Simply ignore the Accept header and return a default representation of the resource.
See the following quote from the RFC 7231, the document the defines the content and semantics of the HTTP/1.1 protocol:
A request without any Accept header field implies that the user agent will accept any media type in response. If the header field is present in a request and none of the available representations for the response have a media type that is listed as acceptable, the origin server can either honor the header field by sending a 406 (Not Acceptable) response or disregard the header field by treating the response as if it is not subject to content negotiation.
Mozilla also recommends the following regarding 406:
In practice, this error is very rarely used. Instead of responding using this error code, which would be cryptic for the end user and difficult to fix, servers ignore the relevant header and serve an actual page to the user. It is assumed that even if the user won't be completely happy, they will prefer this to an error code.
** Regarding the list of available representations, see this answer.

How do i send a POST request without Transfer Encoding:chunked from Jersey ReST Client 2.22.2

When i send a POST request through Jersey ReST client it's automatically using Header transfer-encoding: [chunked].
Is there any way to force use of content-length: instead of transfer-encoding.?
WebTarget webTarget = client.target(connection.getServerUrl());
Invocation.Builder builder = webTarget.request(MediaType.APPLICATION_XML);
Response response = builder.post(Entity.xml(requestBroker));
After adding Content-Length property too the behavior is same
WebTarget webTarget = client.target(connection.getServerUrl());
Invocation.Builder builder = webTarget.request(MediaType.APPLICATION_XML);
Entity entity = Entity.xml(requestBroker);
client.property("Content-Length", entity.toString().getBytes().length);
Response response = builder.post(Entity.xml(requestBroker));
HTTP 1.1 version onwards chunked transfer encoding is default for POST, in this data is sent as chunks and hence the senders can begin transmitting dynamically-generated content before knowing the total size of that content. The size of each chunk is sent right before the chunk itself so that the receiver can tell when it has finished receiving data for that chunk. The data transfer is terminated by a final chunk of length zero.
Is there any way to force use of content-length: instead of
transfer-encoding
Set the Content-Length header before sending your POST request. But this will work only in http 1.0, and when you set the content length, and if the post request data size is more than the content length then the data received will be truncated.
In the version 1.1 of the HTTP protocol, the chunked transfer mechanism is considered to be always and anyways acceptable, even if not listed in the TE (transfer encoding) request header field, and when used with other transfer mechanisms, should always be applied last to the transferred data and never more than one time. Source Wikipedia - Chunked Transfer Encoding
Whereas in the response, we can avoid Transfer-Encoding by setting the BufferSize on response using response.setBufferSize(). But if our response size goes beyond the bufferSize it would fallback to Transfer-Encoding: Chunked.
Different Transfer Mechanisms
More Info:
Content-Length header versus chunked encoding
Remove Transfer-Encoding:chunked in the POST request?
avoiding chunked encoding of HTTP/1.1 response
Hope it Helps!

InvalidXmlRequest error for Azure REST API request

I am using the Azure REST API to create an Azure storage account using the documentation at: http://msdn.microsoft.com/en-us/library/hh264518.aspx
I keep getting the 400 error with code InvalidXmlRequest ("The request body's XML was invalid or not correctly specified."). The only related thread seemed to be at Management API - The request body XML was invalid or not correctly specified - I have tried several variations on my request (like removing xml header, removing empty elements from body, etc.) but still see the same error.
There is no requestId in the response header either (to get more info using GET OperationStatus).
The complete RAW request and response (for one of my trials) is below.
Any ideas on what I am missing here?
Request:
POST https://management.core.windows.net/<mysubscriptionid>/services/storageservices HTTP/1.1
x-ms-version: 2011-06-01
Content-Type: application/xml
Host: management.core.windows.net
Content-Length: 350
Expect: 100-continue
<?xml version="1.0" encoding="utf-8"?><CreateStorageServiceInput xmlns="http://schemas.microsoft.com/windowsAzure"><ServiceName>gjhgkjhgkgk</ServiceName><Description /><Label>gjhgkjhgkgk</Label><AffinityGroup /><Location>North Central US</Location><GeoReplicationEnabled>true</GeoReplicationEnabled><ExtendedProperties /></CreateStorageServiceInput>
Response:
HTTP/1.1 400 Bad Request
Content-Length: 228
Content-Type: application/xml; charset=utf-8
Server: Microsoft-HTTPAPI/2.0
Date: Sun, 20 Oct 2013 02:33:08 GMT
<Error xmlns="http://schemas.microsoft.com/windowsazure" xmlns:i="http://www.w3.org/2001/XMLSchema-instance"><Code>InvalidXmlRequest</Code><Message>The request body's XML was invalid or not correctly specified.</Message></Error>
2 things I noticed:
Value in label element should be base64 encoded as mentioned in the documentation here: http://msdn.microsoft.com/en-us/library/windowsazure/hh264518.aspx.
Label Required. A name for the storage account specified as a
base64-encoded string. The name may be up to 100 characters in length.
The name can be used identify the storage account for your tracking
purposes.
Not related to your problem per se but you're trying to create a storage account in North Central US region. Please note that you can't create new resources in North Central and South Central US region.

HTTP Accept-Encoding and sending unencoded data

I building a module for compressing HTTP output. Reading the spec, I haven't found a clear distinction on a couple of things:
Accept-Encoding:
Should this be treated the same as a Accept-Encoding: * or as if no header is present?
Or what if I don't support gzip, but I get a header like this:
Accept-Encoding: gzip
Should I return a 406 error or just return the data unencoded?
EDIT:
I've read over the spec a few times. It mentions my first case, but it doesn't define what the behavior of the server should be.
Should I treat this case as if the header is not present? Or should I return a 406 error because there's no way to encode something given the field value ('' isn't a valid encoding).
There is written everything in the Spec: 14.3 Accept-Encoding:
The special "*" symbol in an Accept-Encoding field matches any
available content-coding not explicitly listed in the header
field.
If an Accept-Encoding field is present in a request, and if the server cannot send a response which is acceptable according to the Accept-Encoding header, then the server SHOULD send an error response with the 406 (Not Acceptable) status code.
edit:
If the Accept-Encoding field-value is empty, then only the "identity"
encoding is acceptable.
In this case, if "identity" is one of the available content-codings, then the server SHOULD use the "identity" content-coding, unless it has additional information that a different content-coding is meaningful to the client.
What is "identity"
identity
The default (identity) encoding; the use of no transformation whatsoever. This content-coding is used only in the Accept- Encoding header, and SHOULD NOT be used in the Content-Encoding header.