Amazon S3: Cache-Control and Expiry Date difference and setting trough REST API

Amazon S3: Cache-Control and Expiry Date difference and setting trough REST API - rest

I want to enhance my sites loading speed, so I use http://gtmetrix.com/, to check what I could improve. One of the lowest rating I get for "Leverage browser caching". I found, that my files (mainly images), have problem "expiration not specified".
Okay, problem is clear, I thought. I start to googling and I found that amazon S3 prefer Cache-Control meta data over Expiry date (I lost this link, now I think maybe I misunderstood something). Anyway, I start looking for how to add cache-control meta to S3 object. I found
this page: http://www.bucketexplorer.com/documentation/amazon-s3--how-to-set-cache-control-header-for-s3-object.html
I learned, that I must add string to my PUT query.
x-amz-meta-Cache-Control : max-age= <value in seconds> //(there is no need space between equal sign and digits(I made a mistake here)).
I use construction: Cache-control:max-age=1296000 and it work okay.
After that I read
https://developers.google.com/speed/docs/best-practices/caching
This article told me: 1) "Set Expires to a minimum of one month, and preferably up to one year, in the future."
2) "We prefer Expires over Cache-Control: max-age because it is is more widely supported."(in Recommendations topic).
So, I start to look way to set Expiry date to S3 object.
I found this:
http://www.bucketexplorer.com/documentation/amazon-s3--set-object-expiration-on-amazon-s3-objects-put-get-delete-bucket-lifecycle.html
And what I found: "Using Amazon S3 Object Lifecycle Management , you can define the Object Expiration on Amazon S3 Objects . Once the Lifecycle defined for the S3 Object expires, Amazon S3 will delete such Objects. So, when you want to keep your data on S3 for a limited time only and you want it to be deleted automatically by Amazon S3, you can set Object Expiration."
I don't want to delete my files from S3. I just want add cache meta for maximum cache time or/and file expiry time.
I completely confused with this. Can somebody explain what I must use: object expiration or cache-control?

S3 lets you specify the max-age and Expires header for cache control , CloudFront lets you specify the Minimum TTL, Maximum TTL, and Default TTL for a cache behavior.
and these header just tell when will the validity of an object expires in the cache(be it cloudfront or browser cache) to read how they are related read the following link
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html#ExpirationDownloadDist
For letting you Leverage Browser caching just specify the Cache control header for all the object on s3 do
Steps for adding cache control for existing objects in your bucket
git clone https://github.com/s3tools/s3cmd
Run s3cmd --configure
(You will be asked for the two keys - copy and paste them from your
confirmation email or from your Amazon account page. Be careful when
copying them! They are case sensitive and must be entered accurately
or you'll keep getting errors about invalid signatures or similar.
Remember to add s3:ListAllMyBuckets permissions to the keys or you will get an AccessDenied error while testing access.)
./s3cmd --recursive modify --add-header="Cache-Control:public ,max-age= 31536000" s3://your_bucket_name/

Your files won't be deleted, just not cached after the expiration date.
The Amazon docs say:
After the expiration date and time in the Expires header passes, CloudFront gets the object again from the origin server every time an edge location receives a request for the object.
We recommend that you use the Cache-Control max-age directive instead of the Expires header field to control object caching. If you specify values both for Cache-Control max-age and for Expires, CloudFront uses only the value of max-age.

"Amazon S3 Object Lifecycle Management" flushs some objects from your bucket based on a rule you can define. It's only about storage.
What you want to do is set the Expires header of the HTTP request as you set the Cache-Control header. It works the same: you juste have to add this header to your PUT query.
Expires doesn't work as Cache-Control: Expires gives a date. For instance: Sat, 31 Jan 2013 23:59:59 GMT
You may read this: https://web.archive.org/web/20130531222309/http://www.newvem.com/how-to-add-caching-headers-to-your-objects-using-amazon-s3/

Related

Setting "default" metadata for all+new objects in a GCS bucket?

I run a static website (blog) on Google Cloud Storage.
I need to set a default metadata header for cache-control header for all existing and future objects.
However, editing object metadata instructions show the gsutil setmeta -h "cache-control: ..." command, which doesn't seem to be neither applying to "future" objects in the bucket, nor giving me a way to set a
bucket-wide policy that can be inherited to existing/future objects (since the command is executed per-object).
This is surprising to me because there are features like gsutil defaclwhich let you set a policy for the bucket that is inherited by objects created in the future.
Q: Is there a metadata policy for the entire bucket that would apply to all existing and future objects?

There is no way to set default metadata on GCS objects. You have to set the metadata at write time, or you can update it later (e.g., using gsutil setmeta).

Extracted from this question
According to the documentation, if an object does not have a Cache-Control entry, the default value when serving that object would be public,max-age=3600 if the object is publicly readable.
In the case that you still want to modify this meta-data, you could do that using the JSON API inside a Cloud Funtion that would be triggered every time a new object is created or an existing one is overwritten.

Mapping a no-args function to REST

Let's imagine you have a REST resource Restriction (think of a roadblock) that has some filters (e.g. street, direction etc.). The restriction has an expiry which is a datetime. It is only considered in the application logic, when the expiry is in the future or unset/null (no expiry).
Now with procedural style, I could just have a method on the restriction saying expire() which would set the expiry to the current time.
With REST we modify the state of resources instead. I am torn between these more or less functionally equivalent API definitions:
PATCH /restrictions/{id}
data = {
"expiry": 1558654742
}
Client explicitly sets the expiry. This bears the risk of user error with time zones, wrong host clocks etc. Also the client is not supposed to have any choice other than current time.
PATCH /restrictions/{id}
data = {
"expired": true
}
The expired field is a transient virtual property that is translated on the backend to expiry = now. This might be confusing for clients. Also the value for expired can be only true, so there's some redundancy here.
DELETE /restrictions/{id}
Resource stays persisted (soft delete), but is not returned by GET on the collection, which only returns non-expired restrictions. There is no GET on individual restrictions.
PUT /restrictions/{id}/expiry
data = {}
Creates a new virtual resource (no other methods on this path) which represents the expiry. Not sure whether PUTs without any data are idiomatic though.
Right now I do not plan on returning individual restrictions, and the list of all restrictions will return only the non-expired ones by default.
Which one of those methods would you consider the most idiomatic and obvious for a RESTful web service?

If the resource will return a 404 after expiring, DELETE is a great method for this.

Which one of those methods would you consider the most idiomatic and obvious for a RESTful web service?
If you want idiomatic REST, think about how you would do it with a web site.
You'd probably start with GET /restrictions/{id}, and in addition to data about the restriction there would be a form -- maybe embedded in the representation, maybe in the representation of another resource but available via link. You would then submit that form, which would bundle up the fields into an application/x-www-form-urlencoded docement included in a POST request. You could use any URI as the action of the form (and therefore the target-uri of the POST request), but the most useful would probably be POST /restrictions/{id}, because HTTP compliant clients will know to invalidate any previously cached representations of /restrictions/{id}.
Right now I do not plan on returning individual restrictions, and the list of all restrictions will return only the non-expired ones by default.
Same game, but instead of using an identifier for the individual restriction, you would use the uri for the-list-of-non-expired-restrictions. Your resource model doesn't have to match your data model.
There's no rule that says the content-type of a POST must be application/x-www-form-urlencoded. You could post other representations, including your own custom types if that makes things easlier (of course, you have to document the type, and the only clients that are going to send the type are those that have implemented it; the big advantage to the standard media types is that you get lots of clients "for free".)
PUT and PATCH are acceptable alternatives if modifying representations directly seems reasonable. PUT (and probably PATCH, by inference) doesn't actually require that the server accepts the requests as is:
A successful PUT of a given representation would suggest that a subsequent GET on that same target resource will result in an equivalent representation being sent in a 200 (OK) response.
However, there is no guarantee that such a state change will be observable, since the target resource might be acted upon by other user agents in parallel, or might be subject to dynamic processing by the origin server, before any subsequent GET is received. A successful response only implies that the user agent's intent was achieved at the time of its processing by the origin server.
There are conventions to be followed in the response that distinguish "I accepted your representation as is" from other responses.

One of the few constraints REST has is the one of cacheability. The constraints, as the name implies, are, according to Fielding, not an option, even though he talked about hypermedia in this context. Though the general rule applies here as well.
Caching allows local or intermediary applications, so called caches, to store the response of a certain URI for a certain amount of time and if a further safe request for that same URI hits the cache, the cache will serve the client with its stored response rather then routing the request to the server.
In case of your expiration you need to include thoughts about the expiration of such cached values as well, otherwise caches might still serve clients with data that shouldn't exist further.
HTTP talks about caching in RFC 7234 in detail and responses may tell a cache for how long a resource should be considered fresh via either the Cache-Control header or the Expires header. If both are present the former one wins. In example a response header such as Cache-Control: max-age=3600 defines that a response should be considered fresh for up to 3600 seconds while Expires should use a date-time format specified in RFC 7231 such as Expires: Fri, 24 May 2019 05:20:00 CET.
Unfortunately though, RFC 7234 does not talk about how a client can set such a directive actively, as this is considered a server task. Cache-Control does provide some request headers, though these are more of indications that a client either will or wont accept stale data, but not directives to the server to set a respective expiration date. Usually, if a client does not want a certain resource to be further available it should DELETE the resource. If you read up on DELETE though, you might be astonished that it actually does not guarantee that the resource will ever be removed at all. All it states is, that after successfully processing such a request the mapping of the URI to the resource is removed. Whether the same resource is accessible through a different URI or not is not is a different story. "Customizing" DELETE with some kind of parameter to remove the resource after a certain amount of time might work for your API, though might not be understandable for different kind of APIs, therefore this is not recommendable in general.
By using PATCH the information on the expiration timestamp needs to be part of the resource itself. Usually such an information is considered a meta data of a resource and not the or part of the actual data. Therefore I'm not in favor for PATCH either, though this is clearly an opinionated take.
If all of the other HTTP methods do not fit the bill, POST should be taken, as here the server will process the request according to its own semantic. It may apply different heuristics upon receiving different payloads on the same endpoint as well. If you'd have to design such a feature on the Web you may have an edit page of an entry where you have the option to set an expiration date. Upon clicking the submit button of the form your browser is performing a POST request, including the expiration date, and the server will know what to do based on certain heuristics available on the server. I.e. the availability of the expiration date field in the request may instruct the server to actually queue a removal of the entry, update the expiration date meta data of the target resource and return an updated Cache-Control: max-age=... or Expires: ... header on incoming requests to also inform caches about not shipping cached responses of that resource past that time point.
Usually, unsafe operations, like POST, PUT or DELETE invalidate cached responses of a target resource by default, in a case where two users perform cacheable GET requests against a server, both having an intermediary cache in between that differs from the intermediary cache of the other user, and user 1 now expires the origin resource, there might be a case where user 2 will still get served by "his" intermediary cache for the target resource, even though the resource was already deleted on the origin server as his intermediary cache still sees the response as fresh enough and therefore serves use 2 with that stored response. The cached response of the target URI in user 1's cache should already have been removed by the initial POST request but also any eventually cacheable response might have returned an updated cache header and thus led to an expiration at the specified time point. It is therefore important to set the time values for the cache not to high into the future, but also not to short so that caching gets useless.
For resources that may be critical to get removed and not served by caches, in order to prevent the above mentioned case it is probably best to specify Cache-Control: no-cache in general so that such entries are not stored by caches further and requests are directly processed by the API/server itself.
To sum up this post, something like an expiration time point should be considered meta-data of a resource and not main-data. While DELETE may sound great at first, it does not support a removal after some time and an other API might perform such a request immediately, besides not guaranteeing to really remove that resource at all. POST, as the all-purpose toolset of HTTP, or other HTTP operations such as PUT or PATCH may be used here as well, even though the latter "work" under the premise that the body of a request belongs to the actual data of the resource. You should also consider caching into your design and use either Cache-Control: max-age=... or Expires: ... if your resource is non-critical or Cache-Control: no-cache in case of resources that must not ever (for whatever reason) return outdated information to clients. Regardless of the HTTP method you use, you should also think about how the server is allowing the client to set this option in general. Similar to the Web, a form-based solution avoids out-of-band information and thus simplifies the interaction with the API in general, as all of the information are already provided or are obtainable via further links.

Usage of nbf in json web tokens

nbf: Defines the time before which the JWT MUST NOT be accepted for processing
I found this definition about nbf in json web tokens. But still wondering what the usage of nbf is? Why we use this? Does it relate to the term of security?
Any idea would be appreciated.

It definitely is up to how you interpret the time.
One of possible scenarios I could make up is literally - when a token must last from some particular point in time til another point in time.
Say, you're selling some API or resource. And a client purchased access that lasts for one hour and the access starts tomorrow in the midday.
So you issue a JWT with:
iat set to now
nbf set to tomorrow 12:00pm
exp set to tomorrow 1:00pm

There is one more thing to add what #zerkms told, if you want the token to be used from now, then
nbf also need to be current time(now)
Otherwise you'll get error like the token cannot be used prior to this particular time.

It can be given a time of 3 seconds from time of creation to avoid robots and allow only humans users to access the API.
'nbf' means 'Not Before'.
If nbf=3000, then the token cannot be used before 3 seconds of creation. This makes a brute force attack nearly impossible.

Why would ETags set to a MUST requirement if you already have the resource?

Why would you set ETags to a "MUST requirement level"?
You obtains the resource before the ETags returned...
I'm working on a project where I am the client that sends HTTP requests to a server that returns an HTTP Cache-Control header with ETags to cache response (where in each addition request it gets compared to the If-None-Match header to determine if the data is stale and if a new request should be made). In my current project the ETags parameter is using the conditional GET architecture with the MUST requirement level as specified in RFC 2119.
MUST This word, or the terms "REQUIRED" or "SHALL", mean that the definition is an absolute requirement of the specification.
I don't understand the intent of using a conditional GETwith the MUST requirement level? From my understanding the MUST requirement is there to limit (is that right?) the resources provided to the client that makes the request, however the client (me in this case) already has the resources from the first request. Where I can continue obtaining the same resource (or a fresher resource if it gets updated) as much as I want with or without returning the If-None-Match and ETag header fields.
What would be the purpose of setting it to the MUST requirement level in this case if it's not limiting the resources returned, Aside from being able to cache and limiting the amount of requests to the server (Im asking from the client point of view, yes I know I can cache it but why the MUST requirement)? Isn't this only used for limiting resources?
So basically, doesn't it make this MUST requirement not a requirement if I can obtain the resources with or without it? Am I missing something here?
My Question is not asking the what and how Etags, Cache-Control, or If-None-Match headers work.
Thanks in advance, cheers!

Why would ETags set to a MUST requirement if you already have the resource?
A client MUST use a conditional GET to reduce the data traffic.
Aside from being able to cache and limiting the amount of requests to the server
The number of requests stays the same, but the total number of data transferred changes.
Using ETags in if-none-matched GET requests (conditional GET)
When you make a API call, the response header includes an ETag with a value that is the hash of the data returned in the API call. You store this ETag value for use in the next request.
The next time you make the same API call, you include the If-None-Match request header with the ETag value stored from the first step.
If the data has not changed, the response status code will be 304 – Not Modified and no data is returned.
If the data has changed since the last query, the data is returned as usual with a new ETag. The game starts again: you store the new ETag value and use it for subsequent requests.
Why?
The main reason for using conditional GET requests is to reduce data traffic.
Isn't this only used for limiting resources?
No...
You can ask an API for multiple resources in one request.
(Ok, thats also limiting resources by saving the other requests.)
You can prevent a method (e.g. PUT) from modifying an existing resource, when the client believes that the resource does not exist (replace protection).
I can obtain the resources with or without it?
When you ignore the "MUST use conditional GET" then (a) the traffic will increase and (b) you lose the "resource has changed" indication coming from server-side. You would have to implement the comparison handling on client side: is the resource of the second request newer than the one from the first request.

I found my question wasn't asking the "right question" due to me merging my understand of other headers (thanks to #dcerecedo's comment to get my pointed in the right direction) that were affecting my understand of why MUST was being used.
The MUST was more relivent to other headers, in my case private, max-age=3600 and must-revalidate
Where
Cache-Control: private restricts proxy servers from caching it, this helps you keep your data off a server you dont trust and prevents a proxy from caching user specific data that’s not relevant to everyone (like a user profile).
Cache-Control "max-age=3600, must-revalidate" tell both client caches and proxy caches that once the content is stale (older than 3600 seconds) they must revalidate at the origin server before they can serve the content. This should be the default behavior of caching systems, but the must-revalidate directive makes this requirement unambiguous.
Where after the max-age expires the client should revalidate. It might revalidate using the If-Match or If-None-Match headers with an ETag, or it might use the If-Modified-Since or If-Unmodified-Since headers with a date. So, after expiration the browser will check at the server if the file is updated. If not, the server will respond with a 304 Not Modified header and nothing is downloaded.

How can I change key/name of Amazon S3 object using REST or SOAP?

How can I change key/name of Amazon S3 object using REST or SOAP?

The only way to rename an object is to copy the old object to a new object, and set the new name on the new copy.
The REST call you need is detailed here.
Syntax
PUT /destinationObject HTTP/1.1
Host: destinationBucket.s3.amazonaws.com
x-amz-copy-source: /source_bucket/sourceObject
x-amz-metadata-directive: metadata_directive
x-amz-copy-source-if-match: etag
x-amz-copy-source-if-none-match: etag
x-amz-copy-source-if-unmodified-since: time_stamp
x-amz-copy-source-if-modified-since: time_stamp
<request metadata>
Authorization: signatureValue
Date: date
This implementation of the PUT operation creates a copy of an object
that is already stored in Amazon S3. A PUT copy operation is the same
as performing a GET and then a PUT. Adding the request header,
x-amz-copy-source, makes the PUT operation copy the source object into
the destination bucket.
Keep in mind the existing ACLs, however:
When copying an object, you can preserve most of the metadata
(default) or specify new metadata. However, the ACL is not preserved
and is set to private for the user making the request.