Cache-Control header to bust device cache but allow CDN - rest

I am implementing an HTTP polling mechanism to detect device network status. I am planning to make a periodic GET request to a static file /static/byte.txt to validate the device's internet access.
I am using the Cache-Control: no-cache request header to make sure I am not served with a cached copy of the file on the device (which defeats the purpose). But I would like to still use any cached copy of the file on the CDN, as there is no need to download the file from the origin (my servers) every time. Does anyone know of a way to set the cache control headers to achieve that? Thanks!

The Cache-Control request header is a poor fit for this use case as both the client HTTP library and the CDN will assign the same meaning to whatever cache control directive you choose.
Instead, I recommend using a Cache-Control response header. In the response, you can use something like Cache-Control: max-age=0, s-maxage=604800, which indicates that the client should not cache the response but the CDN can cache it for up to a week (604,800 seconds).

Related

Using different content-encoding for Cloud Storage

I want to be able to support Brotli and Gzip encoding for static assets hosted with Google Cloud Storage. To do this I want to encode files before they are uploaded as <filename>, <filename>.gz and <filename>.br. The issue is, I can't find a way to redirect a request with Accept-Encoding to the correct file.
I have looked into using:
Cloud Functions to somehow redirect incoming requests (similar to AWS Cloudfront Lambda), but it does not seem to be supported
Load Balancer to redirect requests to different buckets, but as far as I could see it can only redirect based on hostname/path to different buckets
Cloud CDN, but it does not seem to have any functionality that helps with this
Requests/Response examples
Assume bucket example-bucket contains the following files:
library.js
library.js.gz
library.js.br
Example 1
GET http://storage.googleapis.com/example-bucket/library.js
Accept-Encoding: gzip, deflate, br
Content-Encoding: br
<Contents of http://storage.googleapis.com/example-bucket/library.js.br>
Example 2
GET http://storage.googleapis.com/example-bucket/library.js
Accept-Encoding: gzip, deflate
Content-Encoding: gzip
<Contents of http://storage.googleapis.com/example-bucket/library.js.gz>
Is there a way to accomplish the above in a manner that is simple, performant and cost effective? I realize it's possible to just host my own server via App Engine, and let that take care of routing requests to the bucket, but is this the only way?
Lambda#edge is a way to do it but the simple way is to Whitelist the Accept-encoding header on CloudFront so CloudFront can pass it to Origin and you can have the Configuration on Origin to serve file based on the Accept-encoding header.
If you planning to use Lambda#edge, I would suggest using the Origin request function.
1. Whitelist the Accept-encoding header
2.Origin request function to read the header value and change the request URI path.

Browser doesn't return If-None-Match header

i am trying to implement caching for dynamic API calls where data is near-static. The approach i have taken is using the ETag and returning an ETag header for a Web API response headers. However, Browser doesn't seems to return the "if-None-Match" header at all for me to validate the subsequent calls.
Please note that i am using https and i have a valid SSL installed. Anyone had this issue and potential clues?
Found the root cause of the issue, It was due to the wrong cache-headers being sent by the server particularly Cache-Control: no-store
After changing the response headers, Browser is now able to send the If-None-Match request header.
My current response header is as below which is good enough to request the browser to re-validate it.

Service worker JavaScript update frequency (every 24 hours?)

As per this doc on MDN:
After that it is downloaded every 24 hours or so. It may be
downloaded more frequently, but it must be downloaded every 24h to
prevent bad scripts from being annoying for too long.
Is the same true for Firefox and Chrome? OR does update to service worker javascript only happens when user navigates to site?
Note: As of Firefox 57, and Chrome 68, as well as the versions of Safari and Edge that support service workers, the default behavior has changed to account for the updated service worker specification. In those browsers, HTTP cache directives will, by default, be ignored when checking the service worker script for updates. The description below still applies to earlier versions of Chrome and Firefox.
Every time you navigate to a new page that's under a service worker's scope, Chrome will make a standard HTTP request for the JavaScript resource that was passed in to the navigator.serviceWorker.register() call. Let's assume it's named service-worker.js. This request is only made in conjunction with a navigation or when a service worker is woken up via, e.g., a push event. There is not a background process that refetches each service worker script every 24 hours, or anything automated like that.
This HTTP request will obey standard HTTP cache directives, with one exception (which is covered in the next paragraph). For instance, if your server set appropriate HTTP response headers that indicated the cached response should be used for 1 hour, then within the next hour, the browser's request for service-worker.js will be fulfilled by the browser's cache. Note that we're not talking about the Cache Storage API, which isn't relevant in this situation, but rather standard browser HTTP caching.
The one exception to standard HTTP caching rules, and this is where the 24 hours thing comes in, is that browsers will always go to the network if the age of the service-worker.js entry in the HTTP cache is greater than 24 hours. So, functionally, there's no difference in using a max-age of 1 day or 1 week or 1 year—they'll all be treated as if the max-age was 1 day.
Browser vendors want to ensure that developers don't accidentally roll out a "broken" or buggy service-worker.js that gets served with a max-age of 1 year, leaving users with what might be a persistent, broken web experience for a long period of time. (You can't rely on your users knowing to clear out their site data or to shift-reload the site.)
Some developers prefer to explicitly serve their service-worker.js with response headers causing all HTTP caching to be disabled, meaning that a network request for service-worker.js is made for each and every navigation. Another approach might be to use a very, very short max-age—say a minute—to provide some degree of throttling in case there is a very large number of rapid navigations from a single user. If you really want to minimize requests and are confident you won't be updating your service-worker.js anytime soon, you're free to set a max-age of 24 hours, but I'd recommend going with something shorter on the off chance you unexpectedly need to redeploy.
Some developers prefer to explicitly serve their service-worker.js with response headers causing all HTTP caching to be disabled, meaning that a network request for service-worker.js is made for each and every navigation.
This no-cache strategy may result useful in a fast-paced «agile» environment.
Here is how
Simply place the following hidden .htaccess file in the server directory containing the service-worker.js:
# DISABLE CACHING
<IfModule mod_headers.c>
Header set Cache-Control "no-cache, no-store, must-revalidate"
Header set Pragma "no-cache"
Header set Expires 0
</IfModule>
<FilesMatch "\.(html|js)$">
<IfModule mod_expires.c>
ExpiresActive Off
</IfModule>
<IfModule mod_headers.c>
FileETag None
Header unset ETag
Header unset Pragma
Header unset Cache-Control
Header unset Last-Modified
Header set Pragma "no-cache"
Header set Cache-Control "max-age=0, no-cache, no-store, must-revalidate"
Header set Expires "Thu, 1 Jan 1970 00:00:00 GMT"
</IfModule>
</FilesMatch>
This will disable caching for all .js and .html files in this server directory and those below; which is more than service-worker.js alone.
Only these two file types were selected because these are the non-static files of my PWA that may affect users who are running the app in a browser window without installing it (yet) as a full-fledged automatically updating PWA.
More details about service worker behaviour are available from Google Web Fundamentals.

How do I convert a REST HttpRequest captured via a web proxy tool into a link a user can click?

I have used a proxy tool to capture a certain REST HttpRequest. The request is a HTTP PUT command followed by an extremely long REST link containing specific data that gets sent to the server.
In the proxy tool it looks something like this:
Header
PUT http://XXX.XXX.XXX.XXX:8080/rest/blah/blah/.../ HTTP/1.1
Host: XXX.XXX.XXX.XXX:8080
User-Agent: Mozilla/5.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-language: en-us, en:q=0.5
Proxy Connection:keep-alive
Content-Type:XMLHttpRequest
Referer: http://XXX.XXX.XXX.XXX:8080/plugins/blah/blah
Content-length: 11156
Cookie: JSESSIONID=<really long alpha numeric>
Body:
{"links":{"self":"/rest/plugins/1.0/blah/blah.....
...
... lots and lots of JSON text
}
}
So the proxy tool has been helpful in identifying what the request looks like.
But the only way to generate this request is by clicking a button on the webpage. I would like to send exact same request on my own (like creating a custom link that when clicked generates a similar request to the one shown above). How do I do this?
Also, anything I type in the web browser URL area automatically is a "GET". How do I force a PUT?
Cookie: JSESSIONID=
This clearly indicates that the API you want to use is not a REST API, because it violates the stateless constraint of REST.
How do I force a PUT?
Probably you don't have a way to do that. It depends on the security settings of the web API. If you want to do this with AJAX from the browser, and your domain is different from the APIs domain, then you need a CORS header from the API, which allows your domain to read cross origin responses. By PUT the browsers sends a preflight first, and if it cannot read the response, it will never send the real PUT. Security policy and other headers can block XSS in the browser, so you probably don't have a way to do this from browser.
You can do this from your server by copying the request details and catching the session id somehow.
If the API allows access to 3rd party clients, then I suggest you to contact with them. If not, then you 99% that you won't be able to do this.

Prevent an HTTP client from hitting a server with cache (iphone)

Ok, I'm confused. I'm trying to send back the magic headers from my server that will prevent a client from hitting the server again until a resource is stale.
I understand how ETag or Last-Modified works (Validation) - the client will ALWAYS still hit the server, and the server needs to validate the date or etag against the current value to know whether to bother serving up a new one.
Cache-Control and Expires, however, I don't think I understand. I've set the following:
Cache-Control: max-age=86400, must-revalidate
No matter what I do, my client (my browser, curl, NSURLConnection) always hits the server again on the second request. Is this a client thing? What headers should I send back to get the client to use it's private cache for a certain length of time?
As Nathan hints at in his answer, clients can issue a subsequent request with an If-Modified-Since header to determine whether or not their cache is stale. If the client receives a 304 Not Modified response, it will serve the content out of the local cache.
According to RFC 2616 (the HTTP 1.1 specification), the presence of must-revalidate within the Cache-control header forces clients to re-check their cache's status with the originating server prior to serving out of the cache.
For future reference - Mark Nottingham has written a great guide to HTTP caching:
http://www.mnot.net/cache_docs/#CACHE-CONTROL
The server needs to check the If-Modified-Since header and return a 304 not modified header if it wants the browser to keep caching.