Avoiding getting cached content from raw.githubusercontent.com - github

I noticed that when using curl to get content from github using this format:
https://raw.githubusercontent.com/${org}/${repo}/${branch}/path/to/file
It will sometimes return cached/stale content. For example with this sequence of operations:
curl https://raw.githubusercontent.com/${org}/${repo}/${branch}/path/to/file
Push a new commit to that branch
curl https://raw.githubusercontent.com/${org}/${repo}/${branch}/path/to/file
Step 3 will return the same content as step 1 and not reflect the new commit.
How can avoid getting a stale version?
I noticed on the Github WebUI, it adds a token to the url, eg: ?token=AABCIPALAGOZX5R which presumably avoids getting cached content. What's the nature of this token and how can I emulate this? Would tacking on ?token=$(date +%s) work?
Also I'm looking for a way to avoid the stale content without having to switch to a commit hash in the url, since it will require more changes. However, if that's the only way to achieve it, then I'll go that route.

GitHub caches this data because otherwise frequently requested files would involve serving a request to the backend service each time and this is more expensive than serving a cached copy. Using a CDN provides improved performance and speed. You cannot bypass it.
The token you're seeing in the URL is a temporary token that is issued for the logged-in user. You cannot use a random token, since that won't pass authentication.
If you need the version of that file in a specific commit, then you'll need to explicitly specify that commit. However, do be aware that you should not do this with some sort of large-scale automated process as a way to bypass caching. For example, you should not try to do this to always get the latest version of a file for the purposes of a program you're distributing or multiple instances of a service you're running. You should provide that data yourself, using a CDN if necessary. That way, you can decide for yourself when the cache needs to be expired and get both good performance and the very latest data.
If you run such a process anyway, you may cause an outage or overload, and your repository or account may be suspended or blocked.

Related

Is Precaching with Workbox mandatory for PWA?

I added a few workbox.routing.registerRoute using staleWhileRevalidate to my app and so far it has passed most lighthouse tests under PWA. I am not currently using Precaching at all. My question is, is it mandatory? What am I missing without Precaching? workbox.routing.registerRoute is already caching everything I need. Thanks!
Nothing is mandatory. :-)
Using stale-while-revalidate for all of your assets, as well as for your HTML, is definitely a legitimate approach. It means that you don't have to do anything special as part of your build process, for instance, which could be nice in some scenarios.
Whenever you're using a strategy that reads from the cache, whether it's via precaching or stale-while-revalidate, there's going to be some sort of revalidation step to ensure that you don't end up serving out of date responses indefinitely.
If you use Workbox's precaching, that revalidation is efficient, in that the browser only needs to make a single request for your generated service-worker.js file, and that response serves as the source of truth for whether anything precached actually changed. Assuming your precached assets don't change that frequently, the majority of the time your service-worker.js will be identical to the last time it was retrieved, and there won't be any further bandwidth or CPU cycles used on updating.
If you use runtime caching with a stale-while-revalidate policy for everything, then that "while-revalidate" step happens for each and every response. You'll get the "stale" response back to the page almost immediately, so your overall performance should still be good, but you're incurring extra requests made by your service worker "in the background" to re-fetch each URL, and update the cache. There's an increase in bandwidth and CPU cycles used in this approach.
Apart from using additional resources, another reason you might prefer precaching to stale-while-revalidate is that you can populate your full cache ahead of time, without having to wait for the first time they're accessed. If there are certain assets that are only used on a subsection of your web app, and you'd like those assets to be cached ahead of time, that would be trickier to do if you're only doing runtime caching.
And one more advantage offered by precaching is that it will update your cache en masse. This helps avoid scenarios where, e.g., one JavaScript file was updated by virtue of being requested on a previous page, but then when you navigate to the next page, the newer JavaScript isn't compatible with the DOM provided by your stale HTML. Precaching everything reduces the chances of these versioning mismatches from happening. (Especially if you do not enable skipWaiting.)

Timeout when deleting Azure Batch certificate

I receive the following in the portal:
There was an error while deleting [THUMBPRINT HERE]. The server
returned 500 error. Do you want to try again?
I suspect that there is an azure batch pool / node hanging on to the certificate, however the pool / nodes using that certificate have been deleted already (at least they are not visible in the portal).
Is there a way to force delete the certificate, in normal operation my release pipeline is reliant on being able to delete the certificate.
Intercepting azure powershell with fiddler, I can see this in the http response, so it appears to be timing out.
{
"odata.metadata":"https://ttmdpdev.northeurope.batch.azure.com/$metadata#Microsoft.Azure.Batch.Protocol.Entities.Container.errors/#Element","code":"OperationTimedOut","message":{
"lang":"en-US","value":"Operation could not be completed within the specified time.\nRequestId:[REQUEST ID HERE]\nTime:2017-08-23T16:54:23.1811814Z"
}
}
I have also deleted any corresponding tasks and schedules, still no luck.
(Disclosure: At the time of writing, I work on the Azure Batch team, though not on the core service.)
500 errors are usually transient and may represent heavy load on Batch internals (as opposed to 503s which represent heavy load on the Batch API itself). The internal timeout error reflects this. It's possible there was an unexpected spike in demand on specific APIs which are high-cost but are normally low-usage. We monitor and mitigate these, but sometimes an extremely high load with an unusual usage pattern can impact service responsiveness. I'd suggest you keep trying every 10-15 minutes, and if it doesn't clear itself in a few hours then try raising a support ticket.
There is currently no way to force-delete the certificate. This is an internal safety mechanism to ensure that Batch is never in a position where it has to deploy a certificate of which it no longer has a copy. You could request such a feature via the Batch UserVoice.
Finally, regarding your specific scenario, you could see whether it's feasible to rejig your workflow so it doesn't have the dependency on certificate deletion. You could, for example, have a garbage collection tool (perhaps running using Azure Functions or Azure Scheduler) that periodically cleans out old certificates. Arguable this adds more complexity (and arguably shouldn't be necessary) but it improves resilience and in other ways simplifies the solution as your main path no longer needs to worry so much about delays and timeouts. If you want to explore this path then perhaps post on the Batch forums and kick off a discussion with the team about possible design approaches.

CQ Dispatcher flushing vs invalidation

I want to find out if there is any difference between CQ dispatcher cache flush (from publish instance) and dispatcher cache invalidation?
Any help please?
Dispatcher is a reverse proxy server that can cache data from HTTP source. In case of AEM, it's normally the publisher or the author. Although, in theory, it can even be any resource provider. This backend is called a "Renderer".
Cache invalidation is a HTTP operation triggered by the publisher to mark the cache of a resource as invalid on the dispatcher. This operation will only delete the resource(s) but it will not refresh the resource.
Flush is the workflow associated to publishing of the page and invalidating the cache from publisher/author instance when a new content/resource is published. It is very common scenarion to invalidate the cache during publish so that new content is available for your site.
There are scenarios, where you want to refresh the cache without re-publishing the content. For example, after a release you might want to regenerate all the pages from the publisher as the changes are not editorial changes and hence none of the authors will be willing to publish the content. In this case, you will simply invaidate the cache without using the publish workflow. Although, in practice, it's generally easier to zap the cache directory on dispatcher rather than flushing all the pages but that's a preference. This is where the separation of flush and invalidation really matters and apart from that nothing is really different as the end result is almost the same.
This Adobe article seems to use "flush" and "invalidate" interchangeably.
It says:
Manually Invalidating the Dispatcher Cache
To invalidate (or flush) the Dispatcher cache without activating a
page, you can issue an HTTP request to the dispatcher. For example,
you can create a CQ application that enables administrators or other
applications to flush the cache.
The HTTP request causes Dispatcher to delete specific files from the
cache. Optionally, the Dispatcher then refreshes the cache with a new
copy.
It also talks about configuring a "Dispatcher Flush" agent, and the config for that agent invoked an HTTP request that has "invalidate.cache" in the URL.
CQ basically calls the "Dispatcher Flush Rule Service" from the OSGI, which calls the Replication action type as "Invalidate Catch". So this means To flush the catch CQ replication agents call the action which is called as invalidate catch.
The term is little confusing but its just service and action combination in OSGI.
There are two things, through which cache is modified-
1. Content update
2. Auto-Invalidation
Content update comes into picture when any AEM page is modified.
Auto invalidation is used when there are many automatically generated pages, so the dispatcher flush agent checks for the latest version of files, and marks the files out of date accordingly, by modifying the stat file.

How do you do continous deployment in an AJAX application with lots of client side interaction and local data?

We have an app that is written in PHP. The front end uses javascript heavily. Generally, for normal applications that require page reloads, continuous deployment is not really an issue, because:
The app can be deployed with build tags: myapp-4-3-2013-b1, myapp-4-3-2013-b2, etc.
When the user loads a page (we are using the front controller pattern), we can inject the buildtag and the files are loaded from the app directory with the correct build tag.
We do not need to keep the older builds around for too long because as the older requests finish, they will move to the newer build tags.
The issue with database and user data being incompatible is not very high as we move people to the newer builds after their requests finish (more on this later).
Now, the problem with our app is that it uses AJAX heavily for smooth page loads. In addition, because there is no page refresh at all when people navigate through the application, people can keep their unsaved data in a their current browser session and revisit it as long as the browser has not been refreshed.
This leads to bigger problems if we want to achieve continuous deployment:
We can keep the user's buildtag in their session (set when they make the first request) and only switch to newer buildtags after the logout and login again. This is obviously bad, because if things like the database schema changes or the format of files to be written to disk changes in a newer build, there is no way to reconcile this.
We force all new requests to a newer build tag, but there is a possibility we change client side javascript and will break a lot of things if we force everyone with a session onto the new build tags immediately.
Obviously, the above won't occur with every build we push and hopefully will not happen a lot, but we want to build a fool proof process so that every build which passes our tests can be deployed. At the same time, we want to make sure that every deployed and test passing build does not inadvertently break in clients with running sessions cause a whole bunch of problems.
I have done some investigation and what google does (at least in google groups) is that they push a message out to the clients to refresh the application (browser window). However, in their case, all unsaved client side data (like unsaved message, etc) would be lost.
Given that applications that uses AJAX and local data are very common these days, what are some more intelligent ways of handling this that will provide minimal disruption to users/clients?
Let me preface this that I haven't ever thought of continuous deployment before reading your post, but it does sound like quite a good idea! I've got a few examples where this would be nice.
My thoughts on solving your problem though would be to go for your first suggestion (which is cleaner), and get around the database schema changes like this:
Implement an API service layer in your application that handles the database or file access, which is outside of your build tag environment. For example, you'd have myapp-4-3-2013-b1, and db-services folders.
db-services would provide any interaction with the database with a series of versioned services. For example, registerNewUser2() or processOrder3().
When you needed to change the database schema, you'd provide a new version of that service and upgrade your build tag environment to look at the new version. You'd also provide a legacy service that handles the old schema to new schema upgrade.
For example, say you registered new users like this:
registerNewUser2(username, password, fullname) {
writeToDB(username, password, fullname);
}
And you needed to update the schema to add the user's date of birth:
registerNewUser3(username, password, fullname, dateofbirth) {
writeToDB(username, password, fullname, dateofbirth);
}
registerNewUser2(username, password, fullname) {
registerNewUser3(username, password, fullname, NULL);
}
The new build tag will be changed to call registerNewUser3(), while the previous build tag is still using registerNewUser2().
So the old build tag will continue to work, just that any new users registered will have a NULL date of birth. When an updated build tag is used, the date of birth is written to the database correctly.
You would need to update db-services immediately, as soon as you roll out the new build tag - or even before you roll out the build tag I guess.
Once you're sure that everyone is using the new version, you can just delete registerNewUser2() from the next version of db-services.
This will be quite complicated to make sure that you are correctly handling the conversion between old API and new API calls, but might be feasible if you're already handling continuous deployment.

Developing with backbone.js, how can I detect when multiple users(browsers) attempt to update?

I am very new to backbone.js (and MVC with javascript), and while reading several resources about backbone.js to adopt it in my project, I now have a question: how can I detect when multiple users(browsers) attempt to update? (and prevent it?)
My project is a tool for editing surveys/polls for users who want to create and distribute their own surveys. So far, my web app maintains a list of edit-commands fired by browser, sends it to the server, and the server does batch update.
What I did was, each survey maintains a version number and browser must request update with that version number, and if the request's version number does not match with the one in the server, the request fails and the user must reload his page (you know, implementing concurrent editing is not easy for everyone). Of course, when the browser's update was successful, it gets new version number from the server as ajax response, and one browser can request update to server only when its past update request is done.
Now, I am interested in RESTful APIs and MV* patterns, but having a hard time to solve this issue. What is the best / common approach for this?
There is a common trick instead of using versions, use TIMESTAMPS in your DB and then try to UPDATE WHERE timestamp = model.timestamp. If it returns zero result count - use appropriate HTTP 409 (conflict) response and ask the user to update the page in save() error callback. You can even use the local storage to merge changes, and compare the non-equivalent side by side.