Is Precaching with Workbox mandatory for PWA? - progressive-web-apps

I added a few workbox.routing.registerRoute using staleWhileRevalidate to my app and so far it has passed most lighthouse tests under PWA. I am not currently using Precaching at all. My question is, is it mandatory? What am I missing without Precaching? workbox.routing.registerRoute is already caching everything I need. Thanks!

Nothing is mandatory. :-)
Using stale-while-revalidate for all of your assets, as well as for your HTML, is definitely a legitimate approach. It means that you don't have to do anything special as part of your build process, for instance, which could be nice in some scenarios.
Whenever you're using a strategy that reads from the cache, whether it's via precaching or stale-while-revalidate, there's going to be some sort of revalidation step to ensure that you don't end up serving out of date responses indefinitely.
If you use Workbox's precaching, that revalidation is efficient, in that the browser only needs to make a single request for your generated service-worker.js file, and that response serves as the source of truth for whether anything precached actually changed. Assuming your precached assets don't change that frequently, the majority of the time your service-worker.js will be identical to the last time it was retrieved, and there won't be any further bandwidth or CPU cycles used on updating.
If you use runtime caching with a stale-while-revalidate policy for everything, then that "while-revalidate" step happens for each and every response. You'll get the "stale" response back to the page almost immediately, so your overall performance should still be good, but you're incurring extra requests made by your service worker "in the background" to re-fetch each URL, and update the cache. There's an increase in bandwidth and CPU cycles used in this approach.
Apart from using additional resources, another reason you might prefer precaching to stale-while-revalidate is that you can populate your full cache ahead of time, without having to wait for the first time they're accessed. If there are certain assets that are only used on a subsection of your web app, and you'd like those assets to be cached ahead of time, that would be trickier to do if you're only doing runtime caching.
And one more advantage offered by precaching is that it will update your cache en masse. This helps avoid scenarios where, e.g., one JavaScript file was updated by virtue of being requested on a previous page, but then when you navigate to the next page, the newer JavaScript isn't compatible with the DOM provided by your stale HTML. Precaching everything reduces the chances of these versioning mismatches from happening. (Especially if you do not enable skipWaiting.)

Related

Precaching with service worker, why does it matter? What did I miss?

I was looking at service worker practices and workbox.
There are many articles talking about precaching, workbox even provides special method precachingAndRoute() for just that. I guess I understand the conceptual difference between precache and runtime cache, but what confuses me is why precache is treated so specially?
All articles I've read about precaching emphasize how it makes web app available when client is offline. Isn't that what cache (even it's not precache) is for? I mean it seems that runtime cache can also achieve just that if configured properly. Does it have to be precache to have web app work offline?
The only obvious difference is when the caches are to be created. Well, if client is offline, no cache can be created, no matter it is a precache or runtime cache, and if caches were created during last visit when client was online, how does it matter whether the cache to respond with for current visit was a precache or runtime cache?
Consider 2 abstract cases for compare. Say we have two different service workers, one (/precache/sw.js) only does precache and the other (/runtime/sw.js) only does runtime cache, where /precache and /runtime host same web app (meaning same assets to be cached).
Under what scenario, web app /precache and /runtime could run differently due to different sw setup?
In my understanding,
If cache can not be created (e.g. offline on first visit), then precache and runtime cache shouldn't be any different.
If precache can be created successfully (i.e. client is online on
first visit), runtime cache should too. (Let's not go too wild with
cases like the client may be online only for some certain moment, they still should be the same in my examples.)
If cache are available, then precache and runtime cache have nothing to do, hence are still the same.
The only scenario I could think of when precache shows advantages, is when cache need to be updated on current visit, where precache makes sure current visit get up to date info. If this is the case, wouldn't a NetworkFirst runtime cache do just about the same? And still, there are nothing to do with "offline", what almost every article I've read about sw precaching would mention.
How online/offline makes precache a hero?
What did I miss here, what's so special about precaching?
One scenario where it is different could be the following.
What the app is like:
You have a landing page for your app.
You have a handful of routes that can be navigated to
Cache Strat:
If the user goes to the landing page, only the landing page assets would get cached.
Pre-cache Strat:
If the user goes to the landing page, all of the configured pre-cached assets would get cached.
Difference:
So if the user only goes to the landing page, and then later goes offline, the pre-cache strat would allow them to navigate and interact in some way with the other routes of your app, while the cached strat would not allow any navigation to the other routes.
First, your side by side service workers are restricted to those folders or paths. So they are isolated from each other.
Second, you should define a caching strategy for your application that has a mixture of preCached assets as well as dynamic plus an invalidation routine/logic.
You want to preCache as much as possible without breaking any dynamic nature of your application. So cache common JS, CSS, images, fonts and pages that are used over and over.
Of course have an invalidation strategy in place to keep these up to date.
Next handle non-cached network addressable resources (URLs) from the fetch event handler. Cache them as it makes sense. And invalidate cached assets as it makes sense.
For some applications I cache the entire thing. They are usually on the small side, a few dozen to a few hundred pages for example. For a site like Amazon I would never do that LOL. No mater how much is cached I always have an invalidation and update strategy that makes sense for the application/site.

Any optimizations in reducing the number of disk accesses for inode number lookup by web-servers?

Web-servers typically have a document root denoting the filesystem sub-tree visible via the web. Consequently for eg., if the document root is: /home/foouser/public_html/, then the web-server would map a request for http://www.foo.com/pics/foo.jpg to /home/foouser/public_html/pics/foo.jpg. This results in a series of disk requests to obtain the inode-number of foo.jpg.
Do web-servers do any optimizations to reduce the number of disk accesses (or) is it the role of the server-admin to set the document root as close to "/" as possible, to reduce the number of disk-accesses in the filename to inode number translation?
I know this isn't directly the answer to your question, but by setting up a caching strategy you can drastically reduce disk reads. Especially if your static content is not hosted on your server.
Options:
Host static content on a CDN:
Pros: Off-load all load onto someone else's network. Cost?
Cons: Potentially less control. Cost?
Use Contendo/Akamai, which is also a CDN, but with some differences.
Pros: Host your content, but after the first read the cdn will handle caching based on the headers you send with your content (static or not)
Cons: Sometimes headers are really annoying to manage. Cache busting (breaking your own cache) can be annoying to handle when you want to replace old content.
Cache things locally. If you are making a DB request for instance you can cache the request. Next time your code is run check your in memory cache first (as opposed to make a db request immediately). You could cache entire pages then at an application controller/route level check if there is a cached version of the page/asset and serve that.
Pros: Lots of control. You can cache almost anything.
Cons: A ton of work to set up caching on every little thing. You need a strategy for every part of your website.
My recommendation is to start out by moving your assets to AmazonS3 or Rackspace or something. Joyent has something for this as well. You could then enable cloudfront for s3 which will turn on the cdn, which caches things in various regions. This is a really cheap solution (depending on the amount of files you have).
You could also go the contendo route.
The caching on the application side route takes quite a bit of work and completely depends on your server/language/db/configuration.

Does retaining modified pages in the page cache make more sense (over unmodified)?

I am working on a page cache replacement policy, I read many existing algorithms most of them are prefer to retain modified pages in cache. I don't really understand the reason behind it. Is it due to eviction cost or modified pages have higher chance of being used again?
Out of many different policies LRU(least recently used) policy provides good result with hardware support.
Is it due to eviction cost or modified pages have higher chance of being used again?
Yes
So according to locality of reference the recently modified page has more chances of being referenced again.
One more reason of retaining modified page in cache is that every page replacement of modified page (which has higher chances of being referenced again)requires two transfers. Firstly it is written into disk and secondly requested page comes in main memory. This very costly. But in case of non modified page (which has low chances of being referenced) only one transfer takes place i.e. requested page comes in memory.

Does facebook have its own cache?

While developing Facebook applications I have faced this problem many times that if I delete any image, then it appears on the application while testing, even I delete the whole file then, even, it is executed successfully, so I want to know "Does Facebook have its own cache from where files are executed?".
If so then is there any solution of this problem?
If not then why is happening this?
Best Regards & Thanks in advance
Not sure about image files (they reside in CDN) but facebook uses MemCached server to cache their stuff.
It's not that it has cache but that its main backing store doesn't provide any more coherency than is strictly necessary. Coherency has a cost, so if you don't need it, it makes sense not to pay the cost.
When operations have no enforced order between them, they may complete as if they were executed in either order. If your retrieval and your delete have on enforced order, then they may complete as if they were executed in either order. This applies even if one operation receives its response before the other operation was sent.
My understanding was that there was a cache. Especially for images and styles.
I have frequently made changes to css and updated images only to be left wondering why i can not see these updates.
I always change my css url to be something like styles/styles.css?time= which remedies everything.
In regards to the images , right click on the image in application and view in browser. Refresh to get the updated image and then go back to you application.

enableviewstatemac=true

Does setting enableviewstatemac to "true" affects the site's performance? Could you give me some explanation?
Yes, it will affect the site's performance, straight from MSDN:
A view-state MAC is an encrypted version of the hidden variable that a page's view state is persisted to when sent to the browser. If true, the encrypted view state is checked to verify that it has not been tampered with on the client. Do not set EnableViewStateMac to true if performance is a key consideration.
That check has to do something, and something is more expensive than nothing. The larger the viewstate you're dealing with, the more overhead this will put on your requests. That being said, unless you're a really high traffic site or have really large viewstate in your pages, you probably won't notice a thing server-side. On the client however, they will be getting a larger page, which will probably have more of an impact than anything. That means they're uploading more to the server on postback...that's most likely your pain point caused by enabling this.
Keep in mind just how many things happens when the server executes a page, all of these options are "drop in the bucket" scenarios in most cases, there are of course exceptions. Current servers are powerful enough that settings like this typically don't make any noticeable impact individually, but there are of course those cases where it does, if for example you have megabytes of viewstate for some reason.
The enableviewstatemac property is used to specify that upon the receipt of each client request that a check is performed to ensure that the client has not tampered with the control / hidden data that they were served.
This is important as ASP .Net uses a stateless mechanism and relies on the changes that occur on the client side being passed back as instructions to the page upon a postback to determine what changes / events have fired. If the client were able to tamper with these with impunity then they could potentially alter the page behaviour to their own means.