The mapping between the dispatcher and the publisher is very important in designing the application. There are two ways,
One to One -> One pub is connect to one dispatcher
One to Many -> One pub is connect to 3 or more dispatcher
I could not understand which one should be selected on when. Can anyone tell me pros and cons on each options?
In general publisher and dispatcher have a different role in your setup. Of both of them you need as many as you have load. In theory you can start with 2 of them. Whenever they cannot handle the load (CPU or Disk over 100%), then you add one of them. (actually AEMaaCS is doing it that way dynamically)
With some experience you can forecast the number of required dispatcher and publishers.
The following scenarios will cause a high load on the dispatchers:
many static pages (which seldom change), and a lot of static assets (images, pdf, ...)
few pages and extremely high traffic for those
In general your site is very good cacheable. Because the dispatcher is a cache in front of the "CMS". Then you probably need several dispatchers for each publisher = one to many (good caching is great, because the dispatcher is cheaper and can handle more load than a publisher)
The following scenarios will cause a higher load on the publisher. Then you will have a one to one scenario
There is a CDN in front of the CMS. The CDN does a lot of static caching, so cache ratio of the dispatcher will go down
A lot of static content is already handled outside of the CMS (e.g. images are served elsewhere, e.g. Adobe Dynamic Media)
You have many dynamic pages (rendered for each user seperately, e.g. a banking application)
PS: you will have at least one dispatcher for each publisher. As reverse proxy it has an imported security function. It also is a major backup to avoid downtimes. I know a customer, that is running during maintenance up to 24 hours only the dispatchers. Then they just serve the static content like a normal Apache webserver.
Related
I was looking at service worker practices and workbox.
There are many articles talking about precaching, workbox even provides special method precachingAndRoute() for just that. I guess I understand the conceptual difference between precache and runtime cache, but what confuses me is why precache is treated so specially?
All articles I've read about precaching emphasize how it makes web app available when client is offline. Isn't that what cache (even it's not precache) is for? I mean it seems that runtime cache can also achieve just that if configured properly. Does it have to be precache to have web app work offline?
The only obvious difference is when the caches are to be created. Well, if client is offline, no cache can be created, no matter it is a precache or runtime cache, and if caches were created during last visit when client was online, how does it matter whether the cache to respond with for current visit was a precache or runtime cache?
Consider 2 abstract cases for compare. Say we have two different service workers, one (/precache/sw.js) only does precache and the other (/runtime/sw.js) only does runtime cache, where /precache and /runtime host same web app (meaning same assets to be cached).
Under what scenario, web app /precache and /runtime could run differently due to different sw setup?
In my understanding,
If cache can not be created (e.g. offline on first visit), then precache and runtime cache shouldn't be any different.
If precache can be created successfully (i.e. client is online on
first visit), runtime cache should too. (Let's not go too wild with
cases like the client may be online only for some certain moment, they still should be the same in my examples.)
If cache are available, then precache and runtime cache have nothing to do, hence are still the same.
The only scenario I could think of when precache shows advantages, is when cache need to be updated on current visit, where precache makes sure current visit get up to date info. If this is the case, wouldn't a NetworkFirst runtime cache do just about the same? And still, there are nothing to do with "offline", what almost every article I've read about sw precaching would mention.
How online/offline makes precache a hero?
What did I miss here, what's so special about precaching?
One scenario where it is different could be the following.
What the app is like:
You have a landing page for your app.
You have a handful of routes that can be navigated to
Cache Strat:
If the user goes to the landing page, only the landing page assets would get cached.
Pre-cache Strat:
If the user goes to the landing page, all of the configured pre-cached assets would get cached.
Difference:
So if the user only goes to the landing page, and then later goes offline, the pre-cache strat would allow them to navigate and interact in some way with the other routes of your app, while the cached strat would not allow any navigation to the other routes.
First, your side by side service workers are restricted to those folders or paths. So they are isolated from each other.
Second, you should define a caching strategy for your application that has a mixture of preCached assets as well as dynamic plus an invalidation routine/logic.
You want to preCache as much as possible without breaking any dynamic nature of your application. So cache common JS, CSS, images, fonts and pages that are used over and over.
Of course have an invalidation strategy in place to keep these up to date.
Next handle non-cached network addressable resources (URLs) from the fetch event handler. Cache them as it makes sense. And invalidate cached assets as it makes sense.
For some applications I cache the entire thing. They are usually on the small side, a few dozen to a few hundred pages for example. For a site like Amazon I would never do that LOL. No mater how much is cached I always have an invalidation and update strategy that makes sense for the application/site.
My website will have users from all around the globe not 1 location. I know that I can put my file assets on a globally distributed CDN, and those files will be served from the location closest to the user which will lower the latency.
Is it possible to do the same thing for a (Mongo) database? Or does one still need to pick one location for the database and just put up with increased latency for users who are far away?
It is possible but you may want to put special attention on the headers that you DB/service will return especially the ones regarding caching like Cache-Control: max-age=<int> you would like to avoid the CDN behave just like a plain proxy with no caching capabilities.
In many cases, the CDN obey origin headers but if they are too small, they may be overwritten by some defaults that based on the plan/price could be adjusted.
Some CDN's allow prefetching based on custom times keeping always data updated. This could be useful in some cases and avoid flooding the DB with too many connections besides also restricting access to the service/DB only from the CDN and trusted sources.
I have a RESTful API with resources updates once a week. That is, for each resource, I update it once and week and allow clients to access it. It's an ever changing calculator.
There are probably 10,000 resources which could be requested.
Is it possible to put something like this behind a CDN? Traditionally CDNs are for undeniably static content, ie images. I'm not sure where my situation sits in the spectrum of dynamic <-> static.
Cheers
90% of the resources might not even get called, and if they are, will
get called a few times only. It wont be a mass of repetitive calls.
Right there in your comments, you just showed me that a CDN is not beneficial to you.
Usually how a CDN works is the first call it is downloaded from the main server to the regional CDN node then delivered to the client meaning the first GET will have no improvements. The following GETs to the same regional node will have the speed improvement. If you have little to no repetitive calls, then you will not see any noticeable improvement.
As I said in the comments, for small files, clients are probably spending as much time on the DNS lookup as they are on the download. Look into a global DNS solution (like Anycast) to reduce connection times. This is easy to setup and requires little to no maintenance.
I think it's entirely reasonable to put it behind a CDN if you think your content will reach the appropriate level of scale. As long as the cache-control headers are set such that the latest content is loaded when the cached version may be stale, you'll be fine.
The main benefit of CDNs comes when resources are requested from a variety of different sources, and so siteY.com can use the same cached version of a resource as siteX.com. Do you anticipate your resources will be requested from various different sources?
I am trying to udnerstand the concept of public render paramter in jsr286 portlets.
http://publib.boulder.ibm.com/infocenter/wpexpdoc/v6r1/index.jsp?topic=/com.ibm.wp.exp.doc_v6101/dev/pltcom_pubrndrprm.html
Now inter portlet communication can happen like this:Portlet 1 publishes an event, Portlet 2 processes it and generates a response and puts it in session scope. So now portlet 1 can see it also since both portlets share same session object. So what is the purpose of public render parameters as a way of sharing information between portlets?
Both have there advantages. Generally Public render parameters are lightweight communication. Following are some of the important features of both.
Public render parameters:
They are limited to simple string values.
They do not require explicit administration to set up coordination.
They cause no performance overhead as the number of portlets sharing
information grows.
They can be set from links encoded in portal themes and skins.
Portlet events:
They can contain complex information.
They allow fine-grained control by setting up different sorts of
wires between portlets (on-page or cross-page, public or private).
They can trigger cascaded updates with different information. For
example, portlet A can send event X to portlet B, which in turn sends
a different event Y to portlet C.
They cause increasing processing overhead as the number of
communication links grows.
The reason I ask is that Stack Overflow has been Slashdotted, and Redditted.
First, what kinds of effect does this have on the servers that power a website? Second, what can be done by system administrators to ensure that their sites remain up and running as best as possible?
Unfortunately, if you haven't planned for this before it happens, it's probably too late and your users will have a poor experience.
Scalability is your first immediate concern. You may start getting more hits per second than you were getting per month. Your first line of defense is good programming and design. Make sure you're not doing anything stupid like reloading data from a database multiple times per request instead of caching it. Before the spike happens, you need to do some fairly realistic load tests to see where the bottlenecks are.
For absurdly high traffic, consider the ability to switch some dynamic pages over to static pages.
Having a server architecture that can scale also helps. Shared hosts generally don't scale. A single dedicated machine generally doesn't scale. Using something like Amazon's EC2 to host can help, especially if you plan for a cluster of servers from the beginning (even if your cluster is a single computer).
You're next major concern is security. You're suddenly a much bigger target for the bad guys. Make sure you have a good security plan in place. This is something you should always have, but it become more important with high usage.
Firstly, ask if you really want to spend weeks and thousands of $ on planning for something that might not even happen, and if it does happen, lasts about 5 hours.
Easiest solution is to have a good way to switch to a page simply allowing a signup. People will sign up and you can email them when the storm has passed.
More elaborate solutions rely on being able to scale quickly. That's firstly a software issue (can you connect to a db on another server, can you do load balancing). Secondly, your hosting solution needs to support fast expansion. Amazon EC2 comes to mind, or maybe slicehost. With both services you can easily start new instances ("Let's move the database to a different server") and expand your instances ("Let's upgrade the db server to 4GB RAM").
If you keep all data in the db (including sessions), you can easily have multiple front-end servers. For the database I'd usually try a single server with the highest resources available, but only because I haven't worked with db replication and it used to be quite hard to do, at least with mysql. Things might have improved.
The app designer needs to think about scaling up (larger machines with more cores and higher performance) and/or scaling out (distributing workload across multiple systems). The IT guy needs to work out how to best support that. The network is what you look at first, because obviously everything rides on top of it. Starting at the border, that usually means network load balancers and redundant routers being served by multiple providers. You can also look at geographic caching services and apps such as cachefly.
You want to reduce your bottlenecks as much as possible. You also want to design the environment such that it can be scaled out as needed without much work. Do the design work up front and it'll mean less headaches when you do get dugg.
Some ideas (of what I used in the past and current projects):
For boosting performance (if needed) you can put a reverse-proxying, caching squid in front of your server. Of course that only works if you don't have session keys and if the pages are somewhat static (means: they change only once an hour or so) and not personalised.
With the squid you can boost a bloated and slow CMS like typo3, thus having the performance of static websites with the comfort of a CMS.
You can outsource large files to external services like Amazon S3, saving your server's bandwidth.
And if you are able to spend some (three-figures per month) bucks, you can as well use a Content Delivery Network. Whith that in place you automatically have scaling, high-availability and low latencys for your users. Of course, your pages must be cachable, so session keys and personalised pages are a no-no. If designed carefully and with CDNs in mind, you can at least cache SOME content, like pics and videos and static stuff.
The load goes up, as other answers have mentioned.
You'll also get an influx of new users/blog comments/votes from bored folks who are only really interested in vandalism. This is mostly a problem for blogs which allow completely anonymous commenting, where some dreadful stuff will be entered. The blog platform might have spam filters sufficient to block it, but manual intervention is frequently required to clean up remaining drivel.
Even a little barrier to entry, like requiring a user name or email address even if no verification is done, will dramatically reduce the volume of the vandalism.