Pass Varnish backend response to solr - command-line

does anybody know if there is an option to combine varnish with Solr.
What I'd like to do is:
User requests a URL
Varnish hasn't a cached version or just an outdated version
Varnish calls the backend and finally receivs the response
This is the point where I'd like to hook into and pass the backend response to "./bin/solr post ..." so my solr index will immediately be updated every time I deliver a new content version.
Is this possible?
Thanks in advance
Boris

This is a twofold issue - one, sticking Varnish in front of Solr works as you'd expect. It takes the load away from Solr and allows Varnish to return content without having to query Solr.
However, you should keep your indexing process separate from the Varnish pipeline, otherwise you could have a very bad time if multiple threads start asking for an indexing process to be started within a short period of time. The proper way of doing this is to have a sensible time to live for responses in Varnish, use expire (through a ban or something similar in Varnish, or by attaching a index version identifier in your request to Solr), but launch and perform indexing outside of varnish' delivery of documents.
When the indexing completes you issue a ban to Varnish that tells Varnish that any existing, cached responses are invalid - this makes Varnish start querying your backend again. This way Varnish can do what it's great for, caching content, and you can keep the logic that decides when to update your index and when an update is necessary outside of Varnish.
And while Solr does effective caching, Varnish does a far better job (as it can consider only the response and not have to look at anything further behind in the chain) and can thus alleviate the load from repetitive queries.

Related

How to resolve degardation high response time on AEM Author server

When team will execute a test of performance on AEM Pre prod author servers then immediately author server load is increase highly and response time getting more time on server. so here we need a solution for how to reduce this author server response time when test cases are executing.
Add a dispatcher in front of the author server. Though usually intended for publish instances, dispatchers can be setup for authors as well, and alleviate the load on the server returning common files. Of course, make sure to configure it properly to avoid caching editable content.

Is it possible to enforce a max upload size in Plack::Middleware without reading the entire body of the request?

I've just converted a PageKit (mod_perl) application to Plack. This means that I now need some way to enforce the POST_MAX/MAX_BODY that Apache2::Request would have previously handled. The easiest way to do this would probably be just to put nginx in front of the app, but the app is already sitting behind HAProxy and I don't see how to do this with HAProxy.
So, my question is how I might go about enforcing a maximum body size in Plack::Middleware without reading the entire body of the request first?
Specifically I'm concerned with file uploads. Checking size via Plack::Request::Upload is too late, since the entire body would have been read at this point. The app will be deployed via Starman, so psgix.streaming should be true.
I got a response from Tatsuhiko Miyagawa via Twitter. He says, "if you deploy with Starman it's too late even with the middleware because the buffering is on. I'd do it with nginx".
This answers my particular question as I'm dealing with a Starman deployment.
He also noted that "rejecting a bigger upload before reading it on the backend could cause issues in general"

Developing with backbone.js, how can I detect when multiple users(browsers) attempt to update?

I am very new to backbone.js (and MVC with javascript), and while reading several resources about backbone.js to adopt it in my project, I now have a question: how can I detect when multiple users(browsers) attempt to update? (and prevent it?)
My project is a tool for editing surveys/polls for users who want to create and distribute their own surveys. So far, my web app maintains a list of edit-commands fired by browser, sends it to the server, and the server does batch update.
What I did was, each survey maintains a version number and browser must request update with that version number, and if the request's version number does not match with the one in the server, the request fails and the user must reload his page (you know, implementing concurrent editing is not easy for everyone). Of course, when the browser's update was successful, it gets new version number from the server as ajax response, and one browser can request update to server only when its past update request is done.
Now, I am interested in RESTful APIs and MV* patterns, but having a hard time to solve this issue. What is the best / common approach for this?
There is a common trick instead of using versions, use TIMESTAMPS in your DB and then try to UPDATE WHERE timestamp = model.timestamp. If it returns zero result count - use appropriate HTTP 409 (conflict) response and ask the user to update the page in save() error callback. You can even use the local storage to merge changes, and compare the non-equivalent side by side.

nginx - clear cache on http PUT or POST

I am testing nginx as a reverse proxy cache with REST resources (Spring MVC + ETag). Every GET is cached ok.
Is it possible to clear the nginx cache for a specific resource whenever it gets updated through a HTTP PUT or HTTP POST?
ps: I am also testing varnish cache, but I have the same doubt.
Thanks!
I had the same question and found that Nginx allows you to "purge" the cached requests:
proxy_cache_path /tmp/cache keys_zone=mycache:10m levels=1:2 inactive=60s;
map $request_method $purge_method {
PURGE 1;
default 0;
}
server {
listen 80;
server_name www.example.com;
location / {
proxy_pass http://localhost:8002;
proxy_cache mycache;
proxy_cache_purge $purge_method;
}
}
Then
curl -X PURGE -D – "http://www.example.com/*"
See:
https://www.nginx.com/products/nginx/caching/#proxy_cache_purge
https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_purge
So I guess you could make those calls after each POST/PUT/PATCH/DELETE.
I see 4 caveats:
Seems way more complicated (and probably slower) than the invalidation process offered by other caching strategies. So if you need to invalidate very often I'd probably recommend another caching strategy (e.g. caching the database queries in Redis)
I didn't try it myself and I'm not sure if it's available with the basic version of Nginx (the doc talks only about Nginx Plus, which is their paid product)
What's the granularity offered by this solution? Like is it possible to invalidate a request like GET /cars?year=2010&color=red and keep GET /cars?year=2020&color=green cached?
I don't know if it's a really common thing to do, I couldn't find more about it.
You have not specified what sort of caching you are implementing as there are several options within Nginx.
From your query, I assume you are referring to static files like images which are uploaded to your site.
Proxy Caching
This is where Nginx caches the response from a backend server. There is no point in activating this for static files in the first place. The proxy cache is simply a store on your hard disc and the cost of retrieving such files is the same as if you just let Nginx serve them from there actual locations on the filesystem.
FastCGI Caching
Same as Proxy Caching. No point for the type of files that may be uploaded using POST or PUT.
Memcache
Here, the items are stored in RAM and there is a benefit to this. There are the basic Memcache and the extended Memc Modules both of which have procedures for both adding to and removing from the cache.
Your query however suggests you are using one of the earlier two and as said, there is absolutely no benefit in doing this for the type of files that may be uploaded using POST or PUT. When cached in Nginx, they will be read from a disc location that they will be kept on just as would be done if referenced from the original disc location. There is also the overhead of copying them from the original disc location to another disc location.
Except of course if I am missing something.

Memcache(d) vs. Varnish for speeding up 3 tier web architecture

I'm trying to speed up my benchmark (3 tier web architecture), and I have some general questions related to Memcache(d) and Varnish.
What is the difference?
It seems to me that Varnish is behind the web server, caching web pages and doesn't require change in code, just configuration.
On the other side, Memcached is general purpose caching system and mostly used to cache result from database and does require change in get method (first cache lookup).
Can I use both? Varnish in front web server and Memcached for database caching?
What is a better option?
(scenario 1 - mostly write,
scenario 2 - mostly read,
scenario 3 - read and write are similar)
Varnish is in front of the webserver; it works as a reverse http proxy that caches.
You can use both.
Mostly write -- Varnish will need to have affected pages purged. This will result in an overhead and little benefit for modified pages.
Mostly read -- Varnish will probably cover most of it.
Similar read & write -- Varnish will serve a lot of the pages for you, Memcache will provide info for pages that have a mixture of known and new data allowing you to generate pages faster.
An example that could apply to stackoverflow.com: adding this comment invalidated the page cache, so this page would have to be cleared from Varnish (and also my profile page, which probably isn't worth caching to begin with. Remembering to invalidate all affected pages may be a bit of an issue). All the comments, however, are still in Memcache, so the database only has to write this comment. Nothing else needs to be done by the database to generate the page. All the comments are pulled by Memcache, and the page is recached until somebody affects it again (perhaps by voting my answer up). Again, the database writes the vote, all other data is pulled from Memcache, and life is fast.
Memcache saves your DB from doing a lot of read work, Varnish saves your dynamic web server from CPU load by making you generate pages less frequently (and lightens the db load a bit as well if not for Memcache).
My experience comes from using Varnish with Drupal. In as simple terms as possible, here's how I'd answer:
In general, Varnish works for unauthenticated (via cookie) traffic and memcached will cache authenticated traffic.
So use both.