Clearing APC cache with postgres trigger - postgresql

I was looking for a suitable caching solution for a PHP app.
I decided to let the application do all the "first of all I must land on the right server within the cluster", so I use the much faster APC cache, rather then memcache.
It does include an overhead, to find(in terms of improving the caching) the right server to land on, but I kind of like it.
I heard there was a project pgmemcache, to for example clear outdated memcached entries from within postgres triggers.
I do handle outdated date my own way, but Im still curious if theres something out there to acces APC cache from within postgres triggers.
Thanks in advance,
kriscom

I don't see any equivalent of pgmemcache for APC. Pgmemcache is open source, so you could use it as a basis for creating an APC equivalent: https://github.com/Ormod/pgmemcache.
If it is OK for your cache to be a little stale, you could create a table in Postgres to function as an invalidation/update queue. Use a trigger to insert a row when the cache needs to be updated. Then create a PHP script that constantly polls the queue and performs the cache manipulations.
I would not suggest spreading your cache management across layers. Either do it all in you data access layer or all at the database layer but don't mix them.

Related

How to do caching if you can't afford misses at all?

I'm developing an app that is processing incoming data and currently needs to hit the database for each incoming datapoint. The problem is twofold:
the database can't keep up with the load
the database returns results for less than 5% of the queries
The first idea is to cache the data from the relational database into something like Redis to improve lookup speed. But all the regular caching strategies rely on the fact that you can fall back to the database if needed and fetch data from there. This is problematic in my case because for 95% of the queries there is nothing in the database and I don't have anything to store in the cache. I can of course store the empty results in the cache but that would mean that 95% (or even more, depending on the composition of data) of my cache storage would be rubbish.
The preferred way to do it would be to implement a caching system that doesn't have any misses: everything from the database is always present in the cache and therefore if it's not in the cache, then it's not in the database. After looking around though I found that the consistency of Redis does not seem reliable enough to always make that assumption - if the key doesn't exist in Redis, how can I be 100% sure that it doesn't exist in the database (assuming that we're not in the midst of an update)? It is a strong requirement that if there is a row in the database about an incoming datapoint, then it needs to be found and can't just be missed out on.
How do I go about designing a caching system that will always have the same data as the relational database - without having a fallback to look the data up in the database? Redis might not be the tool but what would you recommend? Is there a pattern or a keyword that I should look up that I haven't thought of?
There already is such a cache in the database: shared buffers. So all you have to do is to set shared_buffers big enough to contain the whole database and restart. Soon the whole database will be cached, and reading will cause no more I/O and will be fast.
That also works if you cannot cache the whole database, as long as you only need to access part of it: PostgreSQL will then just cache those 8kB-pages that are in use.
In my opinion, adding another external caching system can never do better than that. That is particularly true if data are ever modified: any external caching system would have to make sure that its data are not stale, which would introduce an additional overhead.

Should I have two seperate mongoDB instances?

My server queries the db often.
But more often than not, the query retrieves unchanged data.
Therefore I would like to create and store a cached result.
My main mongoDB is stored in a remote address, and therefore takes slightly longer to respond as compared to a local mongoDB instance. I thought it would be beneficial to have therefore an additional, smaller, more static mongoDB running on localhost.
Such that, real-time queries will run on the remote main DB, and smaller, time efficient queries will run on the cached collections in localhost for optimizing speed.
Is this something that can be done?
Is it something people recommend to avoid?
How would I set two connections, one to my main remote server and one
to my local server?
This seems wrong to me
var mongooseMain = require ('mongoose');
var mongooseLocal = require ('mongoose');
mongooseMain.connect(mainDBInfo.url);
mongooseLocal.connect(localDBInfo.url);
In principal, you have the right idea! Caching is a big part of building performant web applications.
First of all, MongoDB wants to cache everything it's using in memory and has a very well designed system of deciding what to keep in memory and what to toss out of it's cache. When an object is asked for that is not in it's cache, it has to read it from disk. When MongoDB reads from disk instead of memory it's called a page fault.
Of course, this memory cache is on a remote server so you still have network latency to deal with.
To eliminate this latency, I would recommend saving the serialized objects you read from often, but rarely write to, in Redis. This is what Redis was built to do. It's basically a dictionary (key:value) which you can easily SET and GET from. You can run redis-server easily on your local machine and even use SETEX to set your objects to the dictionary with some unique key and an expiry for when it should be evicted from the cache.
You can also manually evict objects from the cache whenever they do get updated (I would recommend re-writing them to the cache at this moment). Then, whenever you need an object, just make sure you always try to read from your cache first and fall back to MongoDB if the cache returns null for a key.
Check it out and good luck with your application!

using postgres server-side cursor for caching

To speed page generation for pages based on large postgres collections, we cache query results in memcache. However, for immutable collections that are very large, or that are rarely accessed, I'm wondering if saving server side cursors in postgres would be a viable alternate caching strategy.
The idea is that after having served a page in the middle of a collection "next" and "prev" links are much more likely to be used than a random query somewhere else in the collection. Could I have a cursor "WITH HOLD" in the neighborhood to avoid the (seemingly unavoidable) large startup costs of the query?
I wonder about resource consumption on the server. If the collection is immutable, saving the cursor shouldn't need very many resources, but I wonder how optimized postgres is in this respect. Any links to further documentation would be appreciated.
You're going to run into a lot of issues.
You 'd have to ensure the same user gets the same sql connection
You have to create a cleanup strategy
The cursors will be holding up vacuum operations.
You have to convince the connection pool to not clear the cursors
Probably other issues I have not mentioned.
In short: don't do it. How about precalculating the next/previous page in background, and storing it in memcached?
A good answer to this has previously been made Best way to fetch the continuous list with PostgreSQL in web
The questions are similar, essentially you store a list of PKs on the server with a pagination-token and an expiration.

Using memcache infront of a mongodb server

I am trying to understand how mongo's internal cache works and if it does eliminate using memcache. Our database size is around 200G and index fits in the memory but after the index not much free memory left on the server.
One of my colleague says mongo's internal cache will be as fast as memcache so no need to introduce another level of complexity by using memcache.
The scenario in my head is when we read the data from db, it's saved in memcache and next time it's directly read from the cache instead of going back to db server. If the data is changed and needs to be saved/updated, it's done on both memcache server and database server.
I have been reading about this but couldn't convince myself yet. So I'd really appreciate if someone could shed some light on this.
First thing is that a cache storage is different to a database. So MongoDB and SQL are different in purpose and usage when compared to Memcache.
Memcache is really good at lowering working set sizes for queries. For example: imagine a huge aggregated query with subselects and CASE statements and what not in SQL (think of the most complex query you can), doing this query in realtime all the time could cause the computer(s) to "thrash" (not to mention the problems client side).
However as everyone knows you need only summarise this query to another collection/table for it to be instantly faster. The real speed of memcache comes from the fact that it is a in memory key value store. This is where MongoDB could fail in speed because it is not memory stored, it is memory mapped but not stored.
MongoDB does no self caching, providing the query is "hot" and in LRU (this is where your working set comes in) you shouldn't notice much of a difference in response times. A good way to ensure a query is "hot" is to run it. Some people have a script of their biggest queries that they run to warm up the cache.
As I said memcache is a cache layer this is why:
If the data is changed and needs to be saved/updated, it's done on both memcache server and database server.
Makes me die a little inside. Many do blur the line between the DB and the cache layer.

why memcached instead of hashmap

I am trying to understand what would be the need to go with a solution like memcached. It may seem like a silly question - but what does it bring to the table if all I need is to cache objects? Won't a simple hashmap do ?
Quoting from the memcache web site, memcache is…
Free & open source, high-performance,
distributed memory object caching
system, generic in nature, but
intended for use in speeding up
dynamic web applications by
alleviating database load.
Memcached is an in-memory key-value
store for small chunks of arbitrary
data (strings, objects) from results
of database calls, API calls, or page
rendering. Memcached is simple yet
powerful. Its simple design promotes
quick deployment, ease of development,
and solves many problems facing large
data caches. Its API is available for
most popular languages.
At heart it is a simple Key/Value
store
A key word here is distributed. In general, quoting from the memcache site again,
Memcached servers are generally
unaware of each other. There is no
crosstalk, no syncronization, no
broadcasting. The lack of
interconnections means adding more
servers will usually add more capacity
as you expect. There might be
exceptions to this rule, but they are
exceptions and carefully regarded.
I would highly recommend reading the detailed description of memcache.
Where are you going to put this hashmap? That's what it's doing for you. Any structure you implement on PHP is only there until the request ends. If you throw stuff in a persistent cache, you can fetch it back out for other requests, instead of rebuilding the data.
I know that this question is rather old, but in addition to being able to share a cache across multiple servers, there is also another aspect that is not mentioned in other answers and is the values expiration.
If you store the values in a HashMap, and that HashMap is bound to the Application context, it will keep growing in size, unless you expire items in some ways. Memcached expires object lazily for maximum performance.
When an item is added to the memcache, it can have an expiration time, for instance 600 seconds. After the object is expired it will just remain there, but if another object asks for it, it will purge it and return null.
Similarly, when memcached memory is full, it will look for the first expired item of adequate size and expire it to make room for the new item. Lastly, it can also happen that the cache is full and there isn't any item to expire, in which case it will replace the least used items.
Using a fully flagded cache system usually allow you to replicate the cache on many servers, or just scale to many server just to scale a lot of parallel requestes, all this remaining acceptable fast in term of reply.
There is an (old) article that compares different caching systems used by php:
https://www.percona.com/blog/2006/08/09/cache-performance-comparison/
Basically, file caching is faster than memcached.
So to answer the question, I believe you would have better performances using a file based cache system.
Here are the results from the tests of the article:
Cache Type Cache Gets/sec
Array Cache 365000
APC Cache 98000
File Cache 27000
Memcached Cache (TCP/IP) 12200
MySQL Query Cache (TCP/IP) 9900
MySQL Query Cache (Unix Socket) 13500
Selecting from table (TCP/IP) 5100
Selecting from table (Unix Socket) 7400