I haven't coded/tested this yet, but assume that you have 2 memcached instances of very small size, e.g. 1Mb each.
What happens when each cache is full and you try to "put" another object into cache? Assume all objects in cache have infinite expiry. Any ideas?
When the table is full, subsequent inserts cause older data to be purged in least recently used (LRU) order
Related
I have a part of a large table in PostgreSQL 12 that I would like to be cached at all times. The queries are such that the same rows would (almost) never be read twice, so I can't rely on the automatic caching. I'm reading up on pg_prewarm which seems suitable for loading the cache, but I don't find anything about preventing it being overwritten over time. Any hints? Thanks!
AFAIK there is no documented feature in PostgreSQL to lock pages for a specific object in the database cache.
That is true. The only way you can make sure that the table stays cached is to have shared_buffers big enough to contain the whole database.
But in practice that is no big problem: if a block doesn't get used regularly, it can drop out of the cache, so you have to read it in again when it gets used. But a single block contains many rows, so it only drops out if none of these rows are needed.
Items stored in Memcached seem to disappear without reason (TTL: 86400 but sometimes gone within 60s). However there's enough free space, and stats give zero evictions.
The items that get lost seem to be the larger items. They seem to disappear after adding some other big items. Could it be the case "The slab" for larger items is full and items are being evicted without being reported?
Memcached version 1.4.5.
Keys can get evicted before their expiration in memcached; this is a side effect of how memcached handles memory (see this answer for more details).
If the items you are storing are large enough that this is becoming a problem, memcached may be the wrong tool for the task you are trying to perform. You essentially have 2 practical options in this scenario:
break down the data you're trying to cache in smaller chunks
if this isn't feasible for any reason, you will have to use some sort of permanent storage, the nature of which will be dependent on the nature of data you're trying to store (choices would include redis, mongodb, SQL database, filesystem, etc.)
below is the scenario:
here is a access statistic system, just like Blogger's overviewstats function.
Statistic data is stored persistent in database(like MySQL), while using a key-value cache(now is memcache) to cache the access counts, each access only update the value in cache.
Now the question is how to synch back the latest count value to database?
A normal solution is to write back after some interval, but memcache will discard items when there is no enough spaces, some updates may lost.
so I think a better solution is if memcache can send a message(like JMS) when discarding an item, and then i can synch that item to database.
It seems that memcache does not provide this function, is there any other key-value cache can do this?
Or is there any better solutions?
Memcached is a cache, so you need to use it as one. When you update the access counts in memcached, you should also enqueue the updates so they can be written asynchronously to the database. That way, counts that fall out of the cache can be reloaded from the database.
I like the idea of memcached enqueuing items that are about to be discarded, but it's probably not going to happen in the main project due to performance considerations.
We are using Enterprise Library 5 and the CacheManager that it provides in our web application. Everything seems to be working fine up to the point where we start a heavy load test on the application.
We are caching records from the database using a key based on their ID. We are not requesting from cache one item all the time, sometimes we need to get a list of items from the cache. For this we have a LINQ query that makes a Select(e => CacheManager.GetData(id_from_list)) and returns the list of items from the cache. Most of the time this works fine but in heavy loads the GetData method becomes a bottleneck due to the locking that the cache manager is performing both on read and write operations from cache. Basically only one thread can read data from the cache at one time. We did create several cache managers based on the type of the items - this allows several threads to get data from different cache managers but still the issue remains when heavy loads hit the application (one bottleneck per cache manager) - of course it did improve the application up to some point but not enough.
Did someone else encountered the same problem and did you find a way to overcome this?
NOTE: We tried to actually cache lists of items and compose the key from the ids of the items in the list. This actually solved the problem and the cachemanager.getdata is not a bottleneck anymore ... BUT ... obviously this is not a good solution as we could have each item thousands of times in the cache in a lot of lists.
You may consider adapting the CacheManager to use a read/write lock (which I think is much more suitable for this situation) instead of the exclusive locking that it uses now.
http://msdn.microsoft.com/en-us/library/system.threading.readerwriterlock.aspx
Basically, a read/write lock is appropriate when multiple reader threads need simultaneous access to the data, and only the occurrence of a write will cause incoming readers to block.
These have other problems when put under load, however, such as write starvation. Depending on the read/write lock implementation a write will always wait for all reads to finish first - with a constant stream of reads a write will never have a chance to happen.
We are trying to update memcached objects when we write to the database to avoid having to read them from database after inserts/updates.
For our forum post object we have a ViewCount field containing the number of times a post is viewed.
We are afraid that we are introducing a race condition by updating the memcached object, as the same post could be viewed at the same time on another server in the farm.
Any idea how to deal with these kind of issues - it would seem that some sort of locking is needed but how to do it reliably across servers in a farm?
If you're dealing with data that doesn't necessarily need to be updated realtime, and to me the view count is one of them, then you could add an expires field to the objects that are stored in memcache.
Once that expiration happens, it'll go back to the database and read the new value, but until then it will leave it alone.
Of course for new posts you may want this updated more often, but you can code for this.
Memcache only stores one copy of your object in one of its instances, not in many of them, so I wouldn't worry about object locking or anything. That is for the database to handle, not your cache.
Edit:
Memcache offers no guarantee that when you're getting and setting from varied servers that your data won't get clobbered.
From memcache docs:
A series of commands is not atomic. If you issue a 'get' against an item, operate on the data, then wish to 'set' it back into memcached, you are not guaranteed to be the only process working on that value. In parallel, you could end up overwriting a value set by something else.
Race conditions and stale data
One thing to keep in mind as you design your application to cache data, is how to deal with race conditions and occasional stale data.
Say you cache the latest five comments for display on a sidebar in your application. You decide that the data only needs to be refreshed once per minute. However, you neglect to remember that this sidebar display is renderred 50 times per second! Thus, once 60 seconds rolls around and the cache expires, suddenly 10+ processes are running the same SQL query to repopulate that cache. Every time the cache expires, a sudden burst of SQL traffic will result.
Worse yet, you have multiple processes updating the same data, and the wrong one ends up dating the cache. Then you have stale, outdated data floating about.
One should be mindful about possible issues in populating or repopulating our cache. Remember that the process of checking memcached, fetching SQL, and storing into memcached, is not atomic at all!
I'm thinking - could a solution be to store viewcount seperately from the Post object, and then do an INCR on it. Of course this would require reading 2 seperate values from memcached when displaying the information.
memcached operations are atomic. the server process will queue the requests and serve each one completely before going to the next, so there's no need for locking.
edit: memcached has an increment command, which is atomic. You just have to store the counter as a separate value in the cache.
We encountered this in our system. We modified get so
If the value is unset, it sets it with a flag ('g') and [8] second TTL, and returns false so the calling function generates it.
If the value is not flagged (!== 'g') then unserialize and return it.
If the value is flagged (==='g') then wait 1 second and try again until it's not flagged. It will eventually be set by the other process, or expired by the TTL.
Our database load dropped by a factor of 100 when we implemented this.
function get($key) {
$value=$m->get($key);
if ($value===false) $m->set($key, 'g', $ttl=8);
else while ($value==='g') {
sleep(1);
$value=$m->get($key);
}
return $value;
}