Ive been using memcache before, decided to try out APC. Im having problems with it actually reading values, and respecting the expiry dates. I can set a 10 min expired date on a piece of data. Refresh the page, which will run a mysql query and cache the result into a key. On next load, it checks to see if the key is set, and if it is, it grabs data from it, instead of DB. Except it doesnt always do that... it still runs the query, about 1/2 the time, regardless of the key being set or not. They keys that are set, dont always expire when they are set to expire either either. And the command that deletes a key from the cache, doesn't always do that either.
I havent had these problems with memcache, which performed like clockwork.
Make sure APC isn't full -- it's possible that your keys are being pushed out of memory. The default configuration on many systems only allocates 32 megabytes, which is actually extremely easy to fill with PHP bytecode alone.
The best way to gain visibility into your APC cache utilization is via the apc.php script that ships with APC.
Related
MongoDB was accidentally broken, and now in 'repair'. (wiredTiger, old version 3.6)
In my case, repair work is more needed really for some instances, so if there is available option, I consider to use it and firstly to skip less necessary but more indexes, probably erroneous ones.
(But the job is likely to proceed all the data through instances, especially for 'wiredTiger' which is named for a sort of 'interleaved' status, then no way to prioritize, though.)
Second, repair is likely to work longer for those with more indexes, and longer for more data (even with less indexes),
whatever, progress log messages are in standard-out, BTW time expectation seems difficult (just keeping going even no log for four hours). Instances are different in size of less than 100GB or more than 1TB.
Logs being shown on screen would be saved as any file?
If there are some problems, assuming a certain instance has ones (e.g, complexity, poor structure, etc and as a result caused crash), the repair could give some of left crashed and others rescued?
And, practically there is no more possible method to recover instances if repair finally fails?
I've been looking around for a way to increase the expiration date for all keys stored in a memcached instance.
The reasoning behind that is simple :
I have memcache caching results from DB queries for a period of 300 seconds.
I sometimes need to perform DB operations that requires me to shut down the MySQL instance for a couple minutes.
To achieve that i usually look up to my configuration file and increase the "lifetime" setting for memcache to 24hours, then let some time pass and shut down mysql.
My problem is that some of the items that were stored for 300seconds are not re-pulled from the sql DB during those "few minutes" and therefore not cached, and it leads to errors for my end-user.
What i would like to achieve is to tell memcache to increase all currently stored keys' lifetime by a specific amount.
Is that possible?
Thanks.
Advice: Don't, currently you are trying to use memcache as a substitution of your db while your db is down.
Your db should NEVER be down, if you need to do maintenance you should look into having two db servers (master-master) so you can take one of them down, do the maintenance while the other one keeps working.
Memcache is supposed to be use to speed up things, not as a hacky way to solve other problems.
I understand that probably using memcache for this looks like a simple and good idea, but trust me, it is not.
My application needs only read access to all of its databases. One of those databases (db_1) hosts a collection coll_1 whose entire contents* need to be replaced periodically**.
My goal is to have no or very little effect on read performance for servers currently connected to the database.
Approaches I could think of with so far:
1. renameCollection
Build a temporary collection coll_tmp, then use renameCollection with dropTarget: true to move its contents over to coll_1. The downside of this approach is that as far as I can tell, renameCollection does not copy indexes, so once the collection is renamed, coll_1 would need reindexing. While I don't have a good estimate of how long this would take, I would think that query-performance will be significantly affected until reindexing is complete.
2. TTL Index
Instead of straight up replacing, use a time-to-live index to expire documents after the chosen replacement period. Insert new data every time period. This seems like a decent solution to me, except that for our specific application, old data is better than no data. In this scenario, if the cron job to repopulate the database fails for whatever reason, we could potentially be left with an empty coll_1 which is undesirable. I think this might have a negligible effect, but this solution also requires on-the-fly indexing as every document is inserted.
3. Communicate current database to read-clients
Simply use two different databases (or collections?) and inform connected clients which one is more recent. This solution would allow for finishing indexing the new coll_1_alt (and then coll_1 again) before making it available. I personally dislike the solution since it couples the read clients very closely to the database itself, and of course communication channels are always imperfect.
4. copyDatabase
Use copyDatabase to rename (designate) an alternate database db_tmp to db_1.db_tmp would also have a collection coll_1. Once reindexing is complete on db_tmp.coll_1, copyDatabase could be used to simply rename db_tmp to db_1. It seems that this would require droppping db_1 before renaming, leaving a window in which data won't be accessible.
Ideally (and naively), I'd just set db_1 to be something akin to a symlink, switching to the most current database as needed.
Anyone has good suggestions on how to achieve the desired effect?
*There are about 10 million documents in coll_1.
** The current plan is to replace the collection once every 24 hours. The replacement interval might get as low as once every 30 minutes, but not lower.
The problem that you point out in option 4 you will also have with option 1. dropTarget will also mean that the collection is not available.
Another alternative could be to just have both the old and the new data in the same collection, and use a "version ID" that you then still have to communicate to your clients to do a query on. That at least stops you from having to do reindexing like you pointed out for option 1.
I think your best bet is actually option 3, and it's the most equivalent to changing a symlink, except it is on the client side.
I have a web application which would perform well if some database operations are cached. Those are static data's and new data are added everyday. To reduce the database read operation i'll be using memcached.
What will happen if i don't give an expiry time for the data i put in memcached. Will it affect the performance by consuming more RAM..? Is it good to ditch the expiry time while adding data to cache.
PS: We use AWS to deploy the webapp with ngnix, php, mysql.
Presumably when your app is still running in the year 2050, some things that you put in cache way back in 2012 will no longer be relevant. If you don't provide some sort of expiration, only a cache reset (e.g. server restart) will end up flushing the cache.
Unless you have infinite memory (and I'm pretty sure AWS doesn't provide that ;-) it is wise to add some expiration time to cached items.
Even though Memcached will expire items based on a least recently used mechanism (thanks #mikewied for pointing that out), the cache will still fill entirely before memcache begins evicting items based on LRU. Unfortunately, Memcache's LRU algorithm is per slab, not global. This means that LRU-based evictions can be less than optimal. See Memcached Memory Allocation and Optimization.
We are trying to update memcached objects when we write to the database to avoid having to read them from database after inserts/updates.
For our forum post object we have a ViewCount field containing the number of times a post is viewed.
We are afraid that we are introducing a race condition by updating the memcached object, as the same post could be viewed at the same time on another server in the farm.
Any idea how to deal with these kind of issues - it would seem that some sort of locking is needed but how to do it reliably across servers in a farm?
If you're dealing with data that doesn't necessarily need to be updated realtime, and to me the view count is one of them, then you could add an expires field to the objects that are stored in memcache.
Once that expiration happens, it'll go back to the database and read the new value, but until then it will leave it alone.
Of course for new posts you may want this updated more often, but you can code for this.
Memcache only stores one copy of your object in one of its instances, not in many of them, so I wouldn't worry about object locking or anything. That is for the database to handle, not your cache.
Edit:
Memcache offers no guarantee that when you're getting and setting from varied servers that your data won't get clobbered.
From memcache docs:
A series of commands is not atomic. If you issue a 'get' against an item, operate on the data, then wish to 'set' it back into memcached, you are not guaranteed to be the only process working on that value. In parallel, you could end up overwriting a value set by something else.
Race conditions and stale data
One thing to keep in mind as you design your application to cache data, is how to deal with race conditions and occasional stale data.
Say you cache the latest five comments for display on a sidebar in your application. You decide that the data only needs to be refreshed once per minute. However, you neglect to remember that this sidebar display is renderred 50 times per second! Thus, once 60 seconds rolls around and the cache expires, suddenly 10+ processes are running the same SQL query to repopulate that cache. Every time the cache expires, a sudden burst of SQL traffic will result.
Worse yet, you have multiple processes updating the same data, and the wrong one ends up dating the cache. Then you have stale, outdated data floating about.
One should be mindful about possible issues in populating or repopulating our cache. Remember that the process of checking memcached, fetching SQL, and storing into memcached, is not atomic at all!
I'm thinking - could a solution be to store viewcount seperately from the Post object, and then do an INCR on it. Of course this would require reading 2 seperate values from memcached when displaying the information.
memcached operations are atomic. the server process will queue the requests and serve each one completely before going to the next, so there's no need for locking.
edit: memcached has an increment command, which is atomic. You just have to store the counter as a separate value in the cache.
We encountered this in our system. We modified get so
If the value is unset, it sets it with a flag ('g') and [8] second TTL, and returns false so the calling function generates it.
If the value is not flagged (!== 'g') then unserialize and return it.
If the value is flagged (==='g') then wait 1 second and try again until it's not flagged. It will eventually be set by the other process, or expired by the TTL.
Our database load dropped by a factor of 100 when we implemented this.
function get($key) {
$value=$m->get($key);
if ($value===false) $m->set($key, 'g', $ttl=8);
else while ($value==='g') {
sleep(1);
$value=$m->get($key);
}
return $value;
}