I have a web application which would perform well if some database operations are cached. Those are static data's and new data are added everyday. To reduce the database read operation i'll be using memcached.
What will happen if i don't give an expiry time for the data i put in memcached. Will it affect the performance by consuming more RAM..? Is it good to ditch the expiry time while adding data to cache.
PS: We use AWS to deploy the webapp with ngnix, php, mysql.
Presumably when your app is still running in the year 2050, some things that you put in cache way back in 2012 will no longer be relevant. If you don't provide some sort of expiration, only a cache reset (e.g. server restart) will end up flushing the cache.
Unless you have infinite memory (and I'm pretty sure AWS doesn't provide that ;-) it is wise to add some expiration time to cached items.
Even though Memcached will expire items based on a least recently used mechanism (thanks #mikewied for pointing that out), the cache will still fill entirely before memcache begins evicting items based on LRU. Unfortunately, Memcache's LRU algorithm is per slab, not global. This means that LRU-based evictions can be less than optimal. See Memcached Memory Allocation and Optimization.
Related
My server queries the db often.
But more often than not, the query retrieves unchanged data.
Therefore I would like to create and store a cached result.
My main mongoDB is stored in a remote address, and therefore takes slightly longer to respond as compared to a local mongoDB instance. I thought it would be beneficial to have therefore an additional, smaller, more static mongoDB running on localhost.
Such that, real-time queries will run on the remote main DB, and smaller, time efficient queries will run on the cached collections in localhost for optimizing speed.
Is this something that can be done?
Is it something people recommend to avoid?
How would I set two connections, one to my main remote server and one
to my local server?
This seems wrong to me
var mongooseMain = require ('mongoose');
var mongooseLocal = require ('mongoose');
mongooseMain.connect(mainDBInfo.url);
mongooseLocal.connect(localDBInfo.url);
In principal, you have the right idea! Caching is a big part of building performant web applications.
First of all, MongoDB wants to cache everything it's using in memory and has a very well designed system of deciding what to keep in memory and what to toss out of it's cache. When an object is asked for that is not in it's cache, it has to read it from disk. When MongoDB reads from disk instead of memory it's called a page fault.
Of course, this memory cache is on a remote server so you still have network latency to deal with.
To eliminate this latency, I would recommend saving the serialized objects you read from often, but rarely write to, in Redis. This is what Redis was built to do. It's basically a dictionary (key:value) which you can easily SET and GET from. You can run redis-server easily on your local machine and even use SETEX to set your objects to the dictionary with some unique key and an expiry for when it should be evicted from the cache.
You can also manually evict objects from the cache whenever they do get updated (I would recommend re-writing them to the cache at this moment). Then, whenever you need an object, just make sure you always try to read from your cache first and fall back to MongoDB if the cache returns null for a key.
Check it out and good luck with your application!
below is the scenario:
here is a access statistic system, just like Blogger's overviewstats function.
Statistic data is stored persistent in database(like MySQL), while using a key-value cache(now is memcache) to cache the access counts, each access only update the value in cache.
Now the question is how to synch back the latest count value to database?
A normal solution is to write back after some interval, but memcache will discard items when there is no enough spaces, some updates may lost.
so I think a better solution is if memcache can send a message(like JMS) when discarding an item, and then i can synch that item to database.
It seems that memcache does not provide this function, is there any other key-value cache can do this?
Or is there any better solutions?
Memcached is a cache, so you need to use it as one. When you update the access counts in memcached, you should also enqueue the updates so they can be written asynchronously to the database. That way, counts that fall out of the cache can be reloaded from the database.
I like the idea of memcached enqueuing items that are about to be discarded, but it's probably not going to happen in the main project due to performance considerations.
I have read a few tutorials on memcached and I have a few questions, in order to ease the pain of requests to the default database.
What is being instantiated to allow memcached to operate?
Is it virtual operating systems with say mysql installed or is the database in its entirety being stored in ram?
My other question is say i have a blog and using memcache and a user comes to request data from the browser and the request first checks the memcache for the data and sees that the data exists and is displayed to that user.
What if the data being requested doesn't match what is on the original database because i had updated it myself. how will the cache know that i changed it?
Is it always checking to see if the data on the db is the same as what is cached?
From the memcached front-page:
Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.
Although memcached is frequently used with MySQL, it has no particular ties to MySQL or any other database. It is just a simple key-value store providing constant time (O(1)) access to data cached by key. The data is stored in memory by the memcached process. (Much of this is explained on the FAQ).
Regarding your second question, it is really your application / your responsibility to ensure that memcached is notified of any changes. You can do this via reasonable expiration periods on your cached data or by using a script or the command line interface to manually purge stale entries. Some frameworks will handle notifying memcached of changes provided the change is made through the framework. Ultimately, if you need to ensure that users always have access to the latest data in real-time, than caching is not a good solution for your problem. Caching works on the principle that it's ok to occasionally serve up stale data -- you should construct your application so that it caches data that can be stale, but always uses look-ups to authoritative sources for data that must be fresh.
1
You will start a memcached server in every machine you need, assigning an amount of memory to dedicate to memcached.
Then with the library memcached you will use the amount of memory on every single server.
NB There is no manner to know in which server a single object will be stored.
2
The mechanism of duplicates is easy: you can set a timeout for the object. When the timeout elapses the system will delete that object.
To store an object you will assign to that object a key as an hash because you don t want that 2 object have the same key.
I am trying to understand what would be the need to go with a solution like memcached. It may seem like a silly question - but what does it bring to the table if all I need is to cache objects? Won't a simple hashmap do ?
Quoting from the memcache web site, memcache is…
Free & open source, high-performance,
distributed memory object caching
system, generic in nature, but
intended for use in speeding up
dynamic web applications by
alleviating database load.
Memcached is an in-memory key-value
store for small chunks of arbitrary
data (strings, objects) from results
of database calls, API calls, or page
rendering. Memcached is simple yet
powerful. Its simple design promotes
quick deployment, ease of development,
and solves many problems facing large
data caches. Its API is available for
most popular languages.
At heart it is a simple Key/Value
store
A key word here is distributed. In general, quoting from the memcache site again,
Memcached servers are generally
unaware of each other. There is no
crosstalk, no syncronization, no
broadcasting. The lack of
interconnections means adding more
servers will usually add more capacity
as you expect. There might be
exceptions to this rule, but they are
exceptions and carefully regarded.
I would highly recommend reading the detailed description of memcache.
Where are you going to put this hashmap? That's what it's doing for you. Any structure you implement on PHP is only there until the request ends. If you throw stuff in a persistent cache, you can fetch it back out for other requests, instead of rebuilding the data.
I know that this question is rather old, but in addition to being able to share a cache across multiple servers, there is also another aspect that is not mentioned in other answers and is the values expiration.
If you store the values in a HashMap, and that HashMap is bound to the Application context, it will keep growing in size, unless you expire items in some ways. Memcached expires object lazily for maximum performance.
When an item is added to the memcache, it can have an expiration time, for instance 600 seconds. After the object is expired it will just remain there, but if another object asks for it, it will purge it and return null.
Similarly, when memcached memory is full, it will look for the first expired item of adequate size and expire it to make room for the new item. Lastly, it can also happen that the cache is full and there isn't any item to expire, in which case it will replace the least used items.
Using a fully flagded cache system usually allow you to replicate the cache on many servers, or just scale to many server just to scale a lot of parallel requestes, all this remaining acceptable fast in term of reply.
There is an (old) article that compares different caching systems used by php:
https://www.percona.com/blog/2006/08/09/cache-performance-comparison/
Basically, file caching is faster than memcached.
So to answer the question, I believe you would have better performances using a file based cache system.
Here are the results from the tests of the article:
Cache Type Cache Gets/sec
Array Cache 365000
APC Cache 98000
File Cache 27000
Memcached Cache (TCP/IP) 12200
MySQL Query Cache (TCP/IP) 9900
MySQL Query Cache (Unix Socket) 13500
Selecting from table (TCP/IP) 5100
Selecting from table (Unix Socket) 7400
I added memchached to my website.
And site started running very slow.
If I cancel memchached ,application backs to work quickly.
Why is this happening?And how to avoid it?
Thanks,
kukuwka
That is impossible to answer without knowing how you are using it and what data you are storing. For example, if you are using it as the HttpCache provider (if you are using ASP.NET), and you were previously using the in-process cache provider, then it will behave very differently; the in-process provider has no serialization or network costs, so you might be storing some insanely large objects in the cache. That is fine when it is in-process, but for any other provider this is very very bad; you will have to transfer and deserialize for every usage (and serialize and transfer for every storage).
There are ways to improve the serialization/deserialization/network times, but it sounds like you are simply storing too much data (or inappropriate data) in the cache at the moment. I'd address that first, and then look at tuning it.
Memcached doesn't mean "make things faster." It provides fast and very scalable access to a shared cache of something that is otherwise expensive to acquire.
If you add caching to something that's cheap, it may end up being slower.
For example, if it takes you five seconds to do something and you can cache that, then you'll save almost five seconds on each subsequent request assuming the results are still useful.
If it takes you a few nanoseconds to do it, then it'll slow you down considerably to fetch the results over the network.