Are Rose::DB::Object::Cached memory cached through different processes? - perl

Are RDBOC objects cached through different processes? I'm thining of running it in mod-perl, and it would factor into things, even though it would mostly be used on things that don't change (much).
Also, do relationships referencing RDBOCs use the cache when it should intuitively?

Rose::DB::Object::Cached caches objects in plain-old (non-shared) memory. Under mod_perl, this means that each apache process has its own cache. You could, however, cache your objects on server startup. All of those cached objects would then be shared with each apache child process. This is most useful for read-only objects that you don't ever expect to change for the life of the server.
For more flexible caching options, check out Rose::DBx::Object::Cached::CHI.
As for your second question, Rose::DB::Object::Cached only reads from and writes to the cache on load() and save(). Most relationship methods use Manager queries to get objects and so will not read from the Rose::DB::Object::Cached cache.

Related

How to do caching if you can't afford misses at all?

I'm developing an app that is processing incoming data and currently needs to hit the database for each incoming datapoint. The problem is twofold:
the database can't keep up with the load
the database returns results for less than 5% of the queries
The first idea is to cache the data from the relational database into something like Redis to improve lookup speed. But all the regular caching strategies rely on the fact that you can fall back to the database if needed and fetch data from there. This is problematic in my case because for 95% of the queries there is nothing in the database and I don't have anything to store in the cache. I can of course store the empty results in the cache but that would mean that 95% (or even more, depending on the composition of data) of my cache storage would be rubbish.
The preferred way to do it would be to implement a caching system that doesn't have any misses: everything from the database is always present in the cache and therefore if it's not in the cache, then it's not in the database. After looking around though I found that the consistency of Redis does not seem reliable enough to always make that assumption - if the key doesn't exist in Redis, how can I be 100% sure that it doesn't exist in the database (assuming that we're not in the midst of an update)? It is a strong requirement that if there is a row in the database about an incoming datapoint, then it needs to be found and can't just be missed out on.
How do I go about designing a caching system that will always have the same data as the relational database - without having a fallback to look the data up in the database? Redis might not be the tool but what would you recommend? Is there a pattern or a keyword that I should look up that I haven't thought of?
There already is such a cache in the database: shared buffers. So all you have to do is to set shared_buffers big enough to contain the whole database and restart. Soon the whole database will be cached, and reading will cause no more I/O and will be fast.
That also works if you cannot cache the whole database, as long as you only need to access part of it: PostgreSQL will then just cache those 8kB-pages that are in use.
In my opinion, adding another external caching system can never do better than that. That is particularly true if data are ever modified: any external caching system would have to make sure that its data are not stale, which would introduce an additional overhead.

Should I have two seperate mongoDB instances?

My server queries the db often.
But more often than not, the query retrieves unchanged data.
Therefore I would like to create and store a cached result.
My main mongoDB is stored in a remote address, and therefore takes slightly longer to respond as compared to a local mongoDB instance. I thought it would be beneficial to have therefore an additional, smaller, more static mongoDB running on localhost.
Such that, real-time queries will run on the remote main DB, and smaller, time efficient queries will run on the cached collections in localhost for optimizing speed.
Is this something that can be done?
Is it something people recommend to avoid?
How would I set two connections, one to my main remote server and one
to my local server?
This seems wrong to me
var mongooseMain = require ('mongoose');
var mongooseLocal = require ('mongoose');
mongooseMain.connect(mainDBInfo.url);
mongooseLocal.connect(localDBInfo.url);
In principal, you have the right idea! Caching is a big part of building performant web applications.
First of all, MongoDB wants to cache everything it's using in memory and has a very well designed system of deciding what to keep in memory and what to toss out of it's cache. When an object is asked for that is not in it's cache, it has to read it from disk. When MongoDB reads from disk instead of memory it's called a page fault.
Of course, this memory cache is on a remote server so you still have network latency to deal with.
To eliminate this latency, I would recommend saving the serialized objects you read from often, but rarely write to, in Redis. This is what Redis was built to do. It's basically a dictionary (key:value) which you can easily SET and GET from. You can run redis-server easily on your local machine and even use SETEX to set your objects to the dictionary with some unique key and an expiry for when it should be evicted from the cache.
You can also manually evict objects from the cache whenever they do get updated (I would recommend re-writing them to the cache at this moment). Then, whenever you need an object, just make sure you always try to read from your cache first and fall back to MongoDB if the cache returns null for a key.
Check it out and good luck with your application!

Use a global NSCache or one per class?

I'm wondering if it's better to use one global instance of NSCache or several ones for each component that need it.
For example, I have several view subclasses that draw content into images and cache them to avoid regenerating them all the time.
Is it better to have one instance of NSCache per class or just one centralized NSCache for the whole app?
I don't mean one cache per instance of an object! I'm talking about one instance of NSCache for each class.
It obviously depends, but I would generally vote for one cache for each type of cached object. That way, you can have different countLimit values for each, e.g. to specify things like "keep the 50 most recently rendered thumbnails", "keep the 5 most recently downloaded large images", "keep the 10 most recently downloaded PDFs", etc.
For really computationally expensive tasks, I also employ two tier caching, NSCache for optimal performance, and saving to a temporary/cache directory in the local file system to avoid costly download cycles while not consuming RAM.
If cached items might be shared across components, you might as well have a unified cache - lookups won't be significantly more expensive and you at least have a chance of reducing redundant copies of your cached objects.
But if the cached items are unique per component, mixing them in a cache is pointless and from a code readability perspective likely confusing. Keep the caches separate, in that case. That also lets you more precisely control them - e.g. you could evict more aggressively in caches from components not being immediately used.

is memcached just instantiating another virtual operating system?

I have read a few tutorials on memcached and I have a few questions, in order to ease the pain of requests to the default database.
What is being instantiated to allow memcached to operate?
Is it virtual operating systems with say mysql installed or is the database in its entirety being stored in ram?
My other question is say i have a blog and using memcache and a user comes to request data from the browser and the request first checks the memcache for the data and sees that the data exists and is displayed to that user.
What if the data being requested doesn't match what is on the original database because i had updated it myself. how will the cache know that i changed it?
Is it always checking to see if the data on the db is the same as what is cached?
From the memcached front-page:
Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.
Although memcached is frequently used with MySQL, it has no particular ties to MySQL or any other database. It is just a simple key-value store providing constant time (O(1)) access to data cached by key. The data is stored in memory by the memcached process. (Much of this is explained on the FAQ).
Regarding your second question, it is really your application / your responsibility to ensure that memcached is notified of any changes. You can do this via reasonable expiration periods on your cached data or by using a script or the command line interface to manually purge stale entries. Some frameworks will handle notifying memcached of changes provided the change is made through the framework. Ultimately, if you need to ensure that users always have access to the latest data in real-time, than caching is not a good solution for your problem. Caching works on the principle that it's ok to occasionally serve up stale data -- you should construct your application so that it caches data that can be stale, but always uses look-ups to authoritative sources for data that must be fresh.
1
You will start a memcached server in every machine you need, assigning an amount of memory to dedicate to memcached.
Then with the library memcached you will use the amount of memory on every single server.
NB There is no manner to know in which server a single object will be stored.
2
The mechanism of duplicates is easy: you can set a timeout for the object. When the timeout elapses the system will delete that object.
To store an object you will assign to that object a key as an hash because you don t want that 2 object have the same key.

why memcached instead of hashmap

I am trying to understand what would be the need to go with a solution like memcached. It may seem like a silly question - but what does it bring to the table if all I need is to cache objects? Won't a simple hashmap do ?
Quoting from the memcache web site, memcache is…
Free & open source, high-performance,
distributed memory object caching
system, generic in nature, but
intended for use in speeding up
dynamic web applications by
alleviating database load.
Memcached is an in-memory key-value
store for small chunks of arbitrary
data (strings, objects) from results
of database calls, API calls, or page
rendering. Memcached is simple yet
powerful. Its simple design promotes
quick deployment, ease of development,
and solves many problems facing large
data caches. Its API is available for
most popular languages.
At heart it is a simple Key/Value
store
A key word here is distributed. In general, quoting from the memcache site again,
Memcached servers are generally
unaware of each other. There is no
crosstalk, no syncronization, no
broadcasting. The lack of
interconnections means adding more
servers will usually add more capacity
as you expect. There might be
exceptions to this rule, but they are
exceptions and carefully regarded.
I would highly recommend reading the detailed description of memcache.
Where are you going to put this hashmap? That's what it's doing for you. Any structure you implement on PHP is only there until the request ends. If you throw stuff in a persistent cache, you can fetch it back out for other requests, instead of rebuilding the data.
I know that this question is rather old, but in addition to being able to share a cache across multiple servers, there is also another aspect that is not mentioned in other answers and is the values expiration.
If you store the values in a HashMap, and that HashMap is bound to the Application context, it will keep growing in size, unless you expire items in some ways. Memcached expires object lazily for maximum performance.
When an item is added to the memcache, it can have an expiration time, for instance 600 seconds. After the object is expired it will just remain there, but if another object asks for it, it will purge it and return null.
Similarly, when memcached memory is full, it will look for the first expired item of adequate size and expire it to make room for the new item. Lastly, it can also happen that the cache is full and there isn't any item to expire, in which case it will replace the least used items.
Using a fully flagded cache system usually allow you to replicate the cache on many servers, or just scale to many server just to scale a lot of parallel requestes, all this remaining acceptable fast in term of reply.
There is an (old) article that compares different caching systems used by php:
https://www.percona.com/blog/2006/08/09/cache-performance-comparison/
Basically, file caching is faster than memcached.
So to answer the question, I believe you would have better performances using a file based cache system.
Here are the results from the tests of the article:
Cache Type Cache Gets/sec
Array Cache 365000
APC Cache 98000
File Cache 27000
Memcached Cache (TCP/IP) 12200
MySQL Query Cache (TCP/IP) 9900
MySQL Query Cache (Unix Socket) 13500
Selecting from table (TCP/IP) 5100
Selecting from table (Unix Socket) 7400