memcache, multiple instances vs single instance

memcache, multiple instances vs single instance - memcached

I've got an website running php/mysql.
The app will be 'copied' multiple times. each instance will get its own database, set of php files (wildcard)domain, you get the idea.
Now some parts of the app is cached using memcache and I am wondering if it'd be better/faster/more convient to use
one memcache instance per app instance
or one big memcache instance and prefix each key with some unique token per app instance
I would guess the advantage to using a single memcache instance would be a large 'index table'
where as multiple memcache instances would cause more overhead
Thanks for any help

Use one memcached instance and prefix your keys -- this is the easiest way to ensure optimal resource utilization. If you have a separate memcached instance per application instance, an unused, idle instance claims memory that your active instances can't use.

Related

Different Resources allocations for different databases on same Postgres instance

I have a postgres instance with multiple databases. Can I limit the amount of resources one database (lower priority) will use, but allow the other database (higher priority) to use as many resources as it wants?

This is a limit on Postgres itself, as mentioned here there is no way to prioritize resources to one database over the other. This would have to be done through the OS.
Since you are using Cloud SQL which is a managed service, you won't be able to do such things as temper with the OS in order to achieve this. If you want to do this, you would need to use a normal Compute Engine instance so you are able to manage the resources on it.
Hope you find this information useful!

Should I have two seperate mongoDB instances?

My server queries the db often.
But more often than not, the query retrieves unchanged data.
Therefore I would like to create and store a cached result.
My main mongoDB is stored in a remote address, and therefore takes slightly longer to respond as compared to a local mongoDB instance. I thought it would be beneficial to have therefore an additional, smaller, more static mongoDB running on localhost.
Such that, real-time queries will run on the remote main DB, and smaller, time efficient queries will run on the cached collections in localhost for optimizing speed.
Is this something that can be done?
Is it something people recommend to avoid?
How would I set two connections, one to my main remote server and one
to my local server?
This seems wrong to me
var mongooseMain = require ('mongoose');
var mongooseLocal = require ('mongoose');
mongooseMain.connect(mainDBInfo.url);
mongooseLocal.connect(localDBInfo.url);

In principal, you have the right idea! Caching is a big part of building performant web applications.
First of all, MongoDB wants to cache everything it's using in memory and has a very well designed system of deciding what to keep in memory and what to toss out of it's cache. When an object is asked for that is not in it's cache, it has to read it from disk. When MongoDB reads from disk instead of memory it's called a page fault.
Of course, this memory cache is on a remote server so you still have network latency to deal with.
To eliminate this latency, I would recommend saving the serialized objects you read from often, but rarely write to, in Redis. This is what Redis was built to do. It's basically a dictionary (key:value) which you can easily SET and GET from. You can run redis-server easily on your local machine and even use SETEX to set your objects to the dictionary with some unique key and an expiry for when it should be evicted from the cache.
You can also manually evict objects from the cache whenever they do get updated (I would recommend re-writing them to the cache at this moment). Then, whenever you need an object, just make sure you always try to read from your cache first and fall back to MongoDB if the cache returns null for a key.
Check it out and good luck with your application!

G-WAN Key-Value Store

What would you consider the best solution to store via G-WAN Key-Value Store my values in RAM and multi-threaded, and able to be used by all my scripts (from other virtual servers or not) ?
Thank you in advance.

I wish to store different values in different "storages" so as to be able to recover each one via a "key" (type char).
The G-WAN KV store does that (for any type of data: binary too).
Once your application will have millions of concurrent users, one way to speed-up lookups will be to use different G-WAN servers to host either a partitioned data set or a redundant data set (it all depends on the type of your application).
The G-WAN reverse-proxy featuring an elastic load-balancer makes such things this almost transparent for developers.
I do not care that the data is lost when you restart g-wan.
Then you won't have to use a persistant layer like mySQL, etc.
So it would be fine (I think) to have a persistent pointer but I'm not sure that this is the most suitable solution
Look at the persistence.c example for about how to share common data among all worker threads in G-WAN.
But you can avoid that if you are using G-WAN with one single worker thread (./gwan -w 1). One thread is more than enough to start developing and even to operate your application until the point you will need to process more requests.
With one single thread, you can just use a static pointer to your G-WAN KV store (unless different scripts need to access it).

is memcached just instantiating another virtual operating system?

I have read a few tutorials on memcached and I have a few questions, in order to ease the pain of requests to the default database.
What is being instantiated to allow memcached to operate?
Is it virtual operating systems with say mysql installed or is the database in its entirety being stored in ram?
My other question is say i have a blog and using memcache and a user comes to request data from the browser and the request first checks the memcache for the data and sees that the data exists and is displayed to that user.
What if the data being requested doesn't match what is on the original database because i had updated it myself. how will the cache know that i changed it?
Is it always checking to see if the data on the db is the same as what is cached?

From the memcached front-page:
Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.
Although memcached is frequently used with MySQL, it has no particular ties to MySQL or any other database. It is just a simple key-value store providing constant time (O(1)) access to data cached by key. The data is stored in memory by the memcached process. (Much of this is explained on the FAQ).
Regarding your second question, it is really your application / your responsibility to ensure that memcached is notified of any changes. You can do this via reasonable expiration periods on your cached data or by using a script or the command line interface to manually purge stale entries. Some frameworks will handle notifying memcached of changes provided the change is made through the framework. Ultimately, if you need to ensure that users always have access to the latest data in real-time, than caching is not a good solution for your problem. Caching works on the principle that it's ok to occasionally serve up stale data -- you should construct your application so that it caches data that can be stale, but always uses look-ups to authoritative sources for data that must be fresh.

1
You will start a memcached server in every machine you need, assigning an amount of memory to dedicate to memcached.
Then with the library memcached you will use the amount of memory on every single server.
NB There is no manner to know in which server a single object will be stored.
2
The mechanism of duplicates is easy: you can set a timeout for the object. When the timeout elapses the system will delete that object.
To store an object you will assign to that object a key as an hash because you don t want that 2 object have the same key.

why memcached instead of hashmap

I am trying to understand what would be the need to go with a solution like memcached. It may seem like a silly question - but what does it bring to the table if all I need is to cache objects? Won't a simple hashmap do ?

Quoting from the memcache web site, memcache is…
Free & open source, high-performance,
distributed memory object caching
system, generic in nature, but
intended for use in speeding up
dynamic web applications by
alleviating database load.
Memcached is an in-memory key-value
store for small chunks of arbitrary
data (strings, objects) from results
of database calls, API calls, or page
rendering. Memcached is simple yet
powerful. Its simple design promotes
quick deployment, ease of development,
and solves many problems facing large
data caches. Its API is available for
most popular languages.
At heart it is a simple Key/Value
store
A key word here is distributed. In general, quoting from the memcache site again,
Memcached servers are generally
unaware of each other. There is no
crosstalk, no syncronization, no
broadcasting. The lack of
interconnections means adding more
servers will usually add more capacity
as you expect. There might be
exceptions to this rule, but they are
exceptions and carefully regarded.
I would highly recommend reading the detailed description of memcache.

Where are you going to put this hashmap? That's what it's doing for you. Any structure you implement on PHP is only there until the request ends. If you throw stuff in a persistent cache, you can fetch it back out for other requests, instead of rebuilding the data.

I know that this question is rather old, but in addition to being able to share a cache across multiple servers, there is also another aspect that is not mentioned in other answers and is the values expiration.
If you store the values in a HashMap, and that HashMap is bound to the Application context, it will keep growing in size, unless you expire items in some ways. Memcached expires object lazily for maximum performance.
When an item is added to the memcache, it can have an expiration time, for instance 600 seconds. After the object is expired it will just remain there, but if another object asks for it, it will purge it and return null.
Similarly, when memcached memory is full, it will look for the first expired item of adequate size and expire it to make room for the new item. Lastly, it can also happen that the cache is full and there isn't any item to expire, in which case it will replace the least used items.

Using a fully flagded cache system usually allow you to replicate the cache on many servers, or just scale to many server just to scale a lot of parallel requestes, all this remaining acceptable fast in term of reply.

There is an (old) article that compares different caching systems used by php:
https://www.percona.com/blog/2006/08/09/cache-performance-comparison/
Basically, file caching is faster than memcached.
So to answer the question, I believe you would have better performances using a file based cache system.
Here are the results from the tests of the article:
Cache Type Cache Gets/sec
Array Cache 365000
APC Cache 98000
File Cache 27000
Memcached Cache (TCP/IP) 12200
MySQL Query Cache (TCP/IP) 9900
MySQL Query Cache (Unix Socket) 13500
Selecting from table (TCP/IP) 5100
Selecting from table (Unix Socket) 7400