is memcache relative to database - memcached

I have been browsing through a lot of websites. I need experts advice on this one.
can anyone please explain me what exactly is memcache ?
From what I understand that it is a distributed memory caching system used for dynamic web apps but my main question is do we need a database when we say 'memcache' or the term 'memcache' doesnt need a database ?
please answer. Thank you

No, you don't require a traditional database when you say memcache, it's an in memory hash table(dictionary) with key,value storage and so it resides in RAM as a lookup table.For this fact, it is not persistent, so whenever you restart your server, memcache gets reset.

memcached is a specific program that runs a server that other programs can use to keep things in memory. It's something like an in-memory database, depending on your definition of database.
Caching something in memory can also be done generically, without memcached (pronounced memcache dee).

Related

mongodb performance for large document

I have a document that holds a big data structure in certain fields inside an array, it is slowing down my application due to frequent hits to read such data. am thinking on few solutions to implement but I need advice before i proceed and possibly even a better solution, here are my thoughts/questions:
would it help to cache data?
should I use memcached or redis as a caching engine and why?
would it help to read single fields from this document instead of reading it all every time?
should I do something else?!
Caching will help because it would avoid your db to be hit too often
Memcache or redis it's up to you. I prefere redis but if you already have a memcache it's fine.
If you have a cluster of servers, think if you need a centralized cache or not
Caching a full document won't help for getting a single field because you cache the result of a query without knowing what it contains.
your question need more clarification. for example how big is the data that you are speaking of is it couple of megabytes or gigabytes. All these factors change the solution. But if we consider that you have couple of megabytes and you want to prevent to call database every time the best solution is cache. How to choose a cache is also completely depends on what is your situation. If your web application runs on one server you can use the in-memory cache like ASP.Net cache which is very quick and fast for in-memory cache. this cache is stored in your heap so you can put all your object in the cache without serialization.But consider that whenever your application is restarted like most of deployments. your heap will be deleted and all the cache is cleared inside the heap.
if you have more than one server then you can start to think about an out-of-memory cache because two servers are not sharing heap memory and using all in-memory cache are useless because it duplicate the data and invalidating is nightmare. However, this is more reliable cache while it is not in the heap and in term of persistence is more than in-memory cache. But whatever you want to put in this kind of cache should be serializable while you are transferring the object over network connection. So you cannot put all your object in cache. Both Redis and memcached can be used for this purpose. Redis is more complicated with more functionality than Memcached but for your purpose memcached is quite good.
Whatever caching system you choose, approach it in a wide perspective. Design a caching system in your application while over time you need to put more things in cache. so its better to prepare everything for that time from now.
another things which is very important in cache is that whenever you set something in cache you have to consider when you are going to invalidate it.
Whether or not caching will help depends on the accession of the document. If the document is being accessed multiple times then caching will not help due to how MongoDB to memory caching actually works.
First, you need to understand your data accession patterns.

Keeping postgres entirely in memory

I am running various tests that spend a lot of time in the database.
I'd like to keep it all in memory and have it not touch the db, hopefully that would speed things up. Like using sqlite3's in-memory option. I don't need persistence/durability/whatnot, everything is immediately discarded after the test.
Is that possible? I tried tweaking my postgres memory-related vars (as in the solution below), but that doesn't seem to affect the number of db writes it performs, and I couldn't find anything that looks like an 'in-memory' option.
https://dba.stackexchange.com/questions/18484/tuning-postgresql-for-large-amounts-of-ram
I wrote a detailed post on this some time ago:
Optimise PostgreSQL for fast testing
You may find it informative; it covers options for making PostgreSQL run without durability and other tweaks that're useful for running tests.
You do not actually need in-memory operation. If PostgreSQL is set not to flush changes to disk then in practice there'll be little difference for DBs that fit in RAM, and for DBs that don't fit in RAM it won't crash.
You should test with the same database engine you're using in production. Testing with SQLite, Derby, H2, etc then deploying live on PostgreSQL doesn't make tons of sense... as any Heroku/Rails user can tell you from experience.

Does it make sense to use both redis and mongodb?

We have a lot of data, decided to use mongodb and it works great.
We started using redis to track the active users in our real-time app. We also started doing some pub/sub channel stuff with redis.
Our next move might be to use mongodb for dormant data and redis for active data. An example of this would be, all of our users are stored in mongodb but when they are logged in we will move a copy of that data to redis for fast access. We also store things like their game activity in redis and use the data accordingly. When the user logs out we will save anything needed in mongo where it will live until its needed again and loaded into redis.
One thing we have been looking into is preservation of redis on crash. User activity on the system is meaningful data that we wouldn't want to lose on crash, and if we are only logging data after the fact, should we save a back up of important data in mongo after every event? Then on crash redis can restore from mongo?
Is there are better way to go about the things we are trying to achieve?
Thanks!
OK, so there are several angles from which to attack this question. The first thing to point out is that redis does have user-configurable persistence.
User activity on the system is meaningful data that we wouldn't want to lose on crash, and if we are only logging data after the fact, should we save a back up of important data in mongo after every event?
To be fair, the default setup with MongoDB is to flush to disk every 60 seconds. So you still have a 60 second window of data loss.
You can use journaling and drop that window to 100ms, but that will tax the IO more heavily.
You can also configure your writers to wait on that journal to flush (WriteConcern: fsync), but that's going to slow down writes significantly.
Is there are better way to go about the things we are trying to achieve?
Really depends on what you're trying to achieve.
What type of data loss can you handle?
Redis has replication, are you using that? Does that solve most of your data loss worries?
You say you're using PubSub features, how many nodes does this cover? Is your data adequately replicated just as a result of this?
Either way, it's a somewhat complex problem. MongoDB may kind of solve your problems, but replication may solve those problems just as well. Depends on your comfort level.

Is running postgresql in memory a good idea?

Recently we are working on migrate our software from general PC server to a kind of embedded system which use Disk on module (DOM) instead of hard disk drive.
My colleague insist that as DOM could only support about 1 million times of write operation, we should running our database entirely in a RAM disk and backup the database to DOM.
There 3 ways to trigger the backup :
User trigger
Every 30 minutes
Every time when there is some add/update/delete operation in database
As we expecte that user will only modify the database when system is installed, I think maybe postgresql would not write that often.
But I don't know much about postgresql, I can not judge if it worth all this trouble and which approach is better.
What do you think about it?
The problem of wearing out SSDs can be alleviated by whatever firmware the SSD has. Sometimes those chipsets don't do it well, or leave the responsibility to someone else. In this case, you can use a filesystem designed to do wear levelling by itself. UBIFS or LogFS are suitable filesystems.
Assuming that the claim about the DOM write cycles is true, which I can't comment on, then this won't work very well. PostgreSQL assumes that it can write whatever it wants whenever it wants (even if no logical updates are happening), and you have no real chance of making it go along with the 3 triggers that you mention.
What you could do is have the entire thing run on a RAM disk and have some operating system process back this up atomically to permanent storage. This needs careful file system and kernel support. This could work if your device is on most of the time, but probably not so well if it's the sort of thing that you switch on and off like a TV, because the recovery times could be annoying.
Alternatives are using either a more embedded-like RDBMS such as SQLite, or using a storage system that can handle PostgreSQL, like the recent solid state drives, although some SSDs have bogus cache settings that might make them unsuitable for PostgreSQL.

How to link MemCached server together?

I'm looking into using MemCached for a web application I am developing and after researching MemCached over the past few days, I have come across a question I could not find the answer to.
How do you link Memcached server together or how do you replicate data between MemCached server?
Additionally: Is this functionality controlled by the servers or the clients and how?
when you set several servers, the client libraries use a first hash to pick one where to store each key/data pair. that means that there's no replication, and also that every client has to use the same set of servers.
pros:
almost zero overhead, storage and bandwidth grow linearly.
server code is kept simple and reliable.
cons:
any change in the set of servers (one goes down, or you add a new one) suddenly invalidates (almost) the whole cache.
you have to be sure to use the same algorithm on every client.
if you have control to the client's code, you can simply store each key/data pair twice on two servers. just be sure to search on the same places when reading from a different client.
I've used BeITMemcached and in that you create an instance of MemcacheClient and set the servers you want to use, just as strings.
At that point the client itself determines which of the servers it has available to put different items into. You never know which an item will be in.
Check here to see how the servers handle failover.
The easiest thing is to have a repopulate mechanism. In my case, I store several hundred objects in memcache which come out of a database. I can just call repopulate and put them all back in there. Whenever I add, update or delete them to the database, I make those same calls to memcache.
http://repcached.lab.klab.org/
Also, the PHP PECL memcache client can replicate data to multiple servers, see memcache.redundancy.
It sounds like you wish to have caches that can cope with machines rebooting etc if so…
In a lot of case (assuming you are not writing Facebook) a RDMS is fast enough for caching. Just create a table that has a key and a blob column. If the RDBS server has enough ram, all the data will be in RAM and just saved to disk so as to allow recovery.
Remember this could be a separate server(s) from your main database server.
If you wish to get more fancy and are using a high-end RDMS, you may be able to set up change notifications on the queries that are used to build the “cached data” that delete out-of-date rows from the cache.
Someone you can set up triggers to clear invalid rows from the cache, however this can be very complex very quickly.
Memcached does not provide replication property. To do that, you need to add the server to memcached client server list and then hit the DB for the data to be stored in that particular server.
You should seriously consider CouchBase. It uses the memcached protocol, provides nearly the same speed, and delivers the automatic replication you're looking for. It also persists to disk so your cache will never be cold.