How to link MemCached server together? - memcached

I'm looking into using MemCached for a web application I am developing and after researching MemCached over the past few days, I have come across a question I could not find the answer to.
How do you link Memcached server together or how do you replicate data between MemCached server?
Additionally: Is this functionality controlled by the servers or the clients and how?

when you set several servers, the client libraries use a first hash to pick one where to store each key/data pair. that means that there's no replication, and also that every client has to use the same set of servers.
pros:
almost zero overhead, storage and bandwidth grow linearly.
server code is kept simple and reliable.
cons:
any change in the set of servers (one goes down, or you add a new one) suddenly invalidates (almost) the whole cache.
you have to be sure to use the same algorithm on every client.
if you have control to the client's code, you can simply store each key/data pair twice on two servers. just be sure to search on the same places when reading from a different client.

I've used BeITMemcached and in that you create an instance of MemcacheClient and set the servers you want to use, just as strings.
At that point the client itself determines which of the servers it has available to put different items into. You never know which an item will be in.
Check here to see how the servers handle failover.
The easiest thing is to have a repopulate mechanism. In my case, I store several hundred objects in memcache which come out of a database. I can just call repopulate and put them all back in there. Whenever I add, update or delete them to the database, I make those same calls to memcache.

http://repcached.lab.klab.org/
Also, the PHP PECL memcache client can replicate data to multiple servers, see memcache.redundancy.

It sounds like you wish to have caches that can cope with machines rebooting etc if so…
In a lot of case (assuming you are not writing Facebook) a RDMS is fast enough for caching. Just create a table that has a key and a blob column. If the RDBS server has enough ram, all the data will be in RAM and just saved to disk so as to allow recovery.
Remember this could be a separate server(s) from your main database server.
If you wish to get more fancy and are using a high-end RDMS, you may be able to set up change notifications on the queries that are used to build the “cached data” that delete out-of-date rows from the cache.
Someone you can set up triggers to clear invalid rows from the cache, however this can be very complex very quickly.

Memcached does not provide replication property. To do that, you need to add the server to memcached client server list and then hit the DB for the data to be stored in that particular server.

You should seriously consider CouchBase. It uses the memcached protocol, provides nearly the same speed, and delivers the automatic replication you're looking for. It also persists to disk so your cache will never be cold.

Related

How to connect Postgres ReadRepica for Reads without affecting the application source code

I have an application which has read and write inline queries in the code, I am facing a challenge while pointing the read and write queries to respective Databases. Is there any best of doing it for Go application?
My thought is to have two ORMs up with Read and Write databases and select appropriate based on the operation. e.g: ReadDbMap.Select("query"); WriteDbMap.Update("query");
But this change effects entire application, that is the concern I have
I am afraid that there is no simpler way.
Streaming replication is not primarily a load balancing feature. For one, you'll have to be aware that a change you made on the primary server is not immediately visible on the standby, so your application will have to deal with these temporary inconsistencies.

mongodb performance for large document

I have a document that holds a big data structure in certain fields inside an array, it is slowing down my application due to frequent hits to read such data. am thinking on few solutions to implement but I need advice before i proceed and possibly even a better solution, here are my thoughts/questions:
would it help to cache data?
should I use memcached or redis as a caching engine and why?
would it help to read single fields from this document instead of reading it all every time?
should I do something else?!
Caching will help because it would avoid your db to be hit too often
Memcache or redis it's up to you. I prefere redis but if you already have a memcache it's fine.
If you have a cluster of servers, think if you need a centralized cache or not
Caching a full document won't help for getting a single field because you cache the result of a query without knowing what it contains.
your question need more clarification. for example how big is the data that you are speaking of is it couple of megabytes or gigabytes. All these factors change the solution. But if we consider that you have couple of megabytes and you want to prevent to call database every time the best solution is cache. How to choose a cache is also completely depends on what is your situation. If your web application runs on one server you can use the in-memory cache like ASP.Net cache which is very quick and fast for in-memory cache. this cache is stored in your heap so you can put all your object in the cache without serialization.But consider that whenever your application is restarted like most of deployments. your heap will be deleted and all the cache is cleared inside the heap.
if you have more than one server then you can start to think about an out-of-memory cache because two servers are not sharing heap memory and using all in-memory cache are useless because it duplicate the data and invalidating is nightmare. However, this is more reliable cache while it is not in the heap and in term of persistence is more than in-memory cache. But whatever you want to put in this kind of cache should be serializable while you are transferring the object over network connection. So you cannot put all your object in cache. Both Redis and memcached can be used for this purpose. Redis is more complicated with more functionality than Memcached but for your purpose memcached is quite good.
Whatever caching system you choose, approach it in a wide perspective. Design a caching system in your application while over time you need to put more things in cache. so its better to prepare everything for that time from now.
another things which is very important in cache is that whenever you set something in cache you have to consider when you are going to invalidate it.
Whether or not caching will help depends on the accession of the document. If the document is being accessed multiple times then caching will not help due to how MongoDB to memory caching actually works.
First, you need to understand your data accession patterns.

Caching strategy to reduce load on web application server

What is a good tool for applying a layer of caching between a webserver and an application server.
Basic Requirements:
The application server needs a way to remove items from the cache and put items in the cache with an expiration date.
The webserver needs a way to pull items out of the cache in a very light-weight, fast manner without requiring thread allocation on the application server.
It does not neccessarily need to be a distributed cache (accessible from multiple machines), but it wouldn't hurt.
Strategies I have considered:
Static file caching. Request comes in, gets hashed, if a file exists we serve it, if not we route the request to the app server. Is high I/O a problem or file locking problems due to concurrency? Is it accurate that the file system is actually very fast due to kernel level caching in memory.
Using a key-value DB like mongodb, or redis. This would store the finished HTML/JSON fragments in db. The webserver would be equipped to read from the DB and route to the app server if needed. The app server would be equipped to insert/remove from the DB.
A memory cache like memcached or Varnish (don't know much about Varnish). My only concern with memcached is that I'm going to want to cache 3 - 10 gigabytes of data at any given time, which is more than I can safely allocate in memory. Does memcached have a method to spill to the filesystem?
Any thoughts on some techniques and pitfalls when trying this type of caching layer?
You can also use GigaSpaces XAP in memory data grid for caching and even hosting your web application. You can choose just the caching option or combine the power of two and gain single management of your environment along other things.
Unlike the key value pair approach you suggested, using GigaSpaces XAP you'll be able to have complex queries such as SQL, object based temples and much more. In your caching scenario you should check out more specifically the local cache related features.
Local Cache
Web Container
Disclaimer, I am a developer in GigaSpaces.
Eitan
Just to answer this from the POV of using Coherence (http://coherence.oracle.com/):
1. The application server needs a way to remove items from the cache and put items in the cache with an expiration date.
// remove one item from cache
cache.remove(key);
// remove multiple items from cache
cache.keySet().removeAll(keylist);
2. The webserver needs a way to pull items out of the cache in a very light-weight, fast manner without requiring thread allocation on the application server.
// access one item from cache
Object value = cache.get(key);
// access multiple items from cache
Map mapKV = cache.getAll(keylist);
3. It does not neccessarily need to be a distributed cache (accessible from multiple machines), but it wouldn't hurt.
Elastic. Just add nodes. Auto-discovery. Auto-load-balancing. No data loss. No interruption. Every time you add a node, you get more data capacity and more throughput.
Automatic high availability (HA). Kill a process, no data loss. Kill a server, no data loss.
A memory cache like memcached or Varnish (don't know much about Varnish). My only concern with memcached is that I'm going to want to cache 3 - 10 gigabytes of data at any given time, which is more than I can safely allocate in memory. Does memcached have a method to spill to the filesystem?
Use both RAM and flash. Transparently. Easily handle 10s or even 100s of gigabytes per Coherence node (e.g. up to a TB or more per physical server).
For the sake of full disclosure, I work at Oracle. The opinions and views expressed in this post are my own, and do not necessarily reflect the opinions or views of my employer.

Is there any value in using core data for iPhone apps?

Can people give me examples of why they would use coreData in an application?
I ask this because most apps are just clients to a central server where an API of some sort gives you the information you need.
In my case I'm writing a timesheet application for a web app which has an API and I'm debating if there is any value in replicating the data structure on my server in core data(Sqlite)
e.g
Project has many timesheets
employee has many timesheets
It seems to me that I can just connect to the API on every call for lists of projects or existing timesheets for example.
I realize for some kind of offline mode you could store locally in core data but this creates way more problems because you now have a big problem with syncing that data back to the web server when you get connection again.. e.g. the project selected for a timesheet no longer exists.
Can any experienced developer shed some light on there experiences on when core data is best practice approach?
EDIT
I realise of course there is value in storing local persistance but the key value of user defaults seems to cover most applications I can think of.
You shouldn't think of CoreData simply as an SQLite database. It's not JUST an SQLite database. Sure, SQLite is an option, but there are other options as well, such as in-memory and, as of iOS5, a whole slew of custom data stores. The biggest benefit with CoreData is persistence, obviously. But even if you are using an in-memory data store, you get the benefits of a very well structured object graph, and all of the heavy lifting with regards to pulling information out of or putting information into the data store is handled by CoreData for you, without you necessarily needing to concern yourself with what is backing that data store. Sure, today you don't care too much about persistence, so you could use an in-memory data store. What happens if tomorrow, or in a month, or a year, you decide to add a feature that would really benefit from persistence? With CoreData, you simply change or add a persistent data store, and all of your methods to get information out or in remain unchanged. The overhead for that sort of addition is minimal in comparison to if you were trying to access SQLite or some other data store directly. IMHO, that's the biggest benefit: abstraction. And, in essence, abstraction is one of the most powerful things behind OOP. Granted, building the Data Model just for in-memory storage could be overkill for your app, depending on how involved the app is. But, just as a side note, you may want to consider what is faster: Requesting information from your web service every time you want to perform some action, or requesting the information once, storing it in memory, and acting on that stored value for the remainder of the session. An in-memory data store wouldn't persistent beyond that particular session.
Additionally, with CoreData you get a lot of other great features like saving, fetching, and undo-redo.
There are basically two kinds of apps. Those that provide you with local functionality (games, professional applications, navigation systems...) and those that grant access to a remote service.
Your app seems to be in the second category. If you access remote services, your users will want to access new or real-time data (you don't want to read 2 week old Facebook posts) but in some cases, local caching makes sense (e.g. reading your mails when you're on the train with unstable network).
I assume that the value of accessing cached entries when not connected to a network is pretty low for your customers (internal or external) compared to the importance of accessing real-time-data. So local storage might be not necessary at all.
If you don't have hundreds of entries in your timetable, "normal" serialization (NSCoding-protocol) might be enough. If you only access some "dashboard-data", you will be able to get along with simple request/response-caching (NSURLCache can do a lot of things...).
Core Data does make more sense if you have complex data structures which should be synchronized with a server. This adds a lot of synchronization logic to your project as well as complexity from Core Data integration (concurrency, thread-safety, in-app-conflicts...).
If you want to create a "client"-app with a server driven user experience, local storage is not necessary at all so my suggestion is: Keep it as simple as possible unless there is a real need for offline storage.
It's ideal for if you want to store data locally on the phone.
Seriously though, if you can't see a need for it for your timesheet app, then don't worry about it and don't use it.
Solving the sync problems that you would have with an "offline" mode would be detailed in your design of your app. For example - don't allow projects to be deleted. Why would you? Wouldn't you want to go back in time and look at previous data for particular projects? Instead just have a marker on the project to show it as inactive and a date/time that it was made inactive. If the data that is being synced from the device is for that project and is before the date/time that it was marked as inactive, then it's fine to sync. Otherwise display a message and the user will have to sort it.
It depends purely on your application's design whether you need to store some data locally or not, if it is a real problem or a thin GUI client around your web service. Apart from "offline" mode the other reason to cache server data on client side might be to take traffic load from your server. Just think what does it mean for your server to send every time the whole timesheet data to the client, or just the changes. Yes, it means more implementation on both side, but in some cases it has serious advantages.
EDIT: example added
You have 1000 records per user in your timesheet application and one record is cca 1 kbyte. In this case every time a user starts your application, it has to fetch ~1Mbyte data from your server. If you cache the data locally, the server can tell you that let's say two records were updated since your last update, so you'll have to download only 2 kbyte. Now you should scale up this for several tens of thousands of user and you will immediately notice the difference of the server bandwidth and CPU usage.

Best strategy for synching data in iPhone app

I am working on a regular iPhone app which pulls data from a server (XML, JSON, etc...), and I'm wondering what is the best way to implement synching data. Criteria are speed (less network data exchange), robustness (data recovery in case update fails), offline access and flexibility (adaptable when the structure of the database changes slightly, like a new column). I know it varies from app to app, but can you guys share some of your strategy/experience?
For me, I'm thinking of something like this:
1) Store Last Modified Date in iPhone
2) Upon launching, send a message like getNewData.php?lastModifiedDate=...
3) Server will process and send back only modified data from last time.
4) This data is formatted as so:
<+><data id="..."></data></+> // add this to SQLite/CoreData
<-><data id="..."></data></-> // remove this
<%><data id="..."><attribute>newValue</attribute></data></%> // new modified value
I don't want to make <+>, <->, <%>... for each attribute as well, because it would be too complicated, so probably when receive a <%> field, I would just remove the data with the specified id and then add it again (assuming id here is not some automatically auto-incremented field).
5) Once everything is downloaded and updated, I will update the Last Modified Date field.
The main problem with this strategy is: If the network goes down when I am updating something => the Last Modified Date is not yet updated => next time I relaunch the app, I will have to go through the same thing again. Not to mention potential inconsistent data. If I use a temporary table for update and make the whole thing atomic, it would work, but then again, if the update is too long (lots of data change), the user has to wait a long time until new data is available. Should I use Last-Modified-Date for each of the data field and update data gradually?
I would start by making the update routine atomic, since you'll have enough on your hands figuring out how to get the client-server communication working properly.
After that is a good time to consider tweaking it to be incremental, but only after you do some testing to figure out if it's really necessary. If you're tuning your update protocol to be as low bandwidth as possible, you might discover that even a "big" update is downloaded fast enough.
Another way to look at it is to ask yourself, how often is there going to be network trouble when an average user is doing a sync? You probably don't want to tune for unlikely scenarios.
If you are trying to optimize (minimize) the data transfer you may want to consider a different format than XML, since XML is fairly verbose. Or at least you may want to trade in XML readability for space by making each element name and attribute as small as possible, and eliminate all unnecessary whitespace.
Your basic scheme is good. The thing you need to do is to somehow make your updates idempotent so that you can restart a partially-completed transfer without risk. This is a better way to go than to try to implement some sort of true atomic commit (though you could do that too, using, eg, the SQLite database).
In our experience fairly large updates (10s of KB) can be downloaded quite rapidly, if the server is fast enough. No great need to break updates up into tiny bits. But certainly it won't hurt to try to minimize the amount of data transferred by keeping more granular info on "last update".
(And definitely you should use JSON rather than XML as your transmitted data representation.)
Wonder if you have considered using a Sync Framework to manage the synchronization. If that interests you can take a look at the open source project, OpenMobster's Sync service. You can do the following sync operations
two-way
one-way client
one-way device
bootup
Besides that, all modifications are automatically tracked and synced with the Cloud. You can have your app offline when network connection is down. It will track any changes and automatically in the background synchronize it with the cloud when the connection returns. It also provides synchronization like iCloud across multiple devices
Also, modifications in the Cloud are synched using Push notifications, so the data is always current even if it is stored locally.
In your case,
Criteria are speed (less network data exchange), robustness (data recovery in case update fails), offline access
Speed: Only the changes are sent across the network in both directions
Robustness: It stores data in a transactional store like sqlite and any failed updates are communicated in the SyncML payload. Only the successful operations are processed while the failed operations are re-tried during the next sync
Here is a link to the open source project: http://openmobster.googlecode.com
Here is a link to iPhone App Sync: http://code.google.com/p/openmobster/wiki/iPhoneSyncApp