How to achieve replication of partial data for following use case - mongodb

I want to build a system with the following data replication requirements.
In the image attached:
Node 1 has 2 entities Entity 1 and Entity 2.
Each entity has multiple rows of data say (Row1, Row2, Row3)
Node 2 and Node 3 are a full replica of Node1 and possibly in the same data center.
Node 4 sits in a different place altogether and has only Row 1 from Entity1 and Entity2.
Node 5 sits in another place and has only Row2 from Entity 1 and Entity2.
The idea is Node4 and Node5 will be in the geographic vicinity of the consumer system and the consumer can communicate with local copies in Node 4 and Node5 if the network is down.
On a normal business day - Its acceptable to limit all writes to Node1 and allow Node 4 or Node 5 to do the write only when Node 1 is down.
I am not sure which Database can support this without extensive management through code.
Data Model Replication
So far I have found this:
Cassandra can do keyspace based replication but it might be tricky as I have 2000+ remote locations for partial data. I can think of having to say 200 keyspaces with 10 locations sharing same keyspace, thus creating less overhead, even though data copied to local nodes will not always be useful to them.
Mongodb has an open request for this feature (https://jira.mongodb.org/browse/SERVER-1559)
Couchbase has XDCR based filtering, which looks like a potential solution.
Can you please suggest if my understanding is correct?

Yes, Couchbase XDCR is a viable solution. You could
1. set up Node 1, Node 4, and Node 5 as three separate data clusters
2. set up a uni-directional XDCR from Node 1 to Node 4 with a filtering expression that matches only Row 1
3. set up a uni-directional XDCR from Node 1 to Node 5 with a filtering expression that matches only Row 2.
For more information, please refer to https://docs.couchbase.com/server/6.0/learn/clusters-and-availability/xdcr-overview.html.
XDCR filtering is at: https://docs.couchbase.com/server/6.0/learn/clusters-and-availability/xdcr-filtering.html

Related

Sharding with replication

Sharding with replication]1
I have a multi tenant database with 3 tables(store,products,purchases) in 5 server nodes .Suppose I've 3 stores in my store table and I am going to shard it with storeId .
I need all data for all shards(1,2,3) available in nodes 1 and 2. But node 3 would contain only shard for store #1 , node 4 would contain only shard for store #2 and node 5 for shard #3. It is like a sharding with 3 replicas.
Is this possible at all? What database engines can be used for this purpose(preferably sql dbs)? Did you have any experience?
Regards
I have a feeling you have not adequately explained why you are trying this strange topology.
Anyway, I will point out several things relating to MySQL/MariaDB.
A Galera cluster already embodies multiple nodes (minimum of 3), but does not directly support "sharding". You can have multiple Galera clusters, one per "shard".
As with my comment about Galera, other forms of MySQL/MariaDB can have replication between nodes of each shard.
If you are thinking of having a server with all data, but replicate only parts to readonly Replicas, there are settings for replicate_do/ignore_database. I emphasize "readonly" because changes to these pseudo-shards cannot easily be sent back to the Primary server. (However see "multi-source replication")
Sharding is used primarily when there is simply too much traffic to handle on a single server. Are you saying that the 3 tenants cannot coexist because of excessive writes? (Excessive reads can be handled by replication.)
A tentative solution:
Have all data on all servers. Use the same Galera cluster for all nodes.
Advantage: When "most" or all of the network is working all data is quickly replicated bidirectionally.
Potential disadvantage: If half or more of the nodes go down, you have to manually step in to get the cluster going again.
Likely solution for the 'disadvantage': "Weight" the nodes differently. Give a height weight to the 3 in HQ; give a much smaller (but non-zero) weight to each branch node. That way, most of the branches could go offline without losing the system as a whole.
But... I fear that an offline branch node will automatically become readonly.
Another plan:
Switch to NDB. The network is allowed to be fragile. Consistency is maintained by "eventual consistency" instead of the "[virtually] synchronous replication" of Galera+InnoDB.
NDB allows you to immediately write on any node. Then the write is sent to the other nodes. If there is a conflict one of the values is declared the "winner". You choose which algorithm for determining the winner. An easy-to-understand one is "whichever write was 'first'".

Ideal number of replicas for database

I am looking for some best practices to be able to read enough to gauge how to decide on the number of replicas needed for Mongo. I am aware of Mongo Docs that talk about things like having odd number of nodes, and when does the need of arbiter arise, etc.
In our case the requirement for reads won’t be so high that reads will become a bottleneck. Neither are we targeting sharding at this moment. However, we are going to run mongo in a docker swarm and there could be multiple instances of certain services trying to write. Our swarm cluster won’t be very huge either most likely.
So how do I find logical answers to these:
Why not create one local mongo instance per physical node and tie it to that?
For any number of physical nodes, as long as read/write is not a bottle neck, 3 or 5 replicas are always going to be ideal for fault recovery and high availability. But why is 3 or 5 a good number. Why not 7 if I have say 10 physical nodes.
I am trying to find some good reads to be able to decide on how to arrive at a number. Any pointers?
To give you an answer, all depend on many criterias
What is your budget
How big is your data
what do you want to use your replica sets for
etc...
As an Example, In my case
We have 3 Data Centers across the country
One of them is very Small
We found our sweet spot in terms of number od nodes being 5
1 Primary + 1 Secondary in DC1
1 Arbiter in DC2
2 secondaries in DC3

MongoDB sharding, how does it rebalance when adding new nodes?

I'm trying to understand MongoDB and the concept of sharding. If we start with 2 nodes and partition say, customer data, based on last name where A thru M data is stored on node 1 and N thru Z data is stored on node 2. What happens when we want to scale out and add more nodes? I just don't see how that will work.
If you have 2 nodes it doesn't mean that data is partitioned into 2 chunks. It can by partitioned to let's say 10 chunks and 6 of them are on server 1 ane rest is on server 2.
When you add another server MongoDB is able to redistribute those chunks between nodes of new configuration
You can read more in official docs:
http://www.mongodb.org/display/DOCS/Sharding+Introduction
http://www.mongodb.org/display/DOCS/Choosing+a+Shard+Key
If there are multiple shards available, MongoDB will start migrating data to other shards
once you have a sufficient number of chunks. This migration is called balancing and is
performed by a process called the balancer.The balancer moves chunks from one shard to another.
For a balancing round to occur, a shard must have at least nine more chunks than the
least-populous shard. At that point, chunks will be migrated off of the crowded shard
until it is even with the rest of the shards.
When you add new node to cluster, MongoDB redistribute those chunks among nodes of new configuration.
It's a little extract ,to get complete understanding of how does it rebalance when adding new node read chapter 2 "Understanding Sharding" of Kristina Chodrow's book "Scaling MongoDB"

Cassandra = Does ReadRepair prevent Scaling Reads?

Cassandra has to option to enable "ReadRepair". A Read is send to all Replicas and if one is stale, it will be fixed/updated. But due to the fact, that all replicas receive the Read, there will be the point, when the nodes reach IO-Saturation. As always ALL replica nodes receive the read, adding further nodes will not help, as they also receive all reads (and will be saturated at once)?
Or does cassandra offer some "tunabililty" to configure that ReadRepair does only go to not all of the nodes (or offer any other "replication" that will allow true read scaling)?
thanks!!
jens
Update:
A Concrete exmaple, as I still do not understand how it will work in practice.
9 Cassandra "Boxes/Severs"
3 Replicas (N=3) => Every "Row" is
written to 2 additinal Nodes = 3
Boxes hold the data in total)
ReadRepair Enabled
The Row in Question is (Lets say customer1) is highly trafficed
1.) The first Time I write the Row "Customer1" to Cassandra it will evantually be available on all 3 nodes.
2.) Now I query the system with 1000's of Requests of requests per second for Customer1 (and to make it more clear with any caching disabled).
3.) The Read will always be dispateched to all 3 nodes. (The first request (to the nearest node) will be a full request for data and the two additional requests will only be a "checksum request".)
4.) As we are queryingw with 1000's of requests, we reach the IO-limit of all Replicas! (The IO is the same on all 3 nodes!! (only the bandwith is much smaller on the checksum nodes).
5.) I add 3 further Boxes (so we have 12 Boxes in Total):
A) These Boxes does NOT have the Data yet (to help scale linearly). I first have to get the Customer1 Record to at least one of this new Boxes.
=>This means I have to Change the replication Factor to 4
(OR is there any other option to get the data to another box?)
And now we have the same problem. The Replication Factor is now 4. And all 4 Boxes will receive the Read(Repair)Requst for this highly trafficed customer1 row. This does not scale this way. Scaling would only work if we have Copy that will NOT receive the ReadRepair Request.
What is wrong in my understanding?? My Conculsion: With Standard ReadRepair the System will NOT scale linearly (for a single highly trafficed row), as adding further boxes will also lead to the fact that these boxes also receive the ReadRepair requests (for this trafficed row)...
Thanks very much!!!Jens
Adding further nodes will help (in most situations). There will only be N read repair "requests" during a read, where N is the ReplicationFactor (number of replicas, nb. not the # of nodes in the entire cluster). So the new node(s) will only be included in a read / read repair if the data you request is included in the nodes key range (or is holding a replica of the data).
There is also the read_repair_chance tunable per ColumnFamily, but that is a more advanced topic and doesn't change the fundamental equation that you should scale reads by adding more nodes, rather than de-tuning read repair.
You could read more about replication and consistency from bens slides

Do I absolutely need a minimum of 3 nodes/servers for a Cassandra cluster or will 2 suffice?

Surely one can run a single node cluster but I'd like some level of fault-tolerance.
At present I can afford to lease two servers (8GB RAM, private VLAN #1GigE) but not 3.
My understanding is that 3 nodes is the minimum needed for a Cassandra cluster because there's no possible majority between 2 nodes, and a majority is required for resolving versioning conflicts. Oh wait, am I thinking of "vector clocks" and Riak? Ack! Cassandra uses timestamps for conflict resolution.
For 2 nodes, what is the recommended read/write strategy? Should I generally write to ALL (both) nodes and read from ONE (N=2; W=N/2+1; W=2/2+1=2)? Cassandra will use hinted-handoff as usual even for 2 nodes, yes?
These 2 servers are located in the same data center FWIW.
Thanks!
If you need availability on a RF=2, clustersize=2 system, then you can't use ALL or you will not be able to write when a node goes down.
That is why people recommend 3 nodes instead of 2, because then you can do quorum reads+writes and still have both strong consistency and availability if a single node goes down.
With just 2 nodes you get to choose whether you want strong consistency (write with ALL) or availability in the face of a single node failure (write with ONE) but not both. Of course if you write with ONE cassandra will do hinted handoff etc as needed to make it eventually consistent.