Can Postgres-XL shard, replicate and auto-balance at the same time? - postgresql

For example if I have 5 servers (A, B, C, D, E)
Can we set the data distributed with replication factor of 3? (for example one write goes to ABC, other records goes to ABD, other record goes to ABE, and so on), so when node C had some hardware failure, there still some record exists.
Can we also add a new node then balance the stored data to the new node without downtime?

Yes, it can do that, but not in the way you are thinking. What you are describing would be NoSQL setup. Postgres-XL is an MPP database.
When you create a table you define it's "DISTRIBUTED BY" option which can be replication, round robin, hash, modulo, etc. You will need to review the details of each option. You can also define table spaces to be on defined nodes.
Your setup would be something like
Node1 Transaction Manger
Node2 Transaction Manger Proxy
Node3 Coordinator1 & Data Node1
Node4 Coordinator2 & Data Node2
Node5 Data Node3
NOTE: Important to point out that as I just discovered Postgres-XL has no HA or fail over support. Meaning that if a single node failed the database is down and will require manual intervention. Worse if you are using the round robin, hash, modulo sharing options if you lose the disk on a single node you have lost your database entirely.
You can have stand by nodes that mirror each of your nodes, but this will double the number of nodes you need and it will still not fail over. You will have to manually configure it to use the standby node and restart it.

Related

Sharding with replication

Sharding with replication]1
I have a multi tenant database with 3 tables(store,products,purchases) in 5 server nodes .Suppose I've 3 stores in my store table and I am going to shard it with storeId .
I need all data for all shards(1,2,3) available in nodes 1 and 2. But node 3 would contain only shard for store #1 , node 4 would contain only shard for store #2 and node 5 for shard #3. It is like a sharding with 3 replicas.
Is this possible at all? What database engines can be used for this purpose(preferably sql dbs)? Did you have any experience?
Regards
I have a feeling you have not adequately explained why you are trying this strange topology.
Anyway, I will point out several things relating to MySQL/MariaDB.
A Galera cluster already embodies multiple nodes (minimum of 3), but does not directly support "sharding". You can have multiple Galera clusters, one per "shard".
As with my comment about Galera, other forms of MySQL/MariaDB can have replication between nodes of each shard.
If you are thinking of having a server with all data, but replicate only parts to readonly Replicas, there are settings for replicate_do/ignore_database. I emphasize "readonly" because changes to these pseudo-shards cannot easily be sent back to the Primary server. (However see "multi-source replication")
Sharding is used primarily when there is simply too much traffic to handle on a single server. Are you saying that the 3 tenants cannot coexist because of excessive writes? (Excessive reads can be handled by replication.)
A tentative solution:
Have all data on all servers. Use the same Galera cluster for all nodes.
Advantage: When "most" or all of the network is working all data is quickly replicated bidirectionally.
Potential disadvantage: If half or more of the nodes go down, you have to manually step in to get the cluster going again.
Likely solution for the 'disadvantage': "Weight" the nodes differently. Give a height weight to the 3 in HQ; give a much smaller (but non-zero) weight to each branch node. That way, most of the branches could go offline without losing the system as a whole.
But... I fear that an offline branch node will automatically become readonly.
Another plan:
Switch to NDB. The network is allowed to be fragile. Consistency is maintained by "eventual consistency" instead of the "[virtually] synchronous replication" of Galera+InnoDB.
NDB allows you to immediately write on any node. Then the write is sent to the other nodes. If there is a conflict one of the values is declared the "winner". You choose which algorithm for determining the winner. An easy-to-understand one is "whichever write was 'first'".

How to use physical and logical replication in Patroni together?

I want to construct cluster like this:
3 MAIN nodes linked by physical replication
N OTHER nodes that receiving data from MAIN nodes via logical replication
I successfully configured physical replication between 3 MAIN nodes, but I did't go far.
I must note that I specify "logical" value for "wal_level" fields for all nodes in my cluster.
But when I try to create subscription at any OTHER node I got error like this: "logical decoding cannot be used while in recovery".
Can anybody help me please?
I solve it.
Just for connection use primary node address.
It is look something like this:
You have 3 MAIN node linked by Patroni physical HA;
You also have N OTHER nodes where each of this is self primary (it is independent of the MAIN nodes).
On an OTHER nodes you must create subscription to the current primary from MAIN nodes.
Note, that you can have right user permissions for REPLICATION.

Standard availability for mongoDB replica set cluster of 3 nodes

I have set up a replica set of mongoDB with one primary, one secondary and one arbiter node, mongoDB installed on three independent AWS instances. I need to document overall availability of the replica set cluster formed as per aforementioned configuration but don't have any reliable/standard data to establish so.
Is there any standard data which can be referred to establish avaialability of overall cluster/individual node in above case?
Your configuration will guarantee continued availability, even after one node goes down. However, availability after that depends on how quickly you can replace the downed node, and that is up to your monitoring and maintenance abilities.
If you do not notice for while that a node is down, or if your procedure for replacing the node takes a long time (you may need to commission a new VM, install MongoDB, reconfigure the replica set, allow time for the new node to sync), then another node may go down and leave you with no availability.
So your actual availability depends on the answers to four questions:
Which replica set configuration do you use? Because that determines how many nodes need to go down before the replica set stops being available
How likely it is that any single node will go down or lose connection to the rest?
How good is your monitoring, so you notice there is a problem?
How fast are your processes for repairing the problem?
The answer to the first one is straightforward; you have decided on the minimum of two data-bearing nodes and one arbiter.
The answer to the second one is not quite straightforward; it depends on the reliability of each node, and the connections between them, and whether two or more are likely to go down together (perhaps if they are in the same availability zone).
The third and fourth, we can't help you with; you'll have to assess those for yourself.

Why does a mongodb replica set need an odd number of voting members?

If find the replica set requirement a bit confusing, and I'm probably missing something obvious (like under which condition there are elections).
I understand that in normal operations you need quorum, and a voting takes place and to get a majority you need and odd numbers of machines.
But since we use a replica set for failover, if the master dies, then we are left with an even number of voting members, which based on my limited experience lengthen the time to elect a primary.
Also according to the documentation, the addition of a voting member doesn't start an election, it would seem that starting (booting) you replica set with an even number of nodes would make more sense?
So if we start say with 4 machines in the replica set, and one machine dies, there is a re-election with 3 machines, fast quorum. We add a machine back to get back to our normal operation state, no re-election and we are back to our normal operation conditions.
Can someone shed a light on this?
TL;DR: With single master systems, even partitions make it impossible to determine which remainder still has a majority, taking both systems down.
Let N be a cluster of four machines:
One machine dies, the others resume operation. Good.
Two machines die, we're offline because we no longer get a majority. Bad.
Let M be a cluster of three machines:
One machine dies, the others resume operation. Good.
Two machines die, we're offline because we no longer get a majority. Bad.
=> Same result at 3/4 of the cost.
Now, let's add an assumption or two:
We're also going to operate some kind of server application that uses the database
The network can be partitioned
Let's say you have two datacenters, one with two database instances and the backend server machines. If the connection to the backup center (which has one MongoDB instance) fails, you're still online.
Now if you added a second MongoDB instance at the backup data center, a network partition would, despite seemingly higher redundancy, yield lower availability since we'd lose the majority in case of a network partition and can't continue to operate.
=> Less availability at higher cost. But that doesn't answer the question yet.
Let's say you're really worried about availability: You have two data centers, with backend servers in both datacenters, anycast IPs, the whole deal. Now the network between the two DCs is partitioned, but some clients connect to DC A while other reach DC B. How do you now determine which datacenter may accept writes? It's not possible - this is why the odd number is necessary.
You don't actually need Anycast IPs, BGP or any fancy stuff for the problem to become real, any writing application (like a worker, a stale request, anything) would require later merging different writes, which is a completely different concurrency scheme.

Do I absolutely need a minimum of 3 nodes/servers for a Cassandra cluster or will 2 suffice?

Surely one can run a single node cluster but I'd like some level of fault-tolerance.
At present I can afford to lease two servers (8GB RAM, private VLAN #1GigE) but not 3.
My understanding is that 3 nodes is the minimum needed for a Cassandra cluster because there's no possible majority between 2 nodes, and a majority is required for resolving versioning conflicts. Oh wait, am I thinking of "vector clocks" and Riak? Ack! Cassandra uses timestamps for conflict resolution.
For 2 nodes, what is the recommended read/write strategy? Should I generally write to ALL (both) nodes and read from ONE (N=2; W=N/2+1; W=2/2+1=2)? Cassandra will use hinted-handoff as usual even for 2 nodes, yes?
These 2 servers are located in the same data center FWIW.
Thanks!
If you need availability on a RF=2, clustersize=2 system, then you can't use ALL or you will not be able to write when a node goes down.
That is why people recommend 3 nodes instead of 2, because then you can do quorum reads+writes and still have both strong consistency and availability if a single node goes down.
With just 2 nodes you get to choose whether you want strong consistency (write with ALL) or availability in the face of a single node failure (write with ONE) but not both. Of course if you write with ONE cassandra will do hinted handoff etc as needed to make it eventually consistent.