Configuring replica set in a multi-data center - mongodb

We have the following multi data-center Scenario
Node1 --- Node3
| |
| |
| |
--- ---
Node2 Node4
Node1 and Node3 form a Replica (sort of) Set ( for high availability )
Node 2/Node 4 are Priority 0 members (They should never become Primaries - Solely for read purpose)
Caveat -- what is the best way to design such a situation, since Node 2 and Node4 are not accessible to one another, given the way we configured our VPN/Firewalls;
essentially ruling out any heartbeat between Node2 and Node4.
Thanks Much

Here's what I got in mind:
Don't keep even members in a set. Thus you need another arbiter or set one of node2/4 to non-voting member.
As I'm using C# driver, I'm not sure you are using the same technology to build your application. Anyway, it turns out C# driver obtain a complete available server list from seeds (servers you provided in connection string) and tries to load-balancing requests to all of them. In your situation, I guess you would have application servers running in all 3 data centers. However, you probably don't want, for example, node 1 to accept connections from a different data center. That would significantly slow down the application. So you need some further settings:
Set node 3/4 to hidden nodes.
For applications running in the same data center with node 3/4, don't config the replicaSet parameter in connection string. But config the readPreference=secondary. If you need to write, you'll have to config another connection string to primary node.

If you make the votes of 2 and 4 also 0 then it should act, in failover as though 1 and 2 are only eligible. If you set them to hidden you have to forceably connect to them, MongoDB drivers will intentionally avoid them normally.
Other than that node 2 and 4 have direct access to whatever would be the primary as such I see no other problem.

Related

How to prevent data inconsistency when one node lose network connectivity in kubernetes

I have a situation where I have a cluster with a service (we named it A1) and its data which is on a remote storage like cephFS in my case. the number of replica for my service is 1. Assume I have 5 node in my cluster and service A1 reside in node 1. something happens with node 1 network and it lose the connectivity with cephFS cluster and my Kubernetes cluster as well (or docker-swarm). cluster mark it as unreachable and start a new service (we named it A2) on node 2 to keep replica as 1. after for example 15 min node 1 network fixed and node 1 get back to cluster and have service A1 running already (assume it didn't crash while it loses its connectivity with remote storage).
I worked with docker-swarm and recently switched to Kubernetes. I see Kuber has a feature call StatefulSet but when I read about it. it doesn't answer my question. (or I may miss something when I read about it)
Question A: what does cluster do. does it keep A2 and shutdown A1 or let A1 keeps working and shutdown A2 (Logically it should shutdown A1)
Question B (and my primary question as well!): Assume that the cluster wants to shutdown on of these services (for example A1). This service does some save on storage when it wants to shutdown. in this case state A1 save to disk and A2 with newer state saved something before A1 network get fixed.
There must be some locks when we mount the volume to the container in which when it attached to one container other container cant write to that (let A1 failed when want to save its old state data on disk)
The way it works - using docker swarm terminology -
You have a service. A service is a description of some image you'd like to run, how many replicas and so on. Assuming the service specifies at least 1 replica should be running it will create a task that will schedule a container on a swarm node.
So the service is associated with 0 to many tasks, where each task has 0 - if its still starting or 1 container - if the task is running or stopped - which is on a node.
So, when swarm (the orcestrator) detects a node go offline, it principally sees that a number of tasks associated with a service have lost their containers, and so the replication (in terms of running tasks) is no longer correct for the service, and it creates new tasks which in turn will schedule new containers on the available nodes.
On the disconnected node, the swarm worker notices that it has lost connection to the swarm managers so it cleans up all the tasks it is holding onto as it no longer has current information about them. In the process of cleaning the tasks up, the associated containers get stopped.
This is good because when the node finally reconnects there is no race condition where there are two tasks running. Only "A2" is running and "A1" has been shut down.
This is bad if you have a situation where nodes can lose connectivity to the managers frequently, but you need the services to keep running on those nodes regardless, as they will be shut down each time the workers detach.
The process on K8s is pretty much the same just change the terminology.

Does MongoDB has a centralized way to get node status for sharded replica sets?

I have a mongodb cluster running 11 shards across 25 host machines. Each shard is based on a replica set spread across 3 instances (2 data + 1 arbiter).
Is there some easy centralized way I can get node status via mongos? I like the data output by sh.status(), but it doesn't tell me if any of the nodes are down.
I know that I can log into 11 different nodes and run rs.status() on each (if I know which ones are working), but seems like it would be good to have some centralized way of getting status for both the shards and their underlying replica sets. Is there?

How do I model a PostgreSQL failover cluster with Docker/Kubernetes?

I'm still wrapping my head around Kubernetes and how that's supposed to work. Currently, I'm struggling to understand how to model something like a PostgreSQL cluster with streaming replication, scaling out and automatic failover/failback (pgpool-II, repmgr, pick your poison).
My main problem with the approach is the dual nature of a PostgreSQL instance, configuration-wise -- it's either a master or a cold/warm/hot standby. If I increase the number of replicas, I'd expect them all to come up as standbys, so I'd imagine creating a postgresql-standby replication controller separately from a postgresql-master pod. However I'd also expect one of those standbys to become a master in case current master is down, so it's a common postgresql replication controller after all.
The only idea I've had so far is to put the replication configuration on an external volume and manage the state and state changes outside the containers.
(in case of PostgreSQL the configuration would probably already be on a volume inside its data directory, which itself is obviously something I'd want on a volume, but that's beside the point)
Is that the correct approaach, or is there any other cleaner way?
There's an example in OpenShift: https://github.com/openshift/postgresql/tree/master/examples/replica The principle is the same in pure Kube (it's not using anything truly OpenShift specific, and you can use the images in plain docker)
You can give PostDock a try, either with docker-compose or Kubernetes. Currently I have tried it in our project with docker-compose, with the schema as shown below:
pgmaster (primary node1) --|
|- pgslave1 (node2) --|
| |- pgslave2 (node3) --|----pgpool (master_slave_mode stream)----client
|- pgslave3 (node4) --|
|- pgslave4 (node5) --|
I have tested the following scenarios, and they all work very well:
Replication: changes made at the primary (i.e., master) node will be replicated to all standby (i.e., slave) nodes
Failover: stops the primary node, and a standby node (e.g., node4) will automatically take over the primary role.
Prevention of two primary nodes: resurrect the previous primary node (node1), node4 will continue as the primary node, while node1 will be in sync but as a standby node.
As for the client application, these changes are all transparent. The client just points to the pgpool node, and keeps working fine in all the aforementioned scenarios.
Note: In case you have problems to get PostDock up running, you could try my forked version of PostDock.
Pgpool-II with Watchdog
A problem with the aforementioned architecture is that pgpool is the single point of failure. So I have also tried enabling Watchdog for pgpool-II with a delegated virtual IP, so as to avoid the single point of failure.
master (primary node1) --\
|- slave1 (node2) ---\ / pgpool1 (active) \
| |- slave2 (node3) ----|---| |----client
|- slave3 (node4) ---/ \ pgpool2 (standby) /
|- slave4 (node5) --/
I have tested the following scenarios, and they all work very well:
Normal scenario: both pgpools start up, with the virtual IP automatically applied to one of them, in my case, pgpool1
Failover: shutdown pgpool1. The virtual IP will be automatically applied to pgpool2, which hence becomes active.
Start failed pgpool: start again pgpool1. The virtual IP will be kept with pgpool2, and pgpool1 is now working as standby.
As for the client application, these changes are all transparent. The client just points to the virtual IP, and keeps working fine in all the aforementioned scenarios.
You can find this project at my GitHub repository on the watchdog branch.
Kubernetes's statefulset is a good base for setting up the stateful service. You will still need some work to configure the correct membership among PostgreSQL replicas.
Kubernetes has one example for it. http://blog.kubernetes.io/2017/02/postgresql-clusters-kubernetes-statefulsets.html
You can look at one of the below postgresql open-source tools
1 Crunchy data postgresql
Patroni postgresql
.

Three nodes using replica-set in MongoDb and 2 are down

in a 3-node replicaSet why when 2 are down the third become SECONDARY and not PRIMARY?
I want to have 2 mongod inside a DataCenter and one outside, so if the Datacenters fails I wanna the third outside mongod becomes the Primary.
It's possible without and arbiter?
Ok, found response:
http://tebros.com/2010/11/mongodb-arbiters-with-only-two-replicas/
What happend?! It turns out that when a mongod instance is isolated, it cannot vote for itself to be primary. This makes sense when you think about it. If a network link went down and separated your two replicas, you wouldn’t want them both to elect themselves as primary. So in my case, when rep1-1 noticed that it was isolated from the rest of the replica set, it made itself secondary and stopped accepting writes.
Always you end up with (cluster_participants/2) + 1 nodes down (assuming you have odd number of participants), the cluster enters in read only mode. A candidate noDe needs the majority of all nodes to be elected as primary.
For example, if you have 5 noDe cluster and 3 nodes blow away, the others will stay as secondary, because none of them are able to get 3 votes.
For more information: http://docs.mongodb.org/manual/core/replication-internals/#replica-set-election-internals

MongoDB share-nothing slaves

I'd like to use mongodb to distribute a cached database to some distributed worker nodes I'll be firing up in EC2 on demand. When a node goes up, a local copy of mongo should connect to a master copy of the database (say, mongomaster.mycompany.com) and pull down a fresh copy of the database. It should continue to replicate changes from the master until the node is shut down and released from the pool.
The requirements are that the master need not know about each individual slave being fired up, nor should the slave have any knowledge of other nodes outside the master (mongomaster.mycompany.com).
The slave should be read only, the master will be the only node accepting writes (and never from one of these ec2 nodes).
I've looked into replica sets, and this doesn't seem to be possible. I've done something similar to this before with a master/slave setup, but it was unreliable. The master/slave replication was prone to sudden catastrophic failure.
Regarding replicasets: While I don't imagine you could have a set member invisible to the primary (and other nodes), due to the need for replication, you can tailor a particular node to come pretty close to what you want:
Set the newly-launched node to priority 0 (meaning it cannot become primary)
Set the newly-launched node to 'hidden'
Here are links to more info on priority 0 and hidden nodes.