In master-master/multi-master replication, who is the secondary? - mongodb

Silly question, when we talk about secondaries in the context of failover behaviour, with regards to master-master/multi-master, is that basically any node other than the one that we are currently reading from or writing to?

In master-master replication both the nodes are primary and secondary. In multi master replication every node is secondary but some or all are primary.
Multi master means there many database servers over which write can perform. In order to sync with other data nodes or database server we have to read all other writes and It behaves as secondary. In master slave replication we have only one master and many slaves. Master ensures that he is only write enabled and no one can writes so no need to read any one. and it behave as primary only.
For example- mysql 5.6 replication has support master-master replication but doesn't support multi master replication. But in mysql 5.7 replication it also support multi master replication. In mongoDB It only support master - slave replication.

Related

Postgres Streaming and Logical Replication

Currently we are using postgres streaming replication to sync/replicate databases on Primary and replica server. We are planning to use one of the application to sync data from our secondary or replica server to our data warehouse which needs logical replication to be enabled for tracking the changes and syncing the data from our replica server to data warehouse. Can we enable logical replication on top of streaming replication ? Is it possible or good practice to enable both on the same server or database ? If so, will there be any performance impact or what are the considerations or best practices to be followed?
There is no problem with using streaming (physical) replication and logical replication at the same time, but a physical standby server cannot be a logical primary. So you will have to use the same server as primary server for both physical and logical replication. But that shouldn't be a problem, since streaming replication primary and standby are physically identical anyway.

Kafka Connect MongoDB Source Connector failure scenario

I need to use Kafka Connect to monitor changes to a MongoDB cluster with one primary and 2 replicas.
I see there is the official MongoDB connector, and I want to understand what would be the connector's behaviour, in case the primary replica would fail. Will it automatically read from one of the secondary replicas which will become the new primary? I couldn't find information for this in the official docs.
I've seen this post related to the tasks.max configuration, which I thought might be related to this scenario, but the answer implies that it always defaults to 1.
I've also looked at Debezium's implementation of the connector, which seems to support this scenario automatically:
The MongoDB connector is also quite tolerant of changes in membership
and leadership of the replica sets, of additions or removals of shards
within a sharded cluster, and network problems that might cause
communication failures. The connector always uses the replica set’s
primary node to stream changes, so when the replica set undergoes an
election and a different node becomes primary, the connector will
immediately stop streaming changes, connect to the new primary, and
start streaming changes using the new primary node.
Also, Debezium's version of the tasks.max configuration property states that:
The maximum number of tasks that should be created for this connector.
The MongoDB connector will attempt to use a separate task for each
replica set, [...] so that the work for each replica set can be
distributed by Kafka Connect.
The question is - can I get the same default behaviour with the default connector - as advertised for the Debezium one? Because of external reasons, I can't use the Debezium one for now.
In a PSS deployment:
If one node is not available, the other two nodes can elect a primary
If two nodes are not available, there can be no primary
The quote you referenced suggests the connector may be using primary read preference, which means as long as two nodes are up it will be working and if only one node is up it will not retrieve any data.
Therefore, bring down two of the three nodes and observe whether you are able to query.

Workaround for excluding a table in streaming replication in postgres

I have 2 database nodes working as master-slave which streaming replication in place. In one of our use cases, we require to exclude a table from getting replicated to slave. Is there a way or a workaround to exclude a table from getting copied to slave if I have to stay on this WAL based streaming replication
It is not possible to do this using physical replication. You could create a role on the master with no privs to this table, then only allow that role to connect on the replica. This would require you to trust the admin of the replica to respect your wishes, and it wouldn't help if the goal is to reduce the size of the replica.

What is the consistency of Postgresql HA cluster with Patroni?

What is the consistency of Postgresql HA cluster with Patroni?
My understanding is that because the fail-over is using a consensus (etc or zookeeper) the system will stay consistent under network partition.
Does this mean that transaction running under the serializable Isolation Level will also provide linearizability.
If not which consistency will I get Sequential Consistency, Causal Consistency .. ?
You shouldn't mix up consistency between the primary and the replicas and consistency within the database.
A PostgreSQL database running in a Patroni cluster is a normal database with streaming replicas, so it provides the eventual consistency of streaming replication (all replicas will eventually show the same values as the primary).
Serializabiliy guarantees that you can establish an order in the database transactions that ran against the primary such that the outcome of a serialized execution in that order is the same as the workload had in reality.
If I read the definition right, that is just the same as “linearizability”.
Since only one of the nodes in the Patroni cluster can be written to (the primary), this stays true, no matter if the database is in a Patroni cluster or not.
In a distributed context, where we have multiple replicas of an object’s state, A schedule is linearizable if it is as if they were all updated at once at a single point in time.
Once a write completes, all later reads (wall-clock time) from any replica should see the value of that write or the value of a later write.
Since PostgreSQL version 9.6 its possible to have multiple synchronous standy node. This mean if we have 3 server and use num_sync = 2, the primary will always wait for write to be on the 2 standby before doing commit.
This should satisfy the constraint of linearizable schedule even with failover.
Since version 1.2 of Patroni, When synchronous mode is enabled, Patroni will automatically fail over only to a standby that was synchronously replicating at the time of the master failure.
This effectively means that no user visible transaction gets lost in such a case.

MongoDB replica set secondary node data loss

I have two mongod instances without replication each having same collection name but different data.Now initialized replication between them.Secondary machine copies all data from primary machine and looses it's original data.Can I recover original data present in secondary machine ?
This is the expected behaviour with MongoDB replica sets: data from the primary is replicated to the secondaries. When you add a server as a new secondary, it does an "initial sync" which copies data from the primary. The replica sets are designed for failover and redundancy; your secondary nodes should have data consistent with the primary and their current replication lag.
If you have overwritten your previous database, your only option is to restore from a backup.
See also:
Backup and Restoration Strategies
Replica Set Internals Part V: Initial Sync