Aerospike : Asynchronous Replication : Success at Master and Failure at Replica - distributed-computing

Aerospike supports ACID in clustered environment with replication factor greater than 1, where any write will be written to Master and Replica and then only it will be marked as success to the client.
But, we can change the above mentioned default behaviour by changing the write.commit_level from all to master.
In such case, suppose the write/update is successful at Master node and client is notified, but the write fails at Replica node, What would happen?
Will the Aerospike have inconsistent data for same key in the cluster?
Or will it be retried at Replica?
Or will the write on the Master be rolled back?
Note the Replica node is not down, just the write failed due to any reason like stop writes pct is breached at Replica node, etc.

if you choose write.commit_level=master, and if the prole write fails the client will not be notified about the failure. The replica will stay inconsistent with the master. The master write will not be rolled back. The replica will get fixed on the next write with successful replication. i.e it will get overwritten with latest record.
BTW, an important thing to note is that stop-writes is honored at the master and not at the replica. It will be a bad idea to fail the replica write because of this. As long as you have some head room in terms of memory (no malloc failures) and disk, there are hardly any chances of replica write failure when the node itself did not go down.

Related

Redis master wipes out Redis slave data on restart

Sorry this is my first time working with Redis. I have a redis master deployment and a redis slave deployment (via K8s). The replication from master to slave is working as expected. However, when I kill the master altogether and bring it back up again, the sync wipes out the data of slave as well.
I have tried enabling appendonly on either and both but had no luck.
Question # 1: How can I preserve the data in slave when the master node comes back to life?
Question # 2: Is it a practice to sync data back from slave into master?
Yes, the correct practice would be to promote the slave to master and then slave the restarted node to it to sync the state. If you bring up an empty node that is declared as the master, the slave will faithfully replicate whatever is - or isn't - on it.
You can configure periodic saving to disk, so that you can restart a master node and have it load the state as of the last save to disk. You can also manually cause a save to disk via the SAVE command. See the persistence chapter in the manual. If you SAVE to disk, then immediately restart the master node, the state as saved to disk will be loaded back up. Any writes that occur between the last SAVE and node shutdown will be lost.
Along these lines, Redis HA is often done with Redis Sentinel, which manages auto-promotion and discovery of master nodes within a replicated cluster, so that the cluster can survive and auto-heal from the loss of the current master. This lets slaves replicate from the active master, and on the loss of the master (or a network partition that causes a quorum of sentinels to lose visibility to the master), the Sentinel quorum will elect a new master and coordinate the re-slaving of other nodes to it for ensure uptime. This is an AP system, as Redis replication is eventually consistent, and therefore does have the potential to lose writes which are not replicated to a slave or flushed to disk before node shutdown.

Does my master server crash using Log-Shipping Synchronous Replication in Postgresql when the replica is down?

I'm searching for HA solutions without load balancing in the master-slave model, using postgresql. My favorite solution so far is log shipping synchronous replication. But I have one main concern, and that is, if my slave server becomes unavailable, will my master server continue it's operation? Or will it wait for the acknowledgment of my slave server until it's up again?
If you have only one standby, the master will halt ( by design ).
The master will still serve read-only statements, but all writes will be blocked until the standby comes back.
You can avoid this scenario by providing multiple candidates in synchronous_standby_names.
See SYNCHRONOUS-REPLICATION in the PostgreSQL Docs.
I found another way to prevent the master halt at slave crash. We can use wal_sender_timeout in masters postgresql.conf file to disconnect from the slave if it's been crashed.

Mongo behaviour once master is down?

Consider the below diagram in MongoDB
I have two scenarios
Scenario 1 :-
Router directs the write call to master.Its writen to master but then it goes down before it gets replicted to slaves(i am using
synch replication mode)
Will router select one slave as master and also write the above request to both slaves ?
Scenario 2 :-
Router directs the write call to master. Its writen to master but then network link b/w it and one slave is broken(using
synch replication mode)
Will router select another slave(which is connected to all other nodes) as master and also write the above request to slave ?
Let's first use MongoDB terminologies: Primary instead of master and Secondary instead of slave.
Scenario 1: Will router select one slave as master and also write the above request to both slaves ?
A secondary can become a primary. If the current primary becomes unavailable, the replica set holds an election to choose which of the secondaries becomes the new primary. See also Replica Set Elections.
In scenario 1, if the primary had accepted write operations that the secondaries had not successfully replicated before the primary stepped down, then a rollback will revert the write operations on a former primary when the node rejoins the replica set. See also RollBacks During Replica Set Failover.
You can run all voting members with journaling enabled and use writeConcern majority to prevent rollbacks. See also Avoid Replica Set Rollbacks.
Scenario 2: Will router select another slave(which is connected to all other nodes) as master and also write the above request to slave ?
There are two parts here, the first part is replica set election. In this case because the primary and one of the secondaries are still majority, no election will be held. The primary will still be primary and replicating to one of the secondaries.
The second part is about replication of data. Secondary members copy oplog from their sync source and apply these operations in an asynchronous process. A secondary sync source may automatically change as needed based on changes in the ping time and state of other members’ replication. See also Replica Set Data Synchronization
In scenario 2, the secondary may change its sync source to the other secondary.
You may also found the following useful:
Replica Set High Availability
Replica Set Deployment Architectures
Replica Set Distributed Across Two or More Data Centers

Replication on Postgresql pauses when Querying and replication are happening simultaneously

Postgress follows MVCC rules. So any query that is run on a table doesn't conflict with the writes that happen on the table. The query returns the result based on the snapshot at the point of running the query.
Now i have a master and slave. The slave is used by analysts to run queries and to perform analysis. When the slave is replicating and when analyst are running their queries simultaneously, i can see the replication lag for a long time.If the queries are long running, the replication lags a long duration and if the number of writes on the master happens to be pretty high, then i end up losing the WAL files and replication can longer proceed. I just have to spin up another slave. Why does this happen ? How do i allow queries and replication to happen simultaneously on postures ? Is there any parameter setting that i can apply to make this happen ?
The replica can't apply more WAL from the master because the master might've overwritten data blocks still needed by queries running on the replica that're older than any still running on the master. The replica needs older row versions than the master. It's exactly because of MVCC that this pause is necessary.
You probably set a high max_standby_streaming_delay to avoid "canceling statement due to conflict with recovery" errors.
If you turn hot_standby_feedback on, the replica can instead tell the master to keep those rows. But the master can't clean up free space as efficiently then, and it might run out of space in pg_xlog if the standby gets way too far behind.
See PostgreSQL manual: Handling Query Conflicts.
As for the WAL retention part: enable WAL archiving and a restore_command for your standbys. You should really be using it anyway, for point-in-time recovery. PgBarman now makes this easy with the barman get-wal command. If you don't want WAL archiving you can instead set your replica servers up to use a replication slot to connect to the master, so the master knows to retain the WAL they need indefinitely. Of course, that can cause the master to run out of space in pg_xlog and stop running so you need to monitor more closely if you do that.

MongoDB share-nothing slaves

I'd like to use mongodb to distribute a cached database to some distributed worker nodes I'll be firing up in EC2 on demand. When a node goes up, a local copy of mongo should connect to a master copy of the database (say, mongomaster.mycompany.com) and pull down a fresh copy of the database. It should continue to replicate changes from the master until the node is shut down and released from the pool.
The requirements are that the master need not know about each individual slave being fired up, nor should the slave have any knowledge of other nodes outside the master (mongomaster.mycompany.com).
The slave should be read only, the master will be the only node accepting writes (and never from one of these ec2 nodes).
I've looked into replica sets, and this doesn't seem to be possible. I've done something similar to this before with a master/slave setup, but it was unreliable. The master/slave replication was prone to sudden catastrophic failure.
Regarding replicasets: While I don't imagine you could have a set member invisible to the primary (and other nodes), due to the need for replication, you can tailor a particular node to come pretty close to what you want:
Set the newly-launched node to priority 0 (meaning it cannot become primary)
Set the newly-launched node to 'hidden'
Here are links to more info on priority 0 and hidden nodes.