mongosync fails while synchronising data between source and destination sharded cluster - mongodb

I am trying to synchronize data from source cluster to the destination cluster using mongosync however hitting the below error.
{"level":"fatal","serverID":"891fbc43","mongosyncID":"myShard_0","error":"(InvalidOptions) The 'clusteredIndex' option is not supported for namespace mongosync_reserved_for_internal_use.lastWriteStates.footballdb","time":"2022-12-16T02:21:15.209841065-08:00","message":"Error during replication"}
source cluster details:-
3 shards where each shard is a single node replicaset
Dataset: one database with one collection sharded across shards
destination cluster details:-
3 shards where each shard is a single node replicaset
No user data
All the mongosync pre-reqs including RBAC are verified successfully.
I am unable to diagnose the error here - The 'clusteredIndex' option is not supported for namespace mongosync_reserved_for_internal_use.lastWriteStates.footballdb
I tried the same usecase with same dataset for N node replicaset source(without shard) and destination cluster and the synchronisation worked fine.

Related

Deploy Mongo Database with a single master and two read replicas in the Kubernetes cluster of at least 3 worker nodes

Deploy Mongo Database with a single master and two read replicas in the Kubernetes cluster of at least 3 worker nodes that are available in different availability zones.
Points to keep in mind while deploying the DB:
All replicas of DB should be deployed in a separate worker node of diff Availability Zone(For high availability).
Autoscale the read replicas if needed.
Data should be persistent.
Try to run the containers in nonprivileged mode if possible.
Use best practices as much as you can.
Push the task into a separate branch with a proper README file.

mongodb cluster with ELB endpoint as dns

This is not a technical but more of architectural question I am asking here.
I have followed this blog for setting up the mongodb cluster. We have 2 private subnets in which I have configured 3 member replica set of mongodb. Now I want use a single dns like mongod.some_subdomain.example.com for whole cluster.
I do not have access to Route53 and setting/updating the dns records takes at least 2 hours in my case since I am dependant on our cloud support for it. I am not sure which server primarily responds to applications requests in mongodb cluster.
So is there a way to put the whole cluster behind ELB and use ELB as DNS to route traffic to primary and at the same time if there is failover then next primary would be the member of ELB except the arbiter node.
The driver will attempt to connect to all nodes in the replica set configuration. If you put nodes behind proxies the driver will bypass the proxies and try to talk to the nodes directly.
You can proxy standalone and sharded cluster deployments as the driver doesn't need a direct connection to data nodes in those but mapping multiple mongoses to a single address can create problems with retryable reads/writes, sessions, transactions etc. This is not a supported configuration.

Standalone MongoDB installation for Production

I want to deploy MongoDB to Kubernetes cluster with 2 nodes, there is no chance to add another node in the future.
I want to deploy MongoDB as standalone because both node will be able to access to same disk space via NFS and I don't have requirements for replication or high availability. However, in the MongoDB docs, it is clearly stated that standalone deployment is not suitable for production environment.
MongoDB Deploy Standalone
You can deploy a standalone MongoDB instance for Cloud Manager to manage. Use standalone instances for testing and development. Do not use these deployments for production systems as they lack replication and high availability.
What kind of drawbacks I can face? Should I deploy as replica set with arbiter instance? If yes, why?
Of course you can deploy a Standalone MongoDB for production. But if this node fails, then your application is not available anymore. If you don't have any requirement for availability then go for a Standalone MongoDB.
However, running 2 MongoDB services which access the same physical disk (i.e. dbPath) will not work. Each MongoDB instance need to have a dedicated data folder.
In your case, I would suggest a Replica Set. All data from one node will be replicated to the other one. If one node fails then the application goes into "read/only" mode.
You can deploy an arbiter instance on the primary node. If the secondary node goes down, then the application is still fully available.
It is always recommended to deploy as replicaSet for production , however if you deploy as standalone and you have 2x kubernetes nodes , kubernetes can ensure there is always 1x running instance attached to the NFS storage in any of the available nodes , but the risk is that when the data on the storage is corrupted you will not have where to replicate from unless you do often backups and you dont care if you miss some recenly inserted data ...

Can etcd detect problems and elect leaders for other clusters?

To my knowledge, etcd uses Raft as a consensus and leader selection algorithm to maintain a leader that is in charge of keeping the ensemble of etcd nodes in sync with data changes within the etcd cluster. Among other things, this allows etcd to recover from node failures in the cluster where etcd runs.
But what about etcd managing other clusters, i.e. clusters other than the one where etcd runs?
For example, say we have an etcd cluster and separately, a DB (e.g. MySQL or Redis) cluster comprised of master (read and write) node/s and (read-only) replicas. Can etcd manage node roles for this other cluster?
More specifically:
Can etcd elect a leader for clusters other than the one running etcd and make that information available to other clusters and nodes?
To make this more concrete, using the example above, say a master node in the MySQL DB cluster mentioned in the above example goes down. Note again, that the master and replicas for the MySQL DB are running on a different cluster from the nodes running and hosting etcd data.
Does etcd provide capabilities to detect this type of node failures on clusters other than etcd's automatically? If yes, how is this done in practice? (e.g. MySQL DB or any other cluster where nodes can take on different roles).
After detecting such failure, can etcd re-arrange node roles in this separate cluster (i.e. designate new master and replicas), and would it use the Raft leader selection algorithm for this as well?
Once it has done so, can etcd also notify client (application) nodes that depend on this DB and configuration accordingly?
Finally, does any of the above require Kubernetes? Or can etcd manage external clusters all by its own?
In case it helps, here's a similar question for Zookeper.
etcd's master election is strictly for electing a leader for etcd itself.
Other clusters, however can use a distributed strongly-consistent key-value store (such as etcd) to implement their own failure detection, leader election and to allow clients of that cluster to respond.
Etcd doesn't manage clusters other than its own. It's not magic awesome sauce.
If you want to use etcd to manage a mysql cluster, you will need a component which manages the mysql nodes and stores cluster state in etcd. Clients can watch for changes in that cluster state and adjust.

SolrCloud role assignment

I have a SolrCloud with 3 shards, 3 replicas and a Zookeeper ensemble with 5 members.
Replica 2 is being retired, the root device is EBS backed and it has an attached EBS volume. I'm assuming on restart it will migrate to new hardware with new public and private IPs.
I'm also assuming I'll have to restart all the shards and replicas. What's the best way to do this to assign the new replica to the same slot as the old replica? Aren't the shard / replica roles assigned to each host on the very first SolrCloud startup and aren't those assignments stored in Zookeeper?
replica2 restarted with new public and private IPs as expected. I stopped Tomcat on all SOLR hosts and restarted them in the normal order
shard1
shard1
shard3
replica1
replica2
replica3
This did not work as replica2 assigned itself to shard1 on repeated SolrCloud restarts. The shard and replica assignments are (as I thought) maintained in the binary files under the version-2 directory on every Zookeeper host. The following was successful:
stop Tomcat on all SOLR hosts
stop all Zookeeper hosts
delete the version-2 directory on all Zookeeper hosts
start all Zookeeper hosts
re-upload the SOLR conf directory using the CLI tool
start all SOLR hosts in above order
This produced the correct assignments.