How to shard using OrientDB - orientdb

How to achieve sharding on OrientDB?
Suppose I have three nodes viz. node1, node2 and node3. I have two clusers viz. zip_india, zip_usa.
Now I tried to set the servers up such that zip_india will be on node1 and node2 and zip_usa will be on node3.
I configured the default-distributed-db-config.json before creating the database.
After I create the datbase, and connect other nodes a lot of other clusters are automatically being formed viz. _studio, _studio_node2, _studio_node3 etc.
Now, by connecting to node1, if I insert a record to zip_india cluster, I am able to see the replication happening even on node3. Is the data actually being stored on node3? Or is Orient actually fetching the data from node1 when I query node3 for the particular record in zip_india cluster?

Related

mongosync fails while synchronising data between source and destination sharded cluster

I am trying to synchronize data from source cluster to the destination cluster using mongosync however hitting the below error.
{"level":"fatal","serverID":"891fbc43","mongosyncID":"myShard_0","error":"(InvalidOptions) The 'clusteredIndex' option is not supported for namespace mongosync_reserved_for_internal_use.lastWriteStates.footballdb","time":"2022-12-16T02:21:15.209841065-08:00","message":"Error during replication"}
source cluster details:-
3 shards where each shard is a single node replicaset
Dataset: one database with one collection sharded across shards
destination cluster details:-
3 shards where each shard is a single node replicaset
No user data
All the mongosync pre-reqs including RBAC are verified successfully.
I am unable to diagnose the error here - The 'clusteredIndex' option is not supported for namespace mongosync_reserved_for_internal_use.lastWriteStates.footballdb
I tried the same usecase with same dataset for N node replicaset source(without shard) and destination cluster and the synchronisation worked fine.

MongoDB replicaset loses database when updating primary replica (Kubernetes + Bitnami Helm Chart)

I am using Microk8s and bitnami helm chart here
I set up a replicaset of 3 replicas
mongo-0 (by definition this is Primary), mongo-1 and mongo-2
Bitnami makes the replicaset to always use mongo-0 (if available) as Primary replica. However the next can happen: I find out I need to update the nodes, let's say to increase storage. To do this, I would need to:
Drain the node running mongo-0. This automatically triggers a new election, and let's say mongo-1 is the new primary.
I will add to the cluster a new node (with more capacity).
This will make the mongodb replicaset to assign a mongo-0 pod to the new node. However, the new node is empty, so the persistent volume where I store the database (lets say /mnt/mongo) is empty.
I would expect that the current primary replica will finish populating the database to the new replica (mongo-0, and therefore its Persistent Volume) and ONLY when that is done, then make mongo-0 the primary.
However I saw that mongo-0 becomse primary without any data being copied to it from the previous primary, effectively deleting the whole database, since now the primary node states that the database is empty.
How is that possible? What am I missing here?
I am not familiar with your exact management tools but the scaling process you described is wrong. You should not be removing 1 out of 3 nodes out of the replica set at any point, at least not in a production environment.
To replace a RS node:
Add a fourth node with desired parameters.
Set node priorities such that the node you want to remove has a lower priority than any of the other nodes.
Wait for the newly added node to have an acceptable replication lag.
Ensure the primary is not the node you want to remove.
Remove the node you want to remove from the RS.
Expecting generic software to automatically figure out when #3 completes and move on to #4 correctly is, I would say, rather optimistic. Maybe MongoDB ops manager would do that.
Your post contains a number of other incorrect statements about how MongoDB operates. For example, a node that has no data cannot become a primary in a replica set with data. Perhaps you have other configuration issues going on and you actually have independent nodes/multiple replica sets in what you think is a single deployment.

Can etcd detect problems and elect leaders for other clusters?

To my knowledge, etcd uses Raft as a consensus and leader selection algorithm to maintain a leader that is in charge of keeping the ensemble of etcd nodes in sync with data changes within the etcd cluster. Among other things, this allows etcd to recover from node failures in the cluster where etcd runs.
But what about etcd managing other clusters, i.e. clusters other than the one where etcd runs?
For example, say we have an etcd cluster and separately, a DB (e.g. MySQL or Redis) cluster comprised of master (read and write) node/s and (read-only) replicas. Can etcd manage node roles for this other cluster?
More specifically:
Can etcd elect a leader for clusters other than the one running etcd and make that information available to other clusters and nodes?
To make this more concrete, using the example above, say a master node in the MySQL DB cluster mentioned in the above example goes down. Note again, that the master and replicas for the MySQL DB are running on a different cluster from the nodes running and hosting etcd data.
Does etcd provide capabilities to detect this type of node failures on clusters other than etcd's automatically? If yes, how is this done in practice? (e.g. MySQL DB or any other cluster where nodes can take on different roles).
After detecting such failure, can etcd re-arrange node roles in this separate cluster (i.e. designate new master and replicas), and would it use the Raft leader selection algorithm for this as well?
Once it has done so, can etcd also notify client (application) nodes that depend on this DB and configuration accordingly?
Finally, does any of the above require Kubernetes? Or can etcd manage external clusters all by its own?
In case it helps, here's a similar question for Zookeper.
etcd's master election is strictly for electing a leader for etcd itself.
Other clusters, however can use a distributed strongly-consistent key-value store (such as etcd) to implement their own failure detection, leader election and to allow clients of that cluster to respond.
Etcd doesn't manage clusters other than its own. It's not magic awesome sauce.
If you want to use etcd to manage a mysql cluster, you will need a component which manages the mysql nodes and stores cluster state in etcd. Clients can watch for changes in that cluster state and adjust.

How do i configure a Kubernetes Replication Controller to ensure there is a replica on each worker node/minion?

Is there a way to configure an RC such that i have a single replica on each of my worker nodes?
I just created a x2 replica RC for elasticsearch and it has placed both instances onto just one of my worker nodes. I would prefer to have one instance on each of my worker nodes.
This is particularity important for an application like elasticsearch that would use persistent storage on the docker host - having two elasticsearch instances using the same datastore would likely cause issues.
How is this possible to achieve?
Environment:
1x Kubernetes master - physical server running CoreOS
2x Kubernetes nodes - physical servers running CoreOS
You can't choose nodes directly for pods created by scaling up a replication controller. The scheduler assigns nodes based on constraints. You can artificially prevent pods from going to the same node by making them use a resource a node only has one of, like a hostPort.
The daemon controller proposal (https://github.com/kubernetes/kubernetes/pull/13368) sounds more like what you want, which would let you spread pods across nodes

How does clustering of ejabberd nodes work?

Does it work like master-slave way, where xmpp clients connect to a master node, and master node uses the slave nodes to distribute the load?
If not, how load balancing can be done after clustering of ejabberd nodes?
All nodes are equal and there is no master. State is kept in mnesia or mysql (like roster table, session etc.). Configuration is replicated over all nodes.
Usually it means there is LB in front of the whole cluster. One cluster is represented by one domain. You can have more and federate them.