Dynamic construction of zookeeper cluster - apache-zookeeper

I'd like to programatically setup zookeeper cluster. My goal is to use machines with CoreOS and dynamically deploy three nodes in form of docker containers and setup them to one zoo cluster.
Except common setup in manual (/zookeeperReconfig.html) which shows how to add another nodes to existing three nodes cluster, I found a conversation which say how to do that from the beginning when I have no running nodes in existing cluster. Unfortunately, this set of steps does not work for me. I'm talking about http://mail-archives.apache.org/mod_mbox/zookeeper-user/201408.mbox/%3CCAPSuxQipZFH2582bEMid2tCVBFk%3DC31hwCYjbFgSwChBF%2BZQVQ%40mail.gmail.com%3E
Here is a list of steps I did:
Run first node with standaloneEnabled=false and the only entry in zoo.cfg.dynamic.
server.1=localhost:2381:2281:participant;0.0.0.0:2181
Run second node with following dynamic cfg:
server.1=localhost:2381:2281:participant;0.0.0.0:2181
server.2=localhost:2382:2282:observer;0.0.0.0:2182
Note that there is no difference in resulting behavior when I'd change "observer" to "participant" for second node.
Now I have two running instances. I can use ./zkCli.sh to log into first node. When I try to add second node using following command:
reconfig -add server.2=localhost:2382:2282:participant;0.0.0.0:2182
... it fails with:
KeeperErrorCode = NewConfigNoQuorum for
However, after some research I found solution. But it's tricky and I don't think that it's the only correct solution.
What is working for me? I can do step #3 on first node again but now with "observer". This command causes that even first node knows about second node. When I type 'config' to console in zkCli, it seems that it's working. Next step is to log into second node using zkCli and than exec commands:
reconfig -remove 2 <- next step is not working w/o this
reconfig -add server.2=localhost:2382:2282:participant;0.0.0.0:2182
Well, now I have working cluster for two nodes. Finally, it's interesting that now I can add third node using regular scenario I've mentioned in first paragraph.
Do someone have some idea what I'm doing wrong?

Related

Running kubectl commands in parallel with different credentials

I'm currently running two Kubernetes clusters one on Google cloud and one on IBM cloud. To manage them I use kubectl. I've made a script that executes some commands on one of the clusters then switches to the other and does some other work there.
This works fine as long as the script only runs in one process, however when run in parallel the credentials are sometimes overwritten by one process when in use by another and this obviously causes issues.
I therefore want to know if I can supply kubectl with a credentials file for every call, instead of storing it in a environmental variable with kubectl config set-credentials.
Any help/solution is much appreciated.
If I need to work with multiple clusters using kubectl I am splitting my terminal and setting KUBECONFIG for each split:
For my first split:
export KUBECONFIG=~/.kube/cluster1
For the second split
export KUBECONFIG=~/.kube/cluster2
It is working pretty well, but this approach has one issue:
If you are using some kind of prompt with the current Kubernetes context it will give you different output and it might be missing leading.
For scripts, I am just changing value of KUBECONFIG in for loop, to loop over each cluster.
You need to use Kubefed in order to manage multiple clusters.
It will take one cluster as the main one, and execute all the same requests to the second cluster.

Service Fabric - Cannot do Config upgrade to add or remove nodes

I've got an on-premise Service Fabric consisting of 18 nodes (9 are seed nodes) - secured via gMSA windows security. Cluster code version 6.4.622.9590
Unfortunately I have to rebuild 6 of these nodes (3 Seed nodes). They all live in one data center (cluster spans 3 DCs). As such, I wish to remove these 6 nodes, rebuild them and then re-add them.
As per MSDOCs, adding/removing of nodes is performed via config upgrades. Note: I've already used this process recently to add 12 nodes so understand the concept of SF config upgrades well.
Unfortunately, I'm unable to do ANY config upgrades on this cluster until I remove the nodes - this is due to ValidationExceptions reported by the Start-ServiceFabricClusterConfigurationUpgradepowershell command:
If I don't add the 6 nodes to the "NodesToBeRemoved" section, I get validation error that not all removed nodes are in this field
If I do add the 6 nodes, I get the following validation error:
Start-ServiceFabricClusterConfigurationUpgrade :
System.Runtime.InteropServices.COMException (-2147017627)
ValidationException: Model validation error. Removing a non-seed node and changing reliability level in the same
upgrade is not supported. Initiate an upgrade to remove node first and then change the reliability level.
At line:1 char:1
+ Start-ServiceFabricClusterConfigurationUpgrade -ClusterConfigPath "AL ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (Microsoft.Servi...usterConnection:ClusterConnection) [Start-ServiceFa
...gurationUpgrade], FabricException
+ FullyQualifiedErrorId : StartClusterConfigurationUpgradeErrorId,Microsoft.ServiceFabric.Powershell.StartClusterC
onfigurationUpgrade
So, we're stuck! I've also already removed node states, thus leaving all 6 nodes in the "Invalid State". The Get-ServiceFabricClusterConfiguration does not return these 6 nodes, but they are still shown in SF Explorer and listed in the cluster manifest XML file.
As far as reliability level is concerned - I'm pretty sure one can no longer change this in SF; i.e. older versions of SF allowed you to configure bronze/silver/gold in config file, but in recent versions (+6.0??) - this is a calculated field and managed internally by SF. In any case - because the seed nodes will be decreased from 9 to 6, I suspect the internal calculated reliability level will drop (presumably from Gold to silver).
I've also come across a hack that someone has used to remove nodes in a cluster... but in my scenario, nodes are still listed in manifest file... Nonetheless, the words hack and production should never meet!
So, how do I get our production cluster out of this situation? Rebuilding the cluster is not an option (that's the whole reason for clusters...high availability!).
I discovered that the above errors are primarily a symptom of lack of clearly documented procedures as well as bad/misleading error messages when doing service fabric configuration upgrades.
I performed quite a bit of my own testing to make sure I can confidently add/remove several nodes from a cluster. I also removed enough nodes to drop the Seed nodes from 9 to 6.
So, to resolve the above issue, here's what I had to do to remove nodes:
Use the SF explorer to remove node state - this changed node state from Error to Invalid
Get latest json config via Get-ServiceFabricClusterConfiguration
Remove the node from Nodes section
Completely remove the NodesToBeRemoved json section (i.e. you'll get the inconsistent error if you have an empty list of nodes to be removed - so just remove the containing json block
Do a config update
Note: Initially I tried just doing 2-5 above - but it didn't work and the node remained in error state.
That said, from my experience, please also note the following when removing nodes (this info is not clear in MSDOC:
You can remove multiple Seed nodes at once (I wanted to do this to try and replicate above scenario)
You can add multiple nodes at once too - just be aware you may not see any activity/indication via SF config upgrade status tooling that
anything is happening... be prepared to wait at least +15 minutes
(depends on how many nodes you're adding...afterall, SF is copying
installation files to the nodes)
Sometimes, when removing one or more nodes, the node won't be successfully removed - but left in an Error status. If this is the
case, use the SF Explorer (or powershell) to remove node state. Status
will change to Invalid. At this point, do another config upgrade
ensuring that:
The removed node(s) are not in Nodes section
The removed node(s) are not in the NodesToBeRemoved list
As per above, if the value of NodesToBeRemoved is (or should be) empty, remove this whole JSON block otherwise you'll get a misleading/vague warning about NodesToBeRemoved parameter contains inconsistent information.
The latter part really is the confusing part that tripped me up last time. The thing to also remember is that, once you successfully remove nodes, the Get-ServiceFabricClusterConfiguration will STILL return the removed nodes in the NodesToBeRemoved parameter. This will likely confuse/trip you up with any subsequent attempts to do a config upgrade. As such, I recommend you do another final config upgrade with this section completely removed.
As a final note: If you re-add a node that has previously been removed, it may come back in a Deactivated status. Simply activate this node and all should be fine.

wait for other deployments to start running before other can be created?

I am creating the deployments/services using REST APIs. I send POST request with bodies which contain the JSON objects which create the applications on Openshift. After I call all the APIs, these objects get instantiated.
I have 2 deployments which are dependent on mongodb deployment but this mongodb takes a little longer to start running, while the two deployments which are dependent on mongodb start running earlier. This breaks the code inside the 2 deployments as the mongodb connection fails(since it is not up yet).
There could be 2 possible way I can fix this problem.
I put a delay after i create mongodb deployment and recursively call the API to check it's status if it is running or not.
Just like we make changes in docker-compose, with the key, depends-on which tell the docker-compose that all the dependencies should be started first and then the dependent container.
Is there any way this could be achieved in openshift?
Instead of implementing complex logic for dependency handling, use health checking mechanism of Kubernetes. If your application starts and doesn't see Mongo DB, let it crash. Kubernetes will keep restarting it until Mongo DB comes online, and your application becomes healthy and serving as well. Kubernetes won't send traffic to not yet healthy instances.
Docs: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
Just like we make changes in docker-compose, with the key, depends-on which tell the docker-compose that all the dependencies should be started first and then the dependent container.
You might want to look into Init Containers for dependent container. They run to completion before container is actually started. Below excerpt is taken from referenced documentation (given below) for use cases that might be applicable to your issue:
They run to completion before any app Containers start, whereas app Containers run in parallel, so Init Containers provide an easy way to block or delay the startup of app Containers until some set of preconditions are met.
Examples
Here are some ideas for how to use Init Containers:
Wait for a service to be created with a shell command like:
for i in {1..100}; do sleep 1; if dig myservice; then exit 0; fi; done; exit 1
Register this Pod with a remote server from the downward API with a command like:
curl -X POST http://$MANAGEMENT_SERVICE_HOST:$MANAGEMENT_SERVICE_PORT/register -d ‘instance=$()&ip=$()’
Wait for some time before starting the app Container with a command like sleep 60.
Reference documentation:
https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
Alex has pointed out correct practice to follow with kubernetes. But if you still want directly depend on other pod phase you can use this pod-dependency-init-container that I have build. This will check if any pod with given labels is running before starting your pod.

Statefulset - Possible to Skip creation of pod 0 when it fails and proceed with the next one?

I currently do have a problem with the statefulset under the following condition:
I have a percona SQL cluster running with persistent storage and 2 nodes
now i do force both pods to fail.
first i will force pod-0 to fail
Afterwards i will force pod-1 to fail
Now the cluster is not able to recover without manual interference and possible dataloss
Why:
The statefulset is trying to bring pod-0 up first, however this one will not be brought online because of the following message:
[ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1
What i could do alternatively, but what i dont really like:
I could change ".spec.podManagementPolicy" to "Parallel" but this could lead to race conditions when forming the cluster. Thus i would like to avoid that, i basically like the idea of starting the nodes one after another
What i would like to have:
the possibility to have ".spec.podManagementPolicy":"OrderedReady" activated but with the possibility to adjust the order somehow
to be able to put specific pods into "inactive" mode so they are being ignored until i enable them again
Is something like that available? Does someone have any other ideas?
Unfortunately, nothing like that is available in standard functions of Kubernetes.
I see only 2 options here:
Use InitContainers to somehow check the current state on relaunch.
That will allow you to run any code before the primary container is started so you can try to use a custom script in order to resolve the problem etc.
Modify the database startup script to allow it to wait for some Environment Variable or any flag file and use PostStart hook to check the state before running a database.
But in both options, you have to write your own logic of startup order.

how to update cluster status in dataproc

I changed my initialization script after creating a cluster with 2 worker nodes for spark. Then I changed the script a bit and tried to update the cluster with 2 more worker nodes. The script failed because I simply forgot to apt-get update before apt-get install, so dataproc reports error and the cluster's status changed to ERROR. When I try to reduce the size back to 2 nodes again, it doesn't work anymore with the following message
ERROR: (gcloud.dataproc.clusters.update) Cluster 'cluster-1' must be running before it can be updated, current cluster state is 'ERROR'.
The two worker nodes are still added, but they don't seem to be detected by a running spark application at first because no more executors are added. I manually reset the two instances on the Google Compute Engine page, and then 4 executors are added. So it seems everything is working fine again except that the cluster's status is still ERROR, and I cannot increase or decrease the number of worker nodes anymore.
How can I update the cluster status back to normal (RUNNING)?
In your case ERROR indicates that workflow to re-configure the cluster has failed, and Dataproc is not sure of its health. At this point Dataproc cannot guarantee that another reconfigure attempt will succeed so further updates are disallowed. You can however submit jobs.
Your best bet is to delete it and start over.