How to repair bad IP addresses in standalone Service Fabric cluster - azure-service-fabric

We've just shipped a standalone service fabric cluster to a customer site with a misconfiguration. Our setup:
Service Fabric 6.4
2 Windows servers, each running 3 Hyper-V virtual machines that host the cluster
We configured the cluster locally using static IP addresses for the nodes. When the servers arrived, the IP addresses of the Hyper-V machines were changed to conform to the customer's available IP addresses. Now we can't connect to the cluster, since every IP in the clusterConfig is wrong. Is there any way we can recover from this without re-installing the cluster? We'd prefer to keep the new IP's assigned to the VM's if possible.

I've tested this only on my test environment (I've never done this on production before so do it on your own risk), but since you can't connect to the cluster anyway I think it is worth to try.
Connect to each virtual machine which is a part of the cluster and do following steps:
Locate Service Fabric Cluster files (usually C:\ProgramData\SF\{nodeName}\Fabric)
Take ClusterManifest.current.xml file and copy it to temp folder (for example C:\temp)
Go to Fabric.Data subfolder, take InfrastructureManifest.xml file and copy it to the same temp folder
Inside each file you have copied change IP addresses for nodes to correct values
Stop FabricHostSvc process by running net stop FabricHostSvc command in powershell
After successful stop run this powershell (admin mode) command to update node cluster configuration:
New-ServiceFabricNodeConfiguration -ClusterManifestPath C:\temp\ClusterManifest.current.xml -InfrastructureManifestPath C:\temp\InfrastructureManifest.xml
Once the config is updated start FabricHostSvc net start FabricHostSvc
Do this for each node and pray for the best.

Related

Zookeeper for High availability

How does zookeeper work in the following situation.
Consider I am having 3 (1,2,3) vm's and different services are running at their endpoints. My entire administration setup (TAC) is available only on the 1st vm (virtual machine) that means whenever a client wants to connect, it would by default connect to the first vm. My other 2 vm's they are just running bunch of services. This entire cluster setup is maintained by the Zookeeper.
My question is what is the 1st vm fails. I know zookeeper maintains high availability by electing another vm as the master but client by default only points to 1st vm but not to other two. Is there any chance I can overcome this situation by getting the Ip of the first node as my admin setup is entirely present only on that node or in any other method?

Kubernetes vSphere Setup IPV4 issue

I'm following the kubernetes VM setup guide. After uploading the vmdk successfully, the kubernetes-master VM is started up, but it has no IPV4 address. Therefore, the next step in the script fails because it tries to SSH to the kubernetes master to run the setup of the nodes. Other VMs created via the web client correctly receive an IP4 address, but any VM created from govc does not.
This was a customer's setup and it turns out they did not have DHCP setup on the selected network, therefore, no IP.

Restart remote service from DirectAdmin

I have 8 servers connected together to a kind of cluster. On one of the servers i have DirectAdmin that controll all configs (config files mounted over NFS). Everything works correct except restarting services after configs updates.
How to force service restart on all 8 servers?
To restart services on your server you need to login it, I will suggest create bash script on your one of the server and execute it, You can setup SSH key to login your other server from you one server to others, so that your bash script will able to restart your services on all other servers.

How to re-connect to Amazon kubernetes cluster after stopping & starting instances?

I create a cluster for trying out kubernetes using cluster/kube-up.sh in Amazon EC2. Then I stop it to save money when not using it. Next time I start the master & minion instances in amazon, *~/.kube/config has old IP-s for the cluster master as EC2 assigns new public IP to the instances.
Currently I havent found way to provide Elastic IP-s to cluster/kube-up.sh so that consistent IP-s between stopping & starting instances would be set in place. Also the certificate in ~/.kube/config for the old IP so manually changing IP doesn't work either:
Running: ./cluster/../cluster/aws/../../cluster/../_output/dockerized/bin/darwin/amd64/kubectl get pods --context=aws_kubernetes
Error: Get https://52.24.72.124/api/v1beta1/pods?namespace=default: x509: certificate is valid for 54.149.120.248, not 52.24.72.124
How to make kubectl make queries against the same kubernetes master on a running on different IP after its restart?
If the only thing that has changed about your cluster is the IP address of the master, you can manually modify the master location by editing the file ~/.kube/config (look for the line that says "server" with an IP address).
This use case (pausing/resuming a cluster) isn't something that we commonly test for so you may encounter other issues once your cluster is back up and running. If you do, please file an issue on the GitHub repository.
I'm not sure which version of Kubernetes you were using but in v1.0.6 you can pass MASTER_RESERVED_IP environment variable to kube-up.sh to assign a given Elastic IP to Kubernetes Master Node.
You can check all the available options for kube-up.sh in config-default.sh file for AWS in Kubernetes repository.

Windows Failover Cluster not online during creation of SQL Always On Availability Group

I've been following this tutorial to create an Azure SQL AlwaysOn Availability Group using Powershell:
Tutorial: AlwaysOn Availability Groups in Windows Azure (PowerShell)
When I get to the command that invokes the CreateAzureFailoverCluster powershell script, I check the state of the failover cluster. In Failover Cluster Manager, it is shown as "the cluster network name is not online"
When I look at the Cluster Events, I see this:
Cluster network name resource 'Cluster Name' cannot be brought online. Ensure that the network adapters for dependent IP address resources have access to at least one DNS server. Alternatively, enable NetBIOS for dependent IP addresses.
Each of the 3 servers in the cluster has access to the DC via ping. All of the preceding setup steps execute correctly. The servers are all on the 10.10.2.x/24 IP range except the DC, which is on 10.10.0.0/16 (with IP of 10.10.0.4)
All of the settings have been validated by prior execution of the tutorial on a different Azure subscription to create a failover cluster that works fine.
Cluster validation reveals this warning:
The "Cluster Group" does not contain an Cluster IP Address resource. This is a required resource for the group. It may be difficult to manage the cluster with this resource missing
(sic)
How do I add a Cluster IP Address resource?
There was nothing wrong with the configuration or the steps taken.
It took the cluster 3 hours to come online.
Attempting to bring the cluster online manually, failed for at least 20 mins after creating the cluster.
The Windows Event logs on all 4 servers showed nothing to say when the cluster moved into the online state.
It seems the correct solution was to work on something else until the cluster started under its retry policy.
Did you setup a fixed IP address in the cluster, using the cluster manager? there's a bug, DHCP will assign the cluster the IP address of one of the sql server isntances. I just assigned a high enough number (x.x.x.171, I think), and the problem went away.