Service Fabric Cluster Upgrade Failing - azure-service-fabric

I've an on-premise, secure, development cluster that I wish to upgrade. The current version is 5.7.198.9494. I've followed the steps listed here.
At the time of writing, the latest version of SF is 6.2.283.9494. However, running Get-ServiceFabricRuntimeUpgradeVersion -BaseVersion 5.7.198.9494 shows that I first must update to 6.0.232.9494, before upgrade to 6.2.283.9494.
I run the following in Powershell, and the upgrade does start:
Copy-ServiceFabricClusterPackage -Code -CodePackagePath .\MicrosoftAzureServiceFabric.6.0.232.9494.cab -ImageStoreConnectionString "fabric:ImageStore"
Register-ServiceFabricClusterPackage -Code -CodePackagePath MicrosoftAzureServiceFabric.6.0.232.9494.cab
Start-ServiceFabricClusterUpgrade -Code -CodePackageVersion 6.0.232.9494 -Monitored -FailureAction Rollback
However, after a few minutes the following happens:
Powershell IDE crashes
The Service Fabric Cluster becomes unreachable
Service Fabric Local Cluster Manager disappears from the task bar
Event Viewer will log the events, see below.
Quite some time later, the vm will reboot. Service Fabric Local Cluster Manager will only give options to Setup or Restart the local cluster.
Event viewer has logs in the under Applications and Services Logs -> Microsoft-Service Fabric -> Operational. Most are information about opening, closing, and aborting one of the upgrade domains. There are some warnings about a vm failing to open an upgrade domain stating error: Lease Failed.
This behavior happens consistently, and I've not yet been able to update the cluster. My guess is that we are not able to upgrade a development cluster, but I've not found an article that states that.
Am I doing something incorrectly here, or is it impossible to upgrade a development cluster?

I will assume you have a development cluster with a single node or multiple nodes in a single VM.
As described in the first section of the documentation from the same link your provided:
service-fabric-cluster-upgrade-windows-server
You can upgrade your cluster to the new version only if you're using a
production-style node configuration, where each Service Fabric node is
allocated on a separate physical or virtual machine. If you have a
development cluster, where more than one Service Fabric node is on a
single physical or virtual machine, you must re-create the cluster
with the new version.

Related

kubetnetes cluster in Azure (AKS) upgrade 1.24.9 in fail state with pods facing intermittent DNS issues

I upgrade AKS using Azure portal from 1.23.5 to 1.24.9. This part finished properly (or so I assumed) based on below status on Azure portal.
I continued with 1.24.9 to 1.25.5. This time it worked partly. Azure portal shows 1.25.5 for nodepool with provision state "Failed". While nodes are still at 1.24.9.
I found that some nodes were having issues connecting to network including outside e.g. github as well as internal "services". For some reason it is intermittent issue. On same node it sometime works and sometimes not. (I had pods running on each node with python.)
Each node has cluster IP in resolv.conf
One of the question on SO had a hint about ingress-nginx compatibility. I found that I had an incompatible version. So I upgraded it to 1.6.4 which is compatible with 1.24 and 1.25 both
But this network issue still persists. I am not sure if this is because AKS provisioning state of "Failed". Connectivity check for this cluster in Azure portal is Success. Only issue reported in Azure portal diagnostics is nodepool provisioning state.
is there anything I need to do after ingress-nginx upgrade for all nodes/pods to get the new config?
Or is there a way to re-trigger this upgrade? although I am not sure why, but just assuming that it may reset the configs on all nodes and might work.

Service Fabric Single Node SingleNodeClusterUpdateNotAllowed

I've got a single node service fabric instance hosted in Azure, just for testing purposes. When I try and upgrade the service fabric version to 7.0 from 6.5 I get the message:
SingleNodeClusterUpdateNotAllowed
Is there anything I can do to allow this?
The short answer is no.
The reason for this is that in order to upgrade service fabric has to takes down a node, updates and restarts it. This is repeated for all nodes until the update is complete. In a single node cluster this would mean taking the cluster offline completely. This is not allowed by the service fabric rules (at the very least one node must be available).
A single node 'cluster' therefore cannot update the platform or applications running on it.
The only way you can update a single node cluster is to delete and reinstall it. The same goes for applications (delete the application type before deploying an updated version). Depending on where you have the software deployed (development box, a server, azure) I would recommend scripting as much as possible. This will allow you to easily delete and redeploy. I am using a combination of an Azure template (arm), DevOps pipeline and script to initialise and load some default data into the application.

Minikube out of resources

Our company use Kubernetes in all our environments. as well as on our local Macbook using minikube.
We have many microservices and most of them are running JVM which require a large amount of memory. We started to face an issue that we cannot run our stack on minikube due to out of memory of the local machine.
We thought about multiple solutions:
the first was to create a k8s cloud development environment and when a developer is working on a single microservice on his local macbook he will redirect the outbound traffic into the cloud instead of the local minikube. but this solution will create new problems:
how a pod inside the cloud dev env will send data to the local developer machine? its not just a single request/response scenario
We have many developers, they can overlap each other with different versions of each service they need to be deploy on the cloud. (We can set each developer separate namespace but we will need a huge cluster to support it)
The second solution was maybe we should use a tools like skaffold or draft to deploy our current code into the cloud development environment. that will solve issue #1 but again we see problems:
Slow development cycle - building a java image and push to remote cloud and wait for init will take too much time for developer to work.
And we still facing issue #2
Antoher though was, kubernetes support multiple nodes, why won't we just add another node, a remote node that sit on the cloud, to our local minikube? The main issue is that minikube is a single node solution. Also, we didn't find any resources for it on the web.
Last thought was to connect minikube docker daemon to a remote machine. so we will use minikube on the local machine but the docker will run the containers on a remote cloud server. But no luck so far, minikube crush when we do this manipulate. and we didn't find any resources for it on the web as well.
Any thought how to solve our issue? Thank you!

Service Fabric Cluster Upgrade While it is Running

I setup a cluster in my production environment with 3 VMs and setup some applications there. Now if I open the explorer, I got a warning in the cluster
Unhealthy event: SourceId='System.UpgradeOrchestrationService',
Property='ClusterVersionSupport', HealthState='Warning',
ConsiderWarningAsError=false. The current cluster version 5.4.164.9494
support ends 6/10/2017 12:00:00 AM. Please view available upgrades
using Get-ServiceFabricRegisteredClusterCodeVersion and upgrade using
Start-ServiceFabricClusterUpgrade.
My application is till running there.
I didn't update the cluster so far. I have a question that, can we directly update the cluster without unregistering the applications? will it have any impact on the already running application while we updating the cluster configuration?
Thanks,
Divya
As #LodeRunner28 stated, you can upgrade the cluster version while it is running, and actually in Azure you can setup your cluster to allow it to be updated automatically as new versions of Service Fabric are available.

How to restart unresponsive kubernetes master in GKE

The kubernetes master in one of my GKE clusters became unresponsive last night following the infrastructure issue in us-central1-a.
Whenever I run "kubectl get pods" in the default namespace I get the following error message:
Error from server: an error on the server has prevented the request from succeeding
If I run "kubectl get pods --namespace=kube-system", I only see the kube-proxy and the fluentd-logging daemon.
I have trying scaling the cluster down to 0 and then scaling it back up. I have also tried downgrading and upgrading the cluster but that seems to apply only to the nodes (not the master). Is there any GKE/K8S API command to issue a restart to the kubernetes master?
There is not a command that will allow you to restart the Kubernetes master in GKE (since the master is considered a part of the managed service). There is automated infrastructure (and then an oncall engineer from Google) that is responsible for restarting the master if it is unhealthy.
In this particular cases, restarting the master had no effect on restoring it to normal behavior because Google Compute Engine Incident #16011 caused an outage on 2016-06-28 for GKE masters running in us-central1-a (even though that isn't indicated on the Google Cloud Status Dashboard). During the incident, many masters were unavailable.
If you had tried to create a GCE cluster using kube-up.sh during that time, you would have similarly seen that it would be unable to create a functional master VM due to the SSD Persistent disk latency issues.
I'm trying to have at least one version to upgrade ready, if you trying to upgrade the master, it will restart and work within few minutes. Otherwise you should wait around 3 days while Google team will reboot it. On e-mail/phone, then won't help you. And unless you have payed support (transition to which taking few days), they won't give a bird.