I setup a cluster in my production environment with 3 VMs and setup some applications there. Now if I open the explorer, I got a warning in the cluster
Unhealthy event: SourceId='System.UpgradeOrchestrationService',
Property='ClusterVersionSupport', HealthState='Warning',
ConsiderWarningAsError=false. The current cluster version 5.4.164.9494
support ends 6/10/2017 12:00:00 AM. Please view available upgrades
using Get-ServiceFabricRegisteredClusterCodeVersion and upgrade using
Start-ServiceFabricClusterUpgrade.
My application is till running there.
I didn't update the cluster so far. I have a question that, can we directly update the cluster without unregistering the applications? will it have any impact on the already running application while we updating the cluster configuration?
Thanks,
Divya
As #LodeRunner28 stated, you can upgrade the cluster version while it is running, and actually in Azure you can setup your cluster to allow it to be updated automatically as new versions of Service Fabric are available.
Related
I upgrade AKS using Azure portal from 1.23.5 to 1.24.9. This part finished properly (or so I assumed) based on below status on Azure portal.
I continued with 1.24.9 to 1.25.5. This time it worked partly. Azure portal shows 1.25.5 for nodepool with provision state "Failed". While nodes are still at 1.24.9.
I found that some nodes were having issues connecting to network including outside e.g. github as well as internal "services". For some reason it is intermittent issue. On same node it sometime works and sometimes not. (I had pods running on each node with python.)
Each node has cluster IP in resolv.conf
One of the question on SO had a hint about ingress-nginx compatibility. I found that I had an incompatible version. So I upgraded it to 1.6.4 which is compatible with 1.24 and 1.25 both
But this network issue still persists. I am not sure if this is because AKS provisioning state of "Failed". Connectivity check for this cluster in Azure portal is Success. Only issue reported in Azure portal diagnostics is nodepool provisioning state.
is there anything I need to do after ingress-nginx upgrade for all nodes/pods to get the new config?
Or is there a way to re-trigger this upgrade? although I am not sure why, but just assuming that it may reset the configs on all nodes and might work.
What are the steps for upgrading Kubernetes offline via kubeadm. I have a vanilla kubernetes cluster running with no access to internet. In order to upgrade kuberenetes when
kubeadm upgrade plan 'command is executed, it reaches out to internet for the plan.
The version of kubernetes used is 22.1.2,
CNI used: flannel.
Cluster size: 3 master, 5 worker.
It is a time taking process to manage the offline Kubernetes cluster. Because you need to set up your own repositories and registries for images. Once you are done with the setup of the nodes and registries, one can upgrade the cluster based on the requirements. There are a lot of resources available online that will teach how to manage different repositories for each OS distribution.
You can build your own images based on the requirements and push them to the registry. Later these images will help to create the Pods. You need to set up your own CA certificates because container engines require SSL. Example SSL setup.
For more information refer to this K8’s community discussion forum.
I have a Kubernetes cluster on GCP that hosts a Flask application and some more services.
Before upgrading the master node to version 1.15 (was 1.14.x) I saw every log from the flask application on Stackdriver's GKE Container logs, now I don't get any log.
Searching through the release notes I noticed that from 1.15 they:
disabled stackdriver logging agent to prevent node startup failures
I'm not entirely sure that's the reason but I'm sure that the logging stopped after upgrading the master and node versions to 1.15, there has been no code change in the application core.
My question is how can I reactivate the logs I saw before?
I actually found the solution, as stated by the release notes, the stackdriver agent actually becomes disabled by default in 1.15.
To activate it again you need to edit the cluster following these instructions, setting "System and workload logging and monitoring" under "Stackdriver Kubernetes Engine Monitoring"
After that, I could not use anymore the legacy Stackdriver Monitoring, so I found my logs weren't under the resource "GKE Container" but under "Kubernetes Container".
I also had to update every log-based metric that had a filter on resource.type="container", changing it to resource.type="k8s_container"
I've got a single node service fabric instance hosted in Azure, just for testing purposes. When I try and upgrade the service fabric version to 7.0 from 6.5 I get the message:
SingleNodeClusterUpdateNotAllowed
Is there anything I can do to allow this?
The short answer is no.
The reason for this is that in order to upgrade service fabric has to takes down a node, updates and restarts it. This is repeated for all nodes until the update is complete. In a single node cluster this would mean taking the cluster offline completely. This is not allowed by the service fabric rules (at the very least one node must be available).
A single node 'cluster' therefore cannot update the platform or applications running on it.
The only way you can update a single node cluster is to delete and reinstall it. The same goes for applications (delete the application type before deploying an updated version). Depending on where you have the software deployed (development box, a server, azure) I would recommend scripting as much as possible. This will allow you to easily delete and redeploy. I am using a combination of an Azure template (arm), DevOps pipeline and script to initialise and load some default data into the application.
I've an on-premise, secure, development cluster that I wish to upgrade. The current version is 5.7.198.9494. I've followed the steps listed here.
At the time of writing, the latest version of SF is 6.2.283.9494. However, running Get-ServiceFabricRuntimeUpgradeVersion -BaseVersion 5.7.198.9494 shows that I first must update to 6.0.232.9494, before upgrade to 6.2.283.9494.
I run the following in Powershell, and the upgrade does start:
Copy-ServiceFabricClusterPackage -Code -CodePackagePath .\MicrosoftAzureServiceFabric.6.0.232.9494.cab -ImageStoreConnectionString "fabric:ImageStore"
Register-ServiceFabricClusterPackage -Code -CodePackagePath MicrosoftAzureServiceFabric.6.0.232.9494.cab
Start-ServiceFabricClusterUpgrade -Code -CodePackageVersion 6.0.232.9494 -Monitored -FailureAction Rollback
However, after a few minutes the following happens:
Powershell IDE crashes
The Service Fabric Cluster becomes unreachable
Service Fabric Local Cluster Manager disappears from the task bar
Event Viewer will log the events, see below.
Quite some time later, the vm will reboot. Service Fabric Local Cluster Manager will only give options to Setup or Restart the local cluster.
Event viewer has logs in the under Applications and Services Logs -> Microsoft-Service Fabric -> Operational. Most are information about opening, closing, and aborting one of the upgrade domains. There are some warnings about a vm failing to open an upgrade domain stating error: Lease Failed.
This behavior happens consistently, and I've not yet been able to update the cluster. My guess is that we are not able to upgrade a development cluster, but I've not found an article that states that.
Am I doing something incorrectly here, or is it impossible to upgrade a development cluster?
I will assume you have a development cluster with a single node or multiple nodes in a single VM.
As described in the first section of the documentation from the same link your provided:
service-fabric-cluster-upgrade-windows-server
You can upgrade your cluster to the new version only if you're using a
production-style node configuration, where each Service Fabric node is
allocated on a separate physical or virtual machine. If you have a
development cluster, where more than one Service Fabric node is on a
single physical or virtual machine, you must re-create the cluster
with the new version.