I have received a Warning that a new version of Service Fabric is available, however when I tried to upgrade it, the process was stuck at PreUpgradeSafetyCheck on node Rep_247. I've tried -Force and -ForceRestart but it hasn't helped.
Cluster Map
This issue is likely to be happening because service fabric can't take down a service in a safe manner to upgrade the node or application.
Whenever a node is upgraded, the services activated in the node must move to another node first, so that the node can be restarted without affecting your applications\services availability.
In this case, doing so may cause a quorum loss when the service can't be placed in another node, maybe because there is no other node available, or because of placement constraints in the service, or there is only one instance of the service.
Because SF can't guarantee the reliability of the service, it will halt the upgrade process until a solution can be applied to fix the problem and the process continue.
From your cluster map and the message is possible to know the issue, your cluster has only one node of type 'Rep_247 ReportServerType', I am assuming you have services with placement constraints to be deployed only on this node type, taking down the node will make these services unavailable, because the placement constraints will prevent them to move to another node type.
If the service are not constrained to that node type, the problem might be:
It is failing to activate on other nodes, example, dependencies are missing in the node, and this will fail to have the minimum replica.
The service has only one instance available and taking down will make the service unavailable.
PS: the same applies to the node MR_236 MRType
PreUpgradeSafetyCheck
An UpgradePhase of PreUpgradeSafetyCheck means there were issues
preparing the upgrade domain before it was performed. The most common
issues in this case are service errors in the close or demotion from
primary code paths.
Possible solution for this case are:
Add more replicas\instances of the service so the minimum quorum is meet.
Remove the Placement constraints of the service to let them move to other nodes.
Add an extra node of same node type so that the service can move out safely.
Taking down the service and recreate when the node is updated (last option if not stateful, otherwise will lose data)
You might be interested to see related issues:
Github Issue #1279
Github Issue #377
Related
Kubernetes tends to assume apps are small/lightweight/stateless microservices which can be stopped on one node and restarted on another node with no downtime.
We have a slow starting (20min) legacy (stateful) application which, once run as a set of pod should not be rescheduled without due cause. The reason being all user sessions will be killed and the users will have to login again. There is NO way to serialize the sessions and externalize them. We want 3 instances of the pod.
Can we tell k8s not to move a pod unless absolutely necessary (i.e. it dies)?
Additional information:
The app is a tomcat/java monolith
Assume for the sake of argument we would like to run it in Kubernetes
We do have a liveness test endpoint available
There is no benefit, if you tell k8s to use only one pod. That is not the "spirit" of k8s. In this case, it might be better to use a dedicated machine for your app.
But you can assign a pod to a special node - Assigning Pods to Nodes. The should be necessary only, when special hardware requirements are needed (e.g. the AI-microservice needs a GPU, which is only on node xy).
k8s don't restart your pod for fun. It will restart it, when there is a reason (node died, app died, ...) and I never noticed a "random reschedule" in a cluster. It is hard to say, without any further information (like deployment, logs, cluster) what exactly happened to you.
And for your comment: There are different types of recreation, one of them starts a fresh instance and will kill the old one, when the startup was successfully. Look here: Kubernetes deployment strategies
All points together:
Don't enforce a node to your app - k8s will "smart" select the node.
There are normally no planned reschedules in k8s.
k8s will recreate pods only, if there is a reason. Maybe your app didn't answer on the liveness-endpoint? Or someone/something deleting your pod?
We have a service fabric cluster with one scale set (primary) with 5 nodes. There was a memory leak in one of our services which drained all of the available memory on the nodes and eventually other services failed. For instance some Powershell commands don't work now. In the Service Fabric Explorer everything is healthy and we don't have any errors or warnings. Is it possible to restart the machines and what is the best way to do it so we could restore the machines to their initial state where all of the services are working?
In the scale set when scaling down it removes the node with the highest index, so it won't help to follow the documentation, scale up and then remove the nodes that are faulty.
What would happen if we restart the scale set nodes one buy one? I see that service fabric handles it - disables the node and activates it afterwards. But from the documentation in silver tier we need to have 5 nodes up and running all the time. So before restarting any of the nodes should we scale up, add one more node and then proceed with the restart?
If the failing nodes has healthy services still running, the best approach is disable the node first with Disable-ServiceFabricNode command, so that any healthy services are moved out of the node with less impact possible.
Once the services are moved, in some cases, just a Restart-ServiceFabricNode command can kill all locked services and come back healthy, without actually restaring the VM.
In last case, you might need to restart the VM via Powershell or Azure Portal to get a fresh start to the node.
If your cluster is running on high density load, you might need to scale up first to bring capacity to the cluster reallocate the services.
Provided you have 'Silver' durability for your cluster, to restart an underlying Service Fabric VM, just go to the VMSS in Azure portal, select the VM and click 'Restart'. With 'Silver' tier, Service Fabric uses the Infrastructure Service to orchestrate disabling and restarting the nodes so you don't have to do all this manually.
Please note, you should not restart all VMs in the scaleset at the same time, or go below the number of VMs needed to be up per your durability level. This could lead to quorum loss and ultimately the demise of your cluster!
I am thinking to create a service fabric cluster with nodes that span multiple locations, for example, one cluster that has nodes at eastus and westus2. Do you know how I can do it? Is there any ARM template examples? I saw MSDN document mentioned this in service fabric cluster disaster recovery. But nothing else useful I found out.
Thanks,
This is not officially supported at this time. The main problem is designating VM scale sets with their proper fault domains. You need to have a way to make sure the Stateful Service & Actors data is always replicated to the other region, so you can indeed do fail-over.
I've been trying to figure out what happens when the Kubernetes master fails in a cluster that only has one master. Do web requests still get routed to pods if this happens, or does the entire system just shut down?
According to the OpenShift 3 documentation, which is built on top of Kubernetes, (https://docs.openshift.com/enterprise/3.2/architecture/infrastructure_components/kubernetes_infrastructure.html), if a master fails, nodes continue to function properly, but the system looses its ability to manage pods. Is this the same for vanilla Kubernetes?
In typical setups, the master nodes run both the API and etcd and are either largely or fully responsible for managing the underlying cloud infrastructure. When they are offline or degraded, the API will be offline or degraded.
In the event that they, etcd, or the API are fully offline, the cluster ceases to be a cluster and is instead a bunch of ad-hoc nodes for this period. The cluster will not be able to respond to node failures, create new resources, move pods to new nodes, etc. Until both:
Enough etcd instances are back online to form a quorum and make progress (for a visual explanation of how this works and what these terms mean, see this page).
At least one API server can service requests
In a partially degraded state, the API server may be able to respond to requests that only read data.
However, in any case, life for applications will continue as normal unless nodes are rebooted, or there is a dramatic failure of some sort during this time, because TCP/ UDP services, load balancers, DNS, the dashboard, etc. Should all continue to function for at least some time. Eventually, these things will all fail on different timescales. In single master setups or complete API failure, DNS failure will probably happen first as caches expire (on the order of minutes, though the exact timing is configurable, see the coredns cache plugin documentation). This is a good reason to consider a multi-master setup–DNS and service routing can continue to function indefinitely in a degraded state, even if etcd can no longer make progress.
There are actions that you could take as an operator which would accelerate failures, especially in a fully degraded state. For instance, rebooting a node would cause DNS queries and in fact probably all pod and service networking functionality until at least one master comes back online. Restarting DNS pods or kube-proxy would also be bad.
If you'd like to test this out yourself, I recommend kubeadm-dind-cluster, kind or, for more exotic setups, kubeadm on VMs or bare metal. Note: kubectl proxy will not work during API failure, as that routes traffic through the master(s).
Kubernetes cluster without a master is like a company running without a Manager.
No one else can instruct the workers(k8s components) other than the Manager(master node)(even you, the owner of the cluster, can only instruct the Manager)
Everything works as usual. Until the work is finished or something stopped them.(because the master node died after assigning the works)
As there is no Manager to re-assign any work for them, the workers will wait and wait until the Manager comes back.
The best practice is to assign multiple managers(master) to your cluster.
Although your data plane and running applications does not immediately starts breaking but there are several scenarios where cluster admins will wish they had multi-master setup. Key to understanding the impact would be understanding which all components talk to master for what and how and more importantly when will they fail if master fails.
Although your application pods running on data plane will not get immediately impacted but imagine a very possible scenario - your traffic suddenly surged and your horizontal pod autoscaler kicked in. The autoscaling would not work as Metrics Server collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through Metrics API for use by Horizontal Pod Autoscaler and vertical pod autoscaler ( but your API server is already dead).If your pod memory shoots up because of high load then it will eventually lead to getting killed by k8s OOM killer. If any of the pods die, then since controller manager and scheduler talks to API Server to watch for current state of pods so they too will fail. In short a new pod will not be scheduled and your application may stop responding.
One thing to highlight is that Kubernetes system components communicate only with the API server. They don’t
talk to each other directly and so their functionality themselves could fail I guess. Unavailable master plane can mean several things - failure of any or all of these components - API server,etcd, kube scheduler, controller manager or worst the entire node had crashed.
If API server is unavailable - no one can use kubectl as generally all commands talk to API server ( meaning you cannot connect to cluster, cannot login into any pods to check anything on container file system. You will not be able to see application logs unless you have any additional centralized log management system).
If etcd database failed or got corrupted - your entire cluster state data is gone and the admins would want to restore it from backups as early as possible.
In short - a failed single master control plane although may not immediately impact traffic serving capability but cannot be relied on for serving your traffic.
I've spun up a Service Fabric cluster in Azure and suddenly my node is in error state - Unfortunate but this can happen.
When I check Service Fabric Explorer I can see that the node is in error state but the error doesn't really give me any hints since I really didn't do anything.
I haven't found a way to fix it and worst-case scenario was to restart the node but I was unable to find this capability.
Did anybody have this issue before or can anybody help me with this?
In Service Fabric Explorer, there is a Nodes view below the clusters. You can select the node and choose details to see more information about the node. You may be able to see something that indicates what is wrong. There are also 5 actions that can be taken on the node Activate, Deactivate (pause), Deactivate (Restart), Deactivate (remove data) and remove node state.