My goal was to change the SKU of the all of the nodes in my VMSS.
I removed all of the nodes in the VMSS associated to my Service Fabric Cluster
I changed the sku and updated via Powershell
The VMSS correctly shows my nodes running, green across the board
The problem now is that when I view the overview of my Service Fabric Cluster in the Azure portal it is not detecting any nodes.
The existing virtual network correctly sees the nodes.
Any idea's on what the issue could be?
Microsoft support confirmed that what I'm doing is not supported. In order to change the SKU of the nodes in a scale set you need to remove them all, however by removing all of the nodes the association to the SF Cluster gets lost so while you can get the scale set up and running with your new SKU and nodes, SF will not see the nodes.
You need to create a new cluster right now. Apparently they are working on adding that functionality.
Related
I updated my Azure AKS nodepool size from within the Azure Portal to go from 2 to 4 nodes. When I run az aks nodepool show ..., I see that the count has correctly been updated. However, when I run kubectl get nodes, I still only see the two nodes that previously existed.
According to the Kubernetes documentation on node management,
There are two main ways to have Nodes added to the API server :
The kubelet on a node self-registers to the control plane
You, or another human user, manually add a Node object
(Emphasis mine)
My expectation, therefore, is that having scaled up my node pool, these new nodes should automatically register, and kubectl get nodes should just pick them up, but this appears to not be the case.
Now that my nodepool has more nodes, how do I get my AKS cluster to recognize and utilize them? Once kubectl get nodes shows them, will applying an updated manifest (with more replicas) be all I need to do to use the additional hardware?
It's difficult to see without access to your setup. But you can see:
Check that the control plane hasn't been automatically upgraded to a new version that is incompatible with the kubelet version in your nodepool when it registers with the cluster. (Best if the versions match)
Connect to the nodes that are not registering (ssh) and check the logs as to why the kubelet is not starting. i.e systectl status kubelet.
Check that you can connect to the port (i.e 8443) and IP address where your kube-apiserver is listening on from these nodes that are not registering. i.e curl <ip-address>:8443
Possible solution:
Upgrade the VM image of your node pool to use one compatible with the control plane.
Remove firewall rule preventing your nodes accessing the kube-apiserver
will applying an updated manifest (with more replicas) be all I need to do to use the additional hardware?
Should work.
✌️
I need to downsize a cluster from 3 to 2 nodes.
I have critical pods running on some nodes (0 and 1). As I found that the last node (2) in the cluster is the one that has the non critical pods, I have "cordoned" it so it won't get any new ones.
I wonder is if I can make sure that that last node (2) is the one that will be removed when I go to Azure portal and downsize my cluster to 2 nodes (it is the last node and it is cordoned).
I have read that if I manually delete the node, the system will still consider there are 3 nodes running so it's important to use the cluster management to downsize it.
You cannot control which node will be removed when scaling down the AKS cluster.
However, there are some workarounds for that:
Delete the cordoned node manually via portal and than launch upgrade. It would try to add the node but with no success because the subnet has no space left.
Another option is to:
Set up cluster autoscaler with two nodes
Scale up the number of nodes in the UI
Drain the node you want to delete and wait for autoscaler do it's job
Here are some sources and useful info:
Scale the node count in an Azure Kubernetes Service (AKS) cluster
Support selection of nodes to remove when scaling down
az aks scale
Please let me know if that helped.
I am running Service Fabric across a three node standalone cluster. Each cluster is on a separate virtual machine in a corporate enterprise cloud environment. Recently two of my virtual machines on which the nodes reside where deleted (one of the deleted machines being the machine which the cluster was created from). After this deletion, I attempted access Service Fabric Explorer on the remaining machine only to get a "Page Cannot be found" error. Furthermore, the Connect-ServiceFabricCluster (for attempting to connect to the remaining node) and the Get-ServiceFabricApplication Powershell commands fail stating:
"A communication error caused the operation to fail."
and
"No cluster endpoint is reachable, please check if there is a connectivity/firewall/DNS issue."
respectively.
Under what conditions does Service Fabric's automatic failover capability work on a standalone cluster? Are there any steps that can be taken so that I would still be able to access Service Fabric from the remaining node(s) on a standalone cluster if several nodes suddenly go down at once?
The cluster services run as stateful services on the cluster. For a stateful service you need a minimum number of nodes running, to guarantee its availability and ability to preserve state. The minimum number of nodes is equal to the target replica set count of the partition/service.
If less than the minimum number of nodes are available, your (cluster) services will stop working.
More info here.
The cluster size is determined by your business needs. However, you
must have a minimum cluster size of three nodes (machines or virtual
machines).
the kubernetes 1.2 support muti-node acrossing multiple service providers , now the master node running in my laptop , I want to add two work node respectively in amazon and vagrant . how to achieve it?
the kubernetes 1.2 support muti-node acrossing multiple service providers
Where did you see this? It isn't actually true. In 1.2 we added support for nodes across multiple availability zones within the same region on the same service provider (e.g. us-central1-a and us-central1-b in the us-central1 region in GCP). But there is no support for running nodes across regions in the same service provider much less spanning a cluster across service providers.
now the master node running in my laptop , I want to add two work node respectively in amazon and vagrant
The worker nodes must be able to connect directly to the master node. I wouldn't suggest exposing your laptop to the internet directly so that it can be reached from an Amazon data center, but would instead advise you to run the master node in the cloud.
Also note that if you are running nodes in the same cluster across multiple environments (AWS, GCP, Vagrant, bare metal, etc) then you are going to have a difficult time getting networking configured properly so that all pods can reach each other.
How can I specify a specific vm type for the cluster master (I don't want to use an high memory instance for relative an inactive node).
Also, is there any way to add nodes to a cluster and choosing the type of vm? (this can solve the first problem)
Update November 2015:
Now that Google Container Engine is no longer in alpha, you don't need to worry about the size of your cluster master, as it is part of the managed service.
You can now easily add/remove nodes from your cluster through the cloud console UI but they will all be the same machine type that you originally choose for your cluster.
If you are running OSS Kubernetes on GCE, then you can set the MASTER_SIZE environment variable in cluster/gce/config-default.sh before creating your cluster.
If you are running on GKE, we unfortunately don't yet offer the option to customize the size of your master differently than the size of your nodes. We hope to offer more flexibility in cluster provisioning soon.
There is currently not a way to resize your cluster after you create it. I'm actually working on this for OSS Kubernetes in Issue #3168.