eksctl stuck on Waiting for Nodes to join the cluster - kubernetes

When using eksctl to create Kubernetes cluster using AWS EKS, the process get stuck waiting for the nodes to join the cluster:
nodegroup "my-cluster" has 0 node(s)
waiting for at least 3 node(s) to become ready in “my-cluster”
timed out (after 25m0s) waiting for at least 3 nodes to join the cluster and become ready in "my-cluster"
The message is displayed, without any additional logs, until the process eventually times out. It looks like behind the scenes, the newly created nodes are unable to communicate with the Kubernetes cluster

When using an existing VPC network, you have to make sure that the VPC conforms with all EKS-specific requirements [1, 2]. The blog post by logz.io provides detailed guidance on setting up a VPC network, as well as an example AWS Cloud Formation template that you can use as the basis [3].
Missing IAM Policies
The AmazonEKSWorkerNodePolicy and AmazonEKS_CNI_Policy policies [4] are required by the EKS worker nodes to be able to communicate with the cluster.
By default, eksctl automatically generates a role containing these policies. However, when you use “attachPolicyARNs” property to attach specific policies by ARN, you have to include these policies explicitly [5]:
nodeGroups:
- name: my-special-nodegroup
iam:
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/ElasticLoadBalancingFullAccess
[1] https://docs.aws.amazon.com/eks/latest/userguide/create-public-private-vpc.html
[2] https://eksctl.io/usage/vpc-networking
[3] https://logz.io/blog/amazon-eks
[4] https://docs.aws.amazon.com/eks/latest/userguide/worker_node_IAM_role.html
5] https://eksctl.io/usage/iam-policies/

I face the same issue and found the real cause of this issue.
The issue is with the created VPC. I have also created the VPC and subnets but I have not created the route table and internet gate which was causing the issue.
Issue got resolved,once mapped the route table and internet gateway.

Related

Worker Node group doesn't join the EKS cluster

I have followed this blog to set up open5GS on AWS: https://aws.amazon.com/blogs/opensource/open-source-mobile-core-network-implementation-on-amazon-elastic-kubernetes-service/
I've set up the infrastructure using open5gs-infra.yaml
I've configured the bastion host and run step 5 properly (by providing the correct ARN value)
I've initialised the DocumentDB
I updated the CoreDNS configmap and restarted coredns pods
I then ran the cloudformation yaml file for the creation of the worker node group. However, the workernode group doesn't join the cluster. I've double-checked the parameters that I feed to the cloudformation template. I've even tried to edit the authConfig manually after the worker node group has been created so that the worker nodes can join the cluster. But that doesn't work.
Since there are no worker nodes, the pods can't be scheduled and the cluster is non-usable. What can I do so that the worker node group joins the cluster?
Below debugging steps helped me resolve it.
Verify if SecurityGroups allow connectivity bw API server and worker nodes using "Run Reachability Analyzer" (here
Add required policies to IAM role eksAdminRoleForLambda-v5G-Core (here
AmazonEKSWorkerNodePolicy
AmazonEKSWorkerNodePolicy
AmazonEKS_CNI_Policy
AmazonEC2ContainerRegistryReadOnly
Run "TroubleshootEKSWorkerNode" (here

Can I add nodes running on my machine to AWS EKS cluster?

Well, I read the user guide of AWS EKS service. I created a managed node group for the EKS cluster successfully.
I don't know how to add the nodes running on my machine to the EKS cluster. I don't know whether EKS support. I didn't find any clue in its document. I read the 'self-managed node group' chapter, which supports add a self-managed EC2 instances and auto-scaling group to the EKS cluster rather than a private node running on other cloud instance like azure, google cloud or my machine.
Does EKS support? How to do that if supports?
This is not possible. It is (implicitly) called out in this page. All worker nodes need to be deployed in the same VPC where you deployed the control plane (not necessarily the same subnets though). EKS Anywhere (to be launched later this year) will allow you to deploy a complete EKS cluster (control plane + workers) outside of an AWS region (but it won't allow running the control plane in AWS and workers locally).
As far as I know, EKS service doesn't support adding self nodes to the cluster. But the 'EKS Anywhere' service does, which has not been online yet, but soon.

How to add remote vm instance as worker node in kubernetes cluster

I'm new to kubernetes and trying to explore the new things in it. So, my question is
Suppose I have existing kubernetes cluster with 1 master node and 1 worker node. Consider this setup is on AWS, now I have 1 more VM instance available on Oracle Cloud Platform and I want to configure that VM as worker node and attach that worker node to existing cluster.
So, is it possible to do so? Can anybody have any suggestions regarding this.
I would instead divide your clusters up based on region (unless you have a good VPN between your oracle and AWS infrastructure)
You can then run applications across clusters. If you absolutely must have one cluster that is geographically separated, I would create a master (etcd host) in each region that you have a worker node in.
Worker Node and Master Nodes communication is very critical for Kubernetes cluster. Adding nodes from on-prem to a cloud provider or from different cloud provider will make lots of issues from network perspective.
As VPN connection between AWS and Oracle Cloud needed and every time worker node has to cross ocean (probably) to reach master node.
EDIT: From Kubernetes Doc, Clusters cannot span clouds or regions (this functionality will require full federation support).
https://kubernetes.io/docs/setup/best-practices/multiple-zones/

GCP kubernetes cluster node error NetworkUnavailable

I am trying to set up kubernetes cluster on GCP with terraform.Terraform script has VPC (firewall, subnet default route)and kubernetes.
Randomly i am getting issue "NetworkUnavailable" inside cluster node, but same terraform script work fine in next run.
So there is no any issue with terraform. I dont know how to resolve this issue. If i running script 10 times then provision get fail 4 to 5 times.
Error waiting for creating GKE NodePool: All cluster resources were brought up, but the cluster API is reporting that: only 3 nodes out of 4 have registered; cluster may be unhealthy.
Please help me.
enter image description here
Thanks
Shrwan
This is a fairly common issue when using terraform to create GKE clusters. If you create the cluster manually through the GKE API, you won't have the same error.
Note that when creating a GKE cluster, you only need to create the cluster. It is not necessary to create firewall rules or routes as the GKE API will create them during cluster creation.
Most of the time, this error message means that the nodes are unable to communicate with the master node, this is usually linked to an issue with the network config.
If you are creating a zonal cluster, you might have this issue. I'll add this 3rd one as well, which has a 3rd root cause for the same issue.

How can Kubernete auto scale nodes?

I am using kubernete to manage docker cluster. Right now, I can set up POD autoscale using Horizontal Pod Scaler, that is fine.
And now I think the next step is to autoscale nodes. I think for HPA, the auto-created pod is only started in the already created nodes, but if all the available nodes are utilized and no available resource for any more pods, I think the next step is to automatically create node and have node join the k8s master.
I googled a lot and there are very limited resources to introduce this topic.
Can anyone please point me to any resource how to implement this requirement.
Thanks
One way to do using AWS and setting up your own Kubernetes cluster is by following these steps :
Create an Instance greater than t2.micro (will be master node).
Initialize the Kubernetes cluster using some tools like Kubeadm. After the initialisation would be completed you would get a join command, which needs to e run on all the nodes who want to join the cluster. (Here is the link)
Now create an Autoscaling Group on AWS with start/boot script containing that join command.
Now whenever the utilisation specified by you in autoscaling group is breached the scaling would happen and the node(s) would automatically join the Kubernetes cluster. This would allow the Kubernetes to schedule pods on the newly joined nodes based on the HPA.
(I would suggest to use Flannel as pod network as it automatically removes the node from Kubernetes cluster when it is not available)
kubernetes operations (kops) helps you create, destroy, upgrade and maintain production-grade, highly available, Kubernetes clusters from the command line.
Features:
Automates the provisioning of Kubernetes clusters in AWS and GCE
Deploys Highly Available (HA) Kubernetes Masters
Most of the managed kubernetes service providers provide auto scaling feature of the nodes
Elastic Kubernetes Service EKS- configure cluster auto scalar
Google Kubernetes Engine
GKE Auto Scalar
Auto scaling feature needs to be supported by the underlying cloud provider. Google cloud supports auto scaling during cluster creation or update by passing flags --enable-autoscaling --min-nodes and --max-nodes to the corresponding gcloud commands.
Examples:
gcloud container clusters create mytestcluster --zone=us-central1-b --enable-autoscaling --min-nodes=3 --max-nodes=10 --num-nodes=5
gcloud container clusters update mytestcluster --enable-autoscaling --min-nodes=1 --max-nodes=15
below link would be helpful
https://medium.com/kubecost/understanding-kubernetes-cluster-autoscaling-675099a1db92