I am trying to set up kubernetes cluster on GCP with terraform.Terraform script has VPC (firewall, subnet default route)and kubernetes.
Randomly i am getting issue "NetworkUnavailable" inside cluster node, but same terraform script work fine in next run.
So there is no any issue with terraform. I dont know how to resolve this issue. If i running script 10 times then provision get fail 4 to 5 times.
Error waiting for creating GKE NodePool: All cluster resources were brought up, but the cluster API is reporting that: only 3 nodes out of 4 have registered; cluster may be unhealthy.
Please help me.
enter image description here
Thanks
Shrwan
This is a fairly common issue when using terraform to create GKE clusters. If you create the cluster manually through the GKE API, you won't have the same error.
Note that when creating a GKE cluster, you only need to create the cluster. It is not necessary to create firewall rules or routes as the GKE API will create them during cluster creation.
Most of the time, this error message means that the nodes are unable to communicate with the master node, this is usually linked to an issue with the network config.
If you are creating a zonal cluster, you might have this issue. I'll add this 3rd one as well, which has a 3rd root cause for the same issue.
Related
Well, I read the user guide of AWS EKS service. I created a managed node group for the EKS cluster successfully.
I don't know how to add the nodes running on my machine to the EKS cluster. I don't know whether EKS support. I didn't find any clue in its document. I read the 'self-managed node group' chapter, which supports add a self-managed EC2 instances and auto-scaling group to the EKS cluster rather than a private node running on other cloud instance like azure, google cloud or my machine.
Does EKS support? How to do that if supports?
This is not possible. It is (implicitly) called out in this page. All worker nodes need to be deployed in the same VPC where you deployed the control plane (not necessarily the same subnets though). EKS Anywhere (to be launched later this year) will allow you to deploy a complete EKS cluster (control plane + workers) outside of an AWS region (but it won't allow running the control plane in AWS and workers locally).
As far as I know, EKS service doesn't support adding self nodes to the cluster. But the 'EKS Anywhere' service does, which has not been online yet, but soon.
I am trying to install a Kubernetes cluster with one master node and two worker nodes.
I acquired 3 VMs for this purpose running on Ubuntu 21.10. In the master node, I installed kubeadm:1.21.4, kubectl:1.21.4, kubelet:1.21.4 and docker-ce:20.4.
I followed this guide to install the cluster. The only difference was in my init command where I did not mention the --control-plane-endpoint. I used calico CNI v3.19.1 and docker for CRI Runtime.
After I installed the cluster, I deployed minio pod and exposed it as a NodePort.
The pod got deployed in the worker node (10.72.12.52) and my master node IP is 10.72.12.51).
For the first two hours, I am able to access the login page via all three IPs (10.72.12.51:30981, 10.72.12.52:30981, 10.72.13.53:30981). However, after two hours, I lost access to the service via 10.72.12.51:30981 and 10.72.13.53:30981. Now I am only able to access the service from the node on which it is running (10.72.12.52).
I have disabled the firewall and added calico.conf file inside /etc/NetworkManager/conf.d with the following content:
[keyfile]
unmanaged-devices=interface-name:cali*;interface-name:tunl*;interface-name:vxlan.calico
What am I missing in the setup that might cause this issue?
This is a community wiki answer posted for better visibility. Feel free to expand it.
As mentioned by #AbhinavSharma the problem was solved by switching from Calico to Flannel CNI.
More information regarding Flannel itself can be found here.
When I use the command kubectl get nodes. I got list of nodes with ROLES . Are there any way I can find out which nodes are masters?
Use this command for this purpose.
kubectl get node --selector='node-role.kubernetes.io/master'
In EKS, according to the AWS Documentation:
The control plane runs in an account managed by AWS, and the Kubernetes API is exposed via the Amazon EKS endpoint associated with your cluster.
As mentioned in my comment above, you don't have access to the master node in an EKS cluster, as it is managed by AWS.
The idea behind it is to "make your life easier" and make you worry only about the loads that will run on the worker nodes.
There is also this documentation page, that may help in the understanding of EKS.
When using eksctl to create Kubernetes cluster using AWS EKS, the process get stuck waiting for the nodes to join the cluster:
nodegroup "my-cluster" has 0 node(s)
waiting for at least 3 node(s) to become ready in “my-cluster”
timed out (after 25m0s) waiting for at least 3 nodes to join the cluster and become ready in "my-cluster"
The message is displayed, without any additional logs, until the process eventually times out. It looks like behind the scenes, the newly created nodes are unable to communicate with the Kubernetes cluster
When using an existing VPC network, you have to make sure that the VPC conforms with all EKS-specific requirements [1, 2]. The blog post by logz.io provides detailed guidance on setting up a VPC network, as well as an example AWS Cloud Formation template that you can use as the basis [3].
Missing IAM Policies
The AmazonEKSWorkerNodePolicy and AmazonEKS_CNI_Policy policies [4] are required by the EKS worker nodes to be able to communicate with the cluster.
By default, eksctl automatically generates a role containing these policies. However, when you use “attachPolicyARNs” property to attach specific policies by ARN, you have to include these policies explicitly [5]:
nodeGroups:
- name: my-special-nodegroup
iam:
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/ElasticLoadBalancingFullAccess
[1] https://docs.aws.amazon.com/eks/latest/userguide/create-public-private-vpc.html
[2] https://eksctl.io/usage/vpc-networking
[3] https://logz.io/blog/amazon-eks
[4] https://docs.aws.amazon.com/eks/latest/userguide/worker_node_IAM_role.html
5] https://eksctl.io/usage/iam-policies/
I face the same issue and found the real cause of this issue.
The issue is with the created VPC. I have also created the VPC and subnets but I have not created the route table and internet gate which was causing the issue.
Issue got resolved,once mapped the route table and internet gateway.
I have a cluster with several workloads and different configurations on GCP's Kubernetes Engine.
I want to create a clone of this existing cluster along with cloning all the workloads in it. It turns out, you can clone a cluster but not the workloads.
So, at this point, I'm copying the deployment yaml's of the workloads from the cluster which is working fine, and using them for the newly created workload's in the newly created cluster.
When I'm deploying the pods of this newly created workload, the pods are stuck in the pending state.
In the logs of the container, I can see that the error has something to do with Redis.
The error it shows is, Error: Redis connection to 127.0.0.1:6379 failed - connect ECONNREFUSED 127.0.0.1:6379 at TCPConnectWrap.afterConnect [as oncomplete].
Also, when I'm connected with the first cluster and run the command,
kubectl get secrets -n=development, it shows me a bunch of secrets which are supposed to be used by my workload.
However, when I'm connected with the second cluster and run the above kubectl command, I just see one service related secret.
My question is how do I make my workload of the newly created cluster to use the configurations of the already existing cluster.
I think there are few things that can be done here:
Try to use kubectl config command and set the same context for both of your clusters.
You can find more info here and here
You may also try to use Kubernetes Cluster Federation. But bear in mind that it is still in alpha.
Remember that keeping your config in a version control system is generally a very good idea. You want to store it before the cluster applies defaults while exporting.
Please let me know if that helped.