Can not access Cluster Ip with same Node - kubernetes

I'm Deploying a k8s cluster on CentOS with version of v1.5.1
Having three nodes:
kube-01(master)
kube-02
kube-03
Having a deployment with one pod, named Deployment-A with a pod ip Pod-A-IP, deployed on kube-03
Having a deployment with two pods, named Deployment-B. Each work node has one pod. We call Pod-B-02 on kube-02 and Pod-B-03 on kube-03
Exposing Deployment-A using type NodePort, I have a Cluster IP Service-A-IP
Pod-B-02 access Service-A-IP, OK
Pod-B-03 access Service-A-IP, time out
kube-02 access Service-A-IP, OK
kube-03 access Service-A-IP, OK
It seems that accessing a Service from pods in same Node with the Service's backend will encounter this problem.
Updated on Mon Feb 20 16:22:47 CST 2017
I have captured the network traffic on Pod-B-03
10.244.1.10 is the pod ip of Pod-B-03
10.107.25.245 is Service-A-IP
10.244.1.2 is Pod-A-IP
I'm using flannel. I suggested there is something wrong with flannel?

The problem you describe is something that I had in the past if I remember correct...but I had many network problems with many different error sources. If it is really the same issue, then probably setting net.bridge.bridge-nf-call-iptables and net.bridge.bridge-nf-call-ip6tables to 1 may help. You could try this on all hosts first:
sysctl -w net.bridge.bridge-nf-call-iptables=1
sysctl -w net.bridge.bridge-nf-call-ip6tables=1
Then check the service networking again without rebooting your machine. If this helps, persist the change into /etc/sysctl.conf or /etc/sysctl.d/
Kubernetes network problems tend to have uncountable sources of errors, making it really hard to debug this without enough information. It would be good if you could provide some additional information about how you did setup the cluster (kube-up, kargo, kops, kubeadm, ...), which cloud you use (or bare-metal?) and which network solution you chose (weave, calico, cloud provider based, ...)
It may also help to see the output of iptables -L -t nat and the kube-proxy logs as most service related problems can be debugged with this information.
EDIT I've just found the Kubernetes issue where I had this solution from: https://github.com/kubernetes/kubernetes/issues/33798

Related

k8s ClusterIP:Port accessible only within the node of running the pod

I create 3 ubuntu VMs in AWS and use kubeadm to set up the cluster in the master nodes and open port 6443. and apply flannel network via below command:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
And the I join the other two nodes to the cluster via command join:
kubeadm join 172.31.5.223:6443
The I apply below two yaml to deploy my deployment and svc
Here comes the issue. I output all resources in k8s master:
I can only access clusterip:port inside the node/ip-172-31-36-90 as it has pod running.
Which results:
I can only access /:NodePorts using IP of node/ip-172-31-36-90 as it has pod running.
I can use curl <externalip/internal of the node/ip-172-31-36-90>:nodeport in other nodes. But this IP can only be ip-172-31-36-90.
If I try above two using IP of master node or node/ip-172-31-41-66, it will get a timeout issue. Notice: Nodeport 30000 are open on all nodes via aws security group.
Anyone can help me with this network issue? I am really bad at debug network stuff.
2.Second question, If I try curl <externalip/internal of the node/ip-172-31-36-90>:nodeport in my local machine, it gives error :
curl: (56) Recv failure: Connection reset by peer
It really bothers me. k8s expert please save me!!!
----------------Update---------------------------
After days of debugging, I notice it is related to IPs of docker0 and flannel.1
, they are not in the same subnet:
But I still don't where I have I done wrong and how to sync them. Any export here, please!

How to reconfigure the IP of a k8s node

I created a k8s installed by k0s on the aws ec2 instance. In order to make delivery new cluster faster, I try to make an AMI for it.
However, I started a new ec2 instance, the internal IP changed and the node become NotReady
ubuntu#ip-172-31-26-46:~$ k get node
NAME STATUS ROLES AGE VERSION
ip-172-31-18-145 NotReady <none> 95m v1.21.1-k0s1
ubuntu#ip-172-31-26-46:~$
Would it be possible to reconfigure it ?
Work around
I found a work around to make the AWS AMI working
Short answer
install node with kubelet's --extra-args
update the kube-api to the new IP and restart the kubelet
Details :: 1
In the kubernete cluster, the kubelet plays the node agent node. It will tell kube-api "Hey, I am here and my name is XXX".
The name of a node is its hostname and could not be changed after created. It could be set by --hostname-override.
If you don't change the node name, the kube-api will try to use the hostname then got errors caused by old-node-name not found.
Details :: 2
To k0s, it put kubelet' KUBECONFIG in the /var/lib/k0s/kubelet.conf, there was a kubelet api server location
server: https://172.31.18.9:6443
In order to connect a new kube-api location, please update it
Did you check the kubelet logs? Most likely it's a problem with certificates. You cannot just make an existing node into ami and hope it will work since certificates are signed for specific IP.
Check out the awslabs/amazon-eks-ami repo on github. You can check out how aws does its k8s ami.
There is a files/bootstrap.sh file in repo that is run to bootstrap an instance. It does all sort of things that are instance specific which includes getting certificates.
If you want to "make delivery new cluster faster", I'd recommend to create an ami with all dependencies but without an actual k8s boostraping. Install the k8s (or k0s in your case) after you start the instance from ami, not before. (Or figure out how to regenerate certs and configs that are node specific.)

Kubectl connection refused existing cluster

Hope someone can help me.
To describe the situation in short, I have a self managed k8s cluster, running on 3 machines (1 master, 2 worker nodes). In order to make it HA, I attempted to add a second master to the cluster.
After some failed attempts, I found out that I needed to add controlPlaneEndpoint configuration to kubeadm-config config map. So I did, with masternodeHostname:6443.
I generated the certificate and join command for the second master, and after running it on the second master machine, it failed with
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available
Checking the first master now, I get connection refused for the IP on port 6443. So I cannot run any kubectl commands.
Tried recreating the .kube folder, with all the config copied there, no luck.
Restarted kubelet, docker.
The containers running on the cluster seem ok, but I am locked out of any cluster configuration (dashboard is down, kubectl commands not working).
Is there any way I make it work again? Not losing any of the configuration or the deployments already present?
Thanks! Sorry if it’s a noob question.
Cluster information:
Kubernetes version: 1.15.3
Cloud being used: (put bare-metal if not on a public cloud) bare-metal
Installation method: kubeadm
Host OS: RHEL 7
CNI and version: weave 0.3.0
CRI and version: containerd 1.2.6
This is an old, known problem with Kubernetes 1.15 [1,2].
It is caused by short etcd timeout period. As far as I'm aware it is a hard coded value in source, and cannot be changed (feature request to make it configurable is open for version 1.22).
Your best bet would be to upgrade to a newer version, and recreate your cluster.

Failed to create pod sandbox kubernetes cluster

I have an weave network plugin.
inside my folder /etc/cni/net.d there is a 10-weave.conf
{
"name": "weave",
"type": "weave-net",
"hairpinMode": true
}
My weave pods are running and the dns pod is also running
But when i want to run a pod like a simple nginx wich will pull an nginx image
The pod stuck at container creating , describe pod gives me the error , failed create pod sandbox.
When i run journalctl -u kubelet i get this error
cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
is my network plugin not good configured ?
i used this command to configure my weave network
kubectl apply -f https://git.io/weave-kube-1.6
After this won't work i also tried this command
kubectl apply -f “https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d ‘\n’)”
I even tried flannel and that gives me the same error.
The system i am setting kubernetes on is a raspberry pi.
I am trying to build a raspberry pi cluster with 3 nodes and 1 master with kubernetes
Dose anyone have ideas on this?
Thank you all for responding to my question. I solved my problem now. For anyone who has come to my question in the future the solution was as followed.
I cloned my raspberry pi images because i wanted a basicConfig.img for when i needed to add a new node to my cluster of when one gets down.
Weave network (the plugin i used) got confused because on every node and master the os had the same machine-id. When i deleted the machine id and created a new one (and reboot the nodes) my error got fixed. The commands to do this was
sudo rm /etc/machine-id
sudo rm /var/lib/dbus/machine-id
sudo dbus-uuidgen --ensure=/etc/machine-id
Once again my patience was being tested. Because my kubernetes setup was normal and my raspberry pi os was normal. I founded this with the help of someone in the kubernetes community. This again shows us how important and great are IT community is. To the people of the future who will come to this question. I hope this solution will fix your error and will decrease the amount of time you will be searching after a stupid small thing.
Looking at the pertinent code in Kubernetes and in CNI, the specific error you see seems to indicate that it cannot find any files ending in .json, .conf or .conflist in the directory given.
This makes me think it could be something as the conf file not being present on all the hosts, so I would verify that as a first step.

worker added but not present in kubectl get nodes

I'm setting up a 2-node Kubernetes system, following the Docker Multi-Node instructions.
My problem is that kubectl get nodes only shows the master, not the worker node as well.
The setup appears to have worked, with all the expected containers running (as far as I know)
I've confirmed that networking works via flannel.
The subnet of the work node appears in the master's subnet list.
So everything looks good, except the node isn't showing up.
My questions:
Am I right in thinking the worker node should now be visible from 'get nodes'?
Does it matter whether the MASTER_IP used to do the setup was the master node's public IP address, or the docker IP? (I've tried both..)
Where do I start with debugging this?
Any pointers gratefully accepted...
Versions:
Ubuntu Trusty 14.04 LTS on both master and worker
Kubernetes v1.1.4
hyperkube:v1.0.3
Answering my own #cloudplatform question...
It turned out to be a problem in worker.sh in Kubernetes v1.1.4.
kubectl is called with "--hostname-override=$(hostname -i)"
On this machine, that returns the IPv6 address.
The K8s code is trying to turn that into a DNS name, and fails.
So looking at the log file for the kubectl container, we see this:
I0122 15:57:33.891577 1786 kubelet.go:1942] Recording NodeReady event message for node 2001:41c9:1:41f::131
I0122 15:57:33.891599 1786 kubelet.go:790] Attempting to register node 2001:41c9:1:41f::131
I0122 15:57:33.894076 1786 kubelet.go:793] Unable to register 2001:41c9:1:41f::131 with the apiserver: Node "2001:41c9:1:41f::131" is invalid: [metadata.name: invalid value '2001:41c9:1:41f::131': must be a DNS subdomain (at most 253 characters, matching regex [a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*): e.g. "example.com", metadata.labels: invalid value '2001:41c9:1:41f::131': must have at most 63 characters, matching regex (([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?: e.g. "MyValue" or ""]
So that's my problem. Take that out and it all works well.
So in answer to my 3 questions:
Yes, the worker node should be visible immediately in 'get nodes'.
I don't think it matters for getting it to work; it may matter for security reasons.
First step after checking that the basic networking is right and the containers are running: look at the log file for the new node's kubectl container.
Update: I wrote this blog post to explain how I got it working http://blog.willmer.org/2016/11/kubernetes-bytemark/