kuberentes communication is not working cross nodes - kubernetes

I have a procedure of installing kubernetes cluster via kubeadm and it worked multiple times.
for some reason now I have a cluster which I installed and for some reason the nodes are having trouble communicating.
the problem reflect in couple of ways :
sometimes the cluster is unable to resolve global dns records such as mirrorlist.centos.org
sometimes one pod from a specific node has no connectivity to another pod in different node
my kubernetes version is 1.9.2
my hosts are centOS 7.4
I use flannel as cni plugin in version 0.9.1
my cluster is built on AWS
mt debugging so far was :
kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}' - to see subnets
10.244.0.0/24 10.244.1.0/24
I tried adding configurations to kubedns ( even though it is needed in all my other clusters ) like https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/#configure-stub-domain-and-upstream-dns-servers
I tried installing busybox and ding nslookup to cluster kubernetes.default and it only works of busybox is on the same node as the dns ( tried this link https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/
I even tried creating an AMI from other running environments and deploying it as a node to this cluster and it still fails.
I tried checking if some port is missing so I even opened all ports between nodes
I also disabled iptables and firewall and all nodes just to make sure it is not the reason
nothing helps.
please any tip would help
edit :
I added my flannel configuration:
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: flannel
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: flannel
namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-system
labels:
tier: node
app: flannel
data:
cni-conf.json: |
{
"name": "cbr0",
"type": "flannel",
"delegate": {
"isDefaultGateway": true
}
}
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: kube-flannel-ds
namespace: kube-system
labels:
tier: node
app: flannel
spec:
template:
metadata:
labels:
tier: node
app: flannel
spec:
hostNetwork: true
nodeSelector:
beta.kubernetes.io/arch: amd64
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni
image: quay.io/coreos/flannel:v0.9.1-amd64
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conf
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: quay.io/coreos/flannel:v0.9.1-amd64
command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr" ]
securityContext:
privileged: true
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run
- name: flannel-cfg
mountPath: /etc/kube-flannel/
volumes:
- name: run
hostPath:
path: /run
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg

the issues was that the AWS machines were provisioned not by me and the team that provisioned the machines assured that all internal traffic is opened.
after a lot of debugging with nmap I found out that UDP ports are not opened and since flannel requires UDP traffic the communication was not working properly.
once UDP was opened issues got solved.

Related

ErrImageNeverPull: Container image "myjenkins:latest" is not present with pull policy of Never

I'm going through a tutorial on running jenkins on your kubernetes cluster. In the tutorial they're using minikube and for my existing cluster it's running on eks. When I apply my jenkins.yaml file, the pod it creates gets this error
Normal Scheduled 27m default-scheduler Successfully assigned default/jenkins-799666d8db-ft642 to ip-192-168-84-126.us-west-2.compute.internal
Warning Failed 24m (x12 over 27m) kubelet Error: ErrImageNeverPull
Warning ErrImageNeverPull 114s (x116 over 27m) kubelet Container image "myjenkins:latest" is not present with pull policy of Never
This was from describing the pod ^
Here's my jenkins.yaml file that I'm using to try to run jenkins on my cluster
apiVersion: v1
kind: ServiceAccount
metadata:
name: jenkins
namespace: default
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: jenkins
namespace: default
rules:
- apiGroups: [""]
resources: ["pods","services"]
verbs: ["create","delete","get","list","patch","update","watch"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["create","delete","get","list","patch","update","watch"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create","delete","get","list","patch","update","watch"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get","list","watch"]
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["create","delete","get","list","patch","update","watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: jenkins
namespace: default
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: jenkins
subjects:
- kind: ServiceAccount
name: jenkins
---
# Allows jenkins to create persistent volumes
# This cluster role binding allows anyone in the "manager" group to read secrets in any namespace.
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: jenkins-crb
subjects:
- kind: ServiceAccount
namespace: default
name: jenkins
roleRef:
kind: ClusterRole
name: jenkinsclusterrole
apiGroup: rbac.authorization.k8s.io
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
# "namespace" omitted since ClusterRoles are not namespaced
name: jenkinsclusterrole
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["create","delete","get","list","patch","update","watch"]
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: jenkins
namespace: default
spec:
selector:
matchLabels:
app: jenkins
replicas: 1
template:
metadata:
labels:
app: jenkins
spec:
containers:
- name: jenkins
image: myjenkins:latest
env:
- name: JAVA_OPTS
value: -Djenkins.install.runSetupWizard=false
ports:
- name: http-port
containerPort: 8080
- name: jnlp-port
containerPort: 50000
volumeMounts:
- name: jenkins-home
mountPath: /var/jenkins_home
- name: docker-sock-volume
mountPath: "/var/run/docker.sock"
imagePullPolicy: Never
volumes:
# This allows jenkins to use the docker daemon on the host, for running builds
# see https://stackoverflow.com/questions/27879713/is-it-ok-to-run-docker-from-inside-docker
- name: docker-sock-volume
hostPath:
path: /var/run/docker.sock
- name: jenkins-home
hostPath:
path: /mnt/jenkins-store
serviceAccountName: jenkins
---
apiVersion: v1
kind: Service
metadata:
name: jenkins
namespace: default
spec:
type: NodePort
ports:
- name: ui
port: 8080
targetPort: 8080
nodePort: 31000
- name: jnlp
port: 50000
targetPort: 50000
selector:
app: jenkins
Edit:
So far I tried removing imagePullPolicy: Never and tried it again and got a different error
Warning Failed 17s (x2 over 32s) kubelet Failed to pull image "myjenkins:latest": rpc error: code = Unknown desc = Error response from daemon: pull access denied for myjenkins, repository does not exist or may re
quire 'docker login': denied: requested access to the resource is denied
I tried running docker login and logging in and I'm still getting this same error ^. I tried changing imagePullPolicy: Never to Always and received the same error
After changing the image to jenkins/jenkins:lts it's still crashing and when I describe, this is what it says
Normal Scheduled 4m37s default-scheduler Successfully assigned default/jenkins-776574886b-x2l8p to ip-192-168-77-17.us-west-2.compute.internal
Normal Pulled 4m26s kubelet Successfully pulled image "jenkins/jenkins:lts" in 11.07948886s
Normal Pulled 4m22s kubelet Successfully pulled image "jenkins/jenkins:lts" in 908.246481ms
Normal Pulled 4m7s kubelet Successfully pulled image "jenkins/jenkins:lts" in 885.936781ms
Normal Created 3m39s (x4 over 4m23s) kubelet Created container jenkins
Normal Started 3m39s (x4 over 4m23s) kubelet Started container jenkins
Normal Pulled 3m39s kubelet Successfully pulled image "jenkins/jenkins:lts" in 895.651242ms
Warning BackOff 3m3s (x8 over 4m20s) kubelet Back-off restarting failed container
When I try to run "kubectl logs" on that pod I even get an error for that, which I've never received before when getting logs
touch: cannot touch '/var/jenkins_home/copy_reference_file.log': Permission denied
Can not write to /var/jenkins_home/copy_reference_file.log. Wrong volume permissions?
Also had to change my volumemount for jenkins to this and it worked!
I found another resource online saying to change my jenkins volume mount to this to fix the permissions issue and my container works now
`
volumeMounts:
- mountPath: /var
name: jenkins-volume
subPath: jenkins_home`
As you already did, removing imagePullPolicy: Never would solve your first problem. Your second problem comes from the fact that you are trying to pull an image called myjenkins:latest, which doesn't exist. What you most likely want is this image.
Change
image: myjenkins:latest
to
image: jenkins/jenkins:lts

The connection to the server x.x.x.x:6443 was refused - did you specify the right host or port? Kubernetes

Here is the scenario -
I have deployed kubernetes cluster and inside that I have created an application( say test) with "kubectl:1.20" image. I have created required clusterrole, rolebindings and sa for this test application to have the authorization for managing deployments.
Now my requirement is that i should be able to fetch and patch the deployments which are running in the kubernetes cluster from inside the test pod as part of a cronjob.
When i am running the cronjob, i am getting the below mentioned error in the test pod
"The connection to the server x.x.x.x:6443 was refused - did you specify the right host or port?"
I have passed the kubeconfig file of the cluster as configmap to the test pod.
below is the cronjob yml file -
kind: CronJob
apiVersion: batch/v1beta1
metadata:
name: test
spec:
schedule: "*/2 * * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: local-kubectl-scale-deploy
containers:
- name: mycron-container
image: bitnami/kubectl:1.20
imagePullPolicy: IfNotPresent
env:
- name: KUBECONFIG
value: "/tmp/kubeconfig"
command: [ "/bin/sh" ]
args: [ "/var/httpd-init/script", "namespace" ]
tty: true
volumeMounts:
- name: script
mountPath: "/var/httpd-init/"
- name: kubeconfig
mountPath: "/tmp/"
volumes:
- name: script
configMap:
name: cronscript
defaultMode: 0777
- name: kubeconfig
configMap:
name: kubeconfig
restartPolicy: OnFailure
terminationGracePeriodSeconds: 0
concurrencyPolicy: Replace
below is the kubeconfig file which i am passing as configmap inside the pod.
kind: ConfigMap
apiVersion: v1
metadata:
name: kubeconfig
labels:
app: kubeconfig
data:
kubeconfig: |
apiVersion: v1
kind: Config
clusters:
- name: default-cluster
cluster:
certificate-authority-data: xxxxxxxxxxx
server: https://x.x.x.x:6443
contexts:
- name: default-context
context:
cluster: default-cluster
namespace: default
user: default-user
current-context: default-context
users:
- name: default-user
user:
token: xxxxxx
Am I missing some configuration from the pod to communicate to this x.x.x.x:6443 IP?
To diagnose the communication error to the address we would need a lot more detail from your network configuration, but instead we can fix the cronjob.
The proper way of achieving that is to set a service account with prober RBAC similar to https://stackoverflow.com/a/58378834/3930971

Kubernetes metrics server API

I am running a Kuberentes cluster in dev environment. I executed deployment files for metrics server, my pod is up and running without any error message. See the output here:
root#master:~/pre-release# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
metrics-server-568697d856-9jshp 1/1 Running 0 10m 10.244.1.5 worker-1 <none> <none>
Next when I am checking API service status, it shows up as below
Name: v1beta1.metrics.k8s.io
Namespace:
Labels: k8s-app=metrics-server
Annotations: <none>
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2021-03-29T17:32:16Z
Resource Version: 39213
UID: 201f685d-9ef5-4f0a-9749-8004d4d529f4
Spec:
Group: metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: metrics-server
Namespace: pre-release
Port: 443
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2021-03-29T17:32:16Z
Message: failing or missing response from https://10.105.171.253:443/apis/metrics.k8s.io/v1beta1: Get "https://10.105.171.253:443/apis/metrics.k8s.io/v1beta1": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Reason: FailedDiscoveryCheck
Status: False
Type: Available
Events: <none>
Here the metric server deployment code
containers:
- args:
- --cert-dir=/tmp
- --secure-port=443
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
- --kubelet-use-node-status-port
image: k8s.gcr.io/metrics-server/metrics-server:v0.4.2
Here the complete code
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: pre-release
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- nodes/stats
- namespaces
- configmaps
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: pre-release
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: pre-release
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: pre-release
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: pre-release
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: pre-release
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: pre-release
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=443
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
- --kubelet-use-node-status-port
image: k8s.gcr.io/metrics-server/metrics-server:v0.4.2
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
periodSeconds: 10
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: pre-release
version: v1beta1
versionPriority: 100
latest error
I0330 09:02:31.705767 1 secure_serving.go:116] Serving securely on [::]:4443
E0330 09:04:01.718135 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:worker-2: unable to fetch metrics from Kubelet worker-2 (worker-2): Get https://worker-2:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup worker-2 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:master: unable to fetch metrics from Kubelet master (master): Get https://master:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup master on 10.96.0.10:53: read udp 10.244.2.23:41419->10.96.0.10:53: i/o timeout, unable to fully scrape metrics from source kubelet_summary:worker-1: unable to fetch metrics from Kubelet worker-1 (worker-1): Get https://worker-1:10250/stats/summary?only_cpu_and_memory=true: dial tcp: i/o timeout]
Could someone please help me to fix the issue.
Following container arguments work for me in our development cluster
containers:
- args:
- /metrics-server
- --cert-dir=/tmp
- --secure-port=443
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
Result of kubectl describe apiservice v1beta1.metrics.k8s.io:
Status:
Conditions:
Last Transition Time: 2021-03-29T19:19:20Z
Message: all checks passed
Reason: Passed
Status: True
Type: Available
Give it a try.
As I mentioned in the comment section, this may be fixed by adding hostNetwork:true to the metrics-server Deployment.
According to kubernetes documentation:
HostNetwork - Controls whether the pod may use the node network namespace. Doing so gives the pod access to the loopback device, services listening on localhost, and could be used to snoop on network activity of other pods on the same node.
spec:
hostNetwork: true <---
containers:
- args:
- /metrics-server
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP
- --kubelet-use-node-status-port
- --kubelet-insecure-tls
There is an example with information how can you edit your deployment to include that hostNetwork:true in your metrics-server deployment.
Also related github issue.

how to deploy a statefulset when a pod security policy is in place in kubernetes

I am trying to play around with PodSecurityPolicies in kubernetes so pods can't be created if they are using the root user.
This is my psp definition:
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: eks.restrictive
spec:
hostNetwork: false
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
runAsUser:
rule: MustRunAsNonRoot
fsGroup:
rule: RunAsAny
volumes:
- '*'
and this is my statefulset definition
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx # has to match .spec.template.metadata.labels
serviceName: "nginx"
replicas: 3 # by default is 1
template:
metadata:
labels:
app: nginx # has to match .spec.selector.matchLabels
spec:
securityContext:
#only takes integers.
runAsUser: 1000
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "my-storage-class"
resources:
requests:
storage: 1Gi
When trying to create this statefulset I get
create Pod web-0 in StatefulSet web failed error: pods "web-0" is forbidden: unable to validate against any pod security policy:
It doesn't specify what policy am I violating, and since I am specifying I want to run this on user 1000, I am not running this as root (Hence my understanding is that this statefulset pod definition is not violating any rules defined in the PSP). There is no USER specified in the Dockerfile used for this image.
The other weird part, is that this works fine for standard pods (kind: Pod, instead of kind:Statefulset), for example, this works just fine, when the same PSP exists:
apiVersion: v1
kind: Pod
metadata:
name: my-nodejs
spec:
securityContext:
runAsUser: 1000
containers:
- name: my-node
image: node
ports:
- name: web
containerPort: 80
protocol: TCP
command:
- /bin/sh
- -c
- |
npm install http-server-g
npx http-server
What am I missing / doing wrong?
You seems to have forgotten about binding this psp to a service account.
You need to apply the following:
cat << EOF | kubectl apply -f-
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: psp-role
rules:
- apiGroups: ['policy']
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames:
- eks.restrictive
EOF
cat << EOF | kubectl apply -f-
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: psp-role-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: psp-role
subjects:
- kind: ServiceAccount
name: default
namespace: default
EOF
If you dont want to use the default account you can create a separate service account and bind the role to it.
Read more about it k8s documentation - pod-security-policy.

calico/node is not ready: BIRD is not ready: BGP not established

I'm running Kubernetes 1.13.2, setup using kubeadm and struggling with getting calico 3.5 up and running. The cluster is run on top of KVM.
Setup:
kubeadm init --apiserver-advertise-address=10.255.253.20 --pod-network-cidr=192.168.0.0/16
modified calico.yaml file to include:
- name: IP_AUTODETECTION_METHOD
value: "interface=ens.*"
applied rbac.yaml, etcd.yaml, calico.yaml
Output from kubectl describe pods:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 23m default-scheduler Successfully assigned kube-system/calico-node-hjwrc to k8s-master-01
Normal Pulling 23m kubelet, k8s-master-01 pulling image "quay.io/calico/cni:v3.5.0"
Normal Pulled 23m kubelet, k8s-master-01 Successfully pulled image "quay.io/calico/cni:v3.5.0"
Normal Created 23m kubelet, k8s-master-01 Created container
Normal Started 23m kubelet, k8s-master-01 Started container
Normal Pulling 23m kubelet, k8s-master-01 pulling image "quay.io/calico/node:v3.5.0"
Normal Pulled 23m kubelet, k8s-master-01 Successfully pulled image "quay.io/calico/node:v3.5.0"
Warning Unhealthy 23m kubelet, k8s-master-01 Readiness probe failed: calico/node is not ready: felix is not ready: Get http://localhost:9099/readiness: dial tcp [::1]:9099: connect: connection refused
Warning Unhealthy 23m kubelet, k8s-master-01 Liveness probe failed: Get http://localhost:9099/liveness: dial tcp [::1]:9099: connect: connection refused
Normal Created 23m (x2 over 23m) kubelet, k8s-master-01 Created container
Normal Started 23m (x2 over 23m) kubelet, k8s-master-01 Started container
Normal Pulled 23m kubelet, k8s-master-01 Container image "quay.io/calico/node:v3.5.0" already present on machine
Warning Unhealthy 3m32s (x23 over 7m12s) kubelet, k8s-master-01 Readiness probe failed: calico/node is not ready: BIRD is not ready: BGP not established with 10.255.253.22
Output from calicoctl node status:
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+----------+---------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+---------+
| 10.255.253.22 | node-to-node mesh | start | 16:24:44 | Passive |
+---------------+-------------------+-------+----------+---------+
IPv6 BGP status
No IPv6 peers found.
Output from ETCD_ENDPOINTS=http://localhost:6666 calicoctl get nodes -o yaml:
apiVersion: projectcalico.org/v3
items:
- apiVersion: projectcalico.org/v3
kind: Node
metadata:
annotations:
projectcalico.org/kube-labels: '{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/hostname":"k8s-master-01","node-role.kubernetes.io/master":""}'
creationTimestamp: 2019-01-31T16:08:56Z
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/hostname: k8s-master-01
node-role.kubernetes.io/master: ""
name: k8s-master-01
resourceVersion: "28"
uid: 82fee4dc-2572-11e9-8ab7-5254002c725d
spec:
bgp:
ipv4Address: 10.255.253.20/24
ipv4IPIPTunnelAddr: 192.168.151.128
orchRefs:
- nodeName: k8s-master-01
orchestrator: k8s
- apiVersion: projectcalico.org/v3
kind: Node
metadata:
annotations:
projectcalico.org/kube-labels: '{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/hostname":"k8s-worker-01"}'
creationTimestamp: 2019-01-31T16:24:44Z
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/hostname: k8s-worker-01
name: k8s-worker-01
resourceVersion: "170"
uid: b7c2c5a6-2574-11e9-aaa4-5254007d5f6a
spec:
bgp:
ipv4Address: 10.255.253.22/24
ipv4IPIPTunnelAddr: 192.168.36.192
orchRefs:
- nodeName: k8s-worker-01
orchestrator: k8s
kind: NodeList
metadata:
resourceVersion: "395"
Output from ETCD_ENDPOINTS=http://localhost:6666 calicoctl get bgppeers:
NAME PEERIP NODE ASN
Ouput from kubectl logs:
2019-01-31 17:01:20.519 [INFO][48] int_dataplane.go 751: Applying dataplane updates
2019-01-31 17:01:20.519 [INFO][48] ipsets.go 223: Asked to resync with the dataplane on next update. family="inet"
2019-01-31 17:01:20.519 [INFO][48] ipsets.go 254: Resyncing ipsets with dataplane. family="inet"
2019-01-31 17:01:20.523 [INFO][48] ipsets.go 304: Finished resync family="inet" numInconsistenciesFound=0 resyncDuration=3.675284ms
2019-01-31 17:01:20.523 [INFO][48] int_dataplane.go 765: Finished applying updates to dataplane. msecToApply=4.124166000000001
bird: BGP: Unexpected connect from unknown address 10.255.253.14 (port 36329)
bird: BGP: Unexpected connect from unknown address 10.255.253.14 (port 52383)
2019-01-31 17:01:23.182 [INFO][48] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
bird: BGP: Unexpected connect from unknown address 10.255.253.14 (port 39661)
2019-01-31 17:01:25.433 [INFO][48] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
bird: BGP: Unexpected connect from unknown address 10.255.253.14 (port 57359)
bird: BGP: Unexpected connect from unknown address 10.255.253.14 (port 47151)
bird: BGP: Unexpected connect from unknown address 10.255.253.14 (port 39243)
2019-01-31 17:01:30.943 [INFO][48] int_dataplane.go 751: Applying dataplane updates
2019-01-31 17:01:30.943 [INFO][48] ipsets.go 223: Asked to resync with the dataplane on next update. family="inet"
2019-01-31 17:01:30.943 [INFO][48] ipsets.go 254: Resyncing ipsets with dataplane. family="inet"
2019-01-31 17:01:30.945 [INFO][48] ipsets.go 304: Finished resync family="inet" numInconsistenciesFound=0 resyncDuration=2.369997ms
2019-01-31 17:01:30.946 [INFO][48] int_dataplane.go 765: Finished applying updates to dataplane. msecToApply=2.8165820000000004
bird: BGP: Unexpected connect from unknown address 10.255.253.14 (port 60641)
2019-01-31 17:01:33.190 [INFO][48] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
Note: the above unknown address (10.255.253.14) is the IP under br0 on the KVM host, not too sure why it's made an appearance.
I got the solution :
The first preference of ifconfig(in my case) through that it will try to connect to the worker-nodes which is not the right ip.
Solution:Change the calico.yaml file to override that ip to etho-ip by using the following steps.
Need to open port Calico networking (BGP) - TCP 179
# Specify interface
- name: IP_AUTODETECTION_METHOD
value: "interface=eth1"
calico.yaml
---
# Source: calico/templates/calico-config.yaml
# This ConfigMap is used to configure a self-hosted Calico installation.
kind: ConfigMap
apiVersion: v1
metadata:
name: calico-config
namespace: kube-system
data:
# Typha is disabled.
typha_service_name: "none"
# Configure the backend to use.
calico_backend: "bird"
# Configure the MTU to use
veth_mtu: "1440"
# The CNI network configuration to install on each node. The special
# values in this config will be automatically populated.
cni_network_config: |-
{
"name": "k8s-pod-network",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "calico",
"log_level": "info",
"datastore_type": "kubernetes",
"nodename": "__KUBERNETES_NODE_NAME__",
"mtu": __CNI_MTU__,
"ipam": {
"type": "calico-ipam"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "__KUBECONFIG_FILEPATH__"
}
},
{
"type": "portmap",
"snat": true,
"capabilities": {"portMappings": true}
}
]
}
---
# Source: calico/templates/kdd-crds.yaml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: felixconfigurations.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: FelixConfiguration
plural: felixconfigurations
singular: felixconfiguration
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: ipamblocks.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: IPAMBlock
plural: ipamblocks
singular: ipamblock
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: blockaffinities.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: BlockAffinity
plural: blockaffinities
singular: blockaffinity
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: ipamhandles.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: IPAMHandle
plural: ipamhandles
singular: ipamhandle
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: ipamconfigs.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: IPAMConfig
plural: ipamconfigs
singular: ipamconfig
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: bgppeers.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: BGPPeer
plural: bgppeers
singular: bgppeer
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: bgpconfigurations.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: BGPConfiguration
plural: bgpconfigurations
singular: bgpconfiguration
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: ippools.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: IPPool
plural: ippools
singular: ippool
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: hostendpoints.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: HostEndpoint
plural: hostendpoints
singular: hostendpoint
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: clusterinformations.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: ClusterInformation
plural: clusterinformations
singular: clusterinformation
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: globalnetworkpolicies.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: GlobalNetworkPolicy
plural: globalnetworkpolicies
singular: globalnetworkpolicy
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: globalnetworksets.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: GlobalNetworkSet
plural: globalnetworksets
singular: globalnetworkset
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: networkpolicies.crd.projectcalico.org
spec:
scope: Namespaced
group: crd.projectcalico.org
version: v1
names:
kind: NetworkPolicy
plural: networkpolicies
singular: networkpolicy
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: networksets.crd.projectcalico.org
spec:
scope: Namespaced
group: crd.projectcalico.org
version: v1
names:
kind: NetworkSet
plural: networksets
singular: networkset
---
# Source: calico/templates/rbac.yaml
# Include a clusterrole for the kube-controllers component,
# and bind it to the calico-kube-controllers serviceaccount.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: calico-kube-controllers
rules:
# Nodes are watched to monitor for deletions.
- apiGroups: [""]
resources:
- nodes
verbs:
- watch
- list
- get
# Pods are queried to check for existence.
- apiGroups: [""]
resources:
- pods
verbs:
- get
# IPAM resources are manipulated when nodes are deleted.
- apiGroups: ["crd.projectcalico.org"]
resources:
- ippools
verbs:
- list
- apiGroups: ["crd.projectcalico.org"]
resources:
- blockaffinities
- ipamblocks
- ipamhandles
verbs:
- get
- list
- create
- update
- delete
# Needs access to update clusterinformations.
- apiGroups: ["crd.projectcalico.org"]
resources:
- clusterinformations
verbs:
- get
- create
- update
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: calico-kube-controllers
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: calico-kube-controllers
subjects:
- kind: ServiceAccount
name: calico-kube-controllers
namespace: kube-system
---
# Include a clusterrole for the calico-node DaemonSet,
# and bind it to the calico-node serviceaccount.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: calico-node
rules:
# The CNI plugin needs to get pods, nodes, and namespaces.
- apiGroups: [""]
resources:
- pods
- nodes
- namespaces
verbs:
- get
- apiGroups: [""]
resources:
- endpoints
- services
verbs:
# Used to discover service IPs for advertisement.
- watch
- list
# Used to discover Typhas.
- get
- apiGroups: [""]
resources:
- nodes/status
verbs:
# Needed for clearing NodeNetworkUnavailable flag.
- patch
# Calico stores some configuration information in node annotations.
- update
# Watch for changes to Kubernetes NetworkPolicies.
- apiGroups: ["networking.k8s.io"]
resources:
- networkpolicies
verbs:
- watch
- list
# Used by Calico for policy information.
- apiGroups: [""]
resources:
- pods
- namespaces
- serviceaccounts
verbs:
- list
- watch
# The CNI plugin patches pods/status.
- apiGroups: [""]
resources:
- pods/status
verbs:
- patch
# Calico monitors various CRDs for config.
- apiGroups: ["crd.projectcalico.org"]
resources:
- globalfelixconfigs
- felixconfigurations
- bgppeers
- globalbgpconfigs
- bgpconfigurations
- ippools
- ipamblocks
- globalnetworkpolicies
- globalnetworksets
- networkpolicies
- networksets
- clusterinformations
- hostendpoints
verbs:
- get
- list
- watch
# Calico must create and update some CRDs on startup.
- apiGroups: ["crd.projectcalico.org"]
resources:
- ippools
- felixconfigurations
- clusterinformations
verbs:
- create
- update
# Calico stores some configuration information on the node.
- apiGroups: [""]
resources:
- nodes
verbs:
- get
- list
- watch
# These permissions are only requried for upgrade from v2.6, and can
# be removed after upgrade or on fresh installations.
- apiGroups: ["crd.projectcalico.org"]
resources:
- bgpconfigurations
- bgppeers
verbs:
- create
- update
# These permissions are required for Calico CNI to perform IPAM allocations.
- apiGroups: ["crd.projectcalico.org"]
resources:
- blockaffinities
- ipamblocks
- ipamhandles
verbs:
- get
- list
- create
- update
- delete
- apiGroups: ["crd.projectcalico.org"]
resources:
- ipamconfigs
verbs:
- get
# Block affinities must also be watchable by confd for route aggregation.
- apiGroups: ["crd.projectcalico.org"]
resources:
- blockaffinities
verbs:
- watch
# The Calico IPAM migration needs to get daemonsets. These permissions can be
# removed if not upgrading from an installation using host-local IPAM.
- apiGroups: ["apps"]
resources:
- daemonsets
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: calico-node
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: calico-node
subjects:
- kind: ServiceAccount
name: calico-node
namespace: kube-system
---
# Source: calico/templates/calico-node.yaml
# This manifest installs the calico-node container, as well
# as the CNI plugins and network config on
# each master and worker node in a Kubernetes cluster.
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: calico-node
namespace: kube-system
labels:
k8s-app: calico-node
spec:
selector:
matchLabels:
k8s-app: calico-node
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
k8s-app: calico-node
annotations:
# This, along with the CriticalAddonsOnly toleration below,
# marks the pod as a critical add-on, ensuring it gets
# priority scheduling and that its resources are reserved
# if it ever gets evicted.
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
nodeSelector:
beta.kubernetes.io/os: linux
hostNetwork: true
tolerations:
# Make sure calico-node gets scheduled on all nodes.
- effect: NoSchedule
operator: Exists
# Mark the pod as a critical add-on for rescheduling.
- key: CriticalAddonsOnly
operator: Exists
- effect: NoExecute
operator: Exists
serviceAccountName: calico-node
# Minimize downtime during a rolling upgrade or deletion; tell Kubernetes to do a "force
# deletion": https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods.
terminationGracePeriodSeconds: 0
priorityClassName: system-node-critical
initContainers:
# This container performs upgrade from host-local IPAM to calico-ipam.
# It can be deleted if this is a fresh installation, or if you have already
# upgraded to use calico-ipam.
- name: upgrade-ipam
image: calico/cni:v3.8.2
command: ["/opt/cni/bin/calico-ipam", "-upgrade"]
env:
- name: KUBERNETES_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: CALICO_NETWORKING_BACKEND
valueFrom:
configMapKeyRef:
name: calico-config
key: calico_backend
volumeMounts:
- mountPath: /var/lib/cni/networks
name: host-local-net-dir
- mountPath: /host/opt/cni/bin
name: cni-bin-dir
# This container installs the CNI binaries
# and CNI network config file on each node.
- name: install-cni
image: calico/cni:v3.8.2
command: ["/install-cni.sh"]
env:
# Name of the CNI config file to create.
- name: CNI_CONF_NAME
value: "10-calico.conflist"
# The CNI network config to install on each node.
- name: CNI_NETWORK_CONFIG
valueFrom:
configMapKeyRef:
name: calico-config
key: cni_network_config
# Set the hostname based on the k8s node name.
- name: KUBERNETES_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
# CNI MTU Config variable
- name: CNI_MTU
valueFrom:
configMapKeyRef:
name: calico-config
key: veth_mtu
# Prevents the container from sleeping forever.
- name: SLEEP
value: "false"
volumeMounts:
- mountPath: /host/opt/cni/bin
name: cni-bin-dir
- mountPath: /host/etc/cni/net.d
name: cni-net-dir
# Adds a Flex Volume Driver that creates a per-pod Unix Domain Socket to allow Dikastes
# to communicate with Felix over the Policy Sync API.
- name: flexvol-driver
image: calico/pod2daemon-flexvol:v3.8.2
volumeMounts:
- name: flexvol-driver-host
mountPath: /host/driver
containers:
# Runs calico-node container on each Kubernetes node. This
# container programs network policy and routes on each
# host.
- name: calico-node
image: calico/node:v3.8.2
env:
# Use Kubernetes API as the backing datastore.
- name: DATASTORE_TYPE
value: "kubernetes"
# Wait for the datastore.
- name: WAIT_FOR_DATASTORE
value: "true"
# Set based on the k8s node name.
- name: NODENAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
# Choose the backend to use.
- name: CALICO_NETWORKING_BACKEND
valueFrom:
configMapKeyRef:
name: calico-config
key: calico_backend
# Cluster type to identify the deployment type
- name: CLUSTER_TYPE
value: "k8s,bgp"
# Specify interface
- name: IP_AUTODETECTION_METHOD
value: "interface=eth1"
# Auto-detect the BGP IP address.
- name: IP
value: "autodetect"
# Enable IPIP
- name: CALICO_IPV4POOL_IPIP
value: "Always"
# Set MTU for tunnel device used if ipip is enabled
- name: FELIX_IPINIPMTU
valueFrom:
configMapKeyRef:
name: calico-config
key: veth_mtu
# The default IPv4 pool to create on startup if none exists. Pod IPs will be
# chosen from this range. Changing this value after installation will have
# no effect. This should fall within `--cluster-cidr`.
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"
# Disable file logging so `kubectl logs` works.
- name: CALICO_DISABLE_FILE_LOGGING
value: "true"
# Set Felix endpoint to host default action to ACCEPT.
- name: FELIX_DEFAULTENDPOINTTOHOSTACTION
value: "ACCEPT"
# Disable IPv6 on Kubernetes.
- name: FELIX_IPV6SUPPORT
value: "false"
# Set Felix logging to "info"
- name: FELIX_LOGSEVERITYSCREEN
value: "info"
- name: FELIX_HEALTHENABLED
value: "true"
securityContext:
privileged: true
resources:
requests:
cpu: 250m
livenessProbe:
httpGet:
path: /liveness
port: 9099
host: localhost
periodSeconds: 10
initialDelaySeconds: 10
failureThreshold: 6
readinessProbe:
exec:
command:
- /bin/calico-node
- -bird-ready
- -felix-ready
periodSeconds: 10
volumeMounts:
- mountPath: /lib/modules
name: lib-modules
readOnly: true
- mountPath: /run/xtables.lock
name: xtables-lock
readOnly: false
- mountPath: /var/run/calico
name: var-run-calico
readOnly: false
- mountPath: /var/lib/calico
name: var-lib-calico
readOnly: false
- name: policysync
mountPath: /var/run/nodeagent
volumes:
# Used by calico-node.
- name: lib-modules
hostPath:
path: /lib/modules
- name: var-run-calico
hostPath:
path: /var/run/calico
- name: var-lib-calico
hostPath:
path: /var/lib/calico
- name: xtables-lock
hostPath:
path: /run/xtables.lock
type: FileOrCreate
# Used to install CNI.
- name: cni-bin-dir
hostPath:
path: /opt/cni/bin
- name: cni-net-dir
hostPath:
path: /etc/cni/net.d
# Mount in the directory for host-local IPAM allocations. This is
# used when upgrading from host-local to calico-ipam, and can be removed
# if not using the upgrade-ipam init container.
- name: host-local-net-dir
hostPath:
path: /var/lib/cni/networks
# Used to create per-pod Unix Domain Sockets
- name: policysync
hostPath:
type: DirectoryOrCreate
path: /var/run/nodeagent
# Used to install Flex Volume Driver
- name: flexvol-driver-host
hostPath:
type: DirectoryOrCreate
path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: calico-node
namespace: kube-system
---
# Source: calico/templates/calico-kube-controllers.yaml
# See https://github.com/projectcalico/kube-controllers
apiVersion: apps/v1
kind: Deployment
metadata:
name: calico-kube-controllers
namespace: kube-system
labels:
k8s-app: calico-kube-controllers
spec:
# The controllers can only have a single active instance.
replicas: 1
selector:
matchLabels:
k8s-app: calico-kube-controllers
strategy:
type: Recreate
template:
metadata:
name: calico-kube-controllers
namespace: kube-system
labels:
k8s-app: calico-kube-controllers
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
nodeSelector:
beta.kubernetes.io/os: linux
tolerations:
# Mark the pod as a critical add-on for rescheduling.
- key: CriticalAddonsOnly
operator: Exists
- key: node-role.kubernetes.io/master
effect: NoSchedule
serviceAccountName: calico-kube-controllers
priorityClassName: system-cluster-critical
containers:
- name: calico-kube-controllers
image: calico/kube-controllers:v3.8.2
env:
# Choose which controllers to run.
- name: ENABLED_CONTROLLERS
value: node
- name: DATASTORE_TYPE
value: kubernetes
readinessProbe:
exec:
command:
- /usr/bin/check-status
- -r
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: calico-kube-controllers
namespace: kube-system
---
# Source: calico/templates/calico-etcd-secrets.yaml
---
# Source: calico/templates/calico-typha.yaml
---
# Source: calico/templates/configure-canal.yaml
As an addition to this it can also be set using the following kubectl command;
kubectl set env daemonset/calico-node -n kube-system
IP_AUTODETECTION_METHOD=interface=eth1
I restarted the docker service on that node, and the issue has been fixed.
the best solution :
Make sure that TCP port 179 is open https://projectcalico.docs.tigera.io/getting-started/kubernetes/requirements
firewall-cmd --zone=public --add-port=179/tcp --permanent
firewall-cmd --reload
2. Then put this command
kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=interface=eth.*
In my case i just uncommented - name: CALICO_IPV4POOL_CIDR value: "192.168.0.0/16"
or you can change the CALICO_IPV4POOL_CIDR value to the same range as kubernete's --pod-network-cidr value.
Disable Firewall
sudo systemctl stop firewalld.service
sudo systemctl disable firewalld.service
And restart your pod
kubectl delete pod <pod-name> -n <name-ns>
check
watch kubectl get pod -n <names-ns>
This is normal behavior. There must be some not working node in normal status. You can use kubectl get node --all-namespaces to check it. After you recover the problem node, the problem will do away.