Pod is in pending stat - kubernetes

I have made a custom scheduler like below:
root#kmaster:~# cat /etc/kubernetes/manifests/my-scheduler.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-scheduler
tier: control-plane
name: my-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=false
- --port=10261
- --secure-port=10269
env:
- name: no_proxy
value: ',10.74.46.2,10.74.46.3,10.74.46.4'
- name: NO_PROXY
value: ',10.74.46.2,10.74.46.3,10.74.46.4'
- name: HTTPS_PROXY
value: http://127.0.0.1:3129
- name: HTTP_PROXY
value: http://127.0.0.1:3129
image: k8s.gcr.io/kube-scheduler:v1.22.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10269
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: my-scheduler
resources:
requests:
cpu: 100m
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /healthz
port: 10269
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /etc/kubernetes/scheduler.conf
name: kubeconfig
readOnly: true
hostNetwork: true
priorityClassName: system-node-critical
securityContext:
seccompProfile:
type: RuntimeDefault
volumes:
- hostPath:
path: /etc/kubernetes/scheduler.conf
type: FileOrCreate
name: kubeconfig
status: {}
root#kmaster:~#
My customer scheduler pod is running successfully:
root#kmaster:~# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-78fcd69978-6dcft 1/1 Running 10 (2d19h ago) 47d
coredns-78fcd69978-8224d 1/1 Running 10 (2d19h ago) 47d
etcd-kmaster 1/1 Running 4 (21h ago) 15d
kube-apiserver-kmaster 1/1 Running 68 (2d19h ago) 47d
kube-controller-manager-kmaster 1/1 Running 33 (21h ago) 47d
kube-flannel-ds-nvpgz 1/1 Running 11 (2d19h ago) 47d
kube-flannel-ds-xnvvw 1/1 Running 11 (2d19h ago) 47d
kube-flannel-ds-ztgql 1/1 Running 4 (2d19h ago) 47d
kube-proxy-h2t7s 1/1 Running 4 (2d19h ago) 47d
kube-proxy-pq9t4 1/1 Running 10 (2d19h ago) 47d
kube-proxy-vgcw7 1/1 Running 8 (2d19h ago) 47d
kube-scheduler-kmaster 1/1 Running 0 21h
my-scheduler-kmaster 1/1 Running 0 7h17m
Then created a pod using that scheduler:
root#kmaster:~# cat my_pod.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
run: webapp-color
name: webapp-color
spec:
containers:
- image: 10.74.46.13:5000/webapp-color:v1
name: webapp-color
schedulerName: my-scheduler
root#kmaster:~#
But my pod is in pending state:
root#kmaster:~# kubectl get pod
NAME READY STATUS RESTARTS AGE
webapp-color 0/1 Pending 0 4h12m
root#kmaster:~#
Describe the pod:
root#kmaster:~# kubectl describe pod webapp-color
Name: webapp-color
Namespace: default
Priority: 0
Node: <none>
Labels: run=webapp-color
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
webapp-color:
Image: 10.74.46.13:5000/webapp-color:v1
Port: <none>
Host Port: <none>
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mkh4z (ro)
Volumes:
kube-api-access-mkh4z:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
root#kmaster:~#
Kubernetes version:
root#kmaster:~# kubectl version --short
Client Version: v1.22.0
Server Version: v1.22.0
root#kmaster:~#
Please help me to find why pod is in pending state.
Not getting any clue.
Pending means custom scheduler is not working. But not getting any reason.
Regards

The job of the scheduler is to decide what node a pod is going to run on. Until the scheduler decides this, the pod will be in pending state. It seems that your scheduler is not working properly.
Try looking into the logs with: kubectl logs kube-scheduler-kmaster -n kube-system
To see if the scheduler is making any decisions.

Related

Remove nodeSelectorTerms param in manifest deployment

I use this manifest configuration to deploy a registry into 3 mode Kubernetes cluster:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv1
namespace: registry-space
spec:
capacity:
storage: 5Gi # specify your own size
volumeMode: Filesystem
persistentVolumeReclaimPolicy: Retain
local:
path: /opt/registry # can be any path
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- kubernetes2
accessModes:
- ReadWriteMany # only 1 node will read/write on the path.
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pv1-claim
namespace: registry-space
spec: # should match specs added in the PersistenVolume
accessModes:
- ReadWriteMany
volumeMode: Filesystem
resources:
requests:
storage: 5Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: private-repository-k8s
namespace: registry-space
labels:
app: private-repository-k8s
spec:
replicas: 1
selector:
matchLabels:
app: private-repository-k8s
template:
metadata:
labels:
app: private-repository-k8s
spec:
volumes:
- name: certs-vol
hostPath:
path: /opt/certs
type: Directory
- name: task-pv-storage
persistentVolumeClaim:
claimName: pv1-claim # specify the PVC that you've created. PVC and Deployment must be in same namespace.
containers:
- image: registry:2
name: private-repository-k8s
imagePullPolicy: IfNotPresent
env:
- name: REGISTRY_HTTP_TLS_CERTIFICATE
value: "/opt/certs/registry.crt"
- name: REGISTRY_HTTP_TLS_KEY
value: "/opt/certs/registry.key"
ports:
- containerPort: 5000
volumeMounts:
- name: certs-vol
mountPath: /opt/certs
- name: task-pv-storage
mountPath: /opt/registry
I manually created directories on every node under /opt/certs and /opt/registry.
But when I try to deploy the manifest without hardcoded nodeSelectorTerms on tha control plane I get error:
kubernetes#kubernetes1:/opt/registry$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-58dbc876ff-fsjd5 1/1 Running 1 (74m ago) 84m
kube-system calico-node-5brzt 1/1 Running 1 (73m ago) 84m
kube-system calico-node-nph9n 1/1 Running 1 (76m ago) 84m
kube-system calico-node-pcd74 1/1 Running 1 (74m ago) 84m
kube-system calico-node-ph2ht 1/1 Running 1 (76m ago) 84m
kube-system coredns-565d847f94-7pswp 1/1 Running 1 (74m ago) 105m
kube-system coredns-565d847f94-tlrfr 1/1 Running 1 (74m ago) 105m
kube-system etcd-kubernetes1 1/1 Running 2 (74m ago) 105m
kube-system kube-apiserver-kubernetes1 1/1 Running 2 (74m ago) 105m
kube-system kube-controller-manager-kubernetes1 1/1 Running 2 (74m ago) 105m
kube-system kube-proxy-4slm4 1/1 Running 1 (76m ago) 86m
kube-system kube-proxy-4tnx2 1/1 Running 2 (74m ago) 105m
kube-system kube-proxy-9dgsj 1/1 Running 1 (73m ago) 85m
kube-system kube-proxy-cgr44 1/1 Running 1 (76m ago) 86m
kube-system kube-scheduler-kubernetes1 1/1 Running 2 (74m ago) 105m
registry-space private-repository-k8s-6d5d954b4f-xkmj5 0/1 Pending 0 4m55s
kubernetes#kubernetes1:/opt/registry$
Do you know how I can let Kubernetes to decide where to deploy the pod?
It seems like your node has taints hence pods are not getting scheduled. Can you try using this command to remove taints from your node ?
kubectl taint nodes <node-name> node-role.kubernetes.io/master-
or
kubectl taint nodes --all node-role.kubernetes.io/master-
To get the node name use kubectl get nodes
User was able to get the pod scheduled after running below command:
kubectl taint nodes kubernetes1 node-role.kubernetes.io/control-plane:NoSchedule-
Now pod is failing due to crashloopbackoff this implies the pod has been scheduled.
Can you please check if this pod is getting scheduled and running properly ?
apiVersion: v1
kind: Pod
metadata:
name: nginx1
namespace: test
spec:
containers:
- name: webserver
image: nginx:alpine
ports:
- containerPort: 80
resources:
requests:
memory: "64Mi"
cpu: "200m"
limits:
memory: "128Mi"
cpu: "350m"

How to deploy Mongodb replicaset on microk8s cluster

I'm trying to deploy a Mongodb ReplicaSet on microk8s cluster. I have installed a VM running on Ubuntu 20.04. After the deployment, the mongo pods do not run but crash. I've enabled microk8s storage, dns and rbac add-ons but still the same problem persists. Can any one help me find the reason behind it? Below is my manifest file:
apiVersion: v1
kind: Service
metadata:
name: mongodb-service
labels:
name: mongo
spec:
ports:
- port: 27017
targetPort: 27017
clusterIP: None
selector:
role: mongo
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mongo
spec:
selector:
matchLabels:
role: mongo
environment: test
serviceName: mongodb-service
replicas: 3
template:
metadata:
labels:
role: mongo
environment: test
replicaset: MainRepSet
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: replicaset
operator: In
values:
- MainRepSet
topologyKey: kubernetes.io/hostname
terminationGracePeriodSeconds: 10
volumes:
- name: secrets-volume
secret:
secretName: shared-bootstrap-data
defaultMode: 256
containers:
- name: mongod-container
#image: pkdone/mongo-ent:3.4
image: mongo
command:
- "numactl"
- "--interleave=all"
- "mongod"
- "--wiredTigerCacheSizeGB"
- "0.1"
- "--bind_ip"
- "0.0.0.0"
- "--replSet"
- "MainRepSet"
- "--auth"
- "--clusterAuthMode"
- "keyFile"
- "--keyFile"
- "/etc/secrets-volume/internal-auth-mongodb-keyfile"
- "--setParameter"
- "authenticationMechanisms=SCRAM-SHA-1"
resources:
requests:
cpu: 0.2
memory: 200Mi
ports:
- containerPort: 27017
volumeMounts:
- name: secrets-volume
readOnly: true
mountPath: /etc/secrets-volume
- name: mongodb-persistent-storage-claim
mountPath: /data/db
volumeClaimTemplates:
- metadata:
name: mongodb-persistent-storage-claim
spec:
storageClassName: microk8s-hostpath
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 5Gi
Also, here are the pv, pvc and sc outputs:
yyy#xxx:$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mongodb-persistent-storage-claim-mongo-0 Bound pvc-1b3de8f7-e416-4a1a-9c44-44a0422e0413 5Gi RWO microk8s-hostpath 13m
yyy#xxx:$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-5b75ddf6-abbd-4ff3-a135-0312df1e6703 20Gi RWX Delete Bound container-registry/registry-claim microk8s-hostpath 38m
pvc-1b3de8f7-e416-4a1a-9c44-44a0422e0413 5Gi RWO Delete Bound default/mongodb-persistent-storage-claim-mongo-0 microk8s-hostpath 13m
yyy#xxx:$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
microk8s-hostpath (default) microk8s.io/hostpath Delete Immediate false 108m
yyy#xxx:$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
metrics-server-8bbfb4bdb-xvwcw 1/1 Running 1 148m
dashboard-metrics-scraper-78d7698477-4qdhj 1/1 Running 0 146m
kubernetes-dashboard-85fd7f45cb-6t7xr 1/1 Running 0 146m
hostpath-provisioner-5c65fbdb4f-ff7cl 1/1 Running 0 113m
coredns-7f9c69c78c-dr5kt 1/1 Running 0 65m
calico-kube-controllers-f7868dd95-wtf8j 1/1 Running 0 150m
calico-node-knzc2 1/1 Running 0 150m
I have installed the cluster using this command:
sudo snap install microk8s --classic --channel=1.21
Output of mongodb deployment:
yyy#xxx:$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/mongo-0 0/1 CrashLoopBackOff 5 4m18s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 109m
service/mongodb-service ClusterIP None <none> 27017/TCP 4m19s
NAME READY AGE
statefulset.apps/mongo 0/3 4m19s
Pod logs:
yyy#xxx:$ kubectl logs pod/mongo-0
{"t":{"$date":"2021-09-07T16:21:13.191Z"},"s":"F", "c":"CONTROL", "id":20574, "ctx":"-","msg":"Error during global initialization","attr":{"error":{"code":2,"codeName":"BadValue","errmsg":"storage.wiredTiger.engineConfig.cacheSizeGB must be greater than or equal to 0.25"}}}
yyy#xxx:$ kubectl describe pod/mongo-0
Name: mongo-0
Namespace: default
Priority: 0
Node: citest1/192.168.9.105
Start Time: Tue, 07 Sep 2021 16:17:38 +0000
Labels: controller-revision-hash=mongo-66bd776569
environment=test
replicaset=MainRepSet
role=mongo
statefulset.kubernetes.io/pod-name=mongo-0
Annotations: cni.projectcalico.org/podIP: 10.1.150.136/32
cni.projectcalico.org/podIPs: 10.1.150.136/32
Status: Running
IP: 10.1.150.136
IPs:
IP: 10.1.150.136
Controlled By: StatefulSet/mongo
Containers:
mongod-container:
Container ID: containerd://458e21fac3e87dcf304a9701da0eb827b2646efe94cabce7f283cd49f740c15d
Image: mongo
Image ID: docker.io/library/mongo#sha256:58ea1bc09f269a9b85b7e1fae83b7505952aaa521afaaca4131f558955743842
Port: 27017/TCP
Host Port: 0/TCP
Command:
numactl
--interleave=all
mongod
--wiredTigerCacheSizeGB
0.1
--bind_ip
0.0.0.0
--replSet
MainRepSet
--auth
--clusterAuthMode
keyFile
--keyFile
/etc/secrets-volume/internal-auth-mongodb-keyfile
--setParameter
authenticationMechanisms=SCRAM-SHA-1
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 07 Sep 2021 16:24:03 +0000
Finished: Tue, 07 Sep 2021 16:24:03 +0000
Ready: False
Restart Count: 6
Requests:
cpu: 200m
memory: 200Mi
Environment: <none>
Mounts:
/data/db from mongodb-persistent-storage-claim (rw)
/etc/secrets-volume from secrets-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-b7nf8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
mongodb-persistent-storage-claim:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: mongodb-persistent-storage-claim-mongo-0
ReadOnly: false
secrets-volume:
Type: Secret (a volume populated by a Secret)
SecretName: shared-bootstrap-data
Optional: false
kube-api-access-b7nf8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 7m53s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 7m52s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 7m50s default-scheduler Successfully assigned default/mongo-0 to citest1
Normal Pulled 7m25s kubelet Successfully pulled image "mongo" in 25.215669443s
Normal Pulled 7m21s kubelet Successfully pulled image "mongo" in 1.192994197s
Normal Pulled 7m6s kubelet Successfully pulled image "mongo" in 1.203239709s
Normal Pulled 6m38s kubelet Successfully pulled image "mongo" in 1.213451175s
Normal Created 6m38s (x4 over 7m23s) kubelet Created container mongod-container
Normal Started 6m37s (x4 over 7m23s) kubelet Started container mongod-container
Normal Pulling 5m47s (x5 over 7m50s) kubelet Pulling image "mongo"
Warning BackOff 2m49s (x23 over 7m20s) kubelet Back-off restarting failed container
The logs you provided show that you have an incorrectly set parameter wiredTigerCacheSizeGB. In your case it is 0.1, and according to the message
"code":2,"codeName":"BadValue","errmsg":"storage.wiredTiger.engineConfig.cacheSizeGB must be greater than or equal to 0.25"
it should be at least 0.25.
In the section containers:
containers:
- name: mongod-container
#image: pkdone/mongo-ent:3.4
image: mongo
command:
- "numactl"
- "--interleave=all"
- "mongod"
- "--wiredTigerCacheSizeGB"
- "0.1"
- "--bind_ip"
- "0.0.0.0"
- "--replSet"
- "MainRepSet"
- "--auth"
- "--clusterAuthMode"
- "keyFile"
- "--keyFile"
- "/etc/secrets-volume/internal-auth-mongodb-keyfile"
- "--setParameter"
- "authenticationMechanisms=SCRAM-SHA-1"
you should change in this place
- "--wiredTigerCacheSizeGB"
- "0.1"
the value "0.1" to any other greather or equal "0.25".
Additionally I have seen another error:
1 pod has unbound immediate PersistentVolumeClaims
It should related to what I wrote earlier. However, you may find alternative ways to solve it here, here and here.

k3s on arch linux ARM worker service not responding

Current setup:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
cl01mtr01 Ready master 104m v1.18.2+k3s1 10.1.1.1 <none> Debian GNU/Linux 10 (buster) 4.19.0-9-amd64 containerd://1.3.3-k3s2
cl01wkr01 Ready <none> 9m20s v1.18.2+k3s1 10.1.1.101 <none> Arch Linux ARM 5.4.40-1-ARCH containerd://1.3.3-k3s2
Master installed with:
export INSTALL_K3S_VERSION="v1.18.2+k3s1"
curl -sSLf https://get.k3s.io | sh -s - server \
--write-kubeconfig-mode 644 \
--cluster-cidr 172.20.0.0/16 \
--service-cidr 172.21.0.0/16 \
--cluster-dns 172.21.0.10 \
--disable traefik
Worker installed with:
export INSTALL_K3S_VERSION="v1.18.2+k3s1"
curl -sSLf https://get.k3s.io | sh -s - agent \
--server https://10.1.1.1:6443 \
--token <token from master>
I also tried with a raspberry pi as master running arch linux and raspbian and a rock pi 64 with armbian.
I tried with k3s versions:
v1.17.4+k3s1
v1.17.5+k3s1
v1.18.2+k3s1
I also tested with docker and the --docker install option in k3s.
The nodes get discovered (as shown above), but I cannot access the service on my worker node(s) (raspberry pi 3 with arch linux arm) via http://10.1.1.1:30001 although, it can be accessed via kubectl exec.
I always get a connection timeout
This site can’t be reached
10.1.1.1 took too long to respond.
When the pod runs on the master node, or if the worker is an amd64 node, it can be accessed via http://10.1.1.1:30001.
This is the resource I try to load and access:
apiVersion: v1
kind: Namespace
metadata:
name: nginx
---
apiVersion: v1
kind: ConfigMap
metadata:
name: nginx-default-configmap
namespace: nginx
data:
default.conf: |
server {
listen 80;
listen [::]:80;
#server_name localhost;
location / {
root /usr/share/nginx/html;
index index.html index.htm;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root /usr/share/nginx/html;
}
}
---
apiVersion: v1
kind: Service
metadata:
name: nginx-service
namespace: nginx
spec:
ports:
- name: http
targetPort: 80
port: 80
nodePort: 30001
- name: https
targetPort: 443
port: 443
nodePort: 30002
selector:
app: nginx
type: NodePort
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nginx-daemonset
namespace: nginx
labels:
app: nginx
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/master
operator: NotIn
values:
- "true"
containers:
- name: nginx
image: nginx:stable
imagePullPolicy: Always
env:
- name: TZ
value: "Europe/Brussels"
ports:
- name: http
containerPort: 80
- name: https
containerPort: 443
volumeMounts:
- name: default-conf
mountPath: /etc/nginx/conf.d/default.conf
subPath: default.conf
readOnly: true
restartPolicy: Always
volumes:
- name: default-conf
configMap:
name: nginx-default-configmap
Some extra info:
> kubectl get all -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system pod/local-path-provisioner-6d59f47c7-d477m 1/1 Running 0 116m 172.20.0.4 cl01mtr01 <none> <none>
kube-system pod/metrics-server-7566d596c8-fbb7b 1/1 Running 0 116m 172.20.0.2 cl01mtr01 <none> <none>
kube-system pod/coredns-8655855d6-gnbsm 1/1 Running 0 116m 172.20.0.3 cl01mtr01 <none> <none>
nginx pod/nginx-daemonset-l4j7s 1/1 Running 0 52s 172.20.1.3 cl01wkr01 <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 172.21.0.1 <none> 443/TCP 116m <none>
kube-system service/kube-dns ClusterIP 172.21.0.10 <none> 53/UDP,53/TCP,9153/TCP 116m k8s-app=kube-dns
kube-system service/metrics-server ClusterIP 172.21.152.234 <none> 443/TCP 116m k8s-app=metrics-server
nginx service/nginx-service NodePort 172.21.14.185 <none> 80:30001/TCP,443:30002/TCP 52s app=nginx
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
nginx daemonset.apps/nginx-daemonset 1 1 1 1 1 <none> 52s nginx nginx:stable app=nginx
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
kube-system deployment.apps/local-path-provisioner 1/1 1 1 116m local-path-provisioner rancher/local-path-provisioner:v0.0.11 app=local-path-provisioner
kube-system deployment.apps/metrics-server 1/1 1 1 116m metrics-server rancher/metrics-server:v0.3.6 k8s-app=metrics-server
kube-system deployment.apps/coredns 1/1 1 1 116m coredns rancher/coredns-coredns:1.6.3 k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
kube-system replicaset.apps/local-path-provisioner-6d59f47c7 1 1 1 116m local-path-provisioner rancher/local-path-provisioner:v0.0.11 app=local-path-provisioner,pod-template-hash=6d59f47c7
kube-system replicaset.apps/metrics-server-7566d596c8 1 1 1 116m metrics-server rancher/metrics-server:v0.3.6 k8s-app=metrics-server,pod-template-hash=7566d596c8
kube-system replicaset.apps/coredns-8655855d6 1 1 1 116m coredns rancher/coredns-coredns:1.6.3 k8s-app=kube-dns,pod-template-hash=8655855d6

Why do pods remain in 'pending' status?

I am very confused about why my pods are staying in pending status.
Vitess seems have problem scheduling the vttablet pod on nodes. I built a 2-worker-node Kubernetes cluster (nodes A & B), and started vttablets on the cluster, but only two vttablets start normally, the other three is stay in pending state.
When I allow the master node to schedule pods, then the three pending vttablets all start on the master (first error, then running normally), and I create tables, two vttablet failed to execute.
When I add two new nodes (nodes C & D) to my kubernetes cluster, tear down vitess and restart vttablet, I find that the three vttablet pods still remain in pending
state, also if I kick off node A or node B, I get vttablet lost, and it will not restart on new node. I tear down vitess, and also tear down k8s cluster, rebuild it, and this time I use nodes C & D to build a 2-worker-node k8s cluster, and all vttablet now remain in pending status.
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
default etcd-global-5zh4k77slf 1/1 Running 0 46m 192.168.2.3 t-searchredis-a2 <none>
default etcd-global-f7db9nnfq9 1/1 Running 0 45m 192.168.2.5 t-searchredis-a2 <none>
default etcd-global-ksh5r9k45l 1/1 Running 0 45m 192.168.1.4 t-searchredis-a1 <none>
default etcd-operator-6f44498865-t84l5 1/1 Running 0 50m 192.168.2.2 t-searchredis-a2 <none>
default etcd-test-5g5lmcrl2x 1/1 Running 0 46m 192.168.2.4 t-searchredis-a2 <none>
default etcd-test-g4xrkk7wgg 1/1 Running 0 45m 192.168.1.5 t-searchredis-a1 <none>
default etcd-test-jkq4rjrwm8 1/1 Running 0 45m 192.168.2.6 t-searchredis-a2 <none>
default vtctld-z5d46 1/1 Running 0 44m 192.168.1.6 t-searchredis-a1 <none>
default vttablet-100 0/2 Pending 0 40m <none> <none> <none>
default vttablet-101 0/2 Pending 0 40m <none> <none> <none>
default vttablet-102 0/2 Pending 0 40m <none> <none> <none>
default vttablet-103 0/2 Pending 0 40m <none> <none> <none>
default vttablet-104 0/2 Pending 0 40m <none> <none> <none>
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: 2018-11-27T07:25:19Z
labels:
app: vitess
component: vttablet
keyspace: test_keyspace
shard: "0"
tablet: test-0000000100
name: vttablet-100
namespace: default
resourceVersion: "22304"
selfLink: /api/v1/namespaces/default/pods/vttablet-100
uid: 98258046-f215-11e8-b6a1-fa163e0411d1
spec:
containers:
- command:
- bash
- -c
- |-
set -e
mkdir -p $VTDATAROOT/tmp
chown -R vitess /vt
su -p -s /bin/bash -c "/vt/bin/vttablet -binlog_use_v3_resharding_mode -topo_implementation etcd2 -topo_global_server_address http://etcd-global-client:2379 -topo_global_root /global -log_dir $VTDATAROOT/tmp -alsologtostderr -port 15002 -grpc_port 16002 -service_map 'grpc-queryservice,grpc-tabletmanager,grpc-updatestream' -tablet-path test-0000000100 -tablet_hostname $(hostname -i) -init_keyspace test_keyspace -init_shard 0 -init_tablet_type replica -health_check_interval 5s -mysqlctl_socket $VTDATAROOT/mysqlctl.sock -enable_semi_sync -enable_replication_reporter -orc_api_url http://orchestrator/api -orc_discover_interval 5m -restore_from_backup -backup_storage_implementation file -file_backup_storage_root '/usr/local/MySQL_DB_Backup/test'" vitess
env:
- name: EXTRA_MY_CNF
value: /vt/config/mycnf/master_mysql56.cnf
image: vitess/lite
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /debug/vars
port: 15002
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
name: vttablet
ports:
- containerPort: 15002
name: web
protocol: TCP
- containerPort: 16002
name: grpc
protocol: TCP
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 500m
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /dev/log
name: syslog
- mountPath: /vt/vtdataroot
name: vtdataroot
- mountPath: /etc/ssl/certs/ca-certificates.crt
name: certs
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-7g2jb
readOnly: true
- command:
- sh
- -c
- |-
mkdir -p $VTDATAROOT/tmp && chown -R vitess /vt
su -p -c "/vt/bin/mysqlctld -log_dir $VTDATAROOT/tmp -alsologtostderr -tablet_uid 100 -socket_file $VTDATAROOT/mysqlctl.sock -init_db_sql_file $VTROOT/config/init_db.sql" vitess
env:
- name: EXTRA_MY_CNF
value: /vt/config/mycnf/master_mysql56.cnf
image: vitess/lite
imagePullPolicy: Always
name: mysql
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 500m
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /dev/log
name: syslog
- mountPath: /vt/vtdataroot
name: vtdataroot
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-7g2jb
readOnly: true
dnsPolicy: ClusterFirst
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- hostPath:
path: /dev/log
type: ""
name: syslog
- emptyDir: {}
name: vtdataroot
- hostPath:
path: /etc/ssl/certs/ca-certificates.crt
type: ""
name: certs
- name: default-token-7g2jb
secret:
defaultMode: 420
secretName: default-token-7g2jb
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2018-11-27T07:25:19Z
message: '0/3 nodes are available: 1 node(s) had taints that the pod didn''t tolerate,
2 Insufficient cpu.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: Guaranteed
As you can see down at the bottom:
message: '0/3 nodes are available: 1 node(s) had taints that the pod didn''t tolerate,
2 Insufficient cpu.'
Meaning that your two worker nodes are out of resources based on the limits you specified in the pod. You will need more workers, or smaller CPU requests.

kube-dns keeps restarting with kubenetes on coreos

I have Kubernetes installed on Container Linux by CoreOS alpha (1353.1.0)
using hyperkube v1.5.5_coreos.0 using my fork of coreos-kubernetes install scripts at https://github.com/kfirufk/coreos-kubernetes.
I have two ContainerOS machines.
coreos-2.tux-in.com resolved as 192.168.1.2 as controller
coreos-3.tux-in.com resolved as 192.168.1.3 as worker
kubectl get pods --all-namespaces returns
NAMESPACE NAME READY STATUS RESTARTS AGE
ceph ceph-mds-2743106415-rkww4 0/1 Pending 0 1d
ceph ceph-mon-check-3856521781-bd6k5 1/1 Running 0 1d
kube-lego kube-lego-3323932148-g2tf4 1/1 Running 0 1d
kube-system calico-node-xq6j7 2/2 Running 0 1d
kube-system calico-node-xzpp2 2/2 Running 4560 1d
kube-system calico-policy-controller-610849172-b7xjr 1/1 Running 0 1d
kube-system heapster-v1.3.0-beta.0-2754576759-v1f50 2/2 Running 0 1d
kube-system kube-apiserver-192.168.1.2 1/1 Running 0 1d
kube-system kube-controller-manager-192.168.1.2 1/1 Running 1 1d
kube-system kube-dns-3675956729-r7hhf 3/4 Running 3924 1d
kube-system kube-dns-autoscaler-505723555-l2pph 1/1 Running 0 1d
kube-system kube-proxy-192.168.1.2 1/1 Running 0 1d
kube-system kube-proxy-192.168.1.3 1/1 Running 0 1d
kube-system kube-scheduler-192.168.1.2 1/1 Running 1 1d
kube-system kubernetes-dashboard-3697905830-vdz23 1/1 Running 1246 1d
kube-system monitoring-grafana-4013973156-m2r2v 1/1 Running 0 1d
kube-system monitoring-influxdb-651061958-2mdtf 1/1 Running 0 1d
nginx-ingress default-http-backend-150165654-s4z04 1/1 Running 2 1d
so I can see that kube-dns-782804071-h78rf keeps restarting.
kubectl describe pod kube-dns-3675956729-r7hhf --namespace=kube-system returns:
Name: kube-dns-3675956729-r7hhf
Namespace: kube-system
Node: 192.168.1.2/192.168.1.2
Start Time: Sat, 11 Mar 2017 17:54:14 +0000
Labels: k8s-app=kube-dns
pod-template-hash=3675956729
Status: Running
IP: 10.2.67.243
Controllers: ReplicaSet/kube-dns-3675956729
Containers:
kubedns:
Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:kubedns
Image: gcr.io/google_containers/kubedns-amd64:1.9
Image ID: rkt://sha512-c7b7c9c4393bea5f9dc5bcbe1acf1036c2aca36ac14b5e17fd3c675a396c4219
Ports: 10053/UDP, 10053/TCP, 10055/TCP
Args:
--domain=cluster.local.
--dns-port=10053
--config-map=kube-dns
--v=2
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
State: Running
Started: Sun, 12 Mar 2017 17:47:41 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 12 Mar 2017 17:46:28 +0000
Finished: Sun, 12 Mar 2017 17:47:02 +0000
Ready: False
Restart Count: 981
Liveness: http-get http://:8080/healthz-kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
Environment Variables:
PROMETHEUS_PORT: 10055
dnsmasq:
Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:dnsmasq
Image: gcr.io/google_containers/kube-dnsmasq-amd64:1.4.1
Image ID: rkt://sha512-8c5f8b40f6813bb676ce04cd545c55add0dc8af5a3be642320244b74ea03f872
Ports: 53/UDP, 53/TCP
Args:
--cache-size=1000
--no-resolv
--server=127.0.0.1#10053
--log-facility=-
Requests:
cpu: 150m
memory: 10Mi
State: Running
Started: Sun, 12 Mar 2017 17:47:41 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 12 Mar 2017 17:46:28 +0000
Finished: Sun, 12 Mar 2017 17:47:02 +0000
Ready: True
Restart Count: 981
Liveness: http-get http://:8080/healthz-dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
Environment Variables: <none>
dnsmasq-metrics:
Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:dnsmasq-metrics
Image: gcr.io/google_containers/dnsmasq-metrics-amd64:1.0.1
Image ID: rkt://sha512-ceb3b6af1cd67389358be14af36b5e8fb6925e78ca137b28b93e0d8af134585b
Port: 10054/TCP
Args:
--v=2
--logtostderr
Requests:
memory: 10Mi
State: Running
Started: Sun, 12 Mar 2017 17:47:41 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 12 Mar 2017 17:46:28 +0000
Finished: Sun, 12 Mar 2017 17:47:02 +0000
Ready: True
Restart Count: 981
Liveness: http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
Environment Variables: <none>
healthz:
Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:healthz
Image: gcr.io/google_containers/exechealthz-amd64:v1.2.0
Image ID: rkt://sha512-3a85b0533dfba81b5083a93c7e091377123dac0942f46883a4c10c25cf0ad177
Port: 8080/TCP
Args:
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
--url=/healthz-dnsmasq
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
--url=/healthz-kubedns
--port=8080
--quiet
Limits:
memory: 50Mi
Requests:
cpu: 10m
memory: 50Mi
State: Running
Started: Sun, 12 Mar 2017 17:47:41 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 12 Mar 2017 17:46:28 +0000
Finished: Sun, 12 Mar 2017 17:47:02 +0000
Ready: True
Restart Count: 981
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-zqbdp:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-zqbdp
QoS Class: Burstable
Tolerations: CriticalAddonsOnly=:Exists
No events.
which shows that kubedns-amd64:1.9 is in Ready: false
this is my kude-dns-de.yaml file:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
spec:
strategy:
rollingUpdate:
maxSurge: 10%
maxUnavailable: 0
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
labels:
k8s-app: kube-dns
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]'
spec:
containers:
- name: kubedns
image: gcr.io/google_containers/kubedns-amd64:1.9
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
livenessProbe:
httpGet:
path: /healthz-kubedns
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /readiness
port: 8081
scheme: HTTP
initialDelaySeconds: 3
timeoutSeconds: 5
args:
- --domain=cluster.local.
- --dns-port=10053
- --config-map=kube-dns
# This should be set to v=2 only after the new image (cut from 1.5) has
# been released, otherwise we will flood the logs.
- --v=2
env:
- name: PROMETHEUS_PORT
value: "10055"
ports:
- containerPort: 10053
name: dns-local
protocol: UDP
- containerPort: 10053
name: dns-tcp-local
protocol: TCP
- containerPort: 10055
name: metrics
protocol: TCP
- name: dnsmasq
image: gcr.io/google_containers/kube-dnsmasq-amd64:1.4.1
livenessProbe:
httpGet:
path: /healthz-dnsmasq
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
args:
- --cache-size=1000
- --no-resolv
- --server=127.0.0.1#10053
- --log-facility=-
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
# see: https://github.com/kubernetes/kubernetes/issues/29055 for details
resources:
requests:
cpu: 150m
memory: 10Mi
- name: dnsmasq-metrics
image: gcr.io/google_containers/dnsmasq-metrics-amd64:1.0.1
livenessProbe:
httpGet:
path: /metrics
port: 10054
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
args:
- --v=2
- --logtostderr
ports:
- containerPort: 10054
name: metrics
protocol: TCP
resources:
requests:
memory: 10Mi
- name: healthz
image: gcr.io/google_containers/exechealthz-amd64:v1.2.0
resources:
limits:
memory: 50Mi
requests:
cpu: 10m
memory: 50Mi
args:
- --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
- --url=/healthz-dnsmasq
- --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
- --url=/healthz-kubedns
- --port=8080
- --quiet
ports:
- containerPort: 8080
protocol: TCP
dnsPolicy: Default
and this is my kube-dns-svc.yaml:
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "KubeDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: 10.3.0.10
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
any information regarding the issue would be greatly appreciated!
update
rkt list --full 2> /dev/null | grep kubedns shows:
744a4579-0849-4fae-b1f5-cb05d40f3734 kubedns gcr.io/google_containers/kubedns-amd64:1.9 sha512-c7b7c9c4393b running 2017-03-22 22:14:55.801 +0000 UTC 2017-03-22 22:14:56.814 +0000 UTC
journalctl -m _MACHINE_ID=744a45790849b1f5cb05d40f3734 provides:
Mar 22 22:17:58 kube-dns-3675956729-sthcv kubedns[8]: E0322 22:17:58.619254 8 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.3.0.1:443: connect: network is unreachable
I tried to add - --proxy-mode=userspace to /etc/kubernetes/manifests/kube-proxy.yaml but the results are the same.
kubectl get svc --all-namespaces provides:
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ceph ceph-mon None <none> 6789/TCP 1h
default kubernetes 10.3.0.1 <none> 443/TCP 1h
kube-system heapster 10.3.0.2 <none> 80/TCP 1h
kube-system kube-dns 10.3.0.10 <none> 53/UDP,53/TCP 1h
kube-system kubernetes-dashboard 10.3.0.116 <none> 80/TCP 1h
kube-system monitoring-grafana 10.3.0.187 <none> 80/TCP 1h
kube-system monitoring-influxdb 10.3.0.214 <none> 8086/TCP 1h
nginx-ingress default-http-backend 10.3.0.233 <none> 80/TCP 1h
kubectl get cs provides:
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health": "true"}
my kube-proxy.yaml has the following content:
apiVersion: v1
kind: Pod
metadata:
name: kube-proxy
namespace: kube-system
annotations:
rkt.alpha.kubernetes.io/stage1-name-override: coreos.com/rkt/stage1-fly
spec:
hostNetwork: true
containers:
- name: kube-proxy
image: quay.io/coreos/hyperkube:v1.5.5_coreos.0
command:
- /hyperkube
- proxy
- --cluster-cidr=10.2.0.0/16
- --kubeconfig=/etc/kubernetes/controller-kubeconfig.yaml
securityContext:
privileged: true
volumeMounts:
- mountPath: /etc/ssl/certs
name: "ssl-certs"
- mountPath: /etc/kubernetes/controller-kubeconfig.yaml
name: "kubeconfig"
readOnly: true
- mountPath: /etc/kubernetes/ssl
name: "etc-kube-ssl"
readOnly: true
- mountPath: /var/run/dbus
name: dbus
readOnly: false
volumes:
- hostPath:
path: "/usr/share/ca-certificates"
name: "ssl-certs"
- hostPath:
path: "/etc/kubernetes/controller-kubeconfig.yaml"
name: "kubeconfig"
- hostPath:
path: "/etc/kubernetes/ssl"
name: "etc-kube-ssl"
- hostPath:
path: /var/run/dbus
name: dbus
this is all the valuable information I could find. any ideas? :)
update 2
output of iptables-save on the controller ContainerOS at http://pastebin.com/2GApCj0n
update 3
I ran curl on the controller node
# curl https://10.3.0.1 --insecure
Unauthorized
means it can access it properly, i didn't add enough parameters for it to be authorized right ?
update 4
thanks to #jaxxstorm I removed calico manifests, updated their quay/cni and quay/node versions and reinstalled them.
now kubedns keeps restarting, but I think that now calico works. because for the first time it tries to install kubedns on the worker node and not on the controller node, and also when I rkt enter the kubedns pod and try to wget https://10.3.0.1 I get:
# wget https://10.3.0.1
Connecting to 10.3.0.1 (10.3.0.1:443)
wget: can't execute 'ssl_helper': No such file or directory
wget: error getting response: Connection reset by peer
which clearly shows that there is some kind of response. which is good right ?
now kubectl get pods --all-namespaces shows:
kube-system kube-dns-3675956729-ljz2w 4/4 Running 88 42m
so.. 4/4 ready but it keeps restarting.
kubectl describe pod kube-dns-3675956729-ljz2w --namespace=kube-system output at http://pastebin.com/Z70U331G
so it can't connect to http://10.2.47.19:8081/readiness, i'm guessing this is the ip of kubedns since it uses port 8081. don't know how to continue investigating this issue further.
thanks for everything!
Lots of great debugging info here, thanks!
This is the clincher:
# curl https://10.3.0.1 --insecure
Unauthorized
You got an unauthorized response because you didn't pass a client cert, but that's fine, it's not what we're after. This proves that kube-proxy is working as expected and is accessible. Your rkt logs:
Mar 22 22:17:58 kube-dns-3675956729-sthcv kubedns[8]: E0322 22:17:58.619254 8 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.3.0.1:443: connect: network is unreachable
Are indicating that the containers that there's network connectivity issues inside the containers, which indicates to me that you haven't configured container networking/CNI.
Please have a read through this document: https://coreos.com/rkt/docs/latest/networking/overview.html
You may also have to reconfigure calico, there's some more information that here: http://docs.projectcalico.org/master/getting-started/rkt/
kube-dns has a readiness probe that tries resolving trough the Service IP of kube-dns. Is it possible that there is a problem with your Service network?
Check out the answer and solution here:
kubernetes service IPs not reachable