failed to set up pod network: Unhandled Exception killed plugin - kubernetes

I'm trying to play with kubernetes 1.4 install with rkt containers on CoreOS beta (1185.1.0).
In general I have two CoreOS pc machines at home that are configured with etcd2 tls certificates.
I patched the coreos-kubernetes automated generic install script to support etcd2 tls certificates. the latest versions of the worker and controller install scripts are posted at https://github.com/kfirufk/coreos-kubernetes-multi-node-generic-install-script
I used the following environment variables for the controller coreos installation script (ip:10.79.218.2,domain:coreos-2.tux-in.com)
ADVERTISE_IP=10.79.218.2
ETCD_ENDPOINTS="https://coreos-2.tux-in.com:2379,https://coreos-3.tux-in.com:2379"
K8S_VER=v1.4.1_coreos.0
HYPERKUBE_IMAGE_REPO=quay.io/coreos/hyperkube
POD_NETWORK=10.2.0.0/16
SERVICE_IP_RANGE=10.3.0.0/24
K8S_SERVICE_IP=10.3.0.1
DNS_SERVICE_IP=10.3.0.10
USE_CALICO=true
CONTAINER_RUNTIME=rkt
ETCD_CERT_FILE="/etc/ssl/etcd/etcd1.pem"
ETCD_KEY_FILE="/etc/ssl/etcd/etcd1-key.pem"
ETCD_TRUSTED_CA_FILE="/etc/ssl/etcd/ca.pem"
ETCD_CLIENT_CERT_AUTH=true
OVERWRITE_ALL_FILES=true
CONTROLLER_HOSTNAME="coreos-2.tux-in.com"
ETCD_CERT_ROOT_DIR="/etc/ssl/etcd"
ETCD_SCHEME="https"
ETCD_AUTHORITY="coreos-2.tux-in.com:2379"
IS_MASK_UPDATE_ENGINE=false
and these are the environment variables I used for the worker coreos installation script (ip:10.79.218.3,domain:coreos-3.tux-in.com)
ETCD_AUTHORITY=coreos-3.tux-in.com:2379
ETCD_ENDPOINTS="https://coreos-2.tux-in.com:2379,https://coreos-3.tux-in.com:2379"
CONTROLLER_ENDPOINT=https://coreos-2.tux-in.com
K8S_VER=v1.4.1_coreos.0
HYPERKUBE_IMAGE_REPO=quay.io/coreos/hyperkube
DNS_SERVICE_IP=10.3.0.10
USE_CALICO=true
CONTAINER_RUNTIME=rkt
OVERWRITE_ALL_FILES=true
ADVERTISE_IP=10.79.218.3
ETCD_CERT_FILE="/etc/ssl/etcd/etcd2.pem"
ETCD_KEY_FILE="/etc/ssl/etcd/etcd2-key.pem"
ETCD_TRUSTED_CA_FILE="/etc/ssl/etcd/ca.pem"
ETCD_SCHEME="https"
IS_MASK_UPDATE_ENGINE=false
after installing kubernetes on both machines, and configuring kubectl properly, when I type kubectl get nodes I get:
NAME STATUS AGE
10.79.218.2 Ready,SchedulingDisabled 1h
10.79.218.3 Ready 1h
kubectl get pods --namespace=kube-system returns
NAME READY STATUS RESTARTS AGE
heapster-v1.2.0-3646253287-j951o 0/2 ContainerCreating 0 1d
kube-apiserver-10.79.218.2 1/1 Running 0 1d
kube-controller-manager-10.79.218.2 1/1 Running 0 1d
kube-dns-v20-u3pd0 0/3 ContainerCreating 0 1d
kube-proxy-10.79.218.2 1/1 Running 0 1d
kube-proxy-10.79.218.3 1/1 Running 0 1d
kube-scheduler-10.79.218.2 1/1 Running 0 1d
kubernetes-dashboard-v1.4.1-ehiez 0/1 ContainerCreating 0 1d
so heapster-v1.2.0-3646253287-j951o, kube-dns-v20-u3pd0 and kubernetes-dashboard-v1.4.1-ehiez are stuck in ContainerCreating status.
when I run kubectl describe on any of them, I basically get the same error: Error syncing pod, skipping: failed to SyncPod: failed to set up pod network: Unhandled Exception killed plugin.
for example, kubectl describe pods kubernetes-dashboard-v1.4.1-ehiez --namespace kube-system returns:
Name: kubernetes-dashboard-v1.4.1-ehiez
Namespace: kube-system
Node: 10.79.218.3/10.79.218.3
Start Time: Mon, 17 Oct 2016 23:31:43 +0300
Labels: k8s-app=kubernetes-dashboard
kubernetes.io/cluster-service=true
version=v1.4.1
Status: Pending
IP:
Controllers: ReplicationController/kubernetes-dashboard-v1.4.1
Containers:
kubernetes-dashboard:
Container ID:
Image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.4.1
Image ID:
Port: 9090/TCP
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Liveness: http-get http://:9090/ delay=30s timeout=30s period=10s #success=1 #failure=3
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-svbiv (ro)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-svbiv:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-svbiv
QoS Class: Guaranteed
Tolerations: CriticalAddonsOnly=:Exists
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1d 25s 9350 {kubelet 10.79.218.3} Warning FailedSync Error syncing pod, skipping: failed to SyncPod: failed to set up pod network: Unhandled Exception killed plugin
I'm guessing that pod networking isn't working because of faulty calico configuration..
so I tried to install calicoctl rkt container, but had problems with that. but that's a different stackoverflow question :) starting calicoctl container on coreos
so I can't really check if calico works properly.
this is the calico-network systemd service file for the controller node:
[Unit]
Description=Calico per-host agent
Requires=network-online.target
After=network-online.target
[Service]
Slice=machine.slice
Environment=CALICO_DISABLE_FILE_LOGGING=true
Environment=HOSTNAME=10.79.218.3
Environment=IP=10.79.218.3
Environment=FELIX_FELIXHOSTNAME=10.79.218.3
Environment=CALICO_NETWORKING=true
Environment=NO_DEFAULT_POOLS=true
Environment=ETCD_ENDPOINTS=https://coreos-2.tux-in.com:2379,https://coreos-3.tux-in.com:2379
Environment=ETCD_AUTHORITY=coreos-3.tux-in.com:2379
Environment=ETCD_SCHEME=https
Environment=ETCD_CA_CERT_FILE=/etc/ssl/etcd/ca.pem
Environment=ETCD_CERT_FILE=/etc/ssl/etcd/etcd2.pem
Environment=ETCD_KEY_FILE=/etc/ssl/etcd/etcd2-key.pem
ExecStart=/usr/bin/rkt run --inherit-env --stage1-from-dir=stage1-fly.aci --volume=var-run-calico,kind=host,source=/var/run/calico --volume=modules,kind=host,source=/lib/modules,readOnly=false --mount=volume=modules,target=/lib/modules --volume=dns,kind=host,source=/etc/resolv.conf,readOnly=true --volume=etcd-tls-certs,kind=host,source=/etc/ssl/etcd,readOnly=true --mount=volume=dns,target=/etc/resolv.conf --mount=volume=etcd-tls-certs,target=/etc/ssl/etcd --mount=volume=var-run-calico,target=/var/run/calico --trust-keys-from-https quay.io/calico/node:v0.22.0
KillMode=mixed
Restart=always
TimeoutStartSec=0
[Install]
WantedBy=multi-user.target
and is the calico-node service file for the worker node:
[Unit]
Description=Calico per-host agent
Requires=network-online.target
After=network-online.target
[Service]
Slice=machine.slice
Environment=CALICO_DISABLE_FILE_LOGGING=true
Environment=HOSTNAME=10.79.218.2
Environment=IP=10.79.218.2
Environment=FELIX_FELIXHOSTNAME=10.79.218.2
Environment=CALICO_NETWORKING=true
Environment=NO_DEFAULT_POOLS=false
Environment=ETCD_ENDPOINTS=https://coreos-2.tux-in.com:2379,https://coreos-3.tux-in.com:2379
ExecStart=/usr/bin/rkt run --inherit-env --stage1-from-dir=stage1-fly.aci --volume=var-run-calico,kind=host,source=/var/run/calico --volume=modules,kind=host,source=/lib/modules,readOnly=false --mount=volume=modules,target=/lib/modules --volume=dns,kind=host,source=/etc/resolv.conf,readOnly=true --volume=etcd-tls-certs,kind=host,source=/etc/ssl/etcd,readOnly=true --mount=volume=dns,target=/etc/resolv.conf --mount=volume=etcd-tls-certs,target=/etc/ssl/etcd --mount=volume=var-run-calico,target=/var/run/calico --trust-keys-from-https quay.io/calico/node:v0.22.0
KillMode=mixed
Environment=ETCD_CA_CERT_FILE=/etc/ssl/etcd/ca.pem
Environment=ETCD_CERT_FILE=/etc/ssl/etcd/etcd1.pem
Environment=ETCD_KEY_FILE=/etc/ssl/etcd/etcd1-key.pem
Restart=always
TimeoutStartSec=0
[Install]
WantedBy=multi-user.target
and this is the content of /etc/kubernetes/cni/net.d/10-calico.conf of the controller node:
{
"name": "calico",
"type": "flannel",
"delegate": {
"type": "calico",
"etcd_endpoints": "https://coreos-2.tux-in.com:2379,https://coreos-3.tux-in.com:2379",
"etcd_key_file": "/etc/ssl/etcd/etcd1-key.pem",
"etcd_cert_file": "/etc/ssl/etcd/etcd1.pem",
"etcd_ca_cert_file": "/etc/ssl/etcd/ca.pem",
"log_level": "none",
"log_level_stderr": "info",
"hostname": "10.79.218.2",
"policy": {
"type": "k8s",
"k8s_api_root": "http://127.0.0.1:8080/api/v1/"
}
}
}
and this is the /etc/kubernetes/cni/net.d/10-calico.conf of the worker node:
{
"name": "calico",
"type": "flannel",
"delegate": {
"type": "calico",
"etcd_endpoints": "https://coreos-2.tux-in.com:2379,https://coreos-3.tux-in.com:2379",
"etcd_key_file": "/etc/ssl/etcd/etcd2-key.pem",
"etcd_cert_file": "/etc/ssl/etcd/etcd2.pem",
"etcd_ca_cert_file": "/etc/ssl/etcd/ca.pem",
"log_level": "debug",
"log_level_stderr": "info",
"hostname": "10.79.218.3",
"policy": {
"type": "k8s",
"k8s_api_root": "https://coreos-2.tux-in.com:443/api/v1/",
"k8s_client_key": "/etc/kubernetes/ssl/worker-key.pem",
"k8s_client_certificate": "/etc/kubernetes/ssl/worker.pem"
}
}
}
now idea how to investigate the issue further.
I understand that since new calico-cni was moved to go, it doesn't store log information in a log file anymore, so i'm lost from here.
any information regarding the issue would be greatly appreciated.
thanks!

The "Unhandled Exception Killed plugin" error message is being generated by the Calico CNI plugin. From my experience that means it is unlikely to be something wrong with the calico-node.service causing that error.
As such it is probably something subtly wrong with you CNI network configuration. Could you share that file?
The CNI plugin should also emit more detailed logging information - either to stderr or to /var/log/calico/cni/calico.log based on how its configured in your CNI network config. I suspect that file will give you more clues into exactly what is going wrong.
All that said, the "Unhandled Exception" error is coming from the Python version of the CNI plugin, which is rather old at this point. I'd recommend upgrading to the latest stable release from here: https://github.com/projectcalico/calico-cni/releases

Related

CockroachDB Cluster on Kubernetes Pods Crashing

I'm trying to install a CockroachDB Helm chart on a 2 node Kubernetes cluster using this command:
helm install my-release --set statefulset.replicas=2 stable/cockroachdb
I have already created 2 persistent volumes:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv00001 100Gi RWO Recycle Bound default/datadir-my-release-cockroachdb-0 11m
pv00002 100Gi RWO Recycle Bound default/datadir-my-release-cockroachdb-1 11m
I'm getting a weird error and I'm new to Kubernetes so I'm not sure what I'm doing wrong. I've tried creating a StorageClass and using it with my PVs but then the CockroachDB PVCs won't bind to them. I suspect there may be something wrong with my PV setup?
I've tried using kubectl logs but the only error I'm seeing is this:
standard_init_linux.go:211: exec user process caused "exec format
error"
and the pods are crashing over and over:
NAME READY STATUS RESTARTS AGE
my-release-cockroachdb-0 0/1 Pending 0 11m
my-release-cockroachdb-1 0/1 CrashLoopBackOff 7 11m
my-release-cockroachdb-init-tfcks 0/1 CrashLoopBackOff 5 5m29s
Any idea why the pods are crashing?
Here's kubectl describe for the init pod:
Name: my-release-cockroachdb-init-tfcks
Namespace: default
Priority: 0
Node: axon/192.168.1.7
Start Time: Sat, 04 Apr 2020 00:22:19 +0100
Labels: app.kubernetes.io/component=init
app.kubernetes.io/instance=my-release
app.kubernetes.io/name=cockroachdb
controller-uid=54c7c15d-eb1c-4392-930a-d9b8e9225a45
job-name=my-release-cockroachdb-init
Annotations: <none>
Status: Running
IP: 10.44.0.1
IPs:
IP: 10.44.0.1
Controlled By: Job/my-release-cockroachdb-init
Containers:
cluster-init:
Container ID: docker://82a062c6862a9fd5047236feafe6e2654ec1f6e3064fd0513341a1e7f36eaed3
Image: cockroachdb/cockroach:v19.2.4
Image ID: docker-pullable://cockroachdb/cockroach#sha256:511b6d09d5bc42c7566477811a4e774d85d5689f8ba7a87a114b96d115b6149b
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
while true; do initOUT=$(set -x; /cockroach/cockroach init --insecure --host=my-release-cockroachdb-0.my-release-cockroachdb:26257 2>&1); initRC="$?"; echo $initOUT; [[ "$initRC" == "0" ]] && exit 0; [[ "$initOUT" == *"cluster has already been initialized"* ]] && exit 0; sleep 5; done
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sat, 04 Apr 2020 00:28:04 +0100
Finished: Sat, 04 Apr 2020 00:28:04 +0100
Ready: False
Restart Count: 6
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-cz2sn (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-cz2sn:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-cz2sn
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/my-release-cockroachdb-init-tfcks to axon
Normal Pulled 5m9s (x5 over 6m45s) kubelet, axon Container image "cockroachdb/cockroach:v19.2.4" already present on machine
Normal Created 5m8s (x5 over 6m45s) kubelet, axon Created container cluster-init
Normal Started 5m8s (x5 over 6m44s) kubelet, axon Started container cluster-init
Warning BackOff 92s (x26 over 6m42s) kubelet, axon Back-off restarting failed container
When Pods get crashed, the most important thing to troubleshoot is their descriptions(kubectl describe) and logs.
Logs of the failed Pod show that the arch of the cockroach image doesn't match to the nodes.
Run kubectl get po -o wide to get nodes where cockroach runs and check their arch.
A 2-node CockroachDB cluster is an anti-pattern. You need 3 or more nodes to avoid data or cluster-wide unavailability when a single node fails. Consider checking out these videos explaining how data in CockroachDB is organized and then how the nodes in a cluster work together to keep data available in the face of node failure.
Only if you have 3 nodes (or more), you will not risk losing data if any of the notes gets corrupted. Apart from it, its easier to explain how to do it right, than finding out what went wrong, and to find out what went wrong, one must go through the logs.
If you attach the log, I can take a look.
I also wrote a detailed guide that may address the "doing it right" part of my answer. I elaborated even more about the entire process here.

Dashboard not running

I have setup kubenertes on ubuntu server using this link.
Then I installed kubernetes dashboard using:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-rc6/aio/deploy/recommended.yaml
Then I changed the ClusterIP to NodePort 32323, the service to NodePort.
But container is not running.
uday#dockermaster:~$ kubectl -n kubernetes-dashboard get all
NAME READY STATUS RESTARTS AGE
pod/dashboard-metrics-scraper-779f5454cb-pqfrj 1/1 Running 0 50m
pod/kubernetes-dashboard-64686c4bf9-5jkwq 0/1 CrashLoopBackOff 14 50m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/dashboard-metrics-scraper ClusterIP 10.103.22.252 <none> 8000/TCP 50m
service/kubernetes-dashboard NodePort 10.102.48.80 <none> 443:32323/TCP 50m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/dashboard-metrics-scraper 1/1 1 1 50m
deployment.apps/kubernetes-dashboard 0/1 1 0 50m
NAME DESIRED CURRENT READY AGE
replicaset.apps/dashboard-metrics-scraper-779f5454cb 1 1 1 50m
replicaset.apps/kubernetes-dashboard-64686c4bf9 1 1 0 50m
uday#dockermaster:~$ kubectl -n kubernetes-dashboard describe svc kubernetes-dashboard
Name: kubernetes-dashboard
Namespace: kubernetes-dashboard
Labels: k8s-app=kubernetes-dashboard
Annotations: Selector: k8s-app=kubernetes-dashboard
Type: NodePort
IP: 10.102.48.80
Port: <unset> 443/TCP
TargetPort: 8443/TCP
NodePort: <unset> 32323/TCP
Endpoints:
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
Other apps are working fine with NodePort, whether Tomcat/nginx/databases.
But here, it is failing with container creation.
C:\Users\uday\Desktop>kubectl.exe get pods -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
dashboard-metrics-scraper-779f5454cb-pqfrj 1/1 Running 1 20h
kubernetes-dashboard-64686c4bf9-g9z2k 0/1 CrashLoopBackOff 84 18h
C:\Users\uday\Desktop>kubectl.exe describe pod kubernetes-dashboard-64686c4bf9-g9z2k -n kubernetes-dashboard
Name: kubernetes-dashboard-64686c4bf9-g9z2k
Namespace: kubernetes-dashboard
Priority: 0
Node: slave-node/10.0.0.6
Start Time: Sat, 28 Mar 2020 14:16:54 +0000
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=64686c4bf9
Annotations: <none>
Status: Running
IP: 182.244.1.12
IPs:
IP: 182.244.1.12
Controlled By: ReplicaSet/kubernetes-dashboard-64686c4bf9
Containers:
kubernetes-dashboard:
Container ID: docker://470ee8c61998c3c3dda86c58ad17817468f55aa73cd4feecf3b018977ce13ca3
Image: kubernetesui/dashboard:v2.0.0-rc6
Image ID: docker-pullable://kubernetesui/dashboard#sha256:61f9c378c427a3f8a9643f83baa9f96db1ae1357c67a93b533ae7b36d71c69dc
Port: 8443/TCP
Host Port: 0/TCP
Args:
--auto-generate-certificates
--namespace=kubernetes-dashboard
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Sun, 29 Mar 2020 09:01:31 +0000
Finished: Sun, 29 Mar 2020 09:02:01 +0000
Ready: False
Restart Count: 84
Liveness: http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/certs from kubernetes-dashboard-certs (rw)
/tmp from tmp-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kubernetes-dashboard-token-pzfbl (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kubernetes-dashboard-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-certs
Optional: false
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kubernetes-dashboard-token-pzfbl:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-token-pzfbl
Optional: false
QoS Class: BestEffort
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 4m49s (x501 over 123m) kubelet, slave-node Back-off restarting failed container
kubectl.exe logs kubernetes-dashboard-64686c4bf9-g9z2k -n kubernetes-dashboard
2020/03/29 09:01:31 Starting overwatch
2020/03/29 09:01:31 Using namespace: kubernetes-dashboard
2020/03/29 09:01:31 Using in-cluster config to connect to apiserver
2020/03/29 09:01:31 Using secret token for csrf signing
2020/03/29 09:01:31 Initializing csrf token from kubernetes-dashboard-csrf secret
panic: Get https://10.96.0.1:443/api/v1/namespaces/kubernetes-dashboard/secrets/kubernetes-dashboard-csrf: dial tcp 10.96.0.1:443: i/o timeout
goroutine 1 [running]:
github.com/kubernetes/dashboard/src/app/backend/client/csrf.(*csrfTokenManager).init(0xc0004e2dc0)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/csrf/manager.go:40 +0x3b0
github.com/kubernetes/dashboard/src/app/backend/client/csrf.NewCsrfTokenManager(...)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/csrf/manager.go:65
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).initCSRFKey(0xc00043ae80)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/manager.go:499 +0xc6
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).init(0xc00043ae80)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/manager.go:467 +0x47
github.com/kubernetes/dashboard/src/app/backend/client.NewClientManager(...)
/home/travis/build/kubernetes/dashboard/src/app/backend/client/manager.go:548
main.main()
/home/travis/build/kubernetes/dashboard/src/app/backend/dashboard.go:105 +0x20d
The Problem
The reason the application is not coming up is because the Dashboard container itself is not running. If you look at the output you provided you can see this:
pod/kubernetes-dashboard-64686c4bf9-5jkwq 0/1 CrashLoopBackOff 14
So how do we troubleshoot this? Well, there are three principle ways. One of which you'll probably be using more than the other two.
Describe
Describe is a is a command that allows you to fetch details about a resource in kubernetes. This could be metadata information, the amount of replicas you assigned, or even some events depicting why a resource is failing to start. For example, a referenced Container Image in your Pod manifest can not be found in the usable container registries. The syntax for using Describe is like so:
kubectl describe pod -n kubernetes-dashboard kubernetes-dashboard-64686c4bf9-5jkwq
Here are some great docs on the tool as well.
Logs
The next troubleshooting step you'll likely take advantage of in Kubernetes is using the logging architecture. As you're probably aware, when a Docker container is spawned it is common practice to have the logs that are produced by the application be redirected to STDOUT or STDERR for the process. Kubernetes them captures this log data for you and provides an API abstraction layer with which you can interact with it. Sometimes your Describe events won't have any indication of why a process isn't running. However, you can then proceed with grabbing logs from the process to determine what is going wrong. An example syntax might look like:
kubectl logs -f -n kubernetes-dashboard kubernetes-dashboard-64686c4bf9-5jkwq
Exec
The last common troubleshooting technique is Exec. Exec effectively allows you to attach to the a shell in a running container so that you can interact with the live environment to troubleshoot the application. This allows you to do things like see if configuration files were properly staged on the container's filesystem, determine if environment variables were properly expanded and set, etc. An example syntax for Exec might look like:
kubectl exec -it -n kubernetes-dashboard kubernetes-dashboard-64686c4bf9-5jkwq sh
In this case, however, your pod is in a CrashLoopBackoff state. This means that you will not be able to exec into it due to the fact that the container is not running. The Kubernetes API Server recognized a pattern of failures and automatically reduced its attempts at scheduling the workload accordingly.
Here is a good thread on how to troubleshoot pods that enter this state.
Summary
So, now that I've said all of this. How do we answer your question? Well, we can't answer it directly. But I sort of did with my summary above. Because the real answer you're looking for is how to properly troubleshoot linux containers running in Kubernetes. These issues will be a reoccurring theme in your experience with Kubernetes so it's essential to develop debugging skills in the ecosystem as soon as possible.
If the Describe, Logs, and Exec command are unable to help you find out why the Dashboard pod is failing to come up, feel free to add a comment on this answer requesting additional support and I'll be happy to help where I can!

Unable to setup Istio with minikube

I followed Istio's official documentation to setup Istio for sample bookinfo app with minikube. but I'm getting Unable to connect to the server: net/http: TLS handshake timeout error. these are the steps that I have followed(I have kubectl & minikube installed).
minikube start
curl -L https://git.io/getLatestIstio | sh -
cd istio-1.0.3
export PATH=$PWD/bin:$PATH
kubectl apply -f install/kubernetes/helm/istio/templates/crds.yaml
kubectl apply -f install/kubernetes/istio-demo-auth.yaml
kubectl get pods -n istio-system
This is the terminal output I'm getting
$ kubectl get pods -n istio-system
NAME READY STATUS RESTARTS AGE
grafana-9cfc9d4c9-xg7bh 1/1 Running 0 4m
istio-citadel-6d7f9c545b-lwq8s 1/1 Running 0 3m
istio-cleanup-secrets-69hdj 0/1 Completed 0 4m
istio-egressgateway-75dbb8f95d-k6xj2 1/1 Running 0 4m
istio-galley-6d74549bb9-mdc97 0/1 ContainerCreating 0 4m
istio-grafana-post-install-xz9rk 0/1 Completed 0 4m
istio-ingressgateway-6bd4957bc-vhbct 1/1 Running 0 4m
istio-pilot-7f8c49bbd8-x6bmm 0/2 Pending 0 4m
istio-policy-6c65d8cff4-hx2c7 2/2 Running 0 4m
istio-security-post-install-gjfj2 0/1 Completed 0 4m
istio-sidecar-injector-74855c54b9-nnqgx 0/1 ContainerCreating 0 3m
istio-telemetry-65cdd46d6c-rqzfw 2/2 Running 0 4m
istio-tracing-ff94688bb-hgz4h 1/1 Running 0 3m
prometheus-f556886b8-chdxw 1/1 Running 0 4m
servicegraph-778f94d6f8-9xgw5 1/1 Running 0 3m
$kubectl describe pod istio-galley-6d74549bb9-mdc97
Error from server (NotFound): pods "istio-galley-5bf4d6b8f7-8s2z9" not found
pod describe output
$ kubectl -n istio-system describe pod istio-galley-6d74549bb9-mdc97
Name: istio-galley-6d74549bb9-mdc97
Namespace: istio-system
Node: minikube/172.17.0.4
Start Time: Sat, 03 Nov 2018 04:29:57 +0000
Labels: istio=galley
pod-template-hash=1690826493
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
sidecar.istio.io/inject=false
Status: Pending
IP:
Controlled By: ReplicaSet/istio-galley-5bf4d6b8f7
Containers:
validator:
Container ID:
Image: gcr.io/istio-release/galley:1.0.0 Image ID:
Ports: 443/TCP, 9093/TCP Host Ports: 0/TCP, 0/TCP
Command: /usr/local/bin/galley
validator --deployment-namespace=istio-system
--caCertFile=/etc/istio/certs/root-cert.pem
--tlsCertFile=/etc/istio/certs/cert-chain.pem
--tlsKeyFile=/etc/istio/certs/key.pem
--healthCheckInterval=2s
--healthCheckFile=/health
--webhook-config-file
/etc/istio/config/validatingwebhookconfiguration.yaml
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 10m
Liveness: exec [/usr/local/bin/galley probe --probe-path=/health --interval=4s] delay=4s timeout=1s period=4s #success=1 #failure=3
Readiness: exec [/usr/local/bin/galley probe --probe-path=/health --interval=4s] delay=4s timeout=1s period=4s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/istio/certs from certs (ro)
/etc/istio/config from config (ro)
/var/run/secrets/kubernetes.io/serviceaccount from istio-galley-service-account-token-9pcmv(ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
certs:
Type: Secret (a volume populated by a Secret)
SecretName: istio.istio-galley-service-account
Optional: false
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: istio-galley-configuration
Optional: false
istio-galley-service-account-token-9pcmv:
Type: Secret (a volume populated by a Secret)
SecretName: istio-galley-service-account-token-9pcmv
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1m default-scheduler Successfully assigned istio-galley-5bf4d6b8f7-8t8qz to minikube
Normal SuccessfulMountVolume 1m kubelet, minikube MountVolume.SetUp succeeded for volume "config"
Normal SuccessfulMountVolume 1m kubelet, minikube MountVolume.SetUp succeeded for volume "istio-galley-service-account-token-9pcmv"
Warning FailedMount 27s (x7 over 1m) kubelet, minikube MountVolume.SetUp failed for volume "certs" : secrets "istio.istio-galley-service-account" not found
after some time :-
$ kubectl describe pod istio-galley-6d74549bb9-mdc97
Unable to connect to the server: net/http: TLS handshake timeout
so I wait for istio-sidecar-injector and istio-galley containers to get created. If I again run kubectl get pods -n istio-system or any other kubectl commands gives Unable to connect to the server: net/http: TLS handshake timeout error.
Please help me with this issue.
ps: I'm running minikube on ubuntu 16.04
Thanks in advance.
Looks like you are running into this and this the secret istio.istio-galley-service-account is missing in your istio-system namespace. You can try the workaround as described:
Install as outlined in the docs: https://istio.io/docs/setup/kubernetes/minimal-install/ the missing secret is created by the citadel pod which isn't running due to the --set security.enabled=false flag, setting that to true starts citadel and the secret is created.
Problem resolved. when I run minikube start --memory=4048. maybe it was a memory issue.
When using either the istio-demo.yaml or istio-demo-auth.yaml, you'll find that a minimum of 4GB RAM is required to run Istio (particularly when you deploy its sample app, BookInfo, too). This is true whether your running MiniKube or Docker Desktop and is one of the gotchas that Meshery identifies and attempts to help those deploying Istio or other service meshes circumvent.

Unable to launch Alluxio on Kubernetes

I am trying alluxio 1.7.1 with docker 1.13.1, kubernetes 1.9.6, 1.10.1
I created the alluxio docker image as per the instructions on https://www.alluxio.org/docs/1.7/en/Running-Alluxio-On-Docker.html
Then I followed the https://www.alluxio.org/docs/1.7/en/Running-Alluxio-On-Kubernetes.html guide to run alluxio on kubernetes. I was able to bring up the alluxio master pod properly, but when I try to bring up alluxio worker I get the error that Address in use. I have not modified anything in the yamls which I downloaded from alluxio git. Only change I did was for alluxio docker image name and api version in yamls for k8s to match properly.
I checked ports being used in my k8s cluster setup, and even on the nodes also. There are no ports that alluxio wants being used by any other process, but I still get address in use error. I am unable to understand what I can do to debug further or what I should change to make this work. I don't have any other application running on my k8s cluster setup. I tried with single node k8s cluster setup and multi node k8s cluster setup also. I tried k8s version 1.9 and 1.10 also.
There is definitely some issue from alluxio worker side which I am unable to debug.
This is the log that I get from worker pod:
[root#vm-sushil-scrum1-08062018-alluxio-1 kubernetes]# kubectl logs po/alluxio-worker-knqt4
Formatting Alluxio Worker # vm-sushil-scrum1-08062018-alluxio-1
2018-06-08 10:09:55,723 INFO Configuration - Configuration file /opt/alluxio/conf/alluxio-site.properties loaded.
2018-06-08 10:09:55,845 INFO Format - Formatting worker data folder: /alluxioworker/
2018-06-08 10:09:55,845 INFO Format - Formatting Data path for tier 0:/dev/shm/alluxioworker
2018-06-08 10:09:55,856 INFO Format - Formatting complete
2018-06-08 10:09:56,357 INFO Configuration - Configuration file /opt/alluxio/conf/alluxio-site.properties loaded.
2018-06-08 10:09:56,549 INFO TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=10.194.11.7, rack=null)
2018-06-08 10:09:56,866 INFO BlockWorkerFactory - Creating alluxio.worker.block.BlockWorker
2018-06-08 10:09:56,866 INFO FileSystemWorkerFactory - Creating alluxio.worker.file.FileSystemWorker
2018-06-08 10:09:56,942 WARN StorageTier - Failed to verify memory capacity
2018-06-08 10:09:57,082 INFO log - Logging initialized #1160ms
2018-06-08 10:09:57,509 INFO AlluxioWorkerProcess - Domain socket data server is enabled at /opt/domain.
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Address in use
at alluxio.worker.AlluxioWorkerProcess.<init>(AlluxioWorkerProcess.java:164)
at alluxio.worker.WorkerProcess$Factory.create(WorkerProcess.java:45)
at alluxio.worker.WorkerProcess$Factory.create(WorkerProcess.java:37)
at alluxio.worker.AlluxioWorker.main(AlluxioWorker.java:56)
Caused by: java.lang.RuntimeException: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Address in use
at alluxio.util.CommonUtils.createNewClassInstance(CommonUtils.java:224)
at alluxio.worker.DataServer$Factory.create(DataServer.java:45)
at alluxio.worker.AlluxioWorkerProcess.<init>(AlluxioWorkerProcess.java:159)
... 3 more
Caused by: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Address in use
at io.netty.channel.unix.Errors.newIOException(Errors.java:117)
at io.netty.channel.unix.Socket.bind(Socket.java:259)
at io.netty.channel.epoll.EpollServerDomainSocketChannel.doBind(EpollServerDomainSocketChannel.java:75)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:504)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1226)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:495)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:480)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:213)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:305)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at java.lang.Thread.run(Thread.java:748)
-----------------------
[root#vm-sushil-scrum1-08062018-alluxio-1 kubernetes]# kubectl get all
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ds/alluxio-worker 1 1 0 1 0 <none> 42m
ds/alluxio-worker 1 1 0 1 0 <none> 42m
NAME DESIRED CURRENT AGE
statefulsets/alluxio-master 1 1 44m
NAME READY STATUS RESTARTS AGE
po/alluxio-master-0 1/1 Running 0 44m
po/alluxio-worker-knqt4 0/1 Error 12 42m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/alluxio-master ClusterIP None <none> 19998/TCP,19999/TCP 44m
svc/kubernetes ClusterIP 10.254.0.1 <none> 443/TCP 1h
---------------------
[root#vm-sushil-scrum1-08062018-alluxio-1 kubernetes]# kubectl describe po/alluxio-worker-knqt4
Name: alluxio-worker-knqt4
Namespace: default
Node: vm-sushil-scrum1-08062018-alluxio-1/10.194.11.7
Start Time: Fri, 08 Jun 2018 10:09:05 +0000
Labels: app=alluxio
controller-revision-hash=3081903053
name=alluxio-worker
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 10.194.11.7
Controlled By: DaemonSet/alluxio-worker
Containers:
alluxio-worker:
Container ID: docker://40a1eff2cd4dff79d9189d7cb0c4826a6b6e4871fbac65221e7cdd341240e358
Image: alluxio:1.7.1
Image ID: docker://sha256:b080715bd53efc783ee5f54e7f1c451556f93e7608e60e05b4615d32702801af
Ports: 29998/TCP, 29999/TCP, 29996/TCP
Command:
/entrypoint.sh
Args:
worker
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 08 Jun 2018 11:01:37 +0000
Finished: Fri, 08 Jun 2018 11:02:02 +0000
Ready: False
Restart Count: 14
Limits:
cpu: 1
memory: 2G
Requests:
cpu: 500m
memory: 2G
Environment Variables from:
alluxio-config ConfigMap Optional: false
Environment:
ALLUXIO_WORKER_HOSTNAME: (v1:status.hostIP)
Mounts:
/dev/shm from alluxio-ramdisk (rw)
/opt/domain from alluxio-domain (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-7xlz7 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
alluxio-ramdisk:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
alluxio-domain:
Type: HostPath (bare host directory volume)
Path: /tmp/domain
HostPathType: Directory
default-token-7xlz7:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-7xlz7
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulMountVolume 56m kubelet, vm-sushil-scrum1-08062018-alluxio-1 MountVolume.SetUp succeeded for volume "alluxio-domain"
Normal SuccessfulMountVolume 56m kubelet, vm-sushil-scrum1-08062018-alluxio-1 MountVolume.SetUp succeeded for volume "alluxio-ramdisk"
Normal SuccessfulMountVolume 56m kubelet, vm-sushil-scrum1-08062018-alluxio-1 MountVolume.SetUp succeeded for volume "default-token-7xlz7"
Normal Pulled 53m (x5 over 56m) kubelet, vm-sushil-scrum1-08062018-alluxio-1 Container image "alluxio:1.7.1" already present on machine
Normal Created 53m (x5 over 56m) kubelet, vm-sushil-scrum1-08062018-alluxio-1 Created container
Normal Started 53m (x5 over 56m) kubelet, vm-sushil-scrum1-08062018-alluxio-1 Started container
Warning BackOff 1m (x222 over 55m) kubelet, vm-sushil-scrum1-08062018-alluxio-1 Back-off restarting failed container
[root#vm-sushil-scrum1-08062018-alluxio-1 kubernetes]# lsof -n -i :19999 | grep LISTEN
java 8949 root 29u IPv4 12518521 0t0 TCP *:dnp-sec (LISTEN)
[root#vm-sushil-scrum1-08062018-alluxio-1 kubernetes]# lsof -n -i :19998 | grep LISTEN
java 8949 root 19u IPv4 12520458 0t0 TCP *:iec-104-sec (LISTEN)
[root#vm-sushil-scrum1-08062018-alluxio-1 kubernetes]# lsof -n -i :29998 | grep LISTEN
[root#vm-sushil-scrum1-08062018-alluxio-1 kubernetes]# lsof -n -i :29999 | grep LISTEN
[root#vm-sushil-scrum1-08062018-alluxio-1 kubernetes]# lsof -n -i :29996 | grep LISTEN
The alluxio-worker container is always restarting and failing again and again for the same error.
Please guide me what I can do to solve this.
Thanks
The problem was short circuit unix domain socket path. I was using whatever was present by default in alluxio git. In the default integration/kubernetes/conf/alluxio.properties.template the address for ALLUXIO_WORKER_DATA_SERVER_DOMAIN_SOCKET_ADDRESS was not complete. This is properly explained in https://www.alluxio.org/docs/1.7/en/Running-Alluxio-On-Docker.html for enabling short circuit reads in alluxio worker containers using unix domain sockets.
Just because of a missing complete path for unix domain socket the alluxio worker was not able to come up in kubernetes when short circuit read was enabled for alluxio worker.
When I corrected the path in integration/kubernetes/conf/alluxio.properties for ALLUXIO_WORKER_DATA_SERVER_DOMAIN_SOCKET_ADDRESS=/opt/domain/d
Then things started wokring properly. Now also some tests are failing but alteast the alluxio setup is properly up. Now I will debug why some tests are failing.
I have submitted this fix in alluxio git for them to merge it in master branch.
https://github.com/Alluxio/alluxio/pull/7376
On the node where your worker is running, it seems that you have a port already in use.
Try to find which process is using it:
sudo lsof -n -i :80 | grep LISTEN
I read the alluxio configuration files: try with ports 19998, 19999, 29996, 29998, 29999 substituting 80 in the above command.

Error in starting pods- kubernetes. Pods remain in ContainerCreating state

I have installed kubernetes trial version with minikube on my desktop running ubuntu. However there seem to be some issue with bringing up the pods.
Kubectl get pods --all-namespaces shows all the pods in ContainerCreating state and it doesn't shift to Ready.
Even when i do a kubernetes-dahboard, i get
Waiting, endpoint for service is not ready yet.
Minikube version : v0.20.0
Environment:
OS (e.g. from /etc/os-release): Ubuntu 12.04.5 LTS
VM Driver "DriverName": "virtualbox"
ISO version "Boot2DockerURL":
"file:///home/nszig/.minikube/cache/iso/minikube-v0.20.0.iso"
I have installed minikube and kubectl on Ubuntu. However i cannot access the dashboard both through the CLI and through the GUI.
http://127.0.0.1:8001/ui give the below error
{ "kind": "Status", "apiVersion": "v1", "metadata": {}, "status": "Failure", "message": "no endpoints available for service "kubernetes-dashboard"", "reason": "ServiceUnavailable", "code": 503 }
And minikube dashboard on the CLI does not open the dashboard: Output
Waiting, endpoint for service is not ready yet...
Waiting, endpoint for service is not ready yet...
Waiting, endpoint for service is not ready yet...
Waiting, endpoint for service is not ready yet...
.......
Could not find finalized endpoint being pointed to by kubernetes-dashboard: Temporary Error: Endpoint for service is not ready yet
Temporary Error: Endpoint for service is not ready yet
Temporary Error: Endpoint for service is not ready yet
Temporary Error: Endpoint for service is not ready yet
kubectl version: Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.0", GitCommit:"d3ada0119e776222f11ec7945e6d860061339aad", GitTreeState:"clean", BuildDate:"2017-06-29T23:15:59Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"dirty", BuildDate:"2017-06-22T04:31:09Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
minikube logs also reports the errors below:
.....
Jul 10 08:46:12 minikube localkube[3237]: I0710 08:46:12.901880 3237 kuberuntime_manager.go:458] Container {Name:php-redis Image:gcr.io/google-samples/gb-frontend:v4 Command:[] Args:[] WorkingDir: Ports:[{Name: HostPort:0 ContainerPort:80 Protocol:TCP HostIP:}] EnvFrom:[] Env:[{Name:GET_HOSTS_FROM Value:dns ValueFrom:nil}] Resources:{Limits:map[] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:} s:100m Format:DecimalSI} memory:{i:{value:104857600 scale:0} d:{Dec:} s:100Mi Format:BinarySI}]} VolumeMounts:[{Name:default-token-gqtvf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it. Jul 10 08:46:14 minikube localkube[3237]: E0710 08:46:14.139555 3237 remote_runtime.go:86] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = unable to pull sandbox image "gcr.io/google_containers/pause-amd64:3.0": Error response from daemon: Get https://gcr.io/v1/_ping: x509: certificate signed by unknown authority ....
Name: kubernetes-dashboard-2039414953-czptd Namespace: kube-system
Node: minikube/192.168.99.102 Start Time: Fri, 14 Jul 2017 09:31:58
+0530 Labels: k8s-app=kubernetes-dashboard pod-template-hash=2039414953
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"kubernetes-dashboard-2039414953","uid":"2eb39682-6849-11e7-8...
Status: Pending IP: Created
By: ReplicaSet/kubernetes-dashboard-2039414953 Controlled
By: ReplicaSet/kubernetes-dashboard-2039414953 Containers:
kubernetes-dashboard:
Container ID:
Image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.1
Image ID:
Port: 9090/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Liveness: http-get http://:9090/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kubernetes-dashboard-token-12gdj (ro) Conditions: Type Status
Initialized True Ready False PodScheduled True Volumes:
kubernetes-dashboard-token-12gdj:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-token-12gdj
Optional: false QoS Class: BestEffort Node-Selectors: Tolerations: node-role.kubernetes.io/master:NoSchedule Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ ------- 1h 11s 443 kubelet, minikube Warning FailedSync Error syncing
pod, skipping: failed to "CreatePodSandbox" for
"kubernetes-dashboard-2039414953-czptd_kube-system(2eb57d9b-6849-11e7-8a56-080027206461)"
with CreatePodSandboxError: "CreatePodSandbox for pod
\"kubernetes-dashboard-2039414953-czptd_kube-system(2eb57d9b-6849-11e7-8a56-080027206461)\"
failed: rpc error: code = 2 desc = unable to pull sandbox image
\"gcr.io/google_containers/pause-amd64:3.0\": Error response from
daemon: Get https://gcr.io/v1/_ping: x509: certificate signed by
unknown authority"
It's quite possible that the Pod container images are being downloaded. The images are not very large so the images should get downloaded pretty quickly on a decent internet connection.
You can use kubectl describe pod --namespace kube-system <pod-name> to know more details on the pod bring up status. Take a look at the Events section of the output.
Until all the kubernetes components in the kube-system namespace are in READY state, you will not be able to access the dashboard.
You can also try SSH'ing into the minikube vm with minikube ssh to debug the issue.
I was able to resolve this issue by doing a clean install using a VPN connection as i had restrictions in my corporate network. This was blocking the site from where the install was trying to pull the sandbox image.
Try using:
kubectl config use-context minikube
..as a preexisting configuration may have be initiated.
guys i did these and it worked for me
ON MASTER ONLY
####################
kubeadm init --apiserver-advertise-address=0.0.0.0 --pod-network-cidr=10.244.0.0/16
(copy join)
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
ON WORKER NODE ##
###################
kubeadm reset
EXECUTE THE JOIN COMMAND WHICH YOU GOT FROM MASTER AFTER KUBEADM INIT.
#kubeadm join