Readiness fails in the Eclipse Hono pods of the Cloud2Edge package - kubernetes

I am a bit desperate and I hope someone can help me. A few months ago I installed the eclipse cloud2edge package on a kubernetes cluster by following the installation instructions, creating a persistentVolume and running the helm install command with these options.
helm install -n $NS --wait --timeout 15m $RELEASE eclipse-iot/cloud2edge --set hono.prometheus.createInstance=false --set hono.grafana.enabled=false --dependency-update --debug
The yaml of the persistentVolume is the following and I create it in the same namespace that I install the package.
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-device-registry
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 1Mi
hostPath:
path: /mnt/
type: Directory
Everything works perfectly, all pods were ready and running, until the other day when the cluster crashed and some pods stopped working.
The kubectl get pods -n $NS output is as follows:
NAME READY STATUS RESTARTS AGE
ditto-mongodb-7b78b468fb-8kshj 1/1 Running 0 50m
dt-adapter-amqp-vertx-6699ccf495-fc8nx 0/1 Running 0 50m
dt-adapter-http-vertx-545564ff9f-gx5fp 0/1 Running 0 50m
dt-adapter-mqtt-vertx-58c8975678-k5n49 0/1 Running 0 50m
dt-artemis-6759fb6cb8-5rq8p 1/1 Running 1 50m
dt-dispatch-router-5bc7586f76-57dwb 1/1 Running 0 50m
dt-ditto-concierge-f6d5f6f9c-pfmcw 1/1 Running 0 50m
dt-ditto-connectivity-f556db698-q89bw 1/1 Running 0 50m
dt-ditto-gateway-589d8f5596-59c5b 1/1 Running 0 50m
dt-ditto-nginx-897b5bc76-cx2dr 1/1 Running 0 50m
dt-ditto-policies-75cb5c6557-j5zdg 1/1 Running 0 50m
dt-ditto-swaggerui-6f6f989ccd-jkhsk 1/1 Running 0 50m
dt-ditto-things-79ff869bc9-l9lct 1/1 Running 0 50m
dt-ditto-thingssearch-58c5578bb9-pwd9k 1/1 Running 0 50m
dt-service-auth-698d4cdfff-ch5wp 1/1 Running 0 50m
dt-service-command-router-59d6556b5f-4nfcj 0/1 Running 0 50m
dt-service-device-registry-7cf75d794f-pk9ct 0/1 Running 0 50m
The pods that fail all have the same error when running kubectl describe pod POD_NAME -n $NS.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 53m default-scheduler Successfully assigned digitaltwins/dt-service-command-router-59d6556b5f-4nfcj to node1
Normal Pulled 53m kubelet Container image "index.docker.io/eclipse/hono-service-command-router:1.8.0" already present on machine
Normal Created 53m kubelet Created container service-command-router
Normal Started 53m kubelet Started container service-command-router
Warning Unhealthy 52m kubelet Readiness probe failed: Get "https://10.244.1.89:8088/readiness": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 2m58s (x295 over 51m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
According to this, the readinessProbe fails. In the yalm definition of the affected deployments, the readinessProbe is defined:
readinessProbe:
failureThreshold: 3
httpGet:
path: /readiness
port: health
scheme: HTTPS
initialDelaySeconds: 45
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
I have tried increasing these values, increasing the delay to 600 and the timeout to 10. Also i have tried uninstalling the package and installing it again, but nothing changes: the installation fails because the pods are never ready and the timeout pops up. I have also exposed port 8088 (health) and called /readiness with wget and the result is still 503. On the other hand, I have tested if livenessProbe works and it works fine. I have also tried resetting the cluster. First I manually deleted everything in it and then used the following commands:
sudo kubeadm reset
sudo iptables -F && sudo iptables -t nat -F && sudo iptables -t mangle -F && sudo iptables -X
sudo systemctl stop kubelet
sudo systemctl stop docker
sudo rm -rf /var/lib/cni/
sudo rm -rf /var/lib/kubelet/*
sudo rm -rf /etc/cni/
sudo ifconfig cni0 down
sudo ifconfig flannel.1 down
sudo ifconfig docker0 down
sudo ip link set cni0 down
sudo brctl delbr cni0
sudo systemctl start docker
sudo kubeadm init --apiserver-advertise-address=192.168.44.11 --pod-network-cidr=10.244.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl --kubeconfig $HOME/.kube/config apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
The cluster seems to work fine because the Eclipse Ditto part has no problem, it's just the Eclipse Hono part. I add a little more information in case it may be useful.
The kubectl logs dt-service-command-router-b654c8dcb-s2g6t -n $NS output:
12:30:06.340 [vert.x-eventloop-thread-1] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin /10.244.1.101:44142 failed to connect over ssl: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
12:30:06.756 [vert.x-eventloop-thread-1] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin /10.244.1.100:46550 failed to connect over ssl: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
12:30:07.876 [vert.x-eventloop-thread-1] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin /10.244.1.102:40706 failed to connect over ssl: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.client.impl.HonoConnectionImpl - starting attempt [#258] to connect to server [dt-service-device-registry:5671, role: Device Registration]
12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - OpenSSL [available: false, supports KeyManagerFactory: false]
12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - using JDK's default SSL engine
12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - enabling secure protocol [TLSv1.3]
12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - enabling secure protocol [TLSv1.2]
12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - connecting to AMQP 1.0 container [amqps://dt-service-device-registry:5671, role: Device Registration]
12:30:08.339 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - can't connect to AMQP 1.0 container [amqps://dt-service-device-registry:5671, role: Device Registration]: Failed to create SSL connection
12:30:08.339 [vert.x-eventloop-thread-1] WARN o.e.h.client.impl.HonoConnectionImpl - attempt [#258] to connect to server [dt-service-device-registry:5671, role: Device Registration] failed
javax.net.ssl.SSLHandshakeException: Failed to create SSL connection
The kubectl logs dt-adapter-amqp-vertx-74d69cbc44-7kmdq -n $NS output:
12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.client.impl.HonoConnectionImpl - starting attempt [#19] to connect to server [dt-service-device-registry:5671, role: Credentials]
12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - OpenSSL [available: false, supports KeyManagerFactory: false]
12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - using JDK's default SSL engine
12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - enabling secure protocol [TLSv1.3]
12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - enabling secure protocol [TLSv1.2]
12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - connecting to AMQP 1.0 container [amqps://dt-service-device-registry:5671, role: Credentials]
12:19:36.711 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - can't connect to AMQP 1.0 container [amqps://dt-service-device-registry:5671, role: Credentials]: Failed to create SSL connection
12:19:36.712 [vert.x-eventloop-thread-0] WARN o.e.h.client.impl.HonoConnectionImpl - attempt [#19] to connect to server [dt-service-device-registry:5671, role: Credentials] failed
javax.net.ssl.SSLHandshakeException: Failed to create SSL connection
The kubectl version output is as follows:
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.16", GitCommit:"e37e4ab4cc8dcda84f1344dda47a97bb1927d074", GitTreeState:"clean", BuildDate:"2021-10-27T16:20:18Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
Thanks in advance!

based on the iconic Failed to create SSL Connection output in the logs, I assume that you have run into the dreaded The demo certificates included in the Hono chart have expired problem.
The Cloud2Edge package chart is being updated currently (https://github.com/eclipse/packages/pull/337) with the most recent version of the Ditto and Hono charts (which includes fresh certificates that are valid for two more years to come). As soon as that PR is merged and the Eclipse Packages chart repository has been rebuilt, you should be able to do a helm repo update and then (hopefully) succesfully install the c2e package.

Related

How to expose traefik v2 dashboard in k3d/k3s via configuration?

*Cross-posted to k3d github discussions, to a thread in Rancher forums, and to traefik's community discussion board
Tutorials from 2020 refer to editing the traefik configmap. Where did it go?
The traefik installation instructions refer to a couple of ways to expose the dashboard:
This works, but isn't persistent: Using a 1-time command kubectl -n kube-system port-forward $(kubectl -n kube-system get pods --selector "app.kubernetes.io/name=traefik" --output=name) 9000:9000
I cannot get this to work: Creating an "IngressRoute" yaml file and applying it to the cluster. This might be due to the Klipper LB and/or my ignorance.
No configmap in use by traefik deployment
For a 2-server, 2-agent cluster... kubectl -n kube-system describe deploy traefik does not show any configmap:
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
Priority Class Name: system-cluster-critical
No "traefik" configmap
And, kubectl get -n kube-system cm shows:
NAME DATA AGE
chart-content-traefik 0 28m
chart-content-traefik-crd 0 28m
chart-values-traefik 1 28m
chart-values-traefik-crd 0 28m
cluster-dns 2 28m
coredns 2 28m
extension-apiserver-authentication 6 28m
k3s 0 28m
k3s-etcd-snapshots 0 28m
kube-root-ca.crt 1 27m
local-path-config 4 28m
No configmap in use by traefik pods
Describing the pod doesn't turn up anything either. kubectl -n kube-system describe pod traefik-.... results in no configmap too.
Traefik ports in use, but not responding
However, I did see arguments to the traefik pod with ports in use:
--entryPoints.traefik.address=:9000/tcp
--entryPoints.web.address=:8000/tcp
--entryPoints.websecure.address=:8443/tcp
However, these ports are not exposed. So, I tried port-forward with kubectl port-forward pods/traefik-97b44b794-r9srz 9000:9000 8000:8000 8443:8443 -n kube-system --address 0.0.0.0, but when I curl -v localhost:9000 (or 8000 or 8443) and curl -v localhost:9000/dashboard, I get "404 Not Found".
After getting a terminal to traefik, I discovered the local ports that are open, but it seems nothing is responding:
/ $ netstat -lntu
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 :::8443 :::* LISTEN
tcp 0 0 :::8000 :::* LISTEN
tcp 0 0 :::9000 :::* LISTEN
/ $ wget localhost:9000
Connecting to localhost:9000 ([::1]:9000)
wget: server returned error: HTTP/1.1 404 Not Found
/ $ wget localhost:8000
Connecting to localhost:8000 ([::1]:8000)
wget: server returned error: HTTP/1.1 404 Not Found
/ $ wget localhost:8443
Connecting to localhost:8443 ([::1]:8443)
wget: server returned error: HTTP/1.1 404 Not Found
Versions
k3d version v4.4.7
k3s version v1.21.2-k3s1 (default)
I found a solution and hopefully someone find a better one soon
you need to control your k3s cluster from your pc and not to ssh into master node, so add /etc/rancher/k3s/k3s.yaml into your local ~/.kube/config (in order to port forward in last step into your pc)
now get your pod name as follows:
kubectl get pod -n kube-system
and seach for traefik-something-somethingElse
mine was traefik-97b44b794-bsvjn
now this part is needed from your local pc
kubectl port-forward traefik-97b44b794-bsvjn -n kube-system 9000:9000
get http://localhost:9000/dashboard/ in your favorite browser and don't forget the second slash
enjoy the dashboard
please note you have to enable the dashboard first in /var/lib/rancher/k3s/server/manifests/traefik.yaml by adding
dashboard:
enabled: true
Jakub's answer is pretty good. But one thing that is unfortunate about it is that if k3s restarts, the configs get reset. According to the k3s docs, if you create a custom file called /var/lib/rancher/k3s/server/manifests/traefik-config.yaml, k3s' traefik will automatically update with this new config and use its values. Here is what I have:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: traefik
namespace: kube-system
spec:
valuesContent: |-
dashboard:
enabled: true
ports:
traefik:
expose: true # this is not recommended in production deployments, but I want to be able to see my dashboard locally
logs:
access:
enabled: true
With this setup, you can skip the port-forwarding and just go to http://localhost:9000/dashboard/ directly!
for the current latest version of k3s (1.21.4):
according to traefik's installation guide (https://doc.traefik.io/traefik/getting-started/install-traefik/#exposing-the-traefik-dashboard), create dashboard.yaml with the proper CRD content, and run
kubectl apply -f dashboard.yaml
create DNS record or modify host file of the hostname - ip mapping for you set up in last step

New kubernetes install has remnants of old cluster

I did a complete tear down of a v1.13.1 cluster and am now running v1.15.0 with calico cni v3.8.0. All pods are running:
[gms#thalia0 ~]$ kubectl get po --namespace=kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-59f54d6bbc-2mjxt 1/1 Running 0 7m23s
calico-node-57lwg 1/1 Running 0 7m23s
coredns-5c98db65d4-qjzpq 1/1 Running 0 8m46s
coredns-5c98db65d4-xx2sh 1/1 Running 0 8m46s
etcd-thalia0.ahc.umn.edu 1/1 Running 0 8m5s
kube-apiserver-thalia0.ahc.umn.edu 1/1 Running 0 7m46s
kube-controller-manager-thalia0.ahc.umn.edu 1/1 Running 0 8m2s
kube-proxy-lg4cn 1/1 Running 0 8m46s
kube-scheduler-thalia0.ahc.umn.edu 1/1 Running 0 7m40s
But, when I look at the endpoint, I get the following:
[gms#thalia0 ~]$ kubectl get ep --namespace=kube-system
NAME ENDPOINTS AGE
kube-controller-manager <none> 9m46s
kube-dns 192.168.16.194:53,192.168.16.195:53,192.168.16.194:53 + 3 more... 9m30s
kube-scheduler <none> 9m46s
If I look at the log for the apiserver, I get a ton of TLS handshake errors, along the lines of:
I0718 19:35:17.148852 1 log.go:172] http: TLS handshake error from 10.x.x.160:45042: remote error: tls: bad certificate
I0718 19:35:17.158375 1 log.go:172] http: TLS handshake error from 10.x.x.159:53506: remote error: tls: bad certificate
These IP addresses were from nodes in a previous cluster. I had deleted them and done a kubeadm reset on all nodes, including master, so I have no idea why these are showing up. I would assume this is why the endpoints for the controller-manager and the scheduler are showing up as <none>.
In order to completely wipe your cluster you should do next:
1) Reset cluster
$sudo kubeadm reset (or use appropriate to your cluster command)
2) Wipe your local directory with configs
$rm -rf .kube/
3) Remove /etc/kubernetes/
$sudo rm -rf /etc/kubernetes/
4)And one of the main point is to get rid of your previous etc state configuration.
$sudo rm -rf /var/lib/etcd/

“Error: forwarding ports: Upgrade request required” Error in helm of a kubernetes cluster

I have a kubernetes cluster built using kubespray and imported to Rancher.
The nodes are configured with
CentOS Linux 7 3.10.0-957.12.1.el7.x86_64
Docker version : 18.9.5
Kubelet version : v1.14.1
Tiller version : v2.14.1 ( got this version from the tiller pod's image gcr.io/kubernetes-helm/tiller:v2.14.1 )
All the tiller resources are working fine:
$ kubectl get all -n kube-system | findstr tiller
pod/tiller-deploy-57ff77d846-frtb7 1/1 Running 0 12d
service/tiller-deploy ClusterIP 10.233.49.112 <none> 44134/TCP 16d
deployment.apps/tiller-deploy 1 1 1 1 16d
replicaset.apps/tiller-deploy-57ff77d846 1 1 1 12d
replicaset.apps/tiller-deploy-69d5cd79bb 0 0 0 16d
But when I run the helm commands, I am getting this error:
$ helm version
Client: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}
Error: forwarding ports: error upgrading connection: Upgrade request required
$ helm ls
Error: forwarding ports: error upgrading connection: Upgrade request required
I tried:
The tiller version is 2.14.1. So, upgraded the helm client to version 2.14.1 from 2.11.0. But that doesn't solve the issue.
Can someone help me to solve this error?
Each time when you invoke Helm command a specific port on the host machine is proxied to the target tiller Pod port 44134, that is a simply inherited kubectl port-forward command and you can even find Go package portforward.go used by Helm client to initiate connection to
the server. Therefore, issue that you are consulting here is mostly connected with port forwarding (tunneling) problem between Helm client and server parties.
I would probably establish manual port-forward check:
kubectl -n kube-system port-forward <tiller-deploy-Pod> <some_port>:44134
and even verify whether tiller service is listening on 44134 port:
kubectl exec -it <tiller-deploy-Pod> -n kube-system -- ./tiller
Find more information about Helm implementation in the official documentation.

Minikube does not start, kubectl connection to server was refused

Scouring stack overflow solutions for similar problems did not resolve my issue, so hoping to share what I'm currently experiencing to get help debugging this.
So a small preface; I initially installed minikube/kubectl a couple days back. I went ahead and tried following the minikube tutorial today and am now experiencing issues. I'm following the minikube getting started guide.
I am on MacOS. My versions:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:22:21Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Unable to connect to the server: net/http: TLS handshake timeout
$ minikube version
minikube version: v0.26.1
$ vboxmanage --version
5.1.20r114629
The following are a string of commands I've tried to check responses..
$ minikube start
Starting VM...
Getting VM IP address...
Moving files into cluster...
E0503 11:08:18.654428 20197 start.go:234] Error updating cluster: downloading binaries: transferring kubeadm file: &{BaseAsset:{data:[] reader:0xc4200861a8 Length:0 AssetName:/Users/philipyoo/.minikube/cache/v1.10.0/kubeadm TargetDir:/usr/bin TargetName:kubeadm Permissions:0641}}: Error running scp command: sudo scp -t /usr/bin output: : wait: remote command exited without exit status or exit signal
$ minikube status
cluster: Running
kubectl: Correctly Configured: pointing to minikube-vm at 192.168.99.103
Edit:
I don't know what happened, but checking the status again returned "Misconfigured". I ran the recommended command $ minikube update-context and now the $ minikube ip points to "172.17.0.1". Pinging this IP returns request timeouts, 100% packet loss. Double-checked context and I'm still using "minikube" both for context and cluster:
$ kubectl config get-cluster
$ kubectl config get-context
$ kubectl get pods
The connection to the server 192.168.99.103:8443 was refused - did you specify the right host or port?
Reading github issues, I ran into this one: kubernetes#44665. So...
$ ls /etc/kubernetes
ls: /etc/kubernetes: No such file or directory
Only the last few entries
$ minikube logs
May 03 18:10:48 minikube kubelet[3405]: E0503 18:10:47.933251 3405 event.go:209] Unable to write event: 'Patch https://192.168.99.103:8443/api/v1/namespaces/default/events/minikube.152b315ce3475a80: dial tcp 192.168.99.103:8443: getsockopt: connection refused' (may retry after sleeping)
May 03 18:10:49 minikube kubelet[3405]: E0503 18:10:49.160920 3405 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:465: Failed to list *v1.Service: Get https://192.168.99.103:8443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.99.103:8443: getsockopt: connection refused
May 03 18:10:51 minikube kubelet[3405]: E0503 18:10:51.670344 3405 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.99.103:8443/api/v1/pods?fieldSelector=spec.nodeName%3Dminikube&limit=500&resourceVersion=0: dial tcp 192.168.99.103:8443: getsockopt: connection refused
May 03 18:10:53 minikube kubelet[3405]: W0503 18:10:53.017289 3405 status_manager.go:459] Failed to get status for pod "kube-controller-manager-minikube_kube-system(c801aa20d5b60df68810fccc384efdd5)": Get https://192.168.99.103:8443/api/v1/namespaces/kube-system/pods/kube-controller-manager-minikube: dial tcp 192.168.99.103:8443: getsockopt: connection refused
May 03 18:10:53 minikube kubelet[3405]: E0503 18:10:52.595134 3405 rkt.go:65] detectRktContainers: listRunningPods failed: rpc error: code = Unavailable desc = grpc: the connection is unavailable
I'm not exactly sure how to ping an https url, but if I ping the ip
$ kube ping 192.168.99.103
PING 192.168.99.103 (192.168.99.103): 56 data bytes
64 bytes from 192.168.99.103: icmp_seq=0 ttl=64 time=4.632 ms
64 bytes from 192.168.99.103: icmp_seq=1 ttl=64 time=0.363 ms
64 bytes from 192.168.99.103: icmp_seq=2 ttl=64 time=0.826 ms
^C
--- 192.168.99.103 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.363/1.940/4.632/1.913 ms
Looking at kube config file...
$ cat ~/.kube/config
apiVersion: v1
clusters:
- cluster:
insecure-skip-tls-verify: true
server: https://localhost:6443
name: docker-for-desktop-cluster
- cluster:
certificate-authority: /Users/philipyoo/.minikube/ca.crt
server: https://192.168.99.103:8443
name: minikube
contexts:
- context:
cluster: docker-for-desktop-cluster
user: docker-for-desktop
name: docker-for-desktop
- context:
cluster: minikube
user: minikube
name: minikube
current-context: minikube
kind: Config
preferences: {}
users:
- name: docker-for-desktop
user:
client-certificate-data: <removed>
client-key-data: <removed>
- name: minikube
user:
client-certificate: /Users/philipyoo/.minikube/client.crt
client-key: /Users/philipyoo/.minikube/client.key
And to make sure my key/crts are there:
$ ls ~/.minikube
addons/ ca.pem* client.key machines/ proxy-client.key
apiserver.crt cache/ config/ profiles/
apiserver.key cert.pem* files/ proxy-client-ca.crt
ca.crt certs/ key.pem* proxy-client-ca.key
ca.key client.crt logs/ proxy-client.crt
Any help in debugging is super appreciated!
For posterity, the solution to this problem was to delete the
.minikube
directory in the user's home directory, and then try again. Often fixes strange minikube problems.
I had the same issue when I started minikube.
OS
MacOs HighSierra
Minikube
minikube version: v0.33.1
kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.11", GitCommit:"637c7e288581ee40ab4ca210618a89a555b6e7e9", GitTreeState:"clean", BuildDate:"2018-11-26T14:38:32Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-10T23:28:14Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"linux/amd64"}
Solution 1
I just change the permission of the kubeadm file and start the minikube as below. Then it works fine.
sudo chmod 777 /Users/buddhi/.minikube/cache/v1.13.2/kubeadm
In general, you have to do
sudo chmod 777 <PATH_TO_THE_KUBEADM_FILE>
Solution 2
If you no longer need the existing minikube cluster you can try out this.
minikube stop
minikube delete
minikube start
Here you stop and delete existing minikube cluster and create another one.
Hope this might help someone.

Error in starting pods- kubernetes. Pods remain in ContainerCreating state

I have installed kubernetes trial version with minikube on my desktop running ubuntu. However there seem to be some issue with bringing up the pods.
Kubectl get pods --all-namespaces shows all the pods in ContainerCreating state and it doesn't shift to Ready.
Even when i do a kubernetes-dahboard, i get
Waiting, endpoint for service is not ready yet.
Minikube version : v0.20.0
Environment:
OS (e.g. from /etc/os-release): Ubuntu 12.04.5 LTS
VM Driver "DriverName": "virtualbox"
ISO version "Boot2DockerURL":
"file:///home/nszig/.minikube/cache/iso/minikube-v0.20.0.iso"
I have installed minikube and kubectl on Ubuntu. However i cannot access the dashboard both through the CLI and through the GUI.
http://127.0.0.1:8001/ui give the below error
{ "kind": "Status", "apiVersion": "v1", "metadata": {}, "status": "Failure", "message": "no endpoints available for service "kubernetes-dashboard"", "reason": "ServiceUnavailable", "code": 503 }
And minikube dashboard on the CLI does not open the dashboard: Output
Waiting, endpoint for service is not ready yet...
Waiting, endpoint for service is not ready yet...
Waiting, endpoint for service is not ready yet...
Waiting, endpoint for service is not ready yet...
.......
Could not find finalized endpoint being pointed to by kubernetes-dashboard: Temporary Error: Endpoint for service is not ready yet
Temporary Error: Endpoint for service is not ready yet
Temporary Error: Endpoint for service is not ready yet
Temporary Error: Endpoint for service is not ready yet
kubectl version: Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.0", GitCommit:"d3ada0119e776222f11ec7945e6d860061339aad", GitTreeState:"clean", BuildDate:"2017-06-29T23:15:59Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"dirty", BuildDate:"2017-06-22T04:31:09Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
minikube logs also reports the errors below:
.....
Jul 10 08:46:12 minikube localkube[3237]: I0710 08:46:12.901880 3237 kuberuntime_manager.go:458] Container {Name:php-redis Image:gcr.io/google-samples/gb-frontend:v4 Command:[] Args:[] WorkingDir: Ports:[{Name: HostPort:0 ContainerPort:80 Protocol:TCP HostIP:}] EnvFrom:[] Env:[{Name:GET_HOSTS_FROM Value:dns ValueFrom:nil}] Resources:{Limits:map[] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:} s:100m Format:DecimalSI} memory:{i:{value:104857600 scale:0} d:{Dec:} s:100Mi Format:BinarySI}]} VolumeMounts:[{Name:default-token-gqtvf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it. Jul 10 08:46:14 minikube localkube[3237]: E0710 08:46:14.139555 3237 remote_runtime.go:86] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = unable to pull sandbox image "gcr.io/google_containers/pause-amd64:3.0": Error response from daemon: Get https://gcr.io/v1/_ping: x509: certificate signed by unknown authority ....
Name: kubernetes-dashboard-2039414953-czptd Namespace: kube-system
Node: minikube/192.168.99.102 Start Time: Fri, 14 Jul 2017 09:31:58
+0530 Labels: k8s-app=kubernetes-dashboard pod-template-hash=2039414953
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"kubernetes-dashboard-2039414953","uid":"2eb39682-6849-11e7-8...
Status: Pending IP: Created
By: ReplicaSet/kubernetes-dashboard-2039414953 Controlled
By: ReplicaSet/kubernetes-dashboard-2039414953 Containers:
kubernetes-dashboard:
Container ID:
Image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.1
Image ID:
Port: 9090/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Liveness: http-get http://:9090/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kubernetes-dashboard-token-12gdj (ro) Conditions: Type Status
Initialized True Ready False PodScheduled True Volumes:
kubernetes-dashboard-token-12gdj:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-token-12gdj
Optional: false QoS Class: BestEffort Node-Selectors: Tolerations: node-role.kubernetes.io/master:NoSchedule Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ ------- 1h 11s 443 kubelet, minikube Warning FailedSync Error syncing
pod, skipping: failed to "CreatePodSandbox" for
"kubernetes-dashboard-2039414953-czptd_kube-system(2eb57d9b-6849-11e7-8a56-080027206461)"
with CreatePodSandboxError: "CreatePodSandbox for pod
\"kubernetes-dashboard-2039414953-czptd_kube-system(2eb57d9b-6849-11e7-8a56-080027206461)\"
failed: rpc error: code = 2 desc = unable to pull sandbox image
\"gcr.io/google_containers/pause-amd64:3.0\": Error response from
daemon: Get https://gcr.io/v1/_ping: x509: certificate signed by
unknown authority"
It's quite possible that the Pod container images are being downloaded. The images are not very large so the images should get downloaded pretty quickly on a decent internet connection.
You can use kubectl describe pod --namespace kube-system <pod-name> to know more details on the pod bring up status. Take a look at the Events section of the output.
Until all the kubernetes components in the kube-system namespace are in READY state, you will not be able to access the dashboard.
You can also try SSH'ing into the minikube vm with minikube ssh to debug the issue.
I was able to resolve this issue by doing a clean install using a VPN connection as i had restrictions in my corporate network. This was blocking the site from where the install was trying to pull the sandbox image.
Try using:
kubectl config use-context minikube
..as a preexisting configuration may have be initiated.
guys i did these and it worked for me
ON MASTER ONLY
####################
kubeadm init --apiserver-advertise-address=0.0.0.0 --pod-network-cidr=10.244.0.0/16
(copy join)
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
ON WORKER NODE ##
###################
kubeadm reset
EXECUTE THE JOIN COMMAND WHICH YOU GOT FROM MASTER AFTER KUBEADM INIT.
#kubeadm join