Error in starting pods- kubernetes. Pods remain in ContainerCreating state - kubernetes

I have installed kubernetes trial version with minikube on my desktop running ubuntu. However there seem to be some issue with bringing up the pods.
Kubectl get pods --all-namespaces shows all the pods in ContainerCreating state and it doesn't shift to Ready.
Even when i do a kubernetes-dahboard, i get
Waiting, endpoint for service is not ready yet.
Minikube version : v0.20.0
Environment:
OS (e.g. from /etc/os-release): Ubuntu 12.04.5 LTS
VM Driver "DriverName": "virtualbox"
ISO version "Boot2DockerURL":
"file:///home/nszig/.minikube/cache/iso/minikube-v0.20.0.iso"
I have installed minikube and kubectl on Ubuntu. However i cannot access the dashboard both through the CLI and through the GUI.
http://127.0.0.1:8001/ui give the below error
{ "kind": "Status", "apiVersion": "v1", "metadata": {}, "status": "Failure", "message": "no endpoints available for service "kubernetes-dashboard"", "reason": "ServiceUnavailable", "code": 503 }
And minikube dashboard on the CLI does not open the dashboard: Output
Waiting, endpoint for service is not ready yet...
Waiting, endpoint for service is not ready yet...
Waiting, endpoint for service is not ready yet...
Waiting, endpoint for service is not ready yet...
.......
Could not find finalized endpoint being pointed to by kubernetes-dashboard: Temporary Error: Endpoint for service is not ready yet
Temporary Error: Endpoint for service is not ready yet
Temporary Error: Endpoint for service is not ready yet
Temporary Error: Endpoint for service is not ready yet
kubectl version: Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.0", GitCommit:"d3ada0119e776222f11ec7945e6d860061339aad", GitTreeState:"clean", BuildDate:"2017-06-29T23:15:59Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"dirty", BuildDate:"2017-06-22T04:31:09Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
minikube logs also reports the errors below:
.....
Jul 10 08:46:12 minikube localkube[3237]: I0710 08:46:12.901880 3237 kuberuntime_manager.go:458] Container {Name:php-redis Image:gcr.io/google-samples/gb-frontend:v4 Command:[] Args:[] WorkingDir: Ports:[{Name: HostPort:0 ContainerPort:80 Protocol:TCP HostIP:}] EnvFrom:[] Env:[{Name:GET_HOSTS_FROM Value:dns ValueFrom:nil}] Resources:{Limits:map[] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:} s:100m Format:DecimalSI} memory:{i:{value:104857600 scale:0} d:{Dec:} s:100Mi Format:BinarySI}]} VolumeMounts:[{Name:default-token-gqtvf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it. Jul 10 08:46:14 minikube localkube[3237]: E0710 08:46:14.139555 3237 remote_runtime.go:86] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = unable to pull sandbox image "gcr.io/google_containers/pause-amd64:3.0": Error response from daemon: Get https://gcr.io/v1/_ping: x509: certificate signed by unknown authority ....
Name: kubernetes-dashboard-2039414953-czptd Namespace: kube-system
Node: minikube/192.168.99.102 Start Time: Fri, 14 Jul 2017 09:31:58
+0530 Labels: k8s-app=kubernetes-dashboard pod-template-hash=2039414953
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"kubernetes-dashboard-2039414953","uid":"2eb39682-6849-11e7-8...
Status: Pending IP: Created
By: ReplicaSet/kubernetes-dashboard-2039414953 Controlled
By: ReplicaSet/kubernetes-dashboard-2039414953 Containers:
kubernetes-dashboard:
Container ID:
Image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.1
Image ID:
Port: 9090/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Liveness: http-get http://:9090/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kubernetes-dashboard-token-12gdj (ro) Conditions: Type Status
Initialized True Ready False PodScheduled True Volumes:
kubernetes-dashboard-token-12gdj:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-token-12gdj
Optional: false QoS Class: BestEffort Node-Selectors: Tolerations: node-role.kubernetes.io/master:NoSchedule Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ ------- 1h 11s 443 kubelet, minikube Warning FailedSync Error syncing
pod, skipping: failed to "CreatePodSandbox" for
"kubernetes-dashboard-2039414953-czptd_kube-system(2eb57d9b-6849-11e7-8a56-080027206461)"
with CreatePodSandboxError: "CreatePodSandbox for pod
\"kubernetes-dashboard-2039414953-czptd_kube-system(2eb57d9b-6849-11e7-8a56-080027206461)\"
failed: rpc error: code = 2 desc = unable to pull sandbox image
\"gcr.io/google_containers/pause-amd64:3.0\": Error response from
daemon: Get https://gcr.io/v1/_ping: x509: certificate signed by
unknown authority"

It's quite possible that the Pod container images are being downloaded. The images are not very large so the images should get downloaded pretty quickly on a decent internet connection.
You can use kubectl describe pod --namespace kube-system <pod-name> to know more details on the pod bring up status. Take a look at the Events section of the output.
Until all the kubernetes components in the kube-system namespace are in READY state, you will not be able to access the dashboard.
You can also try SSH'ing into the minikube vm with minikube ssh to debug the issue.

I was able to resolve this issue by doing a clean install using a VPN connection as i had restrictions in my corporate network. This was blocking the site from where the install was trying to pull the sandbox image.

Try using:
kubectl config use-context minikube
..as a preexisting configuration may have be initiated.

guys i did these and it worked for me
ON MASTER ONLY
####################
kubeadm init --apiserver-advertise-address=0.0.0.0 --pod-network-cidr=10.244.0.0/16
(copy join)
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
ON WORKER NODE ##
###################
kubeadm reset
EXECUTE THE JOIN COMMAND WHICH YOU GOT FROM MASTER AFTER KUBEADM INIT.
#kubeadm join

Related

Readiness fails in the Eclipse Hono pods of the Cloud2Edge package

I am a bit desperate and I hope someone can help me. A few months ago I installed the eclipse cloud2edge package on a kubernetes cluster by following the installation instructions, creating a persistentVolume and running the helm install command with these options.
helm install -n $NS --wait --timeout 15m $RELEASE eclipse-iot/cloud2edge --set hono.prometheus.createInstance=false --set hono.grafana.enabled=false --dependency-update --debug
The yaml of the persistentVolume is the following and I create it in the same namespace that I install the package.
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-device-registry
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 1Mi
hostPath:
path: /mnt/
type: Directory
Everything works perfectly, all pods were ready and running, until the other day when the cluster crashed and some pods stopped working.
The kubectl get pods -n $NS output is as follows:
NAME READY STATUS RESTARTS AGE
ditto-mongodb-7b78b468fb-8kshj 1/1 Running 0 50m
dt-adapter-amqp-vertx-6699ccf495-fc8nx 0/1 Running 0 50m
dt-adapter-http-vertx-545564ff9f-gx5fp 0/1 Running 0 50m
dt-adapter-mqtt-vertx-58c8975678-k5n49 0/1 Running 0 50m
dt-artemis-6759fb6cb8-5rq8p 1/1 Running 1 50m
dt-dispatch-router-5bc7586f76-57dwb 1/1 Running 0 50m
dt-ditto-concierge-f6d5f6f9c-pfmcw 1/1 Running 0 50m
dt-ditto-connectivity-f556db698-q89bw 1/1 Running 0 50m
dt-ditto-gateway-589d8f5596-59c5b 1/1 Running 0 50m
dt-ditto-nginx-897b5bc76-cx2dr 1/1 Running 0 50m
dt-ditto-policies-75cb5c6557-j5zdg 1/1 Running 0 50m
dt-ditto-swaggerui-6f6f989ccd-jkhsk 1/1 Running 0 50m
dt-ditto-things-79ff869bc9-l9lct 1/1 Running 0 50m
dt-ditto-thingssearch-58c5578bb9-pwd9k 1/1 Running 0 50m
dt-service-auth-698d4cdfff-ch5wp 1/1 Running 0 50m
dt-service-command-router-59d6556b5f-4nfcj 0/1 Running 0 50m
dt-service-device-registry-7cf75d794f-pk9ct 0/1 Running 0 50m
The pods that fail all have the same error when running kubectl describe pod POD_NAME -n $NS.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 53m default-scheduler Successfully assigned digitaltwins/dt-service-command-router-59d6556b5f-4nfcj to node1
Normal Pulled 53m kubelet Container image "index.docker.io/eclipse/hono-service-command-router:1.8.0" already present on machine
Normal Created 53m kubelet Created container service-command-router
Normal Started 53m kubelet Started container service-command-router
Warning Unhealthy 52m kubelet Readiness probe failed: Get "https://10.244.1.89:8088/readiness": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 2m58s (x295 over 51m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
According to this, the readinessProbe fails. In the yalm definition of the affected deployments, the readinessProbe is defined:
readinessProbe:
failureThreshold: 3
httpGet:
path: /readiness
port: health
scheme: HTTPS
initialDelaySeconds: 45
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
I have tried increasing these values, increasing the delay to 600 and the timeout to 10. Also i have tried uninstalling the package and installing it again, but nothing changes: the installation fails because the pods are never ready and the timeout pops up. I have also exposed port 8088 (health) and called /readiness with wget and the result is still 503. On the other hand, I have tested if livenessProbe works and it works fine. I have also tried resetting the cluster. First I manually deleted everything in it and then used the following commands:
sudo kubeadm reset
sudo iptables -F && sudo iptables -t nat -F && sudo iptables -t mangle -F && sudo iptables -X
sudo systemctl stop kubelet
sudo systemctl stop docker
sudo rm -rf /var/lib/cni/
sudo rm -rf /var/lib/kubelet/*
sudo rm -rf /etc/cni/
sudo ifconfig cni0 down
sudo ifconfig flannel.1 down
sudo ifconfig docker0 down
sudo ip link set cni0 down
sudo brctl delbr cni0
sudo systemctl start docker
sudo kubeadm init --apiserver-advertise-address=192.168.44.11 --pod-network-cidr=10.244.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl --kubeconfig $HOME/.kube/config apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
The cluster seems to work fine because the Eclipse Ditto part has no problem, it's just the Eclipse Hono part. I add a little more information in case it may be useful.
The kubectl logs dt-service-command-router-b654c8dcb-s2g6t -n $NS output:
12:30:06.340 [vert.x-eventloop-thread-1] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin /10.244.1.101:44142 failed to connect over ssl: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
12:30:06.756 [vert.x-eventloop-thread-1] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin /10.244.1.100:46550 failed to connect over ssl: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
12:30:07.876 [vert.x-eventloop-thread-1] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin /10.244.1.102:40706 failed to connect over ssl: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.client.impl.HonoConnectionImpl - starting attempt [#258] to connect to server [dt-service-device-registry:5671, role: Device Registration]
12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - OpenSSL [available: false, supports KeyManagerFactory: false]
12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - using JDK's default SSL engine
12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - enabling secure protocol [TLSv1.3]
12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - enabling secure protocol [TLSv1.2]
12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - connecting to AMQP 1.0 container [amqps://dt-service-device-registry:5671, role: Device Registration]
12:30:08.339 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - can't connect to AMQP 1.0 container [amqps://dt-service-device-registry:5671, role: Device Registration]: Failed to create SSL connection
12:30:08.339 [vert.x-eventloop-thread-1] WARN o.e.h.client.impl.HonoConnectionImpl - attempt [#258] to connect to server [dt-service-device-registry:5671, role: Device Registration] failed
javax.net.ssl.SSLHandshakeException: Failed to create SSL connection
The kubectl logs dt-adapter-amqp-vertx-74d69cbc44-7kmdq -n $NS output:
12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.client.impl.HonoConnectionImpl - starting attempt [#19] to connect to server [dt-service-device-registry:5671, role: Credentials]
12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - OpenSSL [available: false, supports KeyManagerFactory: false]
12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - using JDK's default SSL engine
12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - enabling secure protocol [TLSv1.3]
12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - enabling secure protocol [TLSv1.2]
12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - connecting to AMQP 1.0 container [amqps://dt-service-device-registry:5671, role: Credentials]
12:19:36.711 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - can't connect to AMQP 1.0 container [amqps://dt-service-device-registry:5671, role: Credentials]: Failed to create SSL connection
12:19:36.712 [vert.x-eventloop-thread-0] WARN o.e.h.client.impl.HonoConnectionImpl - attempt [#19] to connect to server [dt-service-device-registry:5671, role: Credentials] failed
javax.net.ssl.SSLHandshakeException: Failed to create SSL connection
The kubectl version output is as follows:
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.16", GitCommit:"e37e4ab4cc8dcda84f1344dda47a97bb1927d074", GitTreeState:"clean", BuildDate:"2021-10-27T16:20:18Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
Thanks in advance!
based on the iconic Failed to create SSL Connection output in the logs, I assume that you have run into the dreaded The demo certificates included in the Hono chart have expired problem.
The Cloud2Edge package chart is being updated currently (https://github.com/eclipse/packages/pull/337) with the most recent version of the Ditto and Hono charts (which includes fresh certificates that are valid for two more years to come). As soon as that PR is merged and the Eclipse Packages chart repository has been rebuilt, you should be able to do a helm repo update and then (hopefully) succesfully install the c2e package.

Can not pull any image in minikube

Im on macOS and im using minikube with hyperkit driver: minikube start --driver=hyperkit
and everything seems ok...
with minikube status:
minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured
with minikube version:
minikube version: v1.24.0
with kubectl version:
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:48:33Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.3", GitCommit:"c92036820499fedefec0f847e2054d824aea6cd1", GitTreeState:"clean", BuildDate:"2021-10-27T18:35:25Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"}
and with kubectl get no:
NAME STATUS ROLES AGE VERSION
minikube Ready control-plane,master 13m v1.22.3
my problem is when i deploy anything, it wont pull any image...
for instance:
kubectl create deployment hello-minikube --image=k8s.gcr.io/echoserver:1.4
then kubectl get pods:
NAME READY STATUS RESTARTS AGE
hello-minikube-6ddfcc9757-nfc64 0/1 ImagePullBackOff 0 13m
then i tried to figure out what is the problem?
k describe pod/hello-minikube-6ddfcc9757-nfc64
here is the result:
Name: hello-minikube-6ddfcc9757-nfc64
Namespace: default
Priority: 0
Node: minikube/192.168.64.8
Start Time: Sun, 16 Jan 2022 10:49:27 +0330
Labels: app=hello-minikube
pod-template-hash=6ddfcc9757
Annotations: <none>
Status: Pending
IP: 172.17.0.5
IPs:
IP: 172.17.0.5
Controlled By: ReplicaSet/hello-minikube-6ddfcc9757
Containers:
echoserver:
Container ID:
Image: k8s.gcr.io/echoserver:1.4
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-k5qql (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-k5qql:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18m default-scheduler Successfully assigned default/hello-minikube-6ddfcc9757-nfc64 to minikube
Normal Pulling 16m (x4 over 18m) kubelet Pulling image "k8s.gcr.io/echoserver:1.4"
Warning Failed 16m (x4 over 18m) kubelet Failed to pull image "k8s.gcr.io/echoserver:1.4": rpc error: code = Unknown desc = Error response from daemon: Get "https://k8s.gcr.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Failed 16m (x4 over 18m) kubelet Error: ErrImagePull
Warning Failed 15m (x6 over 18m) kubelet Error: ImagePullBackOff
Normal BackOff 3m34s (x59 over 18m) kubelet Back-off pulling image "k8s.gcr.io/echoserver:1.4"
then tried to get some logs!:
k logs pod/hello-minikube-6ddfcc9757-nfc64 and k logs deploy/hello-minikube
both returns the same result:
Error from server (BadRequest): container "echoserver" in pod "hello-minikube-6ddfcc9757-nfc64" is waiting to start: trying and failing to pull image
this deployment was an example from minikube documentation
but i have no idea why it doesnt pull any image...
I had exactly same problem.
I found out that my internet connection was slow,
the timout to pull an image is 120 seconds, so kubectl could not pull the image in under 120 seconds.
first use minikube to pull the image you need
for example:
minikube image load k8s.gcr.io/echoserver:1.4
and then everything will work because now kubectl will use the image that is stored locally.
According to this article:
The status ImagePullBackOff means that a Pod couldn’t start, because Kubernetes couldn’t pull a container image. The ‘BackOff’ part means that Kubernetes will keep trying to pull the image, with an increasing delay (‘back-off’).
Here is also a handbook about pushing images into a minikube cluster.
This handbook describes your issue:
Unable to pull images..Client.Timeout exceeded while awaiting headers
Unable to pull images, which may be OK:
failed to pull image "k8s.gcr.io/kube-apiserver:v1.13.3": output: Error response from daemon:
Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection
(Client.Timeout exceeded while awaiting headers)
This error indicates that the container runtime running within the VM does not have access to the internet.
See possible workarounds.
I encountered some similar issue, it is fixed by using echoserver:1.10 instead of echoserver:1.4

Helm2 could not find tiller But tiller pod is okay

before helm2 init, I've run:
kubectl delete all --all -n tiller-ns
rm -rf -- ~/.helm
then I run this:
helm2 init --tiller-namespace tiller-ns --stable-repo-url https://charts.helm.sh/stable
out:
Creating /root/.helm
Creating /root/.helm/repository
Creating /root/.helm/repository/cache
Creating /root/.helm/repository/local
Creating /root/.helm/plugins
Creating /root/.helm/starters
Creating /root/.helm/cache/archive
Creating /root/.helm/repository/repositories.yaml
Adding stable repo with URL: https://charts.helm.sh/stable
Adding local repo with URL: http://127.0.0.1:8879/charts
$HELM_HOME has been configured at /root/.helm.
Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.
Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy.
To prevent this, run `helm init` with the --tiller-tls-verify flag.
For more information on securing your installation see: https://v2.helm.sh/docs/securing_installation/
then, check it:
helm2 version
out:
Client: &version.Version{SemVer:"v2.16.12", GitCommit:"47f0b88409e71fd9ca272abc7cd762a56a1c613e", GitTreeState:"clean"}
Error: could not find tiller
see the tiller pod:
kubectl get all -n tiller-ns
out:
NAME READY STATUS RESTARTS AGE
pod/tiller-deploy-86b4fc48c9-x6tk9 1/1 Running 0 2m24s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/tiller-deploy ClusterIP 10.107.83.126 <none> 44134/TCP 2m24s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/tiller-deploy 1/1 1 1 2m24s
NAME DESIRED CURRENT READY AGE
replicaset.apps/tiller-deploy-86b4fc48c9 1 1 1 2m24s
how about describe it?:
kubectl describe po tiller-deploy-86b4fc48c9-x6tk9 -n tiller-ns
out:
Name: tiller-deploy-86b4fc48c9-x6tk9
Namespace: tiller-ns
Priority: 0
Node: node-04/192.168.3.144
Start Time: Mon, 22 Nov 2021 18:57:44 +0800
Labels: app=helm
name=tiller
pod-template-hash=86b4fc48c9
Annotations: cni.projectcalico.org/podIP: 100.103.62.181/32
cni.projectcalico.org/podIPs: 100.103.62.181/32
Status: Running
IP: 100.103.62.181
IPs:
IP: 100.103.62.181
Controlled By: ReplicaSet/tiller-deploy-86b4fc48c9
Containers:
tiller:
Container ID: docker://2eaac0dfcea9a77a591b0c612549265a59df126c855f65089e1ba3402305b518
Image: gcr.io/kubernetes-helm/tiller:v2.16.12
Image ID: docker://sha256:c11838703bf83ca52d95fe66b0db8d6a8797581fdd132925c27cbc9a70e1d9e1
Ports: 44134/TCP, 44135/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Mon, 22 Nov 2021 18:57:48 +0800
Ready: True
Restart Count: 0
Liveness: http-get http://:44135/liveness delay=1s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:44135/readiness delay=1s timeout=1s period=10s #success=1 #failure=3
Environment:
TILLER_NAMESPACE: tiller-ns
TILLER_HISTORY_MAX: 0
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-cppv9 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-cppv9:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-cppv9
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m18s default-scheduler Successfully assigned tiller-ns/tiller-deploy-86b4fc48c9-x6tk9 to node-04
Normal Pulled 3m14s kubelet Container image "gcr.io/kubernetes-helm/tiller:v2.16.12" already present on machine
Normal Created 3m14s kubelet Created container tiller
Normal Started 3m13s kubelet Started container tiller
and logs ?
kubectl logs tiller-deploy-86b4fc48c9-x6tk9 -n tiller-ns
out:
[main] 2021/11/22 10:57:49 Starting Tiller v2.16.12 (tls=false)
[main] 2021/11/22 10:57:49 GRPC listening on :44134
[main] 2021/11/22 10:57:49 Probes listening on :44135
[main] 2021/11/22 10:57:49 Storage driver is ConfigMap
[main] 2021/11/22 10:57:49 Max history per release is 0
then I run again:
helm2 version
out:
Client: &version.Version{SemVer:"v2.16.12", GitCommit:"47f0b88409e71fd9ca272abc7cd762a56a1c613e", GitTreeState:"clean"}
Error: could not find tiller
what the hell ??
If you're using the now-unsupported Helm 2, and you configure it for a non-default Tiller namespace, it doesn't remember that. You either need to include the --tiller-namespace option in every helm invocation, or set the $TILLER_NAMESPACE environment variable.
export TILLER_NAMESPACE=tiller-ns
helm2 version
(As of this writing, Helm 2 has been out of support for over a year, and several community charts have moved over to Helm-3-only syntax. You should upgrade if you can, though I acknowledge that changing deployment systems in a production environment is tricky. My experience has been that most charts written for Helm 2 work unmodified in Helm 3.)
well, I've find the way to get it right:
helm2 version --tiller-namespace tiller-ns
out:
Client: &version.Version{SemVer:"v2.16.12", GitCommit:"47f0b88409e71fd9ca272abc7cd762a56a1c613e", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.16.12", GitCommit:"47f0b88409e71fd9ca272abc7cd762a56a1c613e", GitTreeState:"clean"}
the way by: https://stackoverflow.com/a/51662259/15861205

Helm install or upgrade release failed on Kubernetes cluster: the server could not find the requested resource or UPGRADE FAILED: no deployed releases

Using helm for deploying chart on my Kubernetes cluster, since one day, I can't deploy a new one or upgrading one existed.
Indeed, each time I am using helm I have an error message telling me that it is not possible to install or upgrade ressources.
If I run helm install --name foo . -f values.yaml --namespace foo-namespace I have this output:
Error: release foo failed: the server could not find the requested
resource
If I run helm upgrade --install foo . -f values.yaml --namespace foo-namespace or helm upgrade foo . -f values.yaml --namespace foo-namespace I have this error:
Error: UPGRADE FAILED: "foo" has no deployed releases
I don't really understand why.
This is my helm version:
Client: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}
On my kubernetes cluster I have tiller deployed with the same version, when I run kubectl describe pods tiller-deploy-84b... -n kube-system:
Name: tiller-deploy-84b8...
Namespace: kube-system
Priority: 0
PriorityClassName: <none>
Node: k8s-worker-1/167.114.249.216
Start Time: Tue, 26 Feb 2019 10:50:21 +0100
Labels: app=helm
name=tiller
pod-template-hash=84b...
Annotations: <none>
Status: Running
IP: <IP_NUMBER>
Controlled By: ReplicaSet/tiller-deploy-84b8...
Containers:
tiller:
Container ID: docker://0302f9957d5d83db22...
Image: gcr.io/kubernetes-helm/tiller:v2.12.3
Image ID: docker-pullable://gcr.io/kubernetes-helm/tiller#sha256:cab750b402d24d...
Ports: 44134/TCP, 44135/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Tue, 26 Feb 2019 10:50:28 +0100
Ready: True
Restart Count: 0
Liveness: http-get http://:44135/liveness delay=1s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:44135/readiness delay=1s timeout=1s period=10s #success=1 #failure=3
Environment:
TILLER_NAMESPACE: kube-system
TILLER_HISTORY_MAX: 0
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from helm-token-... (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
helm-token-...:
Type: Secret (a volume populated by a Secret)
SecretName: helm-token-...
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 26m default-scheduler Successfully assigned kube-system/tiller-deploy-84b86cbc59-kxjqv to worker-1
Normal Pulling 26m kubelet, k8s-worker-1 pulling image "gcr.io/kubernetes-helm/tiller:v2.12.3"
Normal Pulled 26m kubelet, k8s-worker-1 Successfully pulled image "gcr.io/kubernetes-helm/tiller:v2.12.3"
Normal Created 26m kubelet, k8s-worker-1 Created container
Normal Started 26m kubelet, k8s-worker-1 Started container
Is someone have faced the same issue ?
Update:
This the folder structure of my actual chart named foo:
structure folder of the chart:
> templates/
> deployment.yaml
> ingress.yaml
> service.yaml
> .helmignore
> Chart.yaml
> values.yaml
I have already tried to delete the chart in failure using the delete command helm del --purge foo but the same errors occurred.
Just to be more precise, the chart foo is in fact a custom chart using my own private registry. ImagePullSecret are normally setting up.
I have run these two commands helm upgrade foo . -f values.yaml --namespace foo-namespace --force | helm upgrade --install foo . -f values.yaml --namespace foo-namespace --force and I still get an error:
UPGRADE FAILED
ROLLING BACK
Error: failed to create resource: the server could not find the requested resource
Error: UPGRADE FAILED: failed to create resource: the server could not find the requested resource
Notice that foo-namespace already exist. So the error don't come from the namespace name or the namespace itself. Indeed, if I run helm list, I can see that the foo chart is in a FAILED status.
Tiller stores all releases as ConfigMaps in Tiller's namespace(kube-system in your case). Try to find broken release and delete it's ConfigMap using commands:
$ kubectl get cm --all-namespaces -l OWNER=TILLER
NAMESPACE NAME DATA AGE
kube-system nginx-ingress.v1 1 22h
$ kubectl delete cm nginx-ingress.v1 -n kube-system
Next, delete all release objects (deployment,services,ingress, etc) manually and reinstall release using helm again.
If it didn't help, you may try to download newer release of Helm (v2.14.3 at the moment) and update/reinstall Tiller.
I had the same issue, but cleanup did not help also try the same helm chart on a brand new k8s cluster did not help.
So I found out that there was a missing apiVersion caused the problem. I found it out by doing a
helm install xyz --dry-run
copy the output to a new test.yaml file and use
kubectl apply test.yaml
there I see the error (apiVersion line was moved to a comment line)
I had the same problem but not due to broken releases. After upgrading helm. It seems newer versions of helm do bad with the --wait parameter. So for anyone facing the same issue: Just removing --wait, and leave --debug from helm upgrade parameters solved my issue.
I had this issue when I tried to deploy custom chart with CronJob instead deployment. The error occurs on this step in deploy script. To resolve it need to add ENV Variable ROLLOUT_STATUS_DISABLED=true it is solved in this issue.

failed to set up pod network: Unhandled Exception killed plugin

I'm trying to play with kubernetes 1.4 install with rkt containers on CoreOS beta (1185.1.0).
In general I have two CoreOS pc machines at home that are configured with etcd2 tls certificates.
I patched the coreos-kubernetes automated generic install script to support etcd2 tls certificates. the latest versions of the worker and controller install scripts are posted at https://github.com/kfirufk/coreos-kubernetes-multi-node-generic-install-script
I used the following environment variables for the controller coreos installation script (ip:10.79.218.2,domain:coreos-2.tux-in.com)
ADVERTISE_IP=10.79.218.2
ETCD_ENDPOINTS="https://coreos-2.tux-in.com:2379,https://coreos-3.tux-in.com:2379"
K8S_VER=v1.4.1_coreos.0
HYPERKUBE_IMAGE_REPO=quay.io/coreos/hyperkube
POD_NETWORK=10.2.0.0/16
SERVICE_IP_RANGE=10.3.0.0/24
K8S_SERVICE_IP=10.3.0.1
DNS_SERVICE_IP=10.3.0.10
USE_CALICO=true
CONTAINER_RUNTIME=rkt
ETCD_CERT_FILE="/etc/ssl/etcd/etcd1.pem"
ETCD_KEY_FILE="/etc/ssl/etcd/etcd1-key.pem"
ETCD_TRUSTED_CA_FILE="/etc/ssl/etcd/ca.pem"
ETCD_CLIENT_CERT_AUTH=true
OVERWRITE_ALL_FILES=true
CONTROLLER_HOSTNAME="coreos-2.tux-in.com"
ETCD_CERT_ROOT_DIR="/etc/ssl/etcd"
ETCD_SCHEME="https"
ETCD_AUTHORITY="coreos-2.tux-in.com:2379"
IS_MASK_UPDATE_ENGINE=false
and these are the environment variables I used for the worker coreos installation script (ip:10.79.218.3,domain:coreos-3.tux-in.com)
ETCD_AUTHORITY=coreos-3.tux-in.com:2379
ETCD_ENDPOINTS="https://coreos-2.tux-in.com:2379,https://coreos-3.tux-in.com:2379"
CONTROLLER_ENDPOINT=https://coreos-2.tux-in.com
K8S_VER=v1.4.1_coreos.0
HYPERKUBE_IMAGE_REPO=quay.io/coreos/hyperkube
DNS_SERVICE_IP=10.3.0.10
USE_CALICO=true
CONTAINER_RUNTIME=rkt
OVERWRITE_ALL_FILES=true
ADVERTISE_IP=10.79.218.3
ETCD_CERT_FILE="/etc/ssl/etcd/etcd2.pem"
ETCD_KEY_FILE="/etc/ssl/etcd/etcd2-key.pem"
ETCD_TRUSTED_CA_FILE="/etc/ssl/etcd/ca.pem"
ETCD_SCHEME="https"
IS_MASK_UPDATE_ENGINE=false
after installing kubernetes on both machines, and configuring kubectl properly, when I type kubectl get nodes I get:
NAME STATUS AGE
10.79.218.2 Ready,SchedulingDisabled 1h
10.79.218.3 Ready 1h
kubectl get pods --namespace=kube-system returns
NAME READY STATUS RESTARTS AGE
heapster-v1.2.0-3646253287-j951o 0/2 ContainerCreating 0 1d
kube-apiserver-10.79.218.2 1/1 Running 0 1d
kube-controller-manager-10.79.218.2 1/1 Running 0 1d
kube-dns-v20-u3pd0 0/3 ContainerCreating 0 1d
kube-proxy-10.79.218.2 1/1 Running 0 1d
kube-proxy-10.79.218.3 1/1 Running 0 1d
kube-scheduler-10.79.218.2 1/1 Running 0 1d
kubernetes-dashboard-v1.4.1-ehiez 0/1 ContainerCreating 0 1d
so heapster-v1.2.0-3646253287-j951o, kube-dns-v20-u3pd0 and kubernetes-dashboard-v1.4.1-ehiez are stuck in ContainerCreating status.
when I run kubectl describe on any of them, I basically get the same error: Error syncing pod, skipping: failed to SyncPod: failed to set up pod network: Unhandled Exception killed plugin.
for example, kubectl describe pods kubernetes-dashboard-v1.4.1-ehiez --namespace kube-system returns:
Name: kubernetes-dashboard-v1.4.1-ehiez
Namespace: kube-system
Node: 10.79.218.3/10.79.218.3
Start Time: Mon, 17 Oct 2016 23:31:43 +0300
Labels: k8s-app=kubernetes-dashboard
kubernetes.io/cluster-service=true
version=v1.4.1
Status: Pending
IP:
Controllers: ReplicationController/kubernetes-dashboard-v1.4.1
Containers:
kubernetes-dashboard:
Container ID:
Image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.4.1
Image ID:
Port: 9090/TCP
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Liveness: http-get http://:9090/ delay=30s timeout=30s period=10s #success=1 #failure=3
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-svbiv (ro)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-svbiv:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-svbiv
QoS Class: Guaranteed
Tolerations: CriticalAddonsOnly=:Exists
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1d 25s 9350 {kubelet 10.79.218.3} Warning FailedSync Error syncing pod, skipping: failed to SyncPod: failed to set up pod network: Unhandled Exception killed plugin
I'm guessing that pod networking isn't working because of faulty calico configuration..
so I tried to install calicoctl rkt container, but had problems with that. but that's a different stackoverflow question :) starting calicoctl container on coreos
so I can't really check if calico works properly.
this is the calico-network systemd service file for the controller node:
[Unit]
Description=Calico per-host agent
Requires=network-online.target
After=network-online.target
[Service]
Slice=machine.slice
Environment=CALICO_DISABLE_FILE_LOGGING=true
Environment=HOSTNAME=10.79.218.3
Environment=IP=10.79.218.3
Environment=FELIX_FELIXHOSTNAME=10.79.218.3
Environment=CALICO_NETWORKING=true
Environment=NO_DEFAULT_POOLS=true
Environment=ETCD_ENDPOINTS=https://coreos-2.tux-in.com:2379,https://coreos-3.tux-in.com:2379
Environment=ETCD_AUTHORITY=coreos-3.tux-in.com:2379
Environment=ETCD_SCHEME=https
Environment=ETCD_CA_CERT_FILE=/etc/ssl/etcd/ca.pem
Environment=ETCD_CERT_FILE=/etc/ssl/etcd/etcd2.pem
Environment=ETCD_KEY_FILE=/etc/ssl/etcd/etcd2-key.pem
ExecStart=/usr/bin/rkt run --inherit-env --stage1-from-dir=stage1-fly.aci --volume=var-run-calico,kind=host,source=/var/run/calico --volume=modules,kind=host,source=/lib/modules,readOnly=false --mount=volume=modules,target=/lib/modules --volume=dns,kind=host,source=/etc/resolv.conf,readOnly=true --volume=etcd-tls-certs,kind=host,source=/etc/ssl/etcd,readOnly=true --mount=volume=dns,target=/etc/resolv.conf --mount=volume=etcd-tls-certs,target=/etc/ssl/etcd --mount=volume=var-run-calico,target=/var/run/calico --trust-keys-from-https quay.io/calico/node:v0.22.0
KillMode=mixed
Restart=always
TimeoutStartSec=0
[Install]
WantedBy=multi-user.target
and is the calico-node service file for the worker node:
[Unit]
Description=Calico per-host agent
Requires=network-online.target
After=network-online.target
[Service]
Slice=machine.slice
Environment=CALICO_DISABLE_FILE_LOGGING=true
Environment=HOSTNAME=10.79.218.2
Environment=IP=10.79.218.2
Environment=FELIX_FELIXHOSTNAME=10.79.218.2
Environment=CALICO_NETWORKING=true
Environment=NO_DEFAULT_POOLS=false
Environment=ETCD_ENDPOINTS=https://coreos-2.tux-in.com:2379,https://coreos-3.tux-in.com:2379
ExecStart=/usr/bin/rkt run --inherit-env --stage1-from-dir=stage1-fly.aci --volume=var-run-calico,kind=host,source=/var/run/calico --volume=modules,kind=host,source=/lib/modules,readOnly=false --mount=volume=modules,target=/lib/modules --volume=dns,kind=host,source=/etc/resolv.conf,readOnly=true --volume=etcd-tls-certs,kind=host,source=/etc/ssl/etcd,readOnly=true --mount=volume=dns,target=/etc/resolv.conf --mount=volume=etcd-tls-certs,target=/etc/ssl/etcd --mount=volume=var-run-calico,target=/var/run/calico --trust-keys-from-https quay.io/calico/node:v0.22.0
KillMode=mixed
Environment=ETCD_CA_CERT_FILE=/etc/ssl/etcd/ca.pem
Environment=ETCD_CERT_FILE=/etc/ssl/etcd/etcd1.pem
Environment=ETCD_KEY_FILE=/etc/ssl/etcd/etcd1-key.pem
Restart=always
TimeoutStartSec=0
[Install]
WantedBy=multi-user.target
and this is the content of /etc/kubernetes/cni/net.d/10-calico.conf of the controller node:
{
"name": "calico",
"type": "flannel",
"delegate": {
"type": "calico",
"etcd_endpoints": "https://coreos-2.tux-in.com:2379,https://coreos-3.tux-in.com:2379",
"etcd_key_file": "/etc/ssl/etcd/etcd1-key.pem",
"etcd_cert_file": "/etc/ssl/etcd/etcd1.pem",
"etcd_ca_cert_file": "/etc/ssl/etcd/ca.pem",
"log_level": "none",
"log_level_stderr": "info",
"hostname": "10.79.218.2",
"policy": {
"type": "k8s",
"k8s_api_root": "http://127.0.0.1:8080/api/v1/"
}
}
}
and this is the /etc/kubernetes/cni/net.d/10-calico.conf of the worker node:
{
"name": "calico",
"type": "flannel",
"delegate": {
"type": "calico",
"etcd_endpoints": "https://coreos-2.tux-in.com:2379,https://coreos-3.tux-in.com:2379",
"etcd_key_file": "/etc/ssl/etcd/etcd2-key.pem",
"etcd_cert_file": "/etc/ssl/etcd/etcd2.pem",
"etcd_ca_cert_file": "/etc/ssl/etcd/ca.pem",
"log_level": "debug",
"log_level_stderr": "info",
"hostname": "10.79.218.3",
"policy": {
"type": "k8s",
"k8s_api_root": "https://coreos-2.tux-in.com:443/api/v1/",
"k8s_client_key": "/etc/kubernetes/ssl/worker-key.pem",
"k8s_client_certificate": "/etc/kubernetes/ssl/worker.pem"
}
}
}
now idea how to investigate the issue further.
I understand that since new calico-cni was moved to go, it doesn't store log information in a log file anymore, so i'm lost from here.
any information regarding the issue would be greatly appreciated.
thanks!
The "Unhandled Exception Killed plugin" error message is being generated by the Calico CNI plugin. From my experience that means it is unlikely to be something wrong with the calico-node.service causing that error.
As such it is probably something subtly wrong with you CNI network configuration. Could you share that file?
The CNI plugin should also emit more detailed logging information - either to stderr or to /var/log/calico/cni/calico.log based on how its configured in your CNI network config. I suspect that file will give you more clues into exactly what is going wrong.
All that said, the "Unhandled Exception" error is coming from the Python version of the CNI plugin, which is rather old at this point. I'd recommend upgrading to the latest stable release from here: https://github.com/projectcalico/calico-cni/releases