Can't remove replication controller manager within deployment in kubernetes - kubernetes

I am having a bit of a problem. I deleted a pod of a replication controller and now I want to recreate it. I tried:
kubectl create -f kube-controller-manager.yaml
Error from server: error when creating "kube-controller-manager.yaml": deployments.extensions "kube-controller-manager" already exists
So figured to:
kubectl delete deployment kube-controller-manager --namespace=kube-system -v=8
Which circles for a while giving this response:
GET https://k8s-k8s.westfield.io:443/apis/extensions/v1beta1/namespaces/kube-system/deployments/kube-controller-manager
I0112 17:33:53.334288 44607 round_trippers.go:303] Request Headers:
I0112 17:33:53.334301 44607 round_trippers.go:306] Accept: application/json, */*
I0112 17:33:53.334310 44607 round_trippers.go:306] User-Agent: kubectl/v1.4.7 (darwin/amd64) kubernetes/92b4f97
I0112 17:33:53.369422 44607 round_trippers.go:321] Response Status: 200 OK in 35 milliseconds
I0112 17:33:53.369445 44607 round_trippers.go:324] Response Headers:
I0112 17:33:53.369450 44607 round_trippers.go:327] Content-Type: application/json
I0112 17:33:53.369454 44607 round_trippers.go:327] Date: Fri, 13 Jan 2017 01:33:53 GMT
I0112 17:33:53.369457 44607 round_trippers.go:327] Content-Length: 1688
I0112 17:33:53.369518 44607 request.go:908] Response Body: {"kind":"Deployment","apiVersion":"extensions/v1beta1","metadata":{"name":"kube-controller-manager","namespace":"kube-system","selfLink":"/apis/extensions/v1beta1/namespaces/kube-system/deployments/kube-controller-manager","uid":"830c83d0-d860-11e6-80d5-066fd61aec22","resourceVersion":"197967","generation":5,"creationTimestamp":"2017-01-12T00:46:10Z","labels":{"k8s-app":"kube-controller-manager"},"annotations":{"deployment.kubernetes.io/revision":"1"}},"spec":{"replicas":0,"selector":{"matchLabels":{"k8s-app":"kube-controller-manager"}},"template":{"metadata":{"creationTimestamp":null,"labels":{"k8s-app":"kube-controller-manager"}},"spec":{"volumes":[{"name":"secrets","secret":{"secretName":"kube-controller-manager","defaultMode":420}},{"name":"ssl-host","hostPath":{"path":"/usr/share/ca-certificates"}}],"containers":[{"name":"kube-controller-manager","image":"quay.io/coreos/hyperkube:v1.4.7_coreos.0","command":["./hyperkube","controller-manager","--root-ca-file=/etc/kubernetes/secrets/ca.crt","--service-account-private-key-file=/etc/kubernetes/secrets/service-account.key","--leader-elect=true","--cloud-provider=aws","--configure-cloud-routes=false"],"resources":{},"volumeMounts":[{"name":"secrets","readOnly":true,"mountPath":"/etc/kubernetes/secrets"},{"name":"ssl-host","readOnly":true,"mountPath":"/etc/ssl/certs"}],"terminationMessagePath":"/dev/termination-log","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"Default","securityContext":{}}},"strategy":{"type":"RollingUpdate","rollingUpdate":{"maxUnavailable":1,"maxSurge":1}},"revisionHistoryLimit":0,"paused":true},"status":{"observedGeneration":3}}
I0112 17:33:54.335302 44607 round_trippers.go:296] GET https://k8s-k8s.westfield.io:443/apis/extensions/v1beta1/namespaces/kube-system/deployments/kube-controller-manager
And then times out
saying that it timed out waiting for an api response.
Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.7", GitCommit:"92b4f971662de9d8770f8dcd2ee01ec226a6f6c0", GitTreeState:"clean", BuildDate:"2016-12-10T04:49:33Z", GoVersion:"go1.7.1", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.7+coreos.0", GitCommit:"0581d1a5c618b404bd4766544bec479aedef763e", GitTreeState:"clean", BuildDate:"2016-12-12T19:04:11Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
I originally had client version 1.5.2 Downgraded to see if that helps. It didn't.

A replication controller defines what a pod looks like and how many replicas should exist in your cluster. The controller-manager's job is to make sure enough replicas are healthy and running, if not then it will ask the scheduler to place the pods onto hosts.
If you delete a pod, then a new one should get spun up automatically. You would just have to run: kubectl delete po <podname>
It's interesting that you are trying to delete the controller manager. typically after creating that, you shouldn't have to touch it.

You can delete a replication controller using following command:
kubectl delete rc kube-controller-manager

Related

Error: k8s doesn't do anything after executing "kubectl create -f mypod.yaml"

I am a beginner in Kubernetes and have been using the kubectl command to create pods for several months. However, I recently encountered a problem where Kubernetes did not create a pod after I executed the kubectl create -f mypod.yaml command. When I run kubectl get pods, the mypod does not appear in the list of pods and I am unable to access it by name as if it does not exist. However, if I try to create it again, I receive a message saying that the pod has already been created.
To illustrate my point, let me give you an example. I frequently generate pods using a YAML file called tpcds-25-query.yaml. The contents of this file are as follows:
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: tpcds-25-query
namespace: default
spec:
type: Scala
mode: cluster
image: registry.cn-beijing.aliyuncs.com/kube-ai/ack-spark-benchmark:1.0.1
imagePullPolicy: Always
sparkVersion: 2.4.5
mainClass: com.aliyun.spark.benchmark.tpcds.BenchmarkSQL
mainApplicationFile: "local:///opt/spark/jars/ack-spark-benchmark-assembly-0.1.jar"
arguments:
# TPC-DS data localtion
- "oss://spark/data/tpc-ds-data/150g"
# results location
- "oss://spark/result/tpcds-25-query"
# Path to kit in the docker image
- "/tmp/tpcds-kit/tools"
# Data Format
- "parquet"
# Scale factor (in GB)
- "150"
# Number of iterations
- "1"
# Optimize queries
- "false"
# Filter queries, will run all if empty - "q70-v2.4,q82-v2.4,q64-v2.4"
- "q1-v2.4,q11-v2.4,q14a-v2.4,q14b-v2.4,q16-v2.4,q17-v2.4,q22-v2.4,q23a-v2.4,q23b-v2.4,q24a-v2.4,q24b-v2.4,q25-v2.4,q28-v2.4,q29-v2.4,q4-v2.4,q49-v2.4,q5-v2.4,q51-v2.4,q64-v2.4,q74-v2.4,q75-v2.4,q77-v2.4,q78-v2.4,q80-v2.4,q9-v2.4"
# Logging set to WARN
- "true"
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
restartPolicy:
type: Never
timeToLiveSeconds: 86400
hadoopConf:
# OSS
"fs.oss.impl": "OSSFileSystem"
"fs.oss.endpoint": "oss.com"
"fs.oss.accessKeyId": "DFDSMGDNDFMSNGDFMNGCU"
"fs.oss.accessKeySecret": "secret"
sparkConf:
"spark.kubernetes.allocation.batch.size": "200"
"spark.sql.adaptive.join.enabled": "true"
"spark.eventLog.enabled": "true"
"spark.eventLog.dir": "oss://spark/spark-events"
driver:
cores: 4
memory: "8192m"
labels:
version: 2.4.5
spark-app: spark-tpcds
role: driver
serviceAccount: spark
nodeSelector:
beta.kubernetes.io/instance-type: ecs.g6.13xlarge
executor:
cores: 48
instances: 1
memory: "160g"
memoryOverhead: "16g"
labels:
version: 2.4.5
role: executor
nodeSelector:
beta.kubernetes.io/instance-type: ecs.g6.13xlarge
After I executed kubectl create --validate=false -f tpcds-25-query.yaml command, k8s returns this:
sparkapplication.sparkoperator.k8s.io/tpcds-25-query created
which means the pod has been created. However, when I executed kubectl get pods, it gave me this:
No resources found in default namespace.
When I created the pod again, it gave me this:
Error from server (AlreadyExists): error when creating "tpcds-25-query.yaml": sparkapplications.sparkoperator.k8s.io "tpcds-25-query" already exists
I know the option -v=8 can print more detailed logs. So I execute the command kubectl create --validate=false -f tpcds-25-query.yaml -v=8, its output is:
I0219 05:50:17.121661 2148722 loader.go:372] Config loaded from file: /root/.kube/config
I0219 05:50:17.124735 2148722 round_trippers.go:432] GET https://172.16.0.212:6443/apis/metrics.k8s.io/v1beta1?timeout=32s
I0219 05:50:17.124747 2148722 round_trippers.go:438] Request Headers:
I0219 05:50:17.124753 2148722 round_trippers.go:442] Accept: application/json, */*
I0219 05:50:17.124759 2148722 round_trippers.go:442] User-Agent: kubectl/v1.22.3 (linux/amd64) kubernetes/9377577
I0219 05:50:17.132864 2148722 round_trippers.go:457] Response Status: 503 Service Unavailable in 8 milliseconds
I0219 05:50:17.132876 2148722 round_trippers.go:460] Response Headers:
I0219 05:50:17.132881 2148722 round_trippers.go:463] X-Kubernetes-Pf-Prioritylevel-Uid: e75a0286-dd47-4533-a65c-79d95dac5bb1
I0219 05:50:17.132890 2148722 round_trippers.go:463] Content-Length: 20
I0219 05:50:17.132894 2148722 round_trippers.go:463] Date: Sun, 19 Feb 2023 05:50:17 GMT
I0219 05:50:17.132898 2148722 round_trippers.go:463] Audit-Id: 3ab06f73-0c88-469a-834d-54ec06e910f1
I0219 05:50:17.132902 2148722 round_trippers.go:463] Cache-Control: no-cache, private
I0219 05:50:17.132906 2148722 round_trippers.go:463] Content-Type: text/plain; charset=utf-8
I0219 05:50:17.132909 2148722 round_trippers.go:463] X-Content-Type-Options: nosniff
I0219 05:50:17.132913 2148722 round_trippers.go:463] X-Kubernetes-Pf-Flowschema-Uid: 7f136704-82ad-4f6c-8c86-b470a972fede
I0219 05:50:17.134365 2148722 request.go:1181] Response Body: service unavailable
I0219 05:50:17.135255 2148722 request.go:1372] body was not decodable (unable to check for Status): couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string "json:\"apiVersion,omitempty\""; Kind string "json:\"kind,omitempty\"" }
I0219 05:50:17.135265 2148722 cached_discovery.go:78] skipped caching discovery info due to the server is currently unable to handle the request
I0219 05:50:17.136050 2148722 request.go:1181] Request Body: {"apiVersion":"sparkoperator.k8s.io/v1beta2","kind":"SparkApplication","metadata":{"name":"tpcds-25-query","namespace":"default"},"spec":{"arguments":["oss://lfpapertest/spark/data/tpc-ds-data/150g","oss://lfpapertest/spark/result/tpcds-runc-150g-48core-160g-1pod-25-query","/tmp/tpcds-kit/tools","parquet","150","1","false","q1-v2.4,q11-v2.4,q14a-v2.4,q14b-v2.4,q16-v2.4,q17-v2.4,q22-v2.4,q23a-v2.4,q23b-v2.4,q24a-v2.4,q24b-v2.4,q25-v2.4,q28-v2.4,q29-v2.4,q4-v2.4,q49-v2.4,q5-v2.4,q51-v2.4,q64-v2.4,q74-v2.4,q75-v2.4,q77-v2.4,q78-v2.4,q80-v2.4,q9-v2.4","true"],"dnsPolicy":"ClusterFirstWithHostNet","driver":{"cores":4,"labels":{"role":"driver","spark-app":"spark-tpcds","version":"2.4.5"},"memory":"8192m","nodeSelector":{"beta.kubernetes.io/instance-type":"ecs.g6.13xlarge"},"serviceAccount":"spark"},"executor":{"cores":48,"instances":1,"labels":{"role":"executor","version":"2.4.5"},"memory":"160g","memoryOverhead":"16g","nodeSelector":{"beta.kubernetes.io/instance-type":"ecs.g6.13xlarge"}},"hadoopConf":{"fs.oss.acce [truncated 802 chars]
I0219 05:50:17.136091 2148722 round_trippers.go:432] POST https://172.16.0.212:6443/apis/sparkoperator.k8s.io/v1beta2/namespaces/default/sparkapplications?fieldManager=kubectl-create
I0219 05:50:17.136098 2148722 round_trippers.go:438] Request Headers:
I0219 05:50:17.136104 2148722 round_trippers.go:442] Accept: application/json
I0219 05:50:17.136108 2148722 round_trippers.go:442] Content-Type: application/json
I0219 05:50:17.136113 2148722 round_trippers.go:442] User-Agent: kubectl/v1.22.3 (linux/amd64) kubernetes/9377577
I0219 05:50:17.144313 2148722 round_trippers.go:457] Response Status: 201 Created in 8 milliseconds
I0219 05:50:17.144327 2148722 round_trippers.go:460] Response Headers:
I0219 05:50:17.144332 2148722 round_trippers.go:463] X-Kubernetes-Pf-Prioritylevel-Uid: e75a0286-dd47-4533-a65c-79d95dac5bb1
I0219 05:50:17.144337 2148722 round_trippers.go:463] Content-Length: 2989
I0219 05:50:17.144341 2148722 round_trippers.go:463] Date: Sun, 19 Feb 2023 05:50:17 GMT
I0219 05:50:17.144345 2148722 round_trippers.go:463] Audit-Id: 8eef9d08-04c0-44f7-87bf-e820853cd9c6
I0219 05:50:17.144349 2148722 round_trippers.go:463] Cache-Control: no-cache, private
I0219 05:50:17.144352 2148722 round_trippers.go:463] Content-Type: application/json
I0219 05:50:17.144356 2148722 round_trippers.go:463] X-Kubernetes-Pf-Flowschema-Uid: 7f136704-82ad-4f6c-8c86-b470a972fede
I0219 05:50:17.144396 2148722 request.go:1181] Response Body: {"apiVersion":"sparkoperator.k8s.io/v1beta2","kind":"SparkApplication","metadata":{"creationTimestamp":"2023-02-19T05:50:17Z","generation":1,"managedFields":[{"apiVersion":"sparkoperator.k8s.io/v1beta2","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{".":{},"f:arguments":{},"f:driver":{".":{},"f:cores":{},"f:labels":{".":{},"f:role":{},"f:spark-app":{},"f:version":{}},"f:memory":{},"f:nodeSelector":{".":{},"f:beta.kubernetes.io/instance-type":{}},"f:serviceAccount":{}},"f:executor":{".":{},"f:cores":{},"f:instances":{},"f:labels":{".":{},"f:role":{},"f:version":{}},"f:memory":{},"f:memoryOverhead":{},"f:nodeSelector":{".":{},"f:beta.kubernetes.io/instance-type":{}}},"f:hadoopConf":{".":{},"f:fs.oss.accessKeyId":{},"f:fs.oss.accessKeySecret":{},"f:fs.oss.endpoint":{},"f:fs.oss.impl":{}},"f:image":{},"f:imagePullPolicy":{},"f:mainApplicationFile":{},"f:mainClass":{},"f:mode":{},"f:restartPolicy":{".":{},"f:type":{}},"f:sparkConf":{".":{},"f:spark.eventLog.dir":{},"f:spark.eventLog.enabled":{},"f:spark.kubernetes. [truncated 1965 chars]
sparkapplication.sparkoperator.k8s.io/tpcds-25-query created
From the logs, we can see the only error "Response Status: 503 Service Unavailable in 8 milliseconds", I don't know what it means.
So I want to ask what may cause this, and how would I diagnose the problem? Any help is appreciated!
There might be multiple reasons for this, initially let’s check whether the pod is really created or not. Like ehmad11 suggested use kubectl get pods --all-namespaces for listing pods in all the namespaces. However in your case it might not work because your application is getting directly deployed in defaulf namespace. Regarding the error “Response Status: 503 Service Unavailable in 8 milliseconds” once you are able to locate the pod use kubectl describe <pod> for finding logs specific to your pod and follow the troubleshooting steps provided in this document for rectifying it.
Note: The reference document is provided from komodor site, here they have articulated each troubleshooting step in highly detailed and understandable manner.

GKE throws invalid certificate when fetching logs

I'm trying to fetch the logs from a pod running in GKE, but I get this error:
I0117 11:42:54.468501 96671 round_trippers.go:466] curl -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.26.0 (darwin/arm64) kubernetes/b46a3f8" 'https://x.x.x.x/api/v1/namespaces/pleiades/pods/pleiades-0/log?container=server'
I0117 11:42:54.569122 96671 round_trippers.go:553] GET https://x.x.x.x/api/v1/namespaces/pleiades/pods/pleiades-0/log?container=server 500 Internal Server Error in 100 milliseconds
I0117 11:42:54.569170 96671 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 100 ms Duration 100 ms
I0117 11:42:54.569186 96671 round_trippers.go:577] Response Headers:
I0117 11:42:54.569202 96671 round_trippers.go:580] Content-Type: application/json
I0117 11:42:54.569215 96671 round_trippers.go:580] Content-Length: 226
I0117 11:42:54.569229 96671 round_trippers.go:580] Date: Tue, 17 Jan 2023 19:42:54 GMT
I0117 11:42:54.569243 96671 round_trippers.go:580] Audit-Id: a25a554f-c3f5-4f91-9711-3f2970376770
I0117 11:42:54.569332 96671 round_trippers.go:580] Cache-Control: no-cache, private
I0117 11:42:54.571392 96671 request.go:1154] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Get \"https://10.6.128.40:10250/containerLogs/pleiades/pleiades-0/server\": x509: certificate is valid for 127.0.0.1, not 10.6.128.40","code":500}
I0117 11:42:54.572267 96671 helpers.go:246] server response object: [{
"metadata": {},
"status": "Failure",
"message": "Get \"https://10.6.128.40:10250/containerLogs/pleiades/pleiades-0/server\": x509: certificate is valid for 127.0.0.1, not 10.6.128.40",
"code": 500
}]
How do I prevent this from happening?
One of the reasons for this error could be because both metrics-server and kubelet listen on port 10250. This is usually not a problem because metrics-server runs in its own namespace but the conflict would have prevented metrics-server from starting when in the host network.
You can confirm this behavior by running the following command :
$ kubectl -n kube-system get pods -l k8s-app=metrics-server -o yaml | grep 10250
- --secure-port=10250
- containerPort: 10250
If you can see a hostPort: 10250 in the yaml file of the metrics-server, please run the following command to delete metrics-server deployment on that cluster :
$ kubectl -n kube-system delete deployment -l k8s-app=metrics-server
Metrics server will be recreated correctly by GKE infrastructure. It should be recreated in ~15 seconds on clusters with a new addon manager, but could take up to 15 minutes on very old clusters.

Kubectl is not showing the Component Status

Trying to Install Kubernetes 1.16.2 from the binaries and i see this issue when i try to check the component status.
The Response object shows all are Healthy but the table below shows unknown.
root#instance:/opt/configs# kubectl get cs -v=8
I1104 05:54:48.554768 25209 round_trippers.go:420] GET http://localhost:8080/api/v1/componentstatuses?limit=500
I1104 05:54:48.555186 25209 round_trippers.go:427] Request Headers:
I1104 05:54:48.555453 25209 round_trippers.go:431] Accept: application/json;as=Table;v=v1beta1;g=meta.k8s.io, application/json
I1104 05:54:48.555735 25209 round_trippers.go:431] User-Agent: kubectl/v1.16.2 (linux/amd64) kubernetes/c97fe50
I1104 05:54:48.567372 25209 round_trippers.go:446] Response Status: 200 OK in 11 milliseconds
I1104 05:54:48.567388 25209 round_trippers.go:449] Response Headers:
I1104 05:54:48.567392 25209 round_trippers.go:452] Cache-Control: no-cache, private
I1104 05:54:48.567395 25209 round_trippers.go:452] Content-Type: application/json
I1104 05:54:48.567397 25209 round_trippers.go:452] Date: Mon, 04 Nov 2019 05:54:48 GMT
I1104 05:54:48.567400 25209 round_trippers.go:452] Content-Length: 661
I1104 05:54:48.567442 25209 request.go:968] Response Body: {"kind":"ComponentStatusList","apiVersion":"v1","metadata":{"selfLink":"/api/v1/componentstatuses"},"items":[{"metadata":{"name":"etcd-0","selfLink":"/api/v1/componentstatuses/etcd-0","creationTimestamp":null},"conditions":[{"type":"Healthy","status":"True","message":"{\"health\":\"true\"}"}]},{"metadata":{"name":"controller-manager","selfLink":"/api/v1/componentstatuses/controller-manager","creationTimestamp":null},"conditions":[{"type":"Healthy","status":"True","message":"ok"}]},{"metadata":{"name":"scheduler","selfLink":"/api/v1/componentstatuses/scheduler","creationTimestamp":null},"conditions":[{"type":"Healthy","status":"True","message":"ok"}]}]}
I1104 05:54:48.567841 25209 table_printer.go:44] Unable to decode server response into a Table. Falling back to hardcoded types: attempt to decode non-Table object into a v1beta1.Table
I1104 05:54:48.567879 25209 table_printer.go:44] Unable to decode server response into a Table. Falling back to hardcoded types: attempt to decode non-Table object into a v1beta1.Table
I1104 05:54:48.567888 25209 table_printer.go:44] Unable to decode server response into a Table. Falling back to hardcoded types: attempt to decode non-Table object into a v1beta1.Table
NAME AGE
etcd-0 <unknown>
controller-manager <unknown>
scheduler <unknown>
There appears to be an issue with table converter for component status specifically with k8s version 1.16.2 Already there is a PR raised to address this issue. Follow and track the link
--> https://github.com/kubernetes/kubernetes/issues/83024
Check your Client & Server versions. If they do not match you'll have the issue.
kubectl version --short
Client Version: v1.13.5
Server Version: v1.13.5
$ kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-2 Healthy {"health": "true"}
etcd-0 Healthy {"health": "true"}
etcd-1 Healthy {"health": "true"}
AND
$ kubectl version --short
Client Version: v1.16.0
Server Version: v1.13.5
$ kubectl get cs
NAME AGE
controller-manager <unknown>
scheduler <unknown>
etcd-2 <unknown>
etcd-0 <unknown>
etcd-1 <unknown>
By the way, it has been resolved in v1.17.0

Kubernetes v1.12 Problems with kubectl exec

I’ve been learning about Kubernetes using Kelsey Hightower’s excellent kubernetes-the-hard-way-guide.
Using this guide I’ve installed v1.12 on GCE. Everything works perfectly apart from kubectl exec:
$ kubectl exec -it shell-demo – /bin/bash --kubeconfig=/root/certsconfigs/admin.kubeconfig
error: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy)
Note that I have set KUBECONFIG=/root/certsconfigs/admin.kubeconfig.
Apart from exec all other kubectl functions work as expected with this admin.kubeconfig file, so from that I deduce it valid for use with my cluster.
I’m pretty sure I have made a beginners mistake somewhere, but if somebody could advise where I have gone away, I should be most grateful.
TIA
Shaun
I have double checked that no .kube/config file exists anywhere on my master controller:
root#controller-1:/root/deployment/kubernetes# kubectl get pods
NAME READY STATUS
shell-demo 1/1 Running 0 23m
Here is the output with -v8:
root#controller-1:/root/deployment/kubernetes# kubectl -v8 exec -it shell-demo – /bin/bash
I1118 15:18:16.898428 11117 loader.go:359] Config loaded from file /root/certsconfigs/admin.kubeconfig
I1118 15:18:16.899531 11117 loader.go:359] Config loaded from file /root/certsconfigs/admin.kubeconfig
I1118 15:18:16.900611 11117 loader.go:359] Config loaded from file /root/certsconfigs/admin.kubeconfig
I1118 15:18:16.902851 11117 round_trippers.go:383] GET ://127.0.0.1:6443/api/v1/namespaces/default/pods/shell-demo
I1118 15:18:16.902946 11117 round_trippers.go:390] Request Headers:
I1118 15:18:16.903016 11117 round_trippers.go:393] Accept: application/json, /
I1118 15:18:16.903091 11117 round_trippers.go:393] User-Agent: kubectl/v1.12.0 (linux/amd64) kubernetes/0ed3388
I1118 15:18:16.918699 11117 round_trippers.go:408] Response Status: 200 OK in 15 milliseconds
I1118 15:18:16.918833 11117 round_trippers.go:411] Response Headers:
I1118 15:18:16.918905 11117 round_trippers.go:414] Content-Type: application/json
I1118 15:18:16.918974 11117 round_trippers.go:414] Content-Length: 2176
I1118 15:18:16.919053 11117 round_trippers.go:414] Date: Sun, 18 Nov 2018 15:18:16 GMT
I1118 15:18:16.919218 11117 request.go:942] Response Body: {“kind”:“Pod”,“apiVersion”:“v1”,“metadata”:{“name”:“shell-demo”,“namespace”:“default”,“selfLink”:"/api/v1/namespaces/default/pods/shell-demo",“uid”:“99f320f8-eb42-11e8-a053-42010af0000b”,“resourceVersion”:“13213”,“creationTimestamp”:“2018-11-18T14:59:51Z”},“spec”:{“volumes”:[{“name”:“shared-data”,“emptyDir”:{}},{“name”:“default-token-djprb”,“secret”:{“secretName”:“default-token-djprb”,“defaultMode”:420}}],“containers”:[{“name”:“nginx”,“image”:“nginx”,“resources”:{},“volumeMounts”:[{“name”:“shared-data”,“mountPath”:"/usr/share/nginx/html"},{“name”:“default-token-djprb”,“readOnly”:true,“mountPath”:"/var/run/secrets/kubernetes.io/serviceaccount"}],“terminationMessagePath”:"/dev/termination-log",“terminationMessagePolicy”:“File”,“imagePullPolicy”:“Always”}],“restartPolicy”:“Always”,“terminationGracePeriodSeconds”:30,“dnsPolicy”:“ClusterFirst”,“serviceAccountName”:“default”,“serviceAccount”:“default”,“nodeName”:“worker-1”,“securityContext”:{},“schedulerName”:“default-scheduler”,“tolerations”:[{“key”:"node.kubernet [truncated 1152 chars]
I1118 15:18:16.925240 11117 round_trippers.go:383] POST …
error: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy)
According to your logs,the connection between kubectl and the apiserver is fine, and is being authenticated correctly.
To satisfy an exec request, the apiserver contacts the kubelet running the pod, and that connection is what is being forbidden.
Your kubelet is configured to authenticate/authorize requests, and the apiserver credential is not authorized to make the exec request against the kubelet's API.
Based on the forbidden message, your apiserver is authenticating as the "kubernetes" user to the kubelet.
You can grant that user full permissions to the kubelet API with the following command:
kubectl create clusterrolebinding apiserver-kubelet-admin --user=kubernetes --clusterrole=system:kubelet-api-admin
See the following docs for more information
https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-authentication-authorization/#overview
https://kubernetes.io/docs/reference/access-authn-authz/rbac/#other-component-roles

Kubectl delete -f deployments/ --grace-period=0 --force does not work

What happened:
Force terminate does not work:
[root#master0 manifests]# kubectl delete -f prometheus/deployment.yaml --grace-period=0 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
deployment.extensions "prometheus-core" force deleted
^C <---- Manual Quit due to hanging. Waited over 5 minutes with no change.
[root#master0 manifests]# kubectl -n monitoring get pods
NAME READY STATUS RESTARTS AGE
alertmanager-668794449d-6dppl 0/1 Terminating 0 22h
grafana-core-576c68c58d-7nvbt 0/1 Terminating 0 22h
kube-state-metrics-69b9d65dd5-rl8td 0/1 Terminating 0 3h
node-directory-size-metrics-6hcfc 2/2 Running 0 3h
node-directory-size-metrics-w7zxh 2/2 Running 0 3h
node-directory-size-metrics-z2m5j 2/2 Running 0 3h
prometheus-core-59778c7987-vh89h 0/1 Terminating 0 3h
prometheus-node-exporter-27fjg 1/1 Running 0 3h
prometheus-node-exporter-2t5v6 1/1 Running 0 3h
prometheus-node-exporter-hhxmv 1/1 Running 0 3h
Then
What you expected to happen:
Pod to be deleted
How to reproduce it (as minimally and precisely as possible):
We feel that the there might have been an IO error with the storage on the pods. Kubernetes has its own dedicated direct storage. All hosted on AWS. Use of t3.xl
Anything else we need to know?:
It seems to happen randomly but happens often enough as we have to reboot the entire cluster. Do stuck in termination can be ok to deal with, having no logs or no control to really force remove them and start again is frustrating.
Environment:
- Kubernetes version (use kubectl version):
kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:08:34Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration:
AWS
- OS (e.g. from /etc/os-release):
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
Kernel (e.g. uname -a):
Linux 3.10.0-862.6.3.el7.x86_64 #1 SMP Tue Jun 26 16:32:21 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Install tools:
Kubernetes was deployed with Kuberpray with GlusterFS as a container volume and Weave as its networking.
Others:
2 master 1 node setup. We have redeployed the entire setup and still get hit by the same issue.
I have posted this question on their issues page:
https://github.com/kubernetes/kubernetes/issues/68829
But no reply.
Logs from API:
[root#master0 manifests]# kubectl -n monitoring delete pod prometheus-core-59778c7987-bl2h4 --force --grace-period=0 -v9
I0919 13:53:08.770798 19973 loader.go:359] Config loaded from file /root/.kube/config
I0919 13:53:08.771440 19973 loader.go:359] Config loaded from file /root/.kube/config
I0919 13:53:08.772681 19973 loader.go:359] Config loaded from file /root/.kube/config
I0919 13:53:08.780266 19973 loader.go:359] Config loaded from file /root/.kube/config
I0919 13:53:08.780943 19973 loader.go:359] Config loaded from file /root/.kube/config
I0919 13:53:08.781609 19973 loader.go:359] Config loaded from file /root/.kube/config
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
I0919 13:53:08.781876 19973 request.go:897] Request Body: {"gracePeriodSeconds":0,"propagationPolicy":"Foreground"}
I0919 13:53:08.781938 19973 round_trippers.go:386] curl -k -v -XDELETE -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: kubectl/v1.11.0 (linux/amd64) kubernetes/91e7b4f" 'https://10.1.1.28:6443/api/v1/namespaces/monitoring/pods/prometheus-core-59778c7987-bl2h4'
I0919 13:53:08.798682 19973 round_trippers.go:405] DELETE https://10.1.1.28:6443/api/v1/namespaces/monitoring/pods/prometheus-core-59778c7987-bl2h4 200 OK in 16 milliseconds
I0919 13:53:08.798702 19973 round_trippers.go:411] Response Headers:
I0919 13:53:08.798709 19973 round_trippers.go:414] Content-Type: application/json
I0919 13:53:08.798714 19973 round_trippers.go:414] Content-Length: 3199
I0919 13:53:08.798719 19973 round_trippers.go:414] Date: Wed, 19 Sep 2018 13:53:08 GMT
I0919 13:53:08.798758 19973 request.go:897] Response Body: {"kind":"Pod","apiVersion":"v1","metadata":{"name":"prometheus-core-59778c7987-bl2h4","generateName":"prometheus-core-59778c7987-","namespace":"monitoring","selfLink":"/api/v1/namespaces/monitoring/pods/prometheus-core-59778c7987-bl2h4","uid":"7647d17a-bc11-11e8-bd71-06b8eceafd88","resourceVersion":"676465","creationTimestamp":"2018-09-19T13:39:41Z","deletionTimestamp":"2018-09-19T13:40:18Z","deletionGracePeriodSeconds":0,"labels":{"app":"prometheus","component":"core","pod-template-hash":"1533473543"},"ownerReferences":[{"apiVersion":"apps/v1","kind":"ReplicaSet","name":"prometheus-core-59778c7987","uid":"75aba047-bc11-11e8-bd71-06b8eceafd88","controller":true,"blockOwnerDeletion":true}],"finalizers":["foregroundDeletion"]},"spec":{"volumes":[{"name":"config-volume","configMap":{"name":"prometheus-core","defaultMode":420}},{"name":"rules-volume","configMap":{"name":"prometheus-rules","defaultMode":420}},{"name":"api-token","secret":{"secretName":"api-token","defaultMode":420}},{"name":"ca-crt","secret":{"secretName":"ca-crt","defaultMode":420}},{"name":"prometheus-k8s-token-trclf","secret":{"secretName":"prometheus-k8s-token-trclf","defaultMode":420}}],"containers":[{"name":"prometheus","image":"prom/prometheus:v1.7.0","args":["-storage.local.retention=12h","-storage.local.memory-chunks=500000","-config.file=/etc/prometheus/prometheus.yaml","-alertmanager.url=http://alertmanager:9093/"],"ports":[{"name":"webui","containerPort":9090,"protocol":"TCP"}],"resources":{"limits":{"cpu":"500m","memory":"500M"},"requests":{"cpu":"500m","memory":"500M"}},"volumeMounts":[{"name":"config-volume","mountPath":"/etc/prometheus"},{"name":"rules-volume","mountPath":"/etc/prometheus-rules"},{"name":"api-token","mountPath":"/etc/prometheus-token"},{"name":"ca-crt","mountPath":"/etc/prometheus-ca"},{"name":"prometheus-k8s-token-trclf","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"prometheus-k8s","serviceAccount":"prometheus-k8s","nodeName":"master1.infra.cde","securityContext":{},"schedulerName":"default-scheduler"},"status":{"phase":"Pending","conditions":[{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2018-09-19T13:39:41Z"},{"type":"Ready","status":"False","lastProbeTime":null,"lastTransitionTime":"2018-09-19T13:39:41Z","reason":"ContainersNotReady","message":"containers with unready status: [prometheus]"},{"type":"ContainersReady","status":"False","lastProbeTime":null,"lastTransitionTime":null,"reason":"ContainersNotReady","message":"containers with unready status: [prometheus]"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2018-09-19T13:39:41Z"}],"hostIP":"10.1.1.187","startTime":"2018-09-19T13:39:41Z","containerStatuses":[{"name":"prometheus","state":{"terminated":{"exitCode":0,"startedAt":null,"finishedAt":null}},"lastState":{},"ready":false,"restartCount":0,"image":"prom/prometheus:v1.7.0","imageID":""}],"qosClass":"Guaranteed"}}
pod "prometheus-core-59778c7987-bl2h4" force deleted
I0919 13:53:08.798864 19973 round_trippers.go:386] curl -k -v -XGET -H "Accept: application/json" -H "User-Agent: kubectl/v1.11.0 (linux/amd64) kubernetes/91e7b4f" 'https://10.1.1.28:6443/api/v1/namespaces/monitoring/pods/prometheus-core-59778c7987-bl2h4'
I0919 13:53:08.801386 19973 round_trippers.go:405] GET https://10.1.1.28:6443/api/v1/namespaces/monitoring/pods/prometheus-core-59778c7987-bl2h4 200 OK in 2 milliseconds
I0919 13:53:08.801403 19973 round_trippers.go:411] Response Headers:
I0919 13:53:08.801409 19973 round_trippers.go:414] Content-Type: application/json
I0919 13:53:08.801415 19973 round_trippers.go:414] Content-Length: 3199
I0919 13:53:08.801420 19973 round_trippers.go:414] Date: Wed, 19 Sep 2018 13:53:08 GMT
I0919 13:53:08.801465 19973 request.go:897] Response Body: {"kind":"Pod","apiVersion":"v1","metadata":{"name":"prometheus-core-59778c7987-bl2h4","generateName":"prometheus-core-59778c7987-","namespace":"monitoring","selfLink":"/api/v1/namespaces/monitoring/pods/prometheus-core-59778c7987-bl2h4","uid":"7647d17a-bc11-11e8-bd71-06b8eceafd88","resourceVersion":"676465","creationTimestamp":"2018-09-19T13:39:41Z","deletionTimestamp":"2018-09-19T13:40:18Z","deletionGracePeriodSeconds":0,"labels":{"app":"prometheus","component":"core","pod-template-hash":"1533473543"},"ownerReferences":[{"apiVersion":"apps/v1","kind":"ReplicaSet","name":"prometheus-core-59778c7987","uid":"75aba047-bc11-11e8-bd71-06b8eceafd88","controller":true,"blockOwnerDeletion":true}],"finalizers":["foregroundDeletion"]},"spec":{"volumes":[{"name":"config-volume","configMap":{"name":"prometheus-core","defaultMode":420}},{"name":"rules-volume","configMap":{"name":"prometheus-rules","defaultMode":420}},{"name":"api-token","secret":{"secretName":"api-token","defaultMode":420}},{"name":"ca-crt","secret":{"secretName":"ca-crt","defaultMode":420}},{"name":"prometheus-k8s-token-trclf","secret":{"secretName":"prometheus-k8s-token-trclf","defaultMode":420}}],"containers":[{"name":"prometheus","image":"prom/prometheus:v1.7.0","args":["-storage.local.retention=12h","-storage.local.memory-chunks=500000","-config.file=/etc/prometheus/prometheus.yaml","-alertmanager.url=http://alertmanager:9093/"],"ports":[{"name":"webui","containerPort":9090,"protocol":"TCP"}],"resources":{"limits":{"cpu":"500m","memory":"500M"},"requests":{"cpu":"500m","memory":"500M"}},"volumeMounts":[{"name":"config-volume","mountPath":"/etc/prometheus"},{"name":"rules-volume","mountPath":"/etc/prometheus-rules"},{"name":"api-token","mountPath":"/etc/prometheus-token"},{"name":"ca-crt","mountPath":"/etc/prometheus-ca"},{"name":"prometheus-k8s-token-trclf","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"prometheus-k8s","serviceAccount":"prometheus-k8s","nodeName":"master1.infra.cde","securityContext":{},"schedulerName":"default-scheduler"},"status":{"phase":"Pending","conditions":[{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2018-09-19T13:39:41Z"},{"type":"Ready","status":"False","lastProbeTime":null,"lastTransitionTime":"2018-09-19T13:39:41Z","reason":"ContainersNotReady","message":"containers with unready status: [prometheus]"},{"type":"ContainersReady","status":"False","lastProbeTime":null,"lastTransitionTime":null,"reason":"ContainersNotReady","message":"containers with unready status: [prometheus]"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2018-09-19T13:39:41Z"}],"hostIP":"10.1.1.187","startTime":"2018-09-19T13:39:41Z","containerStatuses":[{"name":"prometheus","state":{"terminated":{"exitCode":0,"startedAt":null,"finishedAt":null}},"lastState":{},"ready":false,"restartCount":0,"image":"prom/prometheus:v1.7.0","imageID":""}],"qosClass":"Guaranteed"}}
I0919 13:53:08.801758 19973 round_trippers.go:386] curl -k -v -XGET -H "Accept: application/json" -H "User-Agent: kubectl/v1.11.0 (linux/amd64) kubernetes/91e7b4f" 'https://10.1.1.28:6443/api/v1/namespaces/monitoring/pods?fieldSelector=metadata.name%3Dprometheus-core-59778c7987-bl2h4&resourceVersion=676465&watch=true'
I0919 13:53:08.803409 19973 round_trippers.go:405] GET https://10.1.1.28:6443/api/v1/namespaces/monitoring/pods?fieldSelector=metadata.name%3Dprometheus-core-59778c7987-bl2h4&resourceVersion=676465&watch=true 200 OK in 1 milliseconds
I0919 13:53:08.803424 19973 round_trippers.go:411] Response Headers:
I0919 13:53:08.803430 19973 round_trippers.go:414] Date: Wed, 19 Sep 2018 13:53:08 GMT
I0919 13:53:08.803436 19973 round_trippers.go:414] Content-Type: application/json
After some investigation and help from the Kubernetes community over on github. We found the solution. The answer is, in 1.11.0 there is a known bug in relation to this issue. after upgrading to 1.12.0 the issue was resolved. The issue is noted to be resolved in 1.11.1
Thanks to cduchesne https://github.com/kubernetes/kubernetes/issues/68829#issuecomment-422878108
Some times a Kubernetes workers have problems like zombie process or kernel panics or IO waiting. But when you want to delete a pod that use Storage and have many IO/PS like Prometheus DB , your worker can not kill that pods.
I had same situation like you but on Container Linux without any cloud platform like AWS and Gcloud. i just rebooted my broke worker and after that delete them normally without --grace-period=0. --grace-period=0 is a very bad command when your nodes and pods are running without any problems.
workers can reboot when you use an K8S. This is a good fetcher of K8S.
For run Prometheus you should make some Prometheus with different config or use federation for scale Prometheus if you want to have a Monitoring System without IO problem .
After you issue the kubectl delete I would log into the nodes where the pods are running and debug with docker commands. (assuming your runtime is Docker)
docker logs <container-with-issue>
docker exec -it <container-with-with-issue> bash # maybe the application is hanging.
Are you mounting any volumes for Prometheus? It could be that it's trying to release an EBS volume and the AWS API is unresponsive.
Hope it helps!