Openshift: How to Delete or Manage Specific Pod - kubernetes

I spun up an Openshift 4.6 cluster on AWS, 3 masters, 2 Workers for learning/play. Since it was just for learning, I shut down all nodes at once using the AWS Web Console. When I brought them back up a few days later, the console will not spin up.
So I followed this doc on restarting the cluster. Didn't work, so I started looking at how the pods were doing for the console itself. I ran oc get pods -n openshift-console -o wide -w and got:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
console-59f557f67d-ks5kq 0/1 Pending 0 13m <none> <none> <none> <none>
console-59f557f67d-q6zrc 0/1 UnexpectedAdmissionError 0 4d6h <none> ip-10-0-139-41.us-west-2.compute.internal <none> <none>
console-59f557f67d-w4q7l 0/1 UnexpectedAdmissionError 0 56m <none> ip-10-0-131-234.us-west-2.compute.internal <none> <none>
console-59f557f67d-zvxzn 0/1 UnexpectedAdmissionError 0 59m <none> ip-10-0-131-234.us-west-2.compute.internal <none> <none>
downloads-55f4ff79-lqdj7 1/1 Running 0 59m 10.131.0.4 ip-10-0-208-19.us-west-2.compute.internal <none> <none>
downloads-55f4ff79-mrfzn 1/1 Running 0 59m 10.131.0.13 ip-10-0-208-19.us-west-2.compute.internal <none> <none>
So seeing that there were a few messed up pods, I wanted to look at their logs, but I don't know how to target just a specific pod by the NAME column above. I tried each of these:
oc get pods/console-59f557f67d-q6zrc
oc get podtemplates/console-59f557f67d-q6zrc
I consistently get Error from server (NotFound): pods "console-59f557f67d-q6zrc" not found.
I then found the command oc get pods -n openshift-console -o name which reveals:
pod/console-59f557f67d-ks5kq
pod/console-59f557f67d-q6zrc
pod/console-59f557f67d-w4q7l
pod/console-59f557f67d-zvxzn
pod/downloads-55f4ff79-lqdj7
pod/downloads-55f4ff79-mrfzn
So I was right, it's a "pod", but then if I try to run anything like oc logs <name> it returns the same error that it can't be found. Is this a bug? Does Openshift think there are Pods around that no longer exist and is routing to those Pods despite not existing?
If not, what resource type is the thing under the NAME column? How do I target it with say an oc logs or oc delete command?

I discovered the correct syntax in this Redhat Bugzilla Issue.
The correct syntax is to place the name as another argument after the namespace declaration.
Examples:
oc describe pod -n openshift-console console-59f557f67d-zvxzn
oc logs pod -n openshift-console console-59f557f67d-zvxzn
oc delete pod -n openshift-console console-59f557f67d-zvxzn
I'll follow up and update when I find this reference in the official docs or command line help.

Related

Rook ceph broken on kubernetes?

Using Ceph v1.14.10, Rook v1.3.8 on k8s 1.16 on-premise. After 10 days without any trouble, we decided to drain some nodes, then, all moved pods cant attach to their PV any more, look like Ceph cluster is broken:
My ConfigMap rook-ceph-mon-endpoints is referencing 2 missing mon pod IPs:
csi-cluster-config-json: '[{"clusterID":"rook-ceph","monitors":["10.115.0.129:6789","10.115.0.4:6789","10.115.0.132:6789"]}]
But
kubectl -n rook-ceph get pod -l app=rook-ceph-mon -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rook-ceph-mon-e-56b849775-4g5wg 1/1 Running 0 6h42m 10.115.0.2 XXXX <none> <none>
rook-ceph-mon-h-fc486fb5c-8mvng 1/1 Running 0 6h42m 10.115.0.134 XXXX <none> <none>
rook-ceph-mon-i-65666fcff4-4ft49 1/1 Running 0 30h 10.115.0.132 XXXX <none> <none>
Is it normal or I must run a kind of "reconciliation" task to update the CM with new mon pod IPs ?
(could be related to https://github.com/rook/rook/issues/2262)
I had to manualy update:
secret rook-ceph-config
cm rook-ceph-mon-endpoints
cm rook-ceph-csi-config
As #travisn said:
The operator owns updating that configmap and secret. It's not expected to update them manually unless there is some disaster recovery situation as described at https://rook.github.io/docs/rook/v1.4/ceph-disaster-recovery.html.

Kubernetes coredns pods stuck in Pending status. Cannot start the dashboard

I am building a Kubernetes cluster following this tutorial, and I have troubles to access the Kubernetes dashboard. I already created another question about it that you can see here, but while digging up into my cluster, I think that the problem might be somewhere else and that's why I create a new question.
I start my master, by running the following commands:
> kubeadm reset
> kubeadm init --apiserver-advertise-address=[MASTER_IP] > file.txt
> tail -2 file.txt > join.sh # I keep this file for later
> kubectl apply -f https://git.io/weave-kube/
> kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 2m46s
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 2m46s
etcd-kubemaster 1/1 Running 0 93s
kube-apiserver-kubemaster 1/1 Running 0 93s
kube-controller-manager-kubemaster 1/1 Running 0 113s
kube-proxy-lxhvs 1/1 Running 0 2m46s
kube-scheduler-kubemaster 1/1 Running 0 93s
Here we can see that I have two coredns pods stuck in Pending state forever, and when I run the command :
> kubectl -n kube-system describe pod coredns-fb8b8dccf-kb2zq
I can see in the Events part the following Warning :
Failed Scheduling : 0/1 nodes are available 1 node(s) had taints that the pod didn't tolerate.
Since it is a Warning and not and Error, and that as a Kubernetes newbie, taints does not mean much to me, I tried to connect a node to the master (using the previously saved command) :
> cat join.sh
kubeadm join [MASTER_IP]:6443 --token [TOKEN] \
--discovery-token-ca-cert-hash sha256:[ANOTHER_TOKEN]
> ssh [USER]#[WORKER_IP] 'bash' < join.sh
This node has joined the cluster.
On the master, I check that the node is connected:
> kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubemaster NotReady master 13m v1.14.1
kubeslave1 NotReady <none> 31s v1.14.1
And I check my pods :
> kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 14m
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 14m
etcd-kubemaster 1/1 Running 0 13m
kube-apiserver-kubemaster 1/1 Running 0 13m
kube-controller-manager-kubemaster 1/1 Running 0 13m
kube-proxy-lxhvs 1/1 Running 0 14m
kube-proxy-xllx4 0/1 ContainerCreating 0 2m16s
kube-scheduler-kubemaster 1/1 Running 0 13m
We can see that another kube-proxy pod have been created and is stuck in ContainerCreating status.
And when I am doing a describe again :
kubectl -n kube-system describe pod kube-proxy-xllx4
I can see in the Events part multiple identical Warnings :
Failed create pod sandbox : rpx error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.1": Get https://k8s.gcr.io/v1/_ping: dial tcp: lookup k8s.gcr.io on [::1]:53 read up [::1]43133->[::1]:53: read: connection refused
Here are my repositories :
docker image ls
REPOSITORY TAG
k8s.gcr.io/kube-proxy v1.14.1
k8s.gcr.io/kube-apiserver v1.14.1
k8s.gcr.io/kube-controller-manager v1.14.1
k8s.gcr.io/kube-scheduler v1.14.1
k8s.gcr.io/coredns 1.3.1
k8s.gcr.io/etcd 3.3.10
k8s.gcr.io/pause 3.1
And so, for the dashboard part, I tried to start it with the command
> kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/aio/deploy/recommended/kubernetes-dashboard.yaml
But the dashboard pod is stuck in Pending state.
kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 40m
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 40m
etcd-kubemaster 1/1 Running 0 38m
kube-apiserver-kubemaster 1/1 Running 0 38m
kube-controller-manager-kubemaster 1/1 Running 0 39m
kube-proxy-lxhvs 1/1 Running 0 40m
kube-proxy-xllx4 0/1 ContainerCreating 0 27m
kube-scheduler-kubemaster 1/1 Running 0 38m
kubernetes-dashboard-5f7b999d65-qn8qn 1/1 Pending 0 8s
So, event though my problem originaly was that I cannot access to my dashboard, I guess that the real problem is deeper thant that.
I know that I just put a lot of information here, but I am a k8s beginner and I am completely lost on this.
There is an issue I experienced with coredns pods stuck in a pending mode when setting up your own cluster; which I resolve by adding pod network.
Looks like because there is no Network Addon installed, the nodes are taint as not-ready. Installing the Addon would remove the taints and the Pods will be able to schedule. In my case adding flannel fixed the issue.
EDIT: There is a note about this in the official k8s documentation - Create cluster with kubeadm:
The network must be deployed before any applications. Also, CoreDNS
will not start up before a network is installed. kubeadm only
supports Container Network Interface (CNI) based networks (and does
not support kubenet).
Actually it is the opposite of a deep or serious issue. This is a trivial issue. Always you see a pod stuck on Pending state, it means the scheduler is having a hard time to schedule the pod; mostly because there are no enough resources on the node.
In your case it is a taint that has the node, and your pod doesn't have the toleration. What you have to do is to describe the node and get the taint:
kubectl describe node | grep -i taints
Note: you might have more then one taint. So you might want to do kubectl describe no NODE since with grep you will only see one taint.
Once you get the taint, that will be something like hello=world:NoSchedule; which means key=value:effect, you will have to add a toleration section in your Deployment. This is an example Deployment so you can see how it should look like:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx
labels:
app: nginx
spec:
replicas: 10
strategy:
type: Recreate
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx
name: nginx
ports:
- containerPort: 80
name: http
tolerations:
- effect: NoExecute #NoSchedule, PreferNoSchedule
key: node
operator: Equal
value: not-ready
tolerationSeconds: 3600
As you can see there is the toleration section in the yaml. So, if I would have a node with node=not-ready:NoExecute taint, no pod would be able to be scheduled on that node, unless would have this toleration.
Also you can remove the taint, if you don need it. To remove a taint you would describe the node, get the key of the taint and do:
kubectl taint node NODE key-
Hope it makes sense. Just add this section to your deployment, and it will work.
Set up the flannel network tool.
Running commands:
$ sysctl net.bridge.bridge-nf-call-iptables=1
$ kubectl apply -f
https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml

Error from server (NotFound): podmetrics.metrics.k8s.io "mem-example/memory-demo" not found

I am following this tutorial: https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/
I have created the memory pod demo and I am trying to get the metrics from the pod but it is not working.
I installed the metrics server by cloning: https://github.com/kubernetes-incubator/metrics-server
And then running this command from top level:
kubectl create -f deploy/1.8+/
I am using kubernetes version 1.10.11.
The pod is definitely created:
λ kubectl get pod memory-demo --namespace=mem-example
NAME READY STATUS RESTARTS AGE
memory-demo 1/1 Running 0 6m
But the metics command does not work and gives an error:
λ kubectl top pod memory-demo --namespace=mem-example
Error from server (NotFound): podmetrics.metrics.k8s.io "mem-example/memory-demo" not found
What did I do wrong?
There are some patches to be done to metrics server deployment to get the metrics working.
Follow the below steps
kubectl delete -f deploy/1.8+/
wait till the metrics server gets undeployed
run the below command
kubectl create -f https://raw.githubusercontent.com/epasham/docker-repo/master/k8s/metrics-server.yaml
master $ kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-78fcdf6894-6zg78 1/1 Running 0 2h
coredns-78fcdf6894-gk4sb 1/1 Running 0 2h
etcd-master 1/1 Running 0 2h
kube-apiserver-master 1/1 Running 0 2h
kube-controller-manager-master 1/1 Running 0 2h
kube-proxy-f5z9p 1/1 Running 0 2h
kube-proxy-ghbvn 1/1 Running 0 2h
kube-scheduler-master 1/1 Running 0 2h
metrics-server-85c54d44c8-rmvxh 2/2 Running 0 1m
weave-net-4j7cl 2/2 Running 1 2h
weave-net-82fzn 2/2 Running 1 2h
master $ kubectl top pod -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-78fcdf6894-6zg78 2m 11Mi
coredns-78fcdf6894-gk4sb 2m 9Mi
etcd-master 14m 90Mi
kube-apiserver-master 24m 425Mi
kube-controller-manager-master 26m 62Mi
kube-proxy-f5z9p 2m 19Mi
kube-proxy-ghbvn 3m 17Mi
kube-scheduler-master 8m 14Mi
metrics-server-85c54d44c8-rmvxh 1m 19Mi
weave-net-4j7cl 2m 59Mi
weave-net-82fzn 1m 60Mi
Check and verify the below lines in metrics server deployment manifest.
command:
- /metrics-server
- --metric-resolution=30s
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
On Minikube, I had to wait for 20-25 minutes after enabling the metrics-server addon. I was getting the same error for 20-25 minutes but later I could see the output without attempting for any solution.
I faced the similar issue of
Error from server (NotFound): podmetrics.metrics.k8s.io "default/apple-app" not found
I followed two steps and I was able to resolve the issue.
Download the latest customized components.yaml, which is their official file used for easy deployment.
Update the change
# - /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
to the command section of the deployment specification. I have commented the first line because it is the entrypoint of the image used by kubernetes metrics-server.
$ docker image inspect k8s.gcr.io/metrics-server-amd64:v0.3.6 -f {{.ContainerConfig.Entrypoint}}
[/metrics-server]
Even If you use it or not, it doesn't matter.
Note: You have to wait for few seconds for it to properly work.
After this running the top command will work for you.
$ kubectl top pod apple-app
NAME CPU(cores) MEMORY(bytes)
apple-app 1m 3Mi
I know this is an old thread may be someone will find this answer useful.
You have to checkout the following repo:
https://github.com/kubernetes-incubator/metrics-server
Go to the root of the repo and checkout release-0.3.2.
Remove default metrics server by:
kubectl delete -f deploy/1.8+/
Download the container yaml
wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
Edit the container.yaml by adding the following lines to the argument section. You will see these two lines there
args:
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls=true
There is only one args parameter in that file.
Deploy your pod/deployment and you should be able to do:
kubectl top pod <pod-name>

How to get the endpoint for kubernetes-dashboard

I have installed kubernetes using minikube on ubuntu 16.04 machine.
I have also installed kubernetes-dashboard.
When i try accessing the dashboard i get
Waiting, endpoint for service is not registered yet
Waiting, endpoint for service is not ready yet...
Waiting, endpoint for service is not ready yet...
Waiting, endpoint for service is not ready yet...
.....
Could not find finalized endpoint being pointed to by kubernetes-dashboard: Temporary Error: Endpoint for service is not ready yet
Temporary Error: Endpoint for service is not ready yet
Temporary Error: Endpoint for service is not ready yet
Temporary Error: Endpoint for service is not ready yet
Temporary Error: Endpoint for service is not ready yet
Temporary Error: Endpoint for service is not ready yet
`
However, when i try a kubectl get pods --all namespacesi get the below output
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system kube-addon-manager-minikube 1/1 Running 0 11m
kube-system kube-dns-1301475494-xtb3b 3/3 Running 0 8m
kube-system kubernetes-dashboard-2039414953-dvv3m 1/1 Running 0 9m
kube-system kubernetes-dashboard-2crsk 1/1 Running 0 8m
kubectl get endpoints --all-namespaces
NAMESPACE NAME ENDPOINTS AGE
default kubernetes 10.0.2.15:8443 11m
kube-system kube-controller-manager <none> 6m
kube-system kube-dns 172.17.0.4:53,172.17.0.4:53 8m
kube-system kube-scheduler <none> 6m
kube-system kubernetes-dashboard <none> 9m
How can i fix this issue? I don't seem to understand what is wrong. I am completely new to kubernetes
You need to run minikube dashboard. You shouldn't install dashboard separately; it comes with minikube.
some of the minikube commands
./minikube.exe version
./minikube.exe delete
./minikube.exe start --help
./minikube get-k8s-versions
./minikube.exe status
./minikube.exe ip
./minikube.exe dashboard --url=true

What is POD and SERVICE in kubectl commands?

I am probably missing some of the basic. kubectl logs command usage is the following:
"kubectl logs [-f] [-p] POD [-c CONTAINER] [options]"
list of my pods is the following:
ubuntu#master:~$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-master 1/1 Running 0 24m
kube-system kube-apiserver-master 1/1 Running 0 24m
kube-system kube-controller-manager-master 1/1 Running 0 24m
kube-system kube-discovery-982812725-3kt85 1/1 Running 0 24m
kube-system kube-dns-2247936740-kimly 3/3 Running 0 24m
kube-system kube-proxy-amd64-gwv99 1/1 Running 0 20m
kube-system kube-proxy-amd64-r08h9 1/1 Running 0 24m
kube-system kube-proxy-amd64-szl6w 1/1 Running 0 14m
kube-system kube-scheduler-master 1/1 Running 0 24m
kube-system kubernetes-dashboard-1655269645-x3uyt 1/1 Running 0 24m
kube-system weave-net-4g1g8 1/2 CrashLoopBackOff 7 14m
kube-system weave-net-8zdm3 1/2 CrashLoopBackOff 8 20m
kube-system weave-net-qm3q5 2/2 Running 0 24m
I assume POD for logs command is anything from the second "name" column above. So, I try the following commands.
ubuntu#master:~$ kubectl logs etcd-master
Error from server: pods "etcd-master" not found
ubuntu#master:~$ kubectl logs weave-net-4g1g8
Error from server: pods "weave-net-4g1g8" not found
ubuntu#master:~$ kubectl logs weave-net
Error from server: pods "weave-net" not found
ubuntu#master:~$ kubectl logs weave
Error from server: pods "weave" not found
So, what is the POD in the logs command?
I have got the same question about services as well. How to identify a SERVICE to supply into a command, for example for 'describe' command?
ubuntu#master:~$ kubectl get services --all-namespaces
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes 100.64.0.1 <none> 443/TCP 40m
kube-system kube-dns 100.64.0.10 <none> 53/UDP,53/TCP 39m
kube-system kubernetes-dashboard 100.70.83.136 <nodes> 80/TCP 39m
ubuntu#master:~$ kubectl describe service kubernetes-dashboard
Error from server: services "kubernetes-dashboard" not found
ubuntu#master:~$ kubectl describe services kubernetes-dashboard
Error from server: services "kubernetes-dashboard" not found
Also, is it normal that weave-net-8zdm3 is in CrashLoopBackOff state? It seems I have got one for each connected worker. If it is not normal, how can I fix it? I have found similar question here: kube-dns and weave-net not starting but it does not give any practical answer.
Thanks for your help!
It seems you are running your pods in a different namespace than default.
ubuntu#master:~$ kubectl get pods --all-namespaces returns your pods but ubuntu#master:~$ kubectl logs etcd-masterreturns not found. Try running kubectl logs etcd-master --all-namespaces or if you know your namespace kubectl logs etcd-mastern --namespace=mynamespace.
The same thing goes for your services.