Debugging kubernetes pods in Completed status forever but not ready - kubernetes

Would there be any reason if I'm trying to run a pod on a k8s cluster which stays in the 'Completed' status forever but is never in ready status 'Ready 0/1' ... although there are core-dns , kube-proxy pods,etc running successfully scheduled under each node in the nodepool assigned to a k8s cluster ... all the worker nodes seems to be in healthy state

This sounds like the pod lifecycle has ended and the reasons is probably because your pod have finished the task it is meant for.
Something like next example will not do anything, it will be created successfully then started, will pull the image and then it will be marked as completed.
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
containers:
- name: test-pod
image: busybox
resources:
Here is how it will look:
NAMESPACE NAME READY STATUS RESTARTS AGE
default my-pod 0/1 Completed 1 3s
default testapp-1 0/1 Pending 0 92m
default web-0 1/1 Running 0 3h2m
kube-system event-exporter-v0.2.5-7df89f4b8f-mc7kt 2/2 Running 0 7d19h
kube-system fluentd-gcp-scaler-54ccb89d5-9pp8z 1/1 Running 0 7d19h
kube-system fluentd-gcp-v3.1.1-gdr9l 2/2 Running 0 123m
kube-system fluentd-gcp-v3.1.1-pt8zp 2/2 Running 0 3h2m
kube-system fluentd-gcp-v3.1.1-xwhkn 2/2 Running 5 172m
kube-system fluentd-gcp-v3.1.1-zh29w 2/2 Running 0 7d19h
For this case, I will recommend you to check your yaml, check what kind of pod you are running? and what is it meant for?
Even, and with test purposes, you can add a command argument to keep it running.
command: ["/bin/sh","-c"]
args: ["command one; command two && command three"]
something like:
args: ["-c", "while true; do echo hello; sleep 10;done"]
The yaml with commands added would look like this:
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
containers:
- name: test-pod
image: busybox
resources:
command: ["/bin/sh","-c"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
Here is how it will look:
NAMESPACE NAME READY STATUS RESTARTS AGE
default my-pod 1/1 Running 0 8m56s
default testapp-1 0/1 Pending 0 90m
default web-0 1/1 Running 0 3h
kube-system event-exporter-v0.2.5-7df89f4b8f-mc7kt 2/2 Running 0 7d19h
kube-system fluentd-gcp-scaler-54ccb89d5-9pp8z 1/1 Running 0 7d19h
kube-system fluentd-gcp-v3.1.1-gdr9l 2/2 Running 0 122m
Another thing that will help is a kubectl describe pod $POD_NAME, to analyze this further.

This issue got nothing to do with k8s cluster or setup. It's mostly with your Dockerfile.
For the pod to be in running state, you need to have the container as an executable, for that usually an Entrypoint that specifies the instructions would solve the problem you are facing.
A better approach to resolve this or debug this particular issue would be, by running the respective docker container on a localmachine before deploying that onto k8s cluster.

The message Completed, most likely implies that the k8s started your container, then the container subsequently exited.
In Dockerfile, the ENTRYPOINT defines the process with pid 1 which the docker container should hold, otherwise it terminates, so whatever the pod is trying to do, you need to check whether the ENTRYPOINT command is looping, and doesn't die.
For side notes, Kubernetes will try to restart the pod, once it terminates but after few tries the pod status may turn into CrashLoopBackOff and you will see message something similar to Back-off restarting failed container.

Related

Controll job deletions

We have a cronjob monitoring in our cluster. If a pod did not appear in 24 hours, It means that the cronjob haven't ran and we need to alert. But sometimes, due to some garbage collection, pod is deleted (but job completed successfully). How to keep all pods and avoid garbage collection? I know about finalizers, but looks like It's not working in this case.
Posting this as answer since it's a reason why it can happen.
Answer
Cloud kubernetes clusters have nodes autoscaling policies. Or sometimes node pools can be scaled down/up manually.
Cronjob creates job for each run which in turn creates a corresponding pod. Pods are assigned to exact nodes. And if for any reason node with assigned to it pod(s) was removed due to node autoscaling/manual scaling, pods will be gone. However jobs will be preserved since they are stored in etcd.
There are two flags which control amount of jobs stored in the history:
.spec.successfulJobsHistoryLimit - which is by default set to 3
.spec.failedJobsHistoryLimit - set by default to 1
If setting up 0 then everything will be removed right after jobs finish.
Jobs History Limits
How it happens in fact
I have a GCP GKE cluster with two nodes:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-cluster-xxxx Ready <none> 15h v1.21.3-gke.2001
gke-cluster-yyyy Ready <none> 3d20h v1.21.3-gke.2001
cronjob.yaml for testing:
apiVersion: batch/v1
kind: CronJob
metadata:
name: test-cronjob
spec:
schedule: "*/2 * * * *"
successfulJobsHistoryLimit: 5
jobTemplate:
spec:
template:
spec:
containers:
- name: test
image: busybox
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
Pods created:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-cronjob-27253914-mxnzg 0/1 Completed 0 8m59s 10.24.0.22 gke-cluster-4-xxxx <none> <none>
test-cronjob-27253916-88cjn 0/1 Completed 0 6m59s 10.24.0.25 gke-cluster-4-xxxx <none> <none>
test-cronjob-27253918-hdcg9 0/1 Completed 0 4m59s 10.24.0.29 gke-cluster-4-xxxx <none> <none>
test-cronjob-27253920-shnnp 0/1 Completed 0 2m59s 10.24.1.15 gke-cluster-4-yyyy <none> <none>
test-cronjob-27253922-cw5gp 0/1 Completed 0 59s 10.24.1.18 gke-cluster-4-yyyy <none> <none>
Scaling down one node:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-cluster-4-xxxx NotReady,SchedulingDisabled <none> 16h v1.21.3-gke.2001
gke-cluster-4-yyyy Ready <none> 3d21h v1.21.3-gke.2001
And getting pods now:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-cronjob-27253920-shnnp 0/1 Completed 0 7m47s 10.24.1.15 gke-cluster-4-yyyy <none> <none>
test-cronjob-27253922-cw5gp 0/1 Completed 0 5m47s 10.24.1.18 gke-cluster-4-yyyy <none> <none>
Previously completed pods on the first node are gone now.
Jobs are still in place:
$ kubectl get jobs
NAME COMPLETIONS DURATION AGE
test-cronjob-27253914 1/1 1s 13m
test-cronjob-27253916 1/1 2s 11m
test-cronjob-27253918 1/1 1s 9m55s
test-cronjob-27253920 1/1 34s 7m55s
test-cronjob-27253922 1/1 2s 5m55s
How it can be solved
Changing monitoring alert to look for jobs completion is much more precise method and independent to any cluster nodes scaling actions.
E.g. I still can retrieve a result from job test-cronjob-27253916 where corresponding pod to it is deleted:
$ kubectl get job test-cronjob-27253916 -o jsonpath='{.status.succeeded'}
1
Useful links:
Jobs history limits
Garbage collection in kubernetes
TTL controller for finished resources

kubernetes pending pod priority

I have the following pods on my kubernetes (1.18.3) cluster:
NAME READY STATUS RESTARTS AGE
pod1 1/1 Running 0 14m
pod2 1/1 Running 0 14m
pod3 0/1 Pending 0 14m
pod4 0/1 Pending 0 14m
pod3 and pod4 cannot start because the node has capacity for 2 pods only. When pod1 finishes and quits, then the scheduler picks either pod3 or pod4 and starts it. So far so good.
However, I also have a high priority pod (hpod) that I'd like to start before pod3 or pod4 when either of the running pods finishes and quits.
So I created a priorityclass can be found in the kubernetes docs:
kind: PriorityClass
metadata:
name: high-priority-no-preemption
value: 1000000
preemptionPolicy: Never
globalDefault: false
description: "This priority class should be used for XYZ service pods only."
I've created the following pod yaml:
apiVersion: v1
kind: Pod
metadata:
name: hpod
labels:
app: hpod
spec:
containers:
- name: hpod
image: ...
resources:
requests:
cpu: "500m"
memory: "500Mi"
limits:
cpu: "500m"
memory: "500Mi"
priorityClassName: high-priority-no-preemption
Now the problem is that when I start the high prio pod with kubectl apply -f hpod.yaml, then the scheduler terminates a running pod to allow the high priority pod to start despite I've set 'preemptionPolicy: Never'.
The expected behaviour would be to postpone starting hpod until a currently running pod finishes. And when it does, then let hpod start before pod3 or pod4.
What am I doing wrong?
Prerequisites:
This solution was tested on Kubernetes v1.18.3, docker 19.03 and Ubuntu 18.
Also text editor is required (i.e. sudo apt-get install vim).
In Kubernetes documentation under How to disable preemption you can find Note:
Note: In Kubernetes 1.15 and later, if the feature NonPreemptingPriority is enabled, PriorityClasses have the option to set preemptionPolicy: Never. This will prevent pods of that PriorityClass from preempting other pods.
Also under Non-preempting PriorityClass you have information:
The use of the PreemptionPolicy field requires the NonPreemptingPriority feature gate to be enabled.
Later if you will check thoses Feature Gates info, you will find that NonPreemptingPriority is false, so as default it's disabled.
Output with your current configuration:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-normal 1/1 Running 0 32s
nginx-normal-2 1/1 Running 0 32s
$ kubectl apply -f prio.yaml
pod/nginx-priority created$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-normal-2 1/1 Running 0 48s
nginx-priority 1/1 Running 0 8s
To enable preemptionPolicy: Never you need to apply --feature-gates=NonPreemptingPriority=true to 3 files:
/etc/kubernetes/manifests/kube-apiserver.yaml
/etc/kubernetes/manifests/kube-controller-manager.yaml
/etc/kubernetes/manifests/kube-scheduler.yaml
To check if this feature-gate is enabled you can check by using commands:
ps aux | grep apiserver | grep feature-gates
ps aux | grep scheduler | grep feature-gates
ps aux | grep controller-manager | grep feature-gates
For quite detailed information, why you have to edit thoses files please check this Github thread.
$ sudo su
# cd /etc/kubernetes/manifests/
# ls
etcd.yaml kube-apiserver.yaml kube-controller-manager.yaml kube-scheduler.yaml
Use your text editor to add feature gate to those files
# vi kube-apiserver.yaml
and add - --feature-gates=NonPreemptingPriority=true under spec.containers.command like in example bellow:
spec:
containers:
- command:
- kube-apiserver
- --feature-gates=NonPreemptingPriority=true
- --advertise-address=10.154.0.31
And do the same with 2 other files. After that you can check if this flags were applied.
$ ps aux | grep apiserver | grep feature-gates
root 26713 10.4 5.2 565416 402252 ? Ssl 14:50 0:17 kube-apiserver --feature-gates=NonPreemptingPriority=true --advertise-address=10.154.0.31
Now you have redeploy your PriorityClass.
$ kubectl get priorityclass
NAME VALUE GLOBAL-DEFAULT AGE
high-priority-no-preemption 1000000 false 12m
system-cluster-critical 2000000000 false 23m
system-node-critical 2000001000 false 23m
$ kubectl delete priorityclass high-priority-no-preemption
priorityclass.scheduling.k8s.io "high-priority-no-preemption" deleted
$ kubectl apply -f class.yaml
priorityclass.scheduling.k8s.io/high-priority-no-preemption created
Last step is to deploy pod with this PriorityClass.
TEST
$ kubectl get po
NAME READY STATUS RESTARTS AGE
nginx-normal 1/1 Running 0 4m4s
nginx-normal-2 1/1 Running 0 18m
$ kubectl apply -f prio.yaml
pod/nginx-priority created
$ kubectl get po
NAME READY STATUS RESTARTS AGE
nginx-normal 1/1 Running 0 5m17s
nginx-normal-2 1/1 Running 0 20m
nginx-priority 0/1 Pending 0 67s
$ kubectl delete po nginx-normal-2
pod "nginx-normal-2" deleted
$ kubectl get po
NAME READY STATUS RESTARTS AGE
nginx-normal 1/1 Running 0 5m55s
nginx-priority 1/1 Running 0 105s

Kubernetes coredns pods stuck in Pending status. Cannot start the dashboard

I am building a Kubernetes cluster following this tutorial, and I have troubles to access the Kubernetes dashboard. I already created another question about it that you can see here, but while digging up into my cluster, I think that the problem might be somewhere else and that's why I create a new question.
I start my master, by running the following commands:
> kubeadm reset
> kubeadm init --apiserver-advertise-address=[MASTER_IP] > file.txt
> tail -2 file.txt > join.sh # I keep this file for later
> kubectl apply -f https://git.io/weave-kube/
> kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 2m46s
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 2m46s
etcd-kubemaster 1/1 Running 0 93s
kube-apiserver-kubemaster 1/1 Running 0 93s
kube-controller-manager-kubemaster 1/1 Running 0 113s
kube-proxy-lxhvs 1/1 Running 0 2m46s
kube-scheduler-kubemaster 1/1 Running 0 93s
Here we can see that I have two coredns pods stuck in Pending state forever, and when I run the command :
> kubectl -n kube-system describe pod coredns-fb8b8dccf-kb2zq
I can see in the Events part the following Warning :
Failed Scheduling : 0/1 nodes are available 1 node(s) had taints that the pod didn't tolerate.
Since it is a Warning and not and Error, and that as a Kubernetes newbie, taints does not mean much to me, I tried to connect a node to the master (using the previously saved command) :
> cat join.sh
kubeadm join [MASTER_IP]:6443 --token [TOKEN] \
--discovery-token-ca-cert-hash sha256:[ANOTHER_TOKEN]
> ssh [USER]#[WORKER_IP] 'bash' < join.sh
This node has joined the cluster.
On the master, I check that the node is connected:
> kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubemaster NotReady master 13m v1.14.1
kubeslave1 NotReady <none> 31s v1.14.1
And I check my pods :
> kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 14m
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 14m
etcd-kubemaster 1/1 Running 0 13m
kube-apiserver-kubemaster 1/1 Running 0 13m
kube-controller-manager-kubemaster 1/1 Running 0 13m
kube-proxy-lxhvs 1/1 Running 0 14m
kube-proxy-xllx4 0/1 ContainerCreating 0 2m16s
kube-scheduler-kubemaster 1/1 Running 0 13m
We can see that another kube-proxy pod have been created and is stuck in ContainerCreating status.
And when I am doing a describe again :
kubectl -n kube-system describe pod kube-proxy-xllx4
I can see in the Events part multiple identical Warnings :
Failed create pod sandbox : rpx error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.1": Get https://k8s.gcr.io/v1/_ping: dial tcp: lookup k8s.gcr.io on [::1]:53 read up [::1]43133->[::1]:53: read: connection refused
Here are my repositories :
docker image ls
REPOSITORY TAG
k8s.gcr.io/kube-proxy v1.14.1
k8s.gcr.io/kube-apiserver v1.14.1
k8s.gcr.io/kube-controller-manager v1.14.1
k8s.gcr.io/kube-scheduler v1.14.1
k8s.gcr.io/coredns 1.3.1
k8s.gcr.io/etcd 3.3.10
k8s.gcr.io/pause 3.1
And so, for the dashboard part, I tried to start it with the command
> kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/aio/deploy/recommended/kubernetes-dashboard.yaml
But the dashboard pod is stuck in Pending state.
kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 40m
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 40m
etcd-kubemaster 1/1 Running 0 38m
kube-apiserver-kubemaster 1/1 Running 0 38m
kube-controller-manager-kubemaster 1/1 Running 0 39m
kube-proxy-lxhvs 1/1 Running 0 40m
kube-proxy-xllx4 0/1 ContainerCreating 0 27m
kube-scheduler-kubemaster 1/1 Running 0 38m
kubernetes-dashboard-5f7b999d65-qn8qn 1/1 Pending 0 8s
So, event though my problem originaly was that I cannot access to my dashboard, I guess that the real problem is deeper thant that.
I know that I just put a lot of information here, but I am a k8s beginner and I am completely lost on this.
There is an issue I experienced with coredns pods stuck in a pending mode when setting up your own cluster; which I resolve by adding pod network.
Looks like because there is no Network Addon installed, the nodes are taint as not-ready. Installing the Addon would remove the taints and the Pods will be able to schedule. In my case adding flannel fixed the issue.
EDIT: There is a note about this in the official k8s documentation - Create cluster with kubeadm:
The network must be deployed before any applications. Also, CoreDNS
will not start up before a network is installed. kubeadm only
supports Container Network Interface (CNI) based networks (and does
not support kubenet).
Actually it is the opposite of a deep or serious issue. This is a trivial issue. Always you see a pod stuck on Pending state, it means the scheduler is having a hard time to schedule the pod; mostly because there are no enough resources on the node.
In your case it is a taint that has the node, and your pod doesn't have the toleration. What you have to do is to describe the node and get the taint:
kubectl describe node | grep -i taints
Note: you might have more then one taint. So you might want to do kubectl describe no NODE since with grep you will only see one taint.
Once you get the taint, that will be something like hello=world:NoSchedule; which means key=value:effect, you will have to add a toleration section in your Deployment. This is an example Deployment so you can see how it should look like:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx
labels:
app: nginx
spec:
replicas: 10
strategy:
type: Recreate
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx
name: nginx
ports:
- containerPort: 80
name: http
tolerations:
- effect: NoExecute #NoSchedule, PreferNoSchedule
key: node
operator: Equal
value: not-ready
tolerationSeconds: 3600
As you can see there is the toleration section in the yaml. So, if I would have a node with node=not-ready:NoExecute taint, no pod would be able to be scheduled on that node, unless would have this toleration.
Also you can remove the taint, if you don need it. To remove a taint you would describe the node, get the key of the taint and do:
kubectl taint node NODE key-
Hope it makes sense. Just add this section to your deployment, and it will work.
Set up the flannel network tool.
Running commands:
$ sysctl net.bridge.bridge-nf-call-iptables=1
$ kubectl apply -f
https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml

Error from server (NotFound): podmetrics.metrics.k8s.io "mem-example/memory-demo" not found

I am following this tutorial: https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/
I have created the memory pod demo and I am trying to get the metrics from the pod but it is not working.
I installed the metrics server by cloning: https://github.com/kubernetes-incubator/metrics-server
And then running this command from top level:
kubectl create -f deploy/1.8+/
I am using kubernetes version 1.10.11.
The pod is definitely created:
λ kubectl get pod memory-demo --namespace=mem-example
NAME READY STATUS RESTARTS AGE
memory-demo 1/1 Running 0 6m
But the metics command does not work and gives an error:
λ kubectl top pod memory-demo --namespace=mem-example
Error from server (NotFound): podmetrics.metrics.k8s.io "mem-example/memory-demo" not found
What did I do wrong?
There are some patches to be done to metrics server deployment to get the metrics working.
Follow the below steps
kubectl delete -f deploy/1.8+/
wait till the metrics server gets undeployed
run the below command
kubectl create -f https://raw.githubusercontent.com/epasham/docker-repo/master/k8s/metrics-server.yaml
master $ kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-78fcdf6894-6zg78 1/1 Running 0 2h
coredns-78fcdf6894-gk4sb 1/1 Running 0 2h
etcd-master 1/1 Running 0 2h
kube-apiserver-master 1/1 Running 0 2h
kube-controller-manager-master 1/1 Running 0 2h
kube-proxy-f5z9p 1/1 Running 0 2h
kube-proxy-ghbvn 1/1 Running 0 2h
kube-scheduler-master 1/1 Running 0 2h
metrics-server-85c54d44c8-rmvxh 2/2 Running 0 1m
weave-net-4j7cl 2/2 Running 1 2h
weave-net-82fzn 2/2 Running 1 2h
master $ kubectl top pod -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-78fcdf6894-6zg78 2m 11Mi
coredns-78fcdf6894-gk4sb 2m 9Mi
etcd-master 14m 90Mi
kube-apiserver-master 24m 425Mi
kube-controller-manager-master 26m 62Mi
kube-proxy-f5z9p 2m 19Mi
kube-proxy-ghbvn 3m 17Mi
kube-scheduler-master 8m 14Mi
metrics-server-85c54d44c8-rmvxh 1m 19Mi
weave-net-4j7cl 2m 59Mi
weave-net-82fzn 1m 60Mi
Check and verify the below lines in metrics server deployment manifest.
command:
- /metrics-server
- --metric-resolution=30s
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
On Minikube, I had to wait for 20-25 minutes after enabling the metrics-server addon. I was getting the same error for 20-25 minutes but later I could see the output without attempting for any solution.
I faced the similar issue of
Error from server (NotFound): podmetrics.metrics.k8s.io "default/apple-app" not found
I followed two steps and I was able to resolve the issue.
Download the latest customized components.yaml, which is their official file used for easy deployment.
Update the change
# - /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
to the command section of the deployment specification. I have commented the first line because it is the entrypoint of the image used by kubernetes metrics-server.
$ docker image inspect k8s.gcr.io/metrics-server-amd64:v0.3.6 -f {{.ContainerConfig.Entrypoint}}
[/metrics-server]
Even If you use it or not, it doesn't matter.
Note: You have to wait for few seconds for it to properly work.
After this running the top command will work for you.
$ kubectl top pod apple-app
NAME CPU(cores) MEMORY(bytes)
apple-app 1m 3Mi
I know this is an old thread may be someone will find this answer useful.
You have to checkout the following repo:
https://github.com/kubernetes-incubator/metrics-server
Go to the root of the repo and checkout release-0.3.2.
Remove default metrics server by:
kubectl delete -f deploy/1.8+/
Download the container yaml
wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
Edit the container.yaml by adding the following lines to the argument section. You will see these two lines there
args:
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls=true
There is only one args parameter in that file.
Deploy your pod/deployment and you should be able to do:
kubectl top pod <pod-name>

Kubernetes pod not starting

I have a kubernetes cluster with 5 nodes. When I add a simple nginx pod it will be scheduled to one of the nodes but it will not start up. It will not even pull the image.
This is the nginx.yaml file:
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
when I describe the pod there is one event: Successfully assigned busybox to up02 When I log in to the up02 and check to see if there are any images pulled I see it didn't get pulled so I pulled it manually (I thought maybe it needs some kick start ;) )
The pod will allways stay in the Container creating state. It's not only with this pod, the problem is with any pod I try to add.
There are some pods running on the machine which is necessary for Kubernetes to operate:
up#up01:~$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default busybox 0/1 ContainerCreating 0 11m
default nginx 0/1 ContainerCreating 0 22m
kube-system dummy-2088944543-n1cd5 1/1 Running 0 5d
kube-system etcd-up01 1/1 Running 0 5d
kube-system kube-apiserver-up01 1/1 Running 0 5d
kube-system kube-controller-manager-up01 1/1 Running 0 5d
kube-system kube-discovery-1769846148-xfpls 1/1 Running 0 5d
kube-system kube-dns-2924299975-5rzz8 4/4 Running 0 5d
kube-system kube-proxy-17bpl 1/1 Running 2 3d
kube-system kube-proxy-3pk63 1/1 Running 0 3d
kube-system kube-proxy-h3wrj 1/1 Running 0 5d
kube-system kube-proxy-wzqv4 1/1 Running 0 3d
kube-system kube-proxy-z3xxx 1/1 Running 0 3d
kube-system kube-scheduler-up01 1/1 Running 0 5d
kube-system kubernetes-dashboard-3203831700-3xfbd 1/1 Running 0 5d
kube-system weave-net-6c0nr 2/2 Running 0 3d
kube-system weave-net-dchhf 2/2 Running 0 5d
kube-system weave-net-hshvg 2/2 Running 4 3d
kube-system weave-net-n684c 2/2 Running 1 3d
kube-system weave-net-r5319 2/2 Running 0 3d
You can do
kubectl describe pods <pod>
to get more info on what's happening.
Can you recreate the nginx pod again in namespace kube-system?
kubectl create --namespace kube-system -f nginx.yaml
this should fix your problem.
Second, do you have proxy in your environment, take a look as well.
make sure that your namespace and service account information is correct. if you've configured your services or deployments to use a namespace or service account, that namespace needs to exist.
If you configured it to use a non - default service account then that has to exist as well, and the service account should be created after the namespace.
you shouldn't be necessarily using the kube system namespace. namespaces exist so there can be more than one of them and control the flow of traffic inside of a cluster.
you should also set potentially set permissions for your namespace. read this here.
https://kubernetes.io/docs/reference/access-authn-authz/rbac/#service-account-permissions