pod failed to schedule. openstack over kubernates installation - kubernetes

I am new to kubernetes and trying to deploy openstack on kubernetes cluster, below is the error I see when I try to deploy openstack. I am following the openstack docs to deploy.
kube-system ingress-error-pages-56b4446784-crl85 0/1 Pending 0 1d
kube-system ingress-error-pages-56b4446784-m7jrw 0/1 Pending 0 5d
I have kubernetes cluster with one master and one node running on debain9. I encounted this error during openstack installation on kubernetes.
Kubectl describe pod shows the event as below:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m (x7684 over 1d) default-scheduler 0/2 nodes are available: 1 PodToleratesNodeTaints, 2 MatchNodeSelector.
All I see is a failed scheduling, Even the container logs for kube scheduler shows it failed to schedule a pod, but doesn't say why it failed? I am kind of struck at this step from past few hours trying to debug....
PS: I am running debian9, kube version: v1.9.2+coreos.0, Docker - 17.03.1-ce
Any help appreciated ....

Looks like you have a toleration on your Pod and don't have nodes with the taints for those tolerations. Would help to post the definition for your Ingress and its corresponding Deployment or DaemonSet.
You would generally taint your node(s) like this:
kubectl taint nodes <your-node> key=value:IngressNode
Then on your PodSpec something like this:
tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "IngressNode"
It could also be because of missing labels on your node that your Pod needs in the nodeSelector field:
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
cpuType: haswell
Then on you'd add a label to your node.
kubectl label nodes kubernetes-foo-node-1 cpuType=haswell
Hope it helps!

Related

Kubernetes add Toleration from CLI

I'm using the Oracle Cloud Infrastructure with Kubernetes and Docker. I've got the following pod:
$ kubectl describe pod $podname -n $namespace
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 19m default-scheduler 0/1 nodes are available: 1 node(s) had taint {nvidia.com/gpu: }, that the pod didn't tolerate.
Warning FailedScheduling 18m default-scheduler 0/1 nodes are available: 1 node(s) had taint {nvidia.com/gpu: }, that the pod didn't tolerate.
I want to add a toleration to this pod - is there a command to do so, without creating the pod config yaml file (as this pod is created by some other systems that I don't want to edit. I just want to add the toleration to resolve this issue.
Thanks.
====================
gpu-config.yaml
apiVersion: v1 # What version of the Kubernetes API to use
kind: Pod # What kind of object you want to create
metadata: # Data that helps uniquely identify the object, including a name, string, UID and optional namespace
name: nvidia-gpu-workload
spec: # What state you desire for the object, differs for every type of Kubernetes object.
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
image: k8s.gcr/io/cuda-vector-add:v0.1
resources:
limits:
nvidia.com/gpu: 1
tolerations:
- key: "nvidia.com/gpu"
operator: "Equal"
effect: "NoSchedule"
# Update command
$ kubectl create -f ./gpu-config.yaml
# All this seems to do is create a pod by the name of nvidia-gpu-workload-v2, and it doesn't add these configurations to the pod that I require.
Just to note that this issue is occurring on a pod called hook-image-awaiter-5tq5 and I don't think I should re-create that pod with a different config as it seems to be configured by part of the system.

Can a Pod tolerate one of a set of taints

Consider a cluster in which each node has a given taint (let's say NodeType) and a Pod can tolerate a set of NodeType. For example, there are nodes tainted NodeType=A, NodeType=B and NodeType=C.
I'd like to be able to specify for example that some Pods tolerate NodeType=A or NodeType=C, but not NodeType=B. Other Pods (in different Deployments) would tolerate different sets. Is there a way to do this?
Yes, it appears it is possible to do so by adding multiple tolerations with the same key on the pod's spec. An example of the same is given in the official docs.
Here is a demo I tried which works to produce the desired result.
The cluster has three nodes:
kubectl get nodes
NAME STATUS AGE VERSION
dummy-0 Ready 3m17s v1.17.14
dummy-1 Ready 26m v1.17.14
dummy-2 Ready 26m v1.17.14
I tainted them as mentioned in the question using the kubectl taint command:
kubectl taint node dummy-0 NodeType=A:NoSchedule
kubectl taint node dummy-1 NodeType=B:NoSchedule
kubectl taint node dummy-2 NodeType=C:NoSchedule
Created a Deployment with three replicas with the matching tolerations:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx
name: nginx-nfs
tolerations:
- key: "NodeType"
operator: "Equal"
value: "A"
effect: "NoSchedule"
- key: "NodeType"
operator: "Equal"
value: "B"
effect: "NoSchedule"
From the kubectl get pods command, we can see that the pods of the Deployment were scheduled only on the nodes dummy-0 and dummy-1 and not on dummy-2 which has a different taint:
kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
nginx-deployment-5fc8f985d8-2pfvm 1/1 Running 0 8s 100.96.2.11 dummy-0
nginx-deployment-5fc8f985d8-hkrcz 1/1 Running 0 8s 100.96.6.10 dummy-1
nginx-deployment-5fc8f985d8-xfxsx 1/1 Running 0 8s 100.96.6.11 dummy-1
Further, it is important to understand that the taints and tolerations are useful to make sure that the pods don't get scheduled to a particular node.
We should use the concepts of node affinities namely affinity and anti-affinity to make sure that the pods are scheduled to a particular node.

Kuberenetes Available schedulars

How would I display available schedulers in my cluster in order to use non default one using the schedulerName field?
Any link to a document describing how to "install" and use a custom scheduler is highly appreciated :)
Thx in advance
Schedulers can be found among your kube-system pods. You can then filter the output to your needs with kube-scheduler as the search key:
➜ ~ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-6955765f44-9wfkp 0/1 Completed 15 264d
coredns-6955765f44-jmz9j 1/1 Running 16 264d
etcd-acid-fuji 1/1 Running 17 264d
kube-apiserver-acid-fuji 1/1 Running 6 36d
kube-controller-manager-acid-fuji 1/1 Running 21 264d
kube-proxy-hs2qb 1/1 Running 0 177d
kube-scheduler-acid-fuji 1/1 Running 21 264d
You can retrieve the yaml file with:
➜ ~ kubectl get pods -n kube-system <scheduler pod name> -oyaml
If you bootstrapped your cluster with Kubeadm you may also find the yaml files in the /etc/kubernetes/manifests:
➜ manifests sudo cat /etc/kubernetes/manifests/kube-scheduler.yaml
---
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-scheduler
tier: control-plane
name: kube-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=true
image: k8s.gcr.io/kube-scheduler:v1.17.6
imagePullPolicy: IfNotPresent
---------
The location for minikube is similar but you do have to login in the minikube's virtual machine first with minikube ssh.
For more reading please have a look how to configure multiple schedulers and how to write custom schedulers.
You can try this one:
kubectl get pods --all-namespaces | grep scheduler

FailedScheduling: 0/3 nodes are available: 3 Insufficient pods

I'm trying to deploy my NodeJS application to EKS and run 3 pods with exactly the same container.
Here's the error message:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
cm-deployment-7c86bb474c-5txqq 0/1 Pending 0 18s
cm-deployment-7c86bb474c-cd7qs 0/1 ImagePullBackOff 0 18s
cm-deployment-7c86bb474c-qxglx 0/1 ImagePullBackOff 0 18s
public-api-server-79b7f46bf9-wgpk6 0/1 ImagePullBackOff 0 2m30s
$ kubectl describe pod cm-deployment-7c86bb474c-5txqq
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 23s (x4 over 2m55s) default-scheduler 0/3 nodes are available: 3 Insufficient pods.
So it says that 0/3 nodes are available However, if I run
kubectl get nodes --watch
$ kubectl get nodes --watch
NAME STATUS ROLES AGE VERSION
ip-192-168-163-73.ap-northeast-2.compute.internal Ready <none> 6d7h v1.14.6-eks-5047ed
ip-192-168-172-235.ap-northeast-2.compute.internal Ready <none> 6d7h v1.14.6-eks-5047ed
ip-192-168-184-236.ap-northeast-2.compute.internal Ready <none> 6d7h v1.14.6-eks-5047ed
3 pods are running.
here are my configurations:
aws-auth-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- rolearn: [MY custom role ARN]
username: system:node:{{EC2PrivateDNSName}}
groups:
- system:bootstrappers
- system:nodes
deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: cm-deployment
spec:
replicas: 3
selector:
matchLabels:
app: cm-literal
template:
metadata:
name: cm-literal-pod
labels:
app: cm-literal
spec:
containers:
- name: cm
image: docker.io/cjsjyh/public_test:1
imagePullPolicy: Always
ports:
- containerPort: 80
#imagePullSecrets:
# - name: regcred
env:
[my environment variables]
I applied both .yaml files
How can I solve this?
Thank you
My guess, without running the manifests you've got is that the image tag 1 on your image doesn't exist, so you're getting ImagePullBackOff which usually means that the container runtime can't find the image to pull .
Looking at the Docker Hub page there's no 1 tag there, just latest.
So, either removing the tag or replace 1 with latest may resolve your issue.
I experienced this issue with aws instance types with low resources

Trying to create a Kubernetes deployment but it shows 0 pods available

I'm new to k8s, so some of my terminology might be off. But basically, I'm trying to deploy a simple web api: one load balancer in front of n pods (where right now, n=1).
However, when I try to visit the load balancer's IP address it doesn't show my web application. When I run kubectl get deployments, I get this:
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
tl-api 1 1 1 0 4m
Here's my YAML file. Let me know if anything looks off--I'm very new to this!
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: tl-api
spec:
replicas: 1
template:
metadata:
labels:
app: tl-api
spec:
containers:
- name: tl-api
image: tlk8s.azurecr.io/devicecloudwebapi:v1
ports:
- containerPort: 80
imagePullSecrets:
- name: acr-auth
nodeSelector:
beta.kubernetes.io/os: windows
---
apiVersion: v1
kind: Service
metadata:
name: tl-api
spec:
type: LoadBalancer
ports:
- port: 80
selector:
app: tl-api
Edit 2: When I try using ACS (which supports Windows), I get this:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 11m default-scheduler Successfully assigned tl-api-3466491809-vd5kg to dc9ebacs9000
Normal SuccessfulMountVolume 11m kubelet, dc9ebacs9000 MountVolume.SetUp succeeded for volume "default-token-v3wz9"
Normal Pulling 4m (x6 over 10m) kubelet, dc9ebacs9000 pulling image "tlk8s.azurecr.io/devicecloudwebapi:v1"
Warning FailedSync 1s (x50 over 10m) kubelet, dc9ebacs9000 Error syncing pod
Normal BackOff 1s (x44 over 10m) kubelet, dc9ebacs9000 Back-off pulling image "tlk8s.azurecr.io/devicecloudwebapi:v1"
I then try examining the failed pod:
PS C:\users\<me>\source\repos\DeviceCloud\DeviceCloud\1- Presentation\DeviceCloud.Web.API> kubectl logs tl-api-3466491809-vd5kg
Error from server (BadRequest): container "tl-api" in pod "tl-api-3466491809-vd5kg" is waiting to start: trying and failing to pull image
When I run docker images I see the following:
REPOSITORY TAG IMAGE ID CREATED SIZE
devicecloudwebapi latest ee3d9c3e231d 24 hours ago 7.85GB
tlk8s.azurecr.io/devicecloudwebapi v1 ee3d9c3e231d 24 hours ago 7.85GB
devicecloudwebapi dev bb33ab221910 25 hours ago 7.76GB
Your problem is that the container image tlk8s.azurecr.io/devicecloudwebapi:v1 is in a private container registry. See the events at the bottom of the following command:
$ kubectl describe po -l=app=tl-api
The official Kubernetes docs describe how to resolve this issue, see Pull an Image from a Private Registry, essentially:
Create a secret kubectl create secret docker-registry
Use it in your deployment, under the spec.imagePullSecrets key