nodeSelector doesn't match the target node - kubernetes

I want to deploy a simple nginx on my master node.
Basically, if i use the tolerations combined by nodeName everything is good:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: myapp
name: myapp-deployment
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- image: nginx
name: myapp-container
tolerations:
- effect: NoExecute
operator: Exists
nodeName: master
The results:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myapp-deployment-56d5887b9-fw5mj 1/1 Running 0 50s 100.32.0.4 master <none> <none>
But the problem is when i add a type=master label to my node and instead of nodeName, useing nodeselector, the deployment stays in Pending state!
Here are my steps:
Add label to my node: k label node master type=master
Check the node label:
$ k get no --show-labels
NAME STATUS ROLES AGE VERSION LABELS
master Ready control-plane 65d v1.24.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=,type=master
Apply my new yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: myapp
name: myapp-deployment
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- image: nginx
name: myapp-container
tolerations:
- effect: NoExecute
operator: Exists
nodeSelector:
type: master
Check the state:
$ k get po
NAME READY STATUS RESTARTS AGE
myapp-deployment-544784ff98-2qf7z 0/1 Pending 0 3s
Describe it:
Name: myapp-deployment-544784ff98-2qf7z
Namespace: default
Priority: 0
Node: <none>
Labels: app=myapp
pod-template-hash=544784ff98
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/myapp-deployment-544784ff98
Containers:
myapp-container:
Image: nginx
Port: <none>
Host Port: <none>
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lbtsv (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-lbtsv:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: type=master
Tolerations: :NoExecute op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 111s default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Where am i wrong? What is my problem?
P.S: kubernetes version:
Client Version: v1.24.1
Kustomize Version: v4.5.4
Server Version: v1.24.1

Check your master node it might be having the taint set to NoSchedule
kubectl describe node <Node name> | grep Taint
If you want to run POD on master node use this config
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
nodeSelector:
node-role.kubernetes.io/master: ""
Read more about the Concept taint and toleration: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/

Well, thanks to #Harsh, i finally finded the answer:
First i get the Taint on my master node:
$ kubectl describe node master | grep Taint
Taints: node-role.kubernetes.io/control-plane:NoSchedule
As you can see, the value of Taint here is NoSchedule, NOT NoExecute that i used before!
So, the configuration would be like this:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: myapp
name: myapp-deployment
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- image: nginx
name: myapp-container
tolerations:
- effect: "NoSchedule" # just change this
operator: "Exists"
nodeSelector:
type: master
And now you can see everything is good!
NAME READY STATUS RESTARTS AGE
myapp-deployment-79676c54d4-grm94 1/1 Running 0 7s

Related

PodTopologyConstraints - DoNotSchedule not working as expected

I tried to use PodTopologySpreadConstraint to organize our deployment engine, and to spread replicas across zones.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
app: nginx
replicas: 4
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: nginx
tolerations:
- effect: NoSchedule
key: platform
operator: Equal
value: "true"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nodes
operator: In
values:
- platform
I expected that, if I have just one node labeled "platform", the scheduler needs to run just one replica of Nginx, and others need to be in a "pending" state because "DoNotSchedule" constraint is not satisfied.
But the deployment running the 4 replicas in the same node/zone, even that "DoNotSchedule" defined.
➜ kubectl get pods -o wide | grep nginx-deployment
nginx-deployment-6cbbb547c-6qvrg 1/1 Running 0 15m 192.168.33.28 ip-192-168-42-2.sa-east-1.compute.internal <none> <none>
nginx-deployment-6cbbb547c-cgbcf 1/1 Running 0 15m 192.168.37.251 ip-192-168-42-2.sa-east-1.compute.internal <none> <none>
nginx-deployment-6cbbb547c-dlqvd 1/1 Running 0 15m 192.168.47.173 ip-192-168-42-2.sa-east-1.compute.internal <none> <none>
nginx-deployment-6cbbb547c-hkgzc 1/1 Running 0 15m 192.168.39.27 ip-192-168-42-2.sa-east-1.compute.internal <none> <none>
and i just have one node labeled "platform"
➜ kubectl get nodes -l app=platform
NAME STATUS ROLES AGE VERSION
ip-192-168-42-2.sa-east-1.compute.internal Ready <none> 4h45m v1.22.15-eks-fb459a0
Does anyone have any idea what I made wrong?

GKE Pod not scheduled in different namespace

I want to deploy a monstache deployment in my already existing namespace "test-namespace". When I deploy it in "default" namespace it works but when I deploy it in "test-namespace" the pod does not schedule.
kubectl get pods -n test-namespace monstache-74466dc7-5tnrr -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
monstache-74466dc7-5tnrr 0/1 Pending 0 57m <none> <none> <none> <none>
and:
kubectl describe pods -n test-namespace monstache-74466dc7-5tnrr
Name: monstache-74466dc7-5tnrr
Namespace: test-namespace
Priority: 0
Node: <none>
Labels: app=monstache
pod-template-hash=74466dc7
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/monstache-74466dc7
Containers:
monstache:
Image: rwynn/monstache:latest
Port: <none>
Host Port: <none>
Command:
/bin/monstache
-f
/app/monstache.test.config.toml
Environment:
MONSTACHE_DIRECT_READ_NS: xxx.XXX
MONSTACHE_CHANGE_STREAM_NS: xxx.XXX
MONSTACHE_MONGO_URL: mongodb://xxx?replicaSet=rs0
MONSTACHE_ES_USER: elastic
MONSTACHE_ES_PASS: XXX
Mounts:
/app from monstache-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qmcwm (ro)
Volumes:
monstache-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: monstache-config
Optional: false
default-token-qmcwm:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-qmcwm
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
and:
kubectl get events -n test-namespace
LAST SEEN TYPE REASON OBJECT MESSAGE
55m Normal SuccessfulCreate replicaset/monstache-74466dc7 Created pod: monstache-74466dc7-snrdb
55m Normal SuccessfulCreate replicaset/monstache-74466dc7 Created pod: monstache-74466dc7-5tnrr
55m Normal ScalingReplicaSet deployment/monstache Scaled up replica set monstache-74466dc7 to 1
55m Normal ScalingReplicaSet deployment/monstache Scaled up replica set monstache-74466dc7 to 1
and:
This is my monstache deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: monstache
namespace: test-namespace
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: monstache
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: monstache
spec:
containers:
- command:
- /bin/monstache
- -f
- /app/monstache.test.config.toml
env:
- name: MONSTACHE_DIRECT_READ_NS
value: xxx.xxx
- name: MONSTACHE_CHANGE_STREAM_NS
value: xxx.xxx
- name: MONSTACHE_MONGO_URL
value: mongodb://mongodb-service:27017/xxx?replicaSet=rs0
- name: MONSTACHE_ES_USER
value: elastic
- name: MONSTACHE_ES_PASS
value: XXXX
image: rwynn/monstache:latest
imagePullPolicy: Always
name: monstache
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /app
name: monstache-config
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: monstache-config
name: monstache-config
---
apiVersion: v1
data:
monstache.test.config.toml: |
resume = true
gzip = true
elasticsearch-urls = ["https://elasticsearch:9200"]
elasticsearch-max-conns = 10
elasticsearch-max-seconds = 1
elasticsearch-max-docs = 1
#namespace-regex = '*'
verbose = false
enable-http-server = true
elasticsearch-validate-pem-file = false
[[mapping]]
namespace = "XXX.XXX"
index = "XXX"
kind: ConfigMap
metadata:
name: monstache-config
namespace: test-namespace
Few more Things to know:
I already have pods scheduled in that namespace
I tried to delete the deployment an re-create
I even created a new nodepool and tried to schedule the deployment there - also didn't work.
I searched for a pod count limit and pod quota, and it does not conflict.
I have 12 namespaces in that GKE cluster
I have total 113 pods in that GKE cluster
I have some successfully scheduled monstache deployments in other namespaces in that cluster.
It happens in the 2 most recent namespaces I've created.
Any clues?
It was a bug. Re-deploying it with GKE version 1.17.14-gke.1200 solved the problem.

Some Kubernetes pods consistently not able to resolve internal DNS on only one node

I have just moved my first cluster from minikube up to AWS EKS. All went pretty smoothly so far, except I'm running into some DNS issues I think, but only on one of the cluster nodes.
I have two nodes in the cluster running v1.14, and 4 pods of one type, and 4 of another, 3 of each work, but 1 of each - both on the same node - start then error (CrashLoopBackOff) with the script inside the container erroring because it can't resolve the hostname for the database. Deleting the errored pod, or even all pods, results in one pod on the same node failing every time.
The database is in its own pod and has a service assigned, none of the other pods of the same type have problems resolving the name or connecting. The database pod is on the same node as the pods that can't resolve the hostname. I'm not sure how to migrate the pod to a different node, but that might be worth trying to see if the problem follows.
No errors in the coredns pods. I'm not sure where to start looking to discover the issue from here, and any help or suggestions would be appreciated.
Providing the configs below. As mentioned, they all work on Minikube, and also they work on one node.
kubectl get pods - note age, all pod1's were deleted at the same time and they recreated themselves, 3 worked fine, 4th does not.
NAME READY STATUS RESTARTS AGE
pod1-85f7968f7-2cjwt 1/1 Running 0 34h
pod1-85f7968f7-cbqn6 1/1 Running 0 34h
pod1-85f7968f7-k9xv2 0/1 CrashLoopBackOff 399 34h
pod1-85f7968f7-qwcrz 1/1 Running 0 34h
postgresql-865db94687-cpptb 1/1 Running 0 3d14h
rabbitmq-667cfc4cc-t92pl 1/1 Running 0 34h
pod2-94b9bc6b6-6bzf7 1/1 Running 0 34h
pod2-94b9bc6b6-6nvkr 1/1 Running 0 34h
pod2-94b9bc6b6-jcjtb 0/1 CrashLoopBackOff 140 11h
pod2-94b9bc6b6-t4gfq 1/1 Running 0 34h
postgresql service
apiVersion: v1
kind: Service
metadata:
name: postgresql
spec:
ports:
- port: 5432
selector:
app: postgresql
pod1 deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: pod1
spec:
replicas: 4
selector:
matchLabels:
app: pod1
template:
metadata:
labels:
app: pod1
spec:
containers:
- name: pod1
image: us.gcr.io/gcp-project-8888888/pod1:latest
env:
- name: rabbitmquser
valueFrom:
secretKeyRef:
name: rabbitmq-secrets
key: rmquser
volumeMounts:
- mountPath: /data/files
name: datafiles
volumes:
- name: datafiles
persistentVolumeClaim:
claimName: datafiles-pv-claim
imagePullSecrets:
- name: container-readonly
pod2 depoloyment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: pod2
spec:
replicas: 4
selector:
matchLabels:
app: pod2
template:
metadata:
labels:
app: pod2
spec:
containers:
- name: pod2
image: us.gcr.io/gcp-project-8888888/pod2:latest
env:
- name: rabbitmquser
valueFrom:
secretKeyRef:
name: rabbitmq-secrets
key: rmquser
volumeMounts:
- mountPath: /data/files
name: datafiles
volumes:
- name: datafiles
persistentVolumeClaim:
claimName: datafiles-pv-claim
imagePullSecrets:
- name: container-readonly
CoreDNS config map to forward DNS to external service if it doesn't resolve internally. This is the only place I can think that would be causing the issue - but as said it works for one node.
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
proxy . 8.8.8.8
cache 30
loop
reload
loadbalance
}
Errored Pod output. Same for both pods, as it occurs in library code common to both. As mentioned, this does not occur for all pods so the issue likely doesn't lie with the code.
Error connecting to database (psycopg2.OperationalError) could not translate host name "postgresql" to address: Try again
Errored Pod1 description:
Name: xyz-94b9bc6b6-jcjtb
Namespace: default
Priority: 0
Node: ip-192-168-87-230.us-east-2.compute.internal/192.168.87.230
Start Time: Tue, 15 Oct 2019 19:43:11 +1030
Labels: app=pod1
pod-template-hash=94b9bc6b6
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 192.168.70.63
Controlled By: ReplicaSet/xyz-94b9bc6b6
Containers:
pod1:
Container ID: docker://f7dc735111bd94b7c7b698e69ad302ca19ece6c72b654057627626620b67d6de
Image: us.gcr.io/xyz/xyz:latest
Image ID: docker-pullable://us.gcr.io/xyz/xyz#sha256:20110cf126b35773ef3a8656512c023b1e8fe5c81dd88f19a64c5bfbde89f07e
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 16 Oct 2019 07:21:40 +1030
Finished: Wed, 16 Oct 2019 07:21:46 +1030
Ready: False
Restart Count: 139
Environment:
xyz: <set to the key 'xyz' in secret 'xyz-secrets'> Optional: false
Mounts:
/data/xyz from xyz (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-m72kz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
xyz:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: xyz-pv-claim
ReadOnly: false
default-token-m72kz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-m72kz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 2m22s (x3143 over 11h) kubelet, ip-192-168-87-230.us-east-2.compute.internal Back-off restarting failed container
Errored Pod 2 description:
Name: xyz-85f7968f7-k9xv2
Namespace: default
Priority: 0
Node: ip-192-168-87-230.us-east-2.compute.internal/192.168.87.230
Start Time: Mon, 14 Oct 2019 21:19:42 +1030
Labels: app=pod2
pod-template-hash=85f7968f7
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 192.168.84.69
Controlled By: ReplicaSet/pod2-85f7968f7
Containers:
pod2:
Container ID: docker://f7c7379f92f57ea7d381ae189b964527e02218dc64337177d6d7cd6b70990143
Image: us.gcr.io/xyz-217300/xyz:latest
Image ID: docker-pullable://us.gcr.io/xyz-217300/xyz#sha256:b9cecdbc90c5c5f7ff6170ee1eccac83163ac670d9df5febd573c2d84a4d628d
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 16 Oct 2019 07:23:35 +1030
Finished: Wed, 16 Oct 2019 07:23:41 +1030
Ready: False
Restart Count: 398
Environment:
xyz: <set to the key 'xyz' in secret 'xyz-secrets'> Optional: false
Mounts:
/data/xyz from xyz (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-m72kz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
xyz:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: xyz-pv-claim
ReadOnly: false
default-token-m72kz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-m72kz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 3m28s (x9208 over 34h) kubelet, ip-192-168-87-230.us-east-2.compute.internal Back-off restarting failed container
At the suggestion of a k8s community member, I applied the following change to my coredns configuration to be more in line with the best practice:
Line: proxy . 8.8.8.8 changed to forward . /etc/resolv.conf 8.8.8.8
I then deleted the pods, and after they were recreated by k8s, the issue did not appear again.
EDIT:
Turns out, that was not the issue at all as shortly afterwards the issue re-occurred and persisted. In the end, it was this: https://github.com/aws/amazon-vpc-cni-k8s/issues/641
Rolled back to 1.5.3 as recommended by Amazon, restarted the cluster, and the issue was resolved.

Unable to scale application - containercreating

I am using this deployment template (is that what its called?). Two pods are running but two are stuck at containercreating. If I scale to 2 replicas, 1 is running and 1 is stuck at containercreating. How to have all 4 pods running?
This declaration creates a PV in AWS and attaches the pvc. Data is persistent. But unable to get past the containercreating issue.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
activemq-deployment-58cc677d85-497xt 1/1 Running 0 2m
activemq-deployment-58cc677d85-b4tpx 0/1 ContainerCreating 0 1m
activemq-deployment-58cc677d85-hprpv 1/1 Running 0 1m
activemq-deployment-58cc677d85-vdtcs 0/1 ContainerCreating 0 1m
Describe gives this:
$ kubectl describe deployments activemq-deployment
Name: activemq-deployment
Namespace: default
CreationTimestamp: Wed, 27 Feb 2019 21:49:11 -0800
Labels: app=activemq
Annotations: deployment.kubernetes.io/revision: 1
Selector: app=activemq
Replicas: 4 desired | 4 updated | 4 total | 2 available | 2 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=activemq
Containers:
activemq:
Image: activemq:1.0
Port: 8161/TCP
Host Port: 0/TCP
Environment: <none>
Mounts:
/opt/apache-activemq-5.15.6/data from activemq-data (rw)
Volumes:
activemq-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: amq-pv-claim
ReadOnly: false
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available False MinimumReplicasUnavailable
OldReplicaSets: <none>
NewReplicaSet: activemq-deployment-58cc677d85 (4/4 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 3m8s deployment-controller Scaled up replica set activemq-deployment-58cc677d85 to 1
Normal ScalingReplicaSet 103s deployment-controller Scaled up replica set activemq-deployment-58cc677d85 to 4
Declaration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: activemq-deployment
labels:
app: activemq
spec:
replicas: 1
selector:
matchLabels:
app: activemq
template:
metadata:
labels:
app: activemq
spec:
securityContext:
fsGroup: 2000
containers:
- name: activemq
image: activemq:1.0
ports:
- containerPort: 8161
volumeMounts:
- name: activemq-data
mountPath: /opt/apache-activemq-5.15.6/data
readOnly: false
volumes:
- name: activemq-data
persistentVolumeClaim:
claimName: amq-pv-claim
---
apiVersion: v1
kind: Service
metadata:
name: amq-nodeport-service
spec:
selector:
app: activemq
ports:
- port: 8161
targetPort: 8161
type: NodePort
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: amq-pv-claim
spec:
#storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
Use StatefulSets for persisting the state of the container. Deployment is not recommended for running stateful containers

Kubernetes deployment not auto-terminating after successful run of command

I have a Kubernetes cluster in which I've created a deployment to run a pod. Unfortunately after running it the pod does not want to self-terminate, instead it enters a continuous state of restart/CrashLoopBackOff cycle.
The command (on the entry point) runs correctly when first deployed, and I want it to run only one time.
I am programatically deploying the docker image with the entrypoint configured, using the Python K8s API. Here is my deployment YAML:
apiVersion: apps/v1
kind: Deployment
metadata:
name: kio
namespace: kmlflow
labels:
app: kio
name: kio
spec:
replicas: 1
selector:
matchLabels:
app: kio
name: kio
template:
metadata:
labels:
app: kio
name: kio
spec:
containers:
- name: kio-ingester
image: MY_IMAGE
command: ["/opt/bin/kio"]
args: ["some", "args"]
imagePullPolicy: Always
restart: Never
backofflimit: 0
Thanks for any help
Output from kubectl pod is:
Name: ingest-160-779874b676-8pgv5
Namespace: kmlflow
Priority: 0
PriorityClassName: <none>
Node: 02-w540-02.glebe.kinetica.com/172.30.255.205
Start Time: Thu, 11 Oct 2018 13:31:20 -0400
Labels: app=kio
name=kio
pod-template-hash=3354306232
Annotations: <none>
Status: Running
IP: 10.244.0.228
Controlled By: ReplicaSet/ingest-160-779874b676
Containers:
kio-ingester:
Container ID: docker://b67a682d04e69c2dc5c1be7e02bf2e4cf7a12a7557dfbe642dfb531ca4b03f07
Image: kinetica/kinetica-intel
Image ID: docker-pullable://docker.io/kinetica/kinetica-intel#sha256:eefbb6595eb71822300ef97d5cbcdac7ec58f2041f8190d3a2ba9cffd6a0d87c
Port: <none>
Host Port: <none>
Command:
/opt/gpudb/bin/kio
Args:
--source
kinetica://172.30.50.161:9191::dataset_iris
--destination
kinetica://172.30.50.161:9191::iris5000
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 11 Oct 2018 13:33:27 -0400
Finished: Thu, 11 Oct 2018 13:33:32 -0400
Ready: False
Restart Count: 4
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-69wkn (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-69wkn:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-69wkn
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m39s default-scheduler Successfully assigned kmlflow/ingest-160-779874b676-8pgv5 to 02-w540-02.glebe.kinetica.com
Normal Created 89s (x4 over 2m28s) kubelet, 02-w540-02.glebe.kinetica.com Created container
Normal Started 89s (x4 over 2m28s) kubelet, 02-w540-02.glebe.kinetica.com Started container
Warning BackOff 44s (x7 over 2m15s) kubelet, 02-w540-02.glebe.kinetica.com Back-off restarting failed container
Normal Pulling 33s (x5 over 2m28s) kubelet, 02-w540-02.glebe.kinetica.com pulling image "kinetica/kinetica-intel"
Normal Pulled 33s (x5 over 2m28s) kubelet, 02-w540-02.glebe.kinetica.com Successfully pulled image "kinetica/kinetica-intel"
There is no output from Kubectl logs <crashing-pod> because a successful run of the command KIO with the injected parameters does not print anything to standard output.
If you'd like to run your task one time and finish after a successful completion you should consider using Kubernetes Jobs or CronJobs
Something like this:
apiVersion: batch/v1
kind: Job
metadata:
name: kio
namespace: kmlflow
labels:
app: kio
name: kio
spec:
template:
metadata:
labels:
app: kio
name: kio
spec:
containers:
- name: kio-ingester
image: MY_IMAGE
command: ["/opt/bin/kio"]
args: ["some", "args"]
imagePullPolicy: Always
restart: Never
backoffLimit: 4
To delete the jobs automatically if you have Kubernetes 1.12 or later you can use ttlSecondsAfterFinished. Unfortunately, you if you are using Kuberbetes 1.11 or earlier you will have to delete them manually or you can set up a CronJob to do it.