kubernetes cronjob in GKE stop scheduling the job after a few weeks - kubernetes

I have this yaml for cronjob, running in google kubernetes engine:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
creationTimestamp: 2019-04-22T18:20:51Z
name: cron-field-velocity-field-details-manager
namespace: master
resourceVersion: "73643714"
selfLink: /apis/batch/v1beta1/namespaces/master/cronjobs/cron-field-velocity-field-details-manager
uid: 5be9e8d5-652b-11e9-bf91-42010a9600af
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 1
jobTemplate:
metadata:
creationTimestamp: null
spec:
template:
metadata:
creationTimestamp: null
labels:
app: cron-field-velocity-field-details-manager
chart: field-velocity-field-details-manager-0.0.1
heritage: Tiller
release: master-field-velocity-field-details-manager
spec:
containers:
- args:
- ./field-velocity-field-details-manager.dll
command:
- dotnet
image: taranisag/field-velocity-field-details-manager:master.993b179
imagePullPolicy: IfNotPresent
name: cron-field-velocity-field-details-manager
resources:
requests:
cpu: "2"
memory: 2Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: regsecret
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
schedule: '* 2,14 * * *'
successfulJobsHistoryLimit: 3
suspend: false
status:
lastScheduleTime: 2019-06-20T02:00:00Z
It was working for a few weeks meaning the job was running twice a day, but it stop running a week ago.
There was no indication of an error and the last run was completed successfully
Is it something in the yaml I defined wrong ?

Related

Kubernetes failed job with no pods

I see a failed job that created no pods.And also there is no information in the events.Since there are no pods,I could not check the logs.
Here is the description of the job which failed.
kubectl describe job time-limited-rbac-1604010900 -n add-ons
Name: time-limited-rbac-1604010900
Namespace: add-ons
Selector: controller-uid=0816b9b3-814c-4802-83cf-5d5f3456701d
Labels: controller-uid=0816b9b3-814c-4802-83cf-5d5f3456701d
job-name=time-limited-rbac-1604010900
Annotations: <none>
Controlled By: CronJob/time-limited-rbac
Parallelism: 1
Completions: <unset>
Start Time: Thu, 29 Oct 2020 15:35:08 -0700
Active Deadline Seconds: 280s
Pods Statuses: 0 Running / 0 Succeeded / 1 Failed
Pod Template:
Labels: controller-uid=0816b9b3-814c-4802-83cf-5d5f3456701d
job-name=time-limited-rbac-1604010900
Service Account: time-limited-rbac
Containers:
time-limited-rbac:
Image: bitnami/kubectl:latest
Port: <none>
Host Port: <none>
Command:
/bin/bash
Args:
/var/tmp/time-limited-rbac.sh
Environment: <none>
Mounts:
/var/tmp/ from script (rw)
Volumes:
script:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: time-limited-rbac-script
Optional: false
Events: <none>
Here is the description of CronJob.
apiVersion: v1
items:
- apiVersion: batch/v1beta1
kind: CronJob
metadata:
annotations:
meta.helm.sh/release-name: time-limited-rbac
meta.helm.sh/release-namespace: add-ons
labels:
app.kubernetes.io/name: time-limited-rbac
name: time-limited-rbac
spec:
concurrencyPolicy: Replace
failedJobsHistoryLimit: 1
jobTemplate:
metadata:
creationTimestamp: null
spec:
activeDeadlineSeconds: 280
backoffLimit: 3
parallelism: 1
template:
metadata:
creationTimestamp: null
spec:
containers:
- args:
- /var/tmp/time-limited-rbac.sh
command:
- /bin/bash
image: bitnami/kubectl:latest
imagePullPolicy: Always
name: time-limited-rbac
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/tmp/
name: script
dnsPolicy: ClusterFirst
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
serviceAccount: time-limited-rbac
serviceAccountName: time-limited-rbac
terminationGracePeriodSeconds: 0
volumes:
- configMap:
defaultMode: 356
name: time-limited-rbac-script
name: script
schedule: '*/5 * * * *'
successfulJobsHistoryLimit: 3
suspend: false
Is there any way to tune thie cronjob to avoid such scenarios? We are receiving this issue atleast once or twice everyday.

Deployment Yaml file

I'm learning SQL Server BDC on minkube using this article as a guide. I tried deploying the below yaml file by running the code : kubectl apply -f deployment.yaml
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: mssql-deployment
spec:
replicas: 1
template:
metadata:
labels:
app: mssql
spec:
terminationGracePeriodSeconds: 10
containers:
- name: mssql
image: microsoft/mssql-server-linux
ports:
- containerPort: 1433
securityContext:
privileged: true
env:
- name: ACCEPT_EULA
value: "Y"
- name: SA_PASSWORD
valueFrom:
secretKeyRef:
name: mssql
key: SA_PASSWORD
volumeMounts:
- name: mssqldb
mountPath: /var/opt/mssql
volumes:
- name: mssqldb
persistentVolumeClaim:
claimName: pvc0001
It errored due to the v1beta1 APIVersion. I converted this yaml file by running : kubectl convert -f deployment.yaml and got the below script:
apiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
name: mssql-deployment
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector: null
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: mssql
spec:
containers:
- env:
- name: ACCEPT_EULA
value: "Y"
- name: SA_PASSWORD
valueFrom:
secretKeyRef:
key: SA_PASSWORD
name: mssql
image: microsoft/mssql-server-linux
imagePullPolicy: Always
name: mssql
ports:
- containerPort: 1433
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 10
status: {}
But when I deploy the above script I get:
Error validating "deployment.yaml": error validating data: ValidationError(Deployment.spec): missing required field "selector" in io.k8s.api.apps.v1.DeploymentSpec; if you choose to ignore these errors, turn validation off with --validate=false
It is related to matchlabels/matchexpressions but I'm not able to address it. Can someone point me in the right direction?
You need to add a selector in the spec section of the deployment. It's a mandatory field.The .spec.selector field defines how the Deployment finds which Pods to manage. In this case, you simply select a label that is defined in the Pod template (app: mssql). However, more sophisticated selection rules are possible, as long as the Pod template itself satisfies the rule.
apiVersion: apps/v1
kindapiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
name: mssql-deployment
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: mssql
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: mssql
spec:
containers:
- env:
- name: ACCEPT_EULA
value: "Y"
- name: SA_PASSWORD
valueFrom:
secretKeyRef:
key: SA_PASSWORD
name: mssql
image: microsoft/mssql-server-linux
imagePullPolicy: Always
name: mssql
ports:
- containerPort: 1433
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 10
status: {}
missing required field "selector" in io.k8s.api.apps.v1.DeploymentSpec
You need a selector to select which pods are configured to deployment spec.
solution:
selector:
matchLabels:
app: mssql
template:
metadata:
labels:
app: mssql

/var/mqm folder is getting overwritten with PV in kubernetes deployment of ACE-MQ with ibmcom/ace-mq image

Trying to deploy MQ with AppConnect in AWS Cloud with EFS as persistence storage for a HA solution. While trying to deploy with keeping /var/mqm (tried with mnt/mqm also) in the mount path, it is getting overwritten with the blank Persistence storage data. So when pod is up, I can't see the queue manager or other files inside /var/mqm.
This is the deployment.yml used
apiVersion: apps/v1beta1
kind: Deployment
metadata:
generation: 1
labels:
run: acemq01
name: acemq01
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 2
selector:
matchLabels:
run: acemq01
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: ~
labels:
run: acemq01
spec:
containers:
-
env:
-
name: LICENSE
value: accept
-
name: DOMAIN
value: cluster
-
name: MQ_QMGR_NAME
value: MQAWS
image: ibmcom/ace-mq
imagePullPolicy: IfNotPresent
name: acemq01
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
-
mountPath: /var/mqm
name: pv-volumeef
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
-
name: pv-volumeef
persistentVolumeClaim:
claimName: efs
status: {}

How to ensure kubernetes cronjob does not restart on failure

I have a cronjob that sends out emails to customers. It occasionally fails for various reasons. I do not want it to restart, but it still does.
I am running Kubernetes on GKE. To get it to stop, I have to delete the CronJob and then kill all the pods it creates manually.
This is bad, for obvious reasons.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
creationTimestamp: 2018-06-21T14:48:46Z
name: dailytasks
namespace: default
resourceVersion: "20390223"
selfLink: [redacted]
uid: [redacted]
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 1
jobTemplate:
metadata:
creationTimestamp: null
spec:
template:
metadata:
creationTimestamp: null
spec:
containers:
- command:
- kubernetes/daily_tasks.sh
env:
- name: DB_HOST
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
envFrom:
- secretRef:
name: my-secrets
image: [redacted]
imagePullPolicy: IfNotPresent
name: dailytasks
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
schedule: 0 14 * * *
successfulJobsHistoryLimit: 3
suspend: true
status:
active:
- apiVersion: batch
kind: Job
name: dailytasks-1533218400
namespace: default
resourceVersion: "20383182"
uid: [redacted]
lastScheduleTime: 2018-08-02T14:00:00Z
It turns out that you have to set a backoffLimit: 0 in combination with restartPolicy: Never in combination with concurrencyPolicy: Forbid.
backoffLimit means the number of times it will retry before it is considered failed. The default is 6.
concurrencyPolicy set to Forbid means it will run 0 or 1 times, but not more.
restartPolicy set to Never means it won't restart on failure.
You need to do all 3 of these things, or your cronjob may run more than once.
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 1
jobTemplate:
metadata:
creationTimestamp: null
spec:
[ADD THIS -->]backoffLimit: 0
template:
... MORE STUFF ...
The kubernetes cronjob resources has a field, suspend in its spec.
You can't do it by default, but if you want to ensure it doesn't run, you could update the script that sends emails and have it patch the cronjob resource to add suspend: true if it fails
Something like this
kubectl patch cronjob <name> -p '{"spec": { "suspend": true }}'

Kubernetes: Error kubectl edit deployment

I'm trying to edit deployment in kubernetes by:
kubectl -n <namespace> edit deployment <depolyment_name>.
after entering the command, vi windows for editing appears, then I make some changes for example in the command section or in volumeMounts section.
but I get the following error:
A copy of your changes has been stored to "/tmp/kubectl-edit-hv5dh.yaml"
error: map: map[] does not contain declared merge key: name
someone can help with it?
attached the edit deployment file of apiserver:
kubectl -n federation-system edit deployment apiserver
(codes between ** ** are the lines i added)
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
federation.alpha.kubernetes.io/federation-name: fed
creationTimestamp: 2018-04-01T13:26:40Z
generation: 1
labels:
app: federated-cluster
name: apiserver
namespace: federation-system
resourceVersion: "393140"
selfLink: /apis/extensions/v1beta1/namespaces/federation-system/deployments/apiserver
uid: <uid>
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: federated-cluster
module: federation-apiserver
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
annotations:
federation.alpha.kubernetes.io/federation-name: fed
creationTimestamp: null
labels:
app: federated-cluster
module: federation-apiserver
name: apiserver
spec:
containers:
- command:
- /fcp
- federation-apiserver
- --admission-control=NamespaceLifecycle
- --advertise-address=<master-ip>
- --bind-address=0.0.0.0
- --client-ca-file=/etc/federation/apiserver/ca.crt
- --etcd-servers=http://localhost:2379
- --secure-port=8443
- --tls-cert-file=/etc/federation/apiserver/server.crt
- --tls-private-key-file=/etc/federation/apiserver/server.key
**- --enable-admission-plugins=SchedulingPolicy
- --admission-control-config-file=/etc/kubernetes/admission/config.yml**
image: gcr.io/k8s-jkns-e2e-gce-federation/fcp-amd64:v1.9.0-alpha.3
imagePullPolicy: IfNotPresent
name: apiserver
ports:
- containerPort: 8443
name: https
protocol: TCP
- containerPort: 8080
name: local
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/federation/apiserver
name: apiserver-credentials
readOnly: true
**volumeMounts:
- mountPath: /etc/kubernetes/admission
name: admission-config**
- command:
- /usr/local/bin/etcd
- --data-dir
- /var/etcd/data
image: gcr.io/google_containers/etcd:3.1.10
imagePullPolicy: IfNotPresent
name: etcd
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
imagePullSecrets:
- {}
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- name: apiserver-credentials
secret:
defaultMode: 420
secretName: apiserver-credentials
**- name: admission-config
configMap:
name: admission**
status:
availableReplicas: 1
conditions:
- lastTransitionTime: 2018-04-01T13:26:40Z
lastUpdateTime: 2018-04-01T13:26:40Z
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: 2018-04-01T13:26:40Z
lastUpdateTime: 2018-04-01T13:27:20Z
message: ReplicaSet "apiserver-8484fd45f8" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
it's happened after I created configMap file:
kubectl create -f scheduling-policy-admission.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: admission
namespace: federation-system
data:
config.yml: |
apiVersion: apiserver.k8s.io/v1alpha1
kind: AdmissionConfiguration
plugins:
- name: SchedulingPolicy
path: /etc/kubernetes/admission/scheduling-policy-config.yml
scheduling-policy-config.yml: |
kubeconfig: /etc/kubernetes/admission/opa-kubeconfig
opa-kubeconfig: |
clusters:
- name: opa-api
cluster:
server: http://opa.federation-system.svc.cluster.local:8181/v0/data/kubernetes/placement
users:
- name: scheduling-policy
user:
token: deadbeefsecret
contexts:
- name: default
context:
cluster: opa-api
user: scheduling-policy
current-context: default
I'm trying to configure Admission Controller in the Federation API.
Thanks,
dnsPolicy: ClusterFirst
# DELETE imagePullSecrets:
# DELETE - {}
restartPolicy: Always
I would strongly recommend removing that imagePullSecrets block. Since those objects have a mergeKey of name, but that object has no name, it would very easily cause the error you are experiencing. If the YAML was given to your editor in that condition, then I am almost certain that is a kubernetes bug: it should always(?) allow round-tripping YAML via kubectl edit, if for no other reason than this situation right here.