Creating Kubernetes Pod per Kubernetes Job and Cleanup - kubernetes

I'm trying to create Kubernetes job with the following requirements:
Only one pod can be created for each job at most
If the pod failed - the job will fail
Max run time of the pod will be 1 hour
If the job finished successfully - delete the job
I tried the following configurations:
apiVersion: batch/v1
kind: Job
metadata:
name: {{ .Release.Name }}
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-delete-policy": hook-succeeded
spec:
template:
spec:
containers:
- name: {{ .Release.Name }}
image: {{ .Values.image }}
env:
- name: ARG1
value: {{ required "ARG1 is mandatory" .Values.ENV.ARG1 }}
- name: GITLAB_USER_EMAIL
value: {{ .Values.ENV.GITLAB_USER_EMAIL }}
envFrom:
- secretRef:
name: {{ .Release.Name }}
restartPolicy: Never
backoffLimit: 1
activeDeadlineSeconds: 3600
But it's not working as expected, any ideas?
Thanks !

Only one pod can be created for each job at most
The requested parallelism (.spec.parallelism) can be set to any non-negative value. If it is unspecified, it defaults to 1. If it is specified as 0, then the Job is effectively paused until it is increased.
For Cronjobs could be helpful successfulJobsHistoryLimit: 0, failedJobsHistoryLimit: 0 this will
remove the PODs if it's get failed or success so no history or
POD will stays. So only one pod will get created or run.
If the pod failed - the job will fail
That will be the default behavior, also restartPolicy: Never so it won't get restarted.
Max run time of the pod will be 1 hour
activeDeadlineSeconds: 3600 you have already added
If the job finished successfully - delete the job
ttlSecondsAfterFinished: 100 will solve your issue.
apiVersion: batch/v1
kind: Job
metadata:
name: {{ .Release.Name }}
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-delete-policy": hook-succeeded
spec:
template:
spec:
containers:
- name: {{ .Release.Name }}
image: {{ .Values.image }}
env:
- name: ARG1
value: {{ required "ARG1 is mandatory" .Values.ENV.ARG1 }}
- name: GITLAB_USER_EMAIL
value: {{ .Values.ENV.GITLAB_USER_EMAIL }}
envFrom:
- secretRef:
name: {{ .Release.Name }}
restartPolicy: Never
backoffLimit: 1
ttlSecondsAfterFinished: 100
activeDeadlineSeconds: 3600

Related

HorizontalPodAutoscaler scales up pods but then terminates them instantly

So I have a HorizontalPodAutoscaler set up for my backend (an fpm-server and an Nginx server for a Laravel application).
The problem is that when the HPA is under load, it scales up the pods but it terminates them instantly, not even letting them get into the Running state.
The metrics are good, the scale-up behavior is as expected the only problem is that the pods get terminated right after scaling.
What could be the problem?
Edit: The same HPA is used on the frontend and it's working as expected, the problem seems to be only on the backend.
Edit 2: I have Cluster Autoscaler enabled, it does it's job, nodes get added when they are needed and then cleaned up so it's not an issue about available resources.
deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: fpm-server
labels:
tier: backend
layer: fpm
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
tier: backend
layer: fpm
template:
metadata:
labels:
tier: backend
layer: fpm
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
containers:
- name: fpm
image: "{{ .Values.fpm.image.repository }}:{{ .Values.fpm.image.tag }}"
ports:
- name: http
containerPort: 9000
protocol: TCP
env:
{{- range $name, $value := .Values.env }}
- name: {{ $name }}
value: "{{ $value }}"
{{- end }}
envFrom:
- secretRef:
name: backend-secrets
resources:
{{- toYaml .Values.resources | nindent 12 }}
hpa.yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: fpm-server-hpa
labels:
tier: backend
layer: fpm
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: fpm-server
minReplicas: {{ .Values.autoscaling.minReplicas }}
maxReplicas: {{ .Values.autoscaling.maxReplicas }}
metrics:
{{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
- type: Resource
resource:
name: cpu
targetAverageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
{{- end }}
{{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
- type: Resource
resource:
name: memory
targetAverageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}
{{- end }}
Seems that the problem was with the replicas: {{ .Values.replicaCount }} definition. It seems that, if you are using HPA, replicas can't be used. I removed this line and the HPA started scaling.

Fetching docker image and tag as key/value pairs from values.yaml in helm k8s

I have a list of docker images which I want to pass as an environment variable to deployment.yaml
values.yaml
contributions_list:
- image: flogo-aws
tag: 36
- image: flogo-awsec2
tag: 37
- image: flogo-awskinesis
tag: 18
- image: flogo-chargify
tag: 19
deployment.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: container-image-extractor
namespace: local-tibco-tci
labels:
app.cloud.tibco.com/name: container-image-extractor
spec:
backoffLimit: 0
template:
metadata:
labels:
app.cloud.tibco.com/name: container-image-extractor
spec:
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Never
containers:
- name: container-image-extractor
image: reldocker.tibco.com/stratosphere/container-image-extractor
imagePullPolicy: IfNotPresent
env:
- name: SOURCE_DOCKER_IMAGE
value: "<docker_image>:<docker_tag>" # docker image from which contents to be copied
My questions are as follows.
Is this the correct way to pass an array of docker image and tags as an argument to deployment.yaml
How would I replace <docker_image> and <docker_tag> in deployment.yaml from values.yaml and incrementally job should be triggered for each docker image and tag.
This is how I would do it, creating a job for every image in your list
{{- range .Values.contributions_list }}
apiVersion: batch/v1
kind: Job
metadata:
name: "container-image-extractor-{{ .image }}-{{ .tag }}"
namespace: local-tibco-tci
labels:
app.cloud.tibco.com/name: container-image-extractor
spec:
backoffLimit: 0
template:
metadata:
labels:
app.cloud.tibco.com/name: container-image-extractor
spec:
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Never
containers:
- name: container-image-extractor
image: reldocker.tibco.com/stratosphere/container-image-extractor
imagePullPolicy: IfNotPresent
env:
- name: SOURCE_DOCKER_IMAGE
value: "{{ .image }}:{{ .tag }}" # docker image from which contents to be copied
{{ end }}
If you use a value outside of this contribution list (release name, env, whatever), do not forget to change the scope such {{ $.Values.myjob.limits.cpu | quote }}. The $. is important :)
Edit: If you don't change the name at each iteration of the loop, it will override the configuration every time. With different names, you will have multiple jobs created.
You need to fix a deployment.yaml as below:
{{- range $contribution := .Values.contributions_list }}
apiVersion: batch/v1
kind: Job
metadata:
name: container-image-extractor
namespace: local-tibco-tci
labels:
app.cloud.tibco.com/name: container-image-extractor
spec:
backoffLimit: 0
template:
metadata:
labels:
app.cloud.tibco.com/name: container-image-extractor
spec:
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Never
containers:
- name: container-image-extractor
image: reldocker.tibco.com/stratosphere/container-image-extractor
imagePullPolicy: IfNotPresent
env:
- name: SOURCE_DOCKER_IMAGE
value: "{{ $contribution.image }}:{{ $contribution.tag }}"
{{- end }}
If you want to know helm template syntax, you can see this document

how to shut down cloud-sql-proxy in a helm chart pre-install hook

I am running a django application in a Kubernetes cluster on gcloud. I implemented the database migration as a helm pre-intall hook that launches my app container and does the database migration. I use cloud-sql-proxy in a sidecar pattern as recommended in the official tutorial: https://cloud.google.com/sql/docs/mysql/connect-kubernetes-engine
Basically this launches my app and a cloud-sql-proxy containers within the pod described by the job. The problem is that cloud-sql-proxy never terminates after my app has completed the migration causing the pre-intall job to timeout and cancel my deployment. How do I gracefully exit the cloud-sql-proxy container after my app container completes so that the job can complete?
Here is my helm pre-intall hook template definition:
apiVersion: batch/v1
kind: Job
metadata:
name: database-migration-job
labels:
app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
app.kubernetes.io/instance: {{ .Release.Name | quote }}
app.kubernetes.io/version: {{ .Chart.AppVersion }}
helm.sh/chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
annotations:
# This is what defines this resource as a hook. Without this line, the
# job is considered part of the release.
"helm.sh/hook": pre-install,pre-upgrade
"helm.sh/hook-weight": "-1"
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed
spec:
activeDeadlineSeconds: 230
template:
metadata:
name: "{{ .Release.Name }}"
labels:
app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
app.kubernetes.io/instance: {{ .Release.Name | quote }}
helm.sh/chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
spec:
restartPolicy: Never
containers:
- name: db-migrate
image: {{ .Values.my-project.docker_repo }}{{ .Values.backend.image }}:{{ .Values.my-project.image.tag}}
imagePullPolicy: {{ .Values.my-project.image.pullPolicy }}
env:
- name: DJANGO_SETTINGS_MODULE
value: "{{ .Values.backend.django_settings_module }}"
- name: SENDGRID_API_KEY
valueFrom:
secretKeyRef:
name: sendgrid-api-key
key: sendgrid-api-key
- name: DJANGO_SECRET_KEY
valueFrom:
secretKeyRef:
name: django-secret-key
key: django-secret-key
- name: DB_USER
value: {{ .Values.postgresql.postgresqlUsername }}
- name: DB_PASSWORD
{{- if .Values.postgresql.enabled }}
value: {{ .Values.postgresql.postgresqlPassword }}
{{- else }}
valueFrom:
secretKeyRef:
name: database-password
key: database-pwd
{{- end }}
- name: DB_NAME
value: {{ .Values.postgresql.postgresqlDatabase }}
- name: DB_HOST
{{- if .Values.postgresql.enabled }}
value: "postgresql"
{{- else }}
value: "127.0.0.1"
{{- end }}
workingDir: /app-root
command: ["/bin/sh"]
args: ["-c", "python manage.py migrate --no-input"]
{{- if eq .Values.postgresql.enabled false }}
- name: cloud-sql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.17
command:
- "/cloud_sql_proxy"
- "-instances=<INSTANCE_CONNECTION_NAME>=tcp:<DB_PORT>"
- "-credential_file=/secrets/service_account.json"
securityContext:
#fsGroup: 65532
runAsNonRoot: true
runAsUser: 65532
volumeMounts:
- name: db-con-mnt
mountPath: /secrets/
readOnly: true
volumes:
- name: db-con-mnt
secret:
secretName: db-service-account-credentials
{{- end }}
Funny enough, if I kill the job with "kubectl delete jobs database-migration-job" after the migration is done the helm upgrade completes and my new app version gets installed.
Well, I have a solution which will work but might be hacky. First of all this is Kubernetes is lacking feature which is in discussion in this issue.
With Kubernetes v1.17, containers in same Pods can share process namespaces. This enables us to kill proxy container from app container. Since this is a Kubernetes job there shouldn't be any anomalies to enable postStop handlers for app container.
With this solution when your app finishes and exits normally(or abnormally) then Kubernetes will run one last command from your dying container which will be kill another process in this case. This should result in job completion with success or fail depending on how you will be killing process. Process exit code will be container exit code, then it will be job exit code basically.

How to get a pod index inside a helm chart

I'm deploying a Kubernetes stateful set and I would like to get the pod index inside the helm chart so I can configure each pod with this pod index.
For example in the following template I'm using the variable {{ .Values.podIndex }} to retrieve the pod index in order to use it to configure my app.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: {{ .Values.name }}
spec:
replicas: {{ .Values.replicaCount }}
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 50%
template:
metadata:
labels:
app: {{ .Values.name }}
spec:
containers:
- image: {{ .Values.image.repository }}:{{ .Values.image.tag }}
imagePullPolicy: Always
name: {{ .Values.name }}
command: ["launch"],
args: ["-l","{{ .Values.podIndex }}"]
ports:
- containerPort: 4000
imagePullSecrets:
- name: gitlab-registry
You can't do this in the way you're describing.
Probably the best path is to change your Deployment into a StatefulSet. Each pod launched from a StatefulSet has an identity, and each pod's hostname gets set to the name of the StatefulSet plus an index. If your launch command looks at hostname, it will see something like name-0 and know that it's the first (index 0) pod in the StatefulSet.
A second path would be to create n single-replica Deployments using Go templating. This wouldn't be my preferred path, but you can
{{ range $podIndex := until .Values.replicaCount -}}
---
apiVersion: v1
kind: Deployment
metadata:
name: {{ .Values.name }}-{{ $podIndex }}
spec:
replicas: 1
template:
spec:
containers:
- name: {{ .Values.name }}
command: ["launch"]
args: ["-l", "{{ $podIndex }}"]
{{ end -}}
The actual flow here is that Helm reads in all of the template files and produces a block of YAML files, then submits these to the Kubernetes API server (with no templating directives at all), and the Kubernetes machinery acts on it. You can see what's being submitted by running helm template. By the time a Deployment is creating a Pod, all of the template directives have been stripped out; you can't make fields in the pod spec dependent on things like which replica it is or which node it got scheduled on.

error converting YAML to JSON: yaml: line 16: mapping values are not allowed in this context

I am trying to create a Kubernetes cronjob. During the deployment, I get this error:
Error: UPGRADE FAILED: YAML parse error on
lemming-metrics/templates/lemming-metrics-cronjob.yaml: error
converting YAML to JSON: yaml: line 16: mapping values are not allowed
in this context
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: {{ .Values.name }}
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: {{ .Values.lemming_metrics.kubeServiceAccount }}
containers:
- name: {{ .Values.name }}
image: {{ .Values.image.repository }}
tag: latest
imagePullPolicy: Always
resources: {{ toYaml .Values.resources }}
args:
- /usr/bin/python
- /opt/lemming_metrics.py
env:
- name: REGIONS
value: {{ .Values.lemming_metrics.regions}}
- name: ECS_CLUSTER
value: {{ .Values.lemming_metrics.ecs_cluster}}
restartPolicy: OnFailure
backoffLimit: 2
activeDeadlineSeconds: 90
Thanks for any help in advance
Looks like you might need to fix the spacing for the indents (use 2 spaces). Otherwise, I've found that if you are incorrectly defining a service/pod/deployment, you can receive this error message (where specific line it points to in reality doesn't matter)