Run a container on a pod failure Kubernetes - kubernetes

I have a cronjob that runs and does things regularly. I want to send a slack message with the technosophos/slack-notify container when that cronjob fails.
Is it possible to have a container run when a pod fails?

There is nothing built in for this that i am aware of. You could use a web hook to get notified when a pod changes and look for state stuff in there. But you would have to build the plumbing yourself or look for an existing third party tool.

Pods and Jobs are different things. If you want to wait for a job that has failed and send an email after it has, you can do something like this in bash:
while true
do
kubectl wait --for=condition=failed job/myjob
kubectl run --image=technosophos/slack-notify --env="EMAIL=failure#yourdomain.com"
done

To the question: Is it possible to have a container run when a pod fails?
Yes , although there is nothing out of the box right now , but you can define a health check.
Then you can write a cron job , or a Jenkins job , or a custom kubernetes cluster service/controller that checks/probes that health check regularly and if the health check fails then you can run a container based on that.

Related

Iguazio job is stuck on 'Pending' status

I have a job I am running in Iguazio. It starts and then the status is "Pending" and the icon is blue. It stays like this indefinitely and there is nothing in the logs that describes what is going on. How do I fix this?
A job stuck in this status is usually a Kubernetes issue. The reason there is no logs in the Iguazio dashboard for the job is because the pod never started, which is where the logs come from. You can navigate to the web shell / Jupyter service in Iguazio and use kubectl commands to find out what is going on in Kubernetes. Usually, I see this when there is an issue with the docker image for the pod, it either can’t be found or has bugs.
In a terminal: doing kubectl get pods and find your pod. It usually has ImagePullBackOff, or CrashLoopBackOff or some similar error. Check the docker image which is usually the culprit. You can kill the pod in Kubernetes, which in turn will error the job out. You can also “abort” the job from the menu in the dashboard under that specific job.

GCP Alerting Policy for failed GKE CronJob

What would be the best way to set up a GCP monitoring alert policy for a Kubernetes CronJob failing? I haven't been able to find any good examples out there.
Right now, I have an OK solution based on monitoring logs in the Pod with ERROR severity. I've found this to be quite flaky, however. Sometimes a job will fail for some ephemeral reason outside my control (e.g., an external server returning a temporary 500) and on the next retry, the job runs successfully.
What I really need is an alert that is only triggered when a CronJob is in a persistent failed state. That is, Kubernetes has tried rerunning the whole thing, multiple times, and it's still failing. Ideally, it could also handle situations where the Pod wasn't able to come up either (e.g., downloading the image failed).
Any ideas here?
Thanks.
First of all, confirm the GKE’s version that you are running. For that, the following commands are going to help you to identify the GKE’s
default version and the available versions too:
Default version.
gcloud container get-server-config --flatten="channels" --filter="channels.channel=RAPID" \
--format="yaml(channels.channel,channels.defaultVersion)"
Available versions.
gcloud container get-server-config --flatten="channels" --filter="channels.channel=RAPID" \
--format="yaml(channels.channel,channels.validVersions)"
Now that you know your GKE’s version and based on what you want is an alert that is only triggered when a CronJob is in a persistent failed state, GKE Workload Metrics was the GCP’s solution that used to provide a fully managed and highly configurable solution for sending to Cloud Monitoring all Prometheus-compatible metrics emitted by GKE workloads (such as a CronJob or a Deployment for an application). But, as it is right now deprecated in G​K​E 1.24 and was replaced with Google Cloud Managed Service for Prometheus, then this last is the best option you’ve got inside of GCP, as it lets you monitor and alert on your workloads, using Prometheus, without having to manually manage and operate Prometheus at scale.
Plus, you have 2 options from the outside of GCP: Prometheus as well and Ranch’s Prometheus Push Gateway.
Finally and just FYI, it can be done manually by querying for the job and then checking it's start time, and compare that to the current time, this way, with bash:
START_TIME=$(kubectl -n=your-namespace get job your-job-name -o json | jq '.status.startTime')
echo $START_TIME
Or, you are able to get the job’s current status as a JSON blob, as follows:
kubectl -n=your-namespace get job your-job-name -o json | jq '.status'
You can see the following thread for more reference too.
Taking the “Failed” state as the medullary point of your requirement, setting up a bash script with kubectl to send an email if you see a job that is in “Failed” state can be useful. Here I will share some examples with you:
while true; do if `kubectl get jobs myjob -o jsonpath='{.status.conditions[?(#.type=="Failed")].status}' | grep True`; then mail email#address -s jobfailed; else sleep 1 ; fi; done
For newer K8s:
while true; do kubectl wait --for=condition=failed job/myjob; mail#address -s jobfailed; done

kubernetes schedule job after cron

I have Cron name "cronX" and a Job name "JobY"
how can I configure kubernetes to run "JobY" after "cronX" finished?
I know I can do it using API call from "cronX" to start "JobY" but I don't want to do that using an API call.
Is there any Kubernetes configuration to schedule this?
is it possible that this pod will contain 2 containers and one of them will run only after the second container finish?
Negative, more details here. If you only have 2 containers to run, you can place the first one under initContainers and another under containers and schedule the pod.
No built-in K8s configuration available to do workflow orchestration. You can try Argo workflow to do this.

Designing K8 pod and proceses for initialization

I have a problem statement where in there is a Kubernetes cluster and I have some pods running on it.
Now, I want some functions/processes to run once per deployment, independent of number of replicas.
These processes use the same image like the image in deployment yaml.
I cannot use initcontainers and sidecars, because they will run along with main container on pod for each replica.
I tried to create a new image and then a pod out of it. But this pod keeps on running, which is not good for cluster resource, as it should be destroyed after it has done its job. Also, the main container depends on the completion on this process, in order to run the "command" part of K8 spec.
Looking for suggestions on how to tackle this?
Theoretically, You could write an admission controller webhook for intercepting create/update deployments and triggering your functions as you want. If your functions need to be checked, use ValidatingWebhookConfiguration for validating the process and then deny or accept commands.

How to ensure that a pod does not restart?

I have a pod that is meant to run a code excerpt and exit afterwards. I do not want this pod to restart after exiting, but apparently it is not possible to set a restart policy in Kubernetes (see here and here).
Therefore my question is: how can I implement a pod that runs only once?
Thank you
You need to deploy a job. A deployment is meant to keep the containers running all the time. Give a check on:
https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/