Handling cronjobs in a Pod with multiple containers - kubernetes

I have a requirement in which I need to create a cronjob in kubernetes but the pod is having multiple containers (with single container its working fine).
Is it possible?
The requirement is something like this:
1. First container: Run the shell script to do a job.
2. Second container: run fluentbit conf to parse the log and send it.
Previously I thought to have a deployment in place and that is working fine but since that deployment was used just for 10 mins jobs I thought to make it a cron job.
Any help is really appreciated.
Also about the cronjob I am not sure if a pod can support multiple containers to do that same.
Thank you,
Sunny

Yes you can create a cronjob with multiple containers. CronJob is an abstraction on top of pod. So in the pod spec you can have multiple containers just like you can have in a normal pod. As an example
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
namespace: default
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
- name: app
image: alpine
command:
- echo
- Hello World!
restartPolicy: OnFailure

I need to agree with the answer provided by #Arghya Sadhu. It shows how you can run multi container Pod with a CronJob. Before the answer I would like to give more attention to the comment provided by #Chris Stryczynski:
It's not clear whether the containers are run in parallel or sequentially
It is not entirely clear if the workload that you are trying to run:
The requirement is something like this:
First container: Run the shell script to do a job.
Second container: run fluentbit conf to parse the log and send it.
could be used in parallel (both running at the same time) or require sequential approach (after X completed successfully, run Y).
If the workload could be run in parallel the answer provided by #Arghya Sadhu is correct, however if one workload is depending on another, I'd reckon you should be using initContainers instead of multi container Pods.
The example of a CronJob that implements the initContainer could be following:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: ubuntu
image: ubuntu
command: [/bin/bash]
args: ["-c","cat /data/hello_there.txt"]
volumeMounts:
- name: data-dir
mountPath: /data
initContainers:
- name: echo
image: busybox
command: ["bin/sh"]
args: ["-c", "echo 'General Kenobi!' > /data/hello_there.txt"]
volumeMounts:
- name: data-dir
mountPath: "/data"
volumes:
- name: data-dir
emptyDir: {}
This CronJob will write a specific text to a file with an initContainer and then a "main" container will display its result. It's worth to mention that the main container will not start if the initContainer won't succeed with its operations.
$ kubectl logs hello-1234567890-abcde
General Kenobi!
Additional resources:
Linchpiner.github.io: K8S multi container pods

Whats about sidecar container for logging as second container which keep running without exit code. Even the job might run the state of the job still failed.

Related

Ensure pod deletion when one of container terminates [duplicate]

I have a Kubernetes JOB that does database migrations on a CloudSQL database.
One way to access the CloudSQL database from GKE is to use the CloudSQL-proxy container and then connect via localhost. Great - that's working so far. But because I'm doing this inside a K8s JOB the job is not marked as successfully finished because the proxy keeps on running.
$ kubectrl get po
NAME READY STATUS RESTARTS AGE
db-migrations-c1a547 1/2 Completed 0 1m
Even though the output says 'completed' one of the initially two containers is still running - the proxy.
How can I make the proxy exit on completing the migrations inside container 1?
The best way I have found is to share the process namespace between containers and use the SYS_PTRACE securityContext capability to allow you to kill the sidecar.
apiVersion: batch/v1
kind: Job
metadata:
name: my-db-job
spec:
template:
spec:
restartPolicy: OnFailure
shareProcessNamespace: true
containers:
- name: my-db-job-migrations
command: ["/bin/sh", "-c"]
args:
- |
<your migration commands>;
sql_proxy_pid=$(pgrep cloud_sql_proxy) && kill -INT $sql_proxy_pid;
securityContext:
capabilities:
add:
- SYS_PTRACE
- name: cloudsql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.17
command:
- "/cloud_sql_proxy"
args:
- "-instances=$(DB_CONNECTION_NAME)=tcp:5432"
One possible solution would be a separate cloudsql-proxy deployment with a matching service. You would then only need your migration container inside the job that connects to your proxy service.
This comes with some downsides:
higher network latency, no pod local mysql communication
possible security issue if you provide the sql port to your whole kubernetes cluster
If you want to open cloudsql-proxy to the whole cluster you have to replace tcp:3306 with tcp:0.0.0.0:3306 in the -instance parameter on the cloudsql-proxy.
There are 3 ways of doing this.
1- Use private IP to connect your K8s job to Cloud SQL, as described by #newoxo in one of the answers. To do that, your cluster needs to be a VPC-native cluster. Mine wasn't and I was not whiling to move all my stuff to a new cluster. So I wasn't able to do this.
2- Put the Cloud SQL Proxy container in a separate deployment with a service, as described by #Christian Kohler. This looks like a good approach, but it is not recommended by Google Cloud Support.
I was about to head in this direction (solution #2) but I decided to try something else.
And here is the solution that worked for me:
3- You can communicate between different containers in the same Pod/Job using the file system. The idea is to tell the Cloud SQL Proxy container when the main job is done, and then kill the cloud sql proxy. Here is how to do it:
In the yaml file (my-job.yaml)
apiVersion: v1
kind: Pod
metadata:
name: my-job-pod
labels:
app: my-job-app
spec:
restartPolicy: OnFailure
containers:
- name: my-job-app-container
image: my-job-image:0.1
command: ["/bin/bash", "-c"]
args:
- |
trap "touch /lifecycle/main-terminated" EXIT
{ your job commands here }
volumeMounts:
- name: lifecycle
mountPath: /lifecycle
- name: cloudsql-proxy-container
image: gcr.io/cloudsql-docker/gce-proxy:1.11
command: ["/bin/sh", "-c"]
args:
- |
/cloud_sql_proxy -instances={ your instance name }=tcp:3306 -credential_file=/secrets/cloudsql/credentials.json &
PID=$!
while true
do
if [[ -f "/lifecycle/main-terminated" ]]
then
kill $PID
exit 0
fi
sleep 1
done
securityContext:
runAsUser: 2 # non-root user
allowPrivilegeEscalation: false
volumeMounts:
- name: cloudsql-instance-credentials
mountPath: /secrets/cloudsql
readOnly: true
- name: lifecycle
mountPath: /lifecycle
volumes:
- name: cloudsql-instance-credentials
secret:
secretName: cloudsql-instance-credentials
- name: lifecycle
emptyDir:
Basically, when your main job is done, it will create a file in /lifecycle that will be identified by the watcher added to the cloud-sql-proxy container, which will kill the proxy and terminate the container.
I hope it helps! Let me know if you have any questions.
Based on: https://stackoverflow.com/a/52156131/7747292
Doesn't look like Kubernetes can do this alone, you would need to manually kill the proxy once the migration exits. Similar question asked here: Sidecar containers in Kubernetes Jobs?
Google cloud sql has recently launched private ip address connectivity for cloudsql. If the cloud sql instance and kubernetes cluster is in same region you can connect to cloudsql without using cloud sql proxy.
https://cloud.google.com/sql/docs/mysql/connect-kubernetes-engine#private-ip
A possible solution would be to set the concurrencyPolicy: Replace in the job spec ... this will agnostically replace the current pod with the new instance whenever it needs to run again. But, you have to make sure that the subsequent cron runs are separated enough.
Unfortunately the other answers weren't working for me because of CloudSQLProxy running in a distroless environment where there is no shell.
I managed to get around this by bundling a CloudSQLProxy binary with my deployment and running a bash script to start up CloudSQLProxy followed by my app.
Dockerfile:
FROM golang:1.19.4
RUN apt update
COPY . /etc/mycode/
WORKDIR /etc/mycode
RUN chmod u+x ./scripts/run_migrations.sh
RUN chmod u+x ./bin/cloud_sql_proxy.linux-amd64
RUN go install
ENTRYPOINT ["./scripts/run_migrations.sh"]
Shell Script (run_migrations.sh):
#!/bin/sh
# This script is run from the parent directory
dbConnectionString=$1
cloudSQLProxyPort=$2
echo "Starting Cloud SQL Proxy"
./bin/cloud_sql_proxy.linux-amd64 -instances=${dbConnectionString}=tcp:5432 -enable_iam_login -structured_logs &
CHILD_PID=$!
echo "CloudSQLProxy PID: $CHILD_PID"
echo "Migrating DB..."
go run ./db/migrations/main.go
MAIN_EXIT_CODE=$?
kill $CHILD_PID;
echo "Migrations complete.";
exit $MAIN_EXIT_CODE
K8s (via Pulumi):
import * as k8s from '#pulumi/kubernetes'
const jobDBMigrations = new k8s.batch.v1.Job("job-db-migrations", {
metadata: {
namespace: namespaceName,
labels: appLabels,
},
spec: {
backoffLimit: 4,
template: {
spec: {
containers: [
{
image: pulumi.interpolate`gcr.io/${gcpProject}/${migrationsId}:${migrationsVersion}`,
name: "server-db-migration",
args: [
dbConnectionString,
],
},
],
restartPolicy: "Never",
serviceAccount: k8sSAMigration.metadata.name,
},
},
},
},
{
provider: clusterProvider,
});

Checking result of command in helm chart (helm-hooks)

I am trying to execute a pre install job using helm charts. Can someone help getting result of command (parameter in yaml file) that I put in the below file:
apiVersion: batch/v1
kind: Job
metadata:
name: pre-install-job
annotations:
"helm.sh/hook": "pre-install"
spec:
template:
spec:
containers:
- name: pre-install
image: busybox
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'touch somefile.txt && echo $PWD && sleep 15']
restartPolicy: OnFailure
terminationGracePeriodSeconds: 0
backoffLimit: 3
completions: 1
parallelism: 1
I want to know where somefile.txt is created and echo is printed. And the reason I know it is working because "sleep 15" works. I see a 15 second difference in start and end time of pod creation.
Any file you create in a container environment is created inside the container filesystem. Unless you've mounted some storage into the container, the file will be lost as soon as the container exits.
Anything a Kubernetes process writes to its stdout will be captured by the Kubernetes log system. You can retrieve it using kubectl logs pre-install-job-... -c pre-install.

Copy file inside Kubernetes pod from another container

I need to copy a file inside my pod during the time of creation. I don't want to use ConfigMap and Secrets. I am trying to create a volumeMounts and copy the source file using the kubectl cp command—my manifest looks like this.
apiVersion: v1
kind: Pod
metadata:
name: copy
labels:
app: hello
spec:
containers:
- name: init-myservice
image: bitnami/kubectl
command: ['kubectl','cp','./test.json','init-myservice:./data']
volumeMounts:
- name: my-storage
mountPath: data
- name: init-myservices
image: nginx
volumeMounts:
- name: my-storage
mountPath: data
volumes:
- name: my-storage
emptyDir: {}
But I am getting a CrashLoopBackOff error. Any help or suggestion is highly appreciated.
it's not possible.
let me explain : you need to think of it like two different machine. here your local machine is the one where the file exist and you want to copy it in another machine with cp. but it's not possible. and this is what you are trying to do here. you are trying to copy file from your machine to pod's machine.
here you can do one thing just create your own docker image for init-container. and copy the file you want to store before building the docker image. then you can copy that file in shared volume where you want to store the file.
I do agree with an answer provided by H.R. Emon, it explains why you can't just run kubectl cp inside of the container. I do also think there are some resources that could be added to show you how you can tackle this particular setup.
For this particular use case it is recommended to use an initContainer.
initContainers - specialized containers that run before app containers in a Pod. Init containers can contain utilities or setup scripts not present in an app image.
Kubernetes.io: Docs: Concepts: Workloads: Pods: Init-containers
You could use the example from the official Kubernetes documentation (assuming that downloading your test.json is feasible):
apiVersion: v1
kind: Pod
metadata:
name: init-demo
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
volumeMounts:
- name: workdir
mountPath: /usr/share/nginx/html
# These containers are run during pod initialization
initContainers:
- name: install
image: busybox
command:
- wget
- "-O"
- "/work-dir/index.html"
- http://info.cern.ch
volumeMounts:
- name: workdir
mountPath: "/work-dir"
dnsPolicy: Default
volumes:
- name: workdir
emptyDir: {}
-- Kubernetes.io: Docs: Tasks: Configure Pod Initalization: Create a pod that has an initContainer
You can also modify above example to your specific needs.
Also, referring to your particular example, there are some things that you will need to be aware of:
To use kubectl inside of a Pod you will need to have required permissions to access the Kubernetes API. You can do it by using serviceAccount with some permissions. More can be found in this links:
Kubernetes.io: Docs: Reference: Access authn authz: Authentication: Service account tokens
Kubernetes.io: Docs: Reference: Access authn authz: RBAC
Your bitnami/kubectl container will run into CrashLoopBackOff errors because of the fact that you're passing a single command that will run to completion. After that Pod would report status Completed and it would be restarted due to this fact resulting in before mentioned CrashLoopBackOff. To avoid that you would need to use initContainer.
You can read more about what is happening in your setup by following this answer (connected with previous point):
Stackoverflow.com: Questions: What happens one of the container process crashes in multiple container POD?
Additional resources:
Kubernetes.io: Pod lifecycle
A side note!
I also do consider including the reason why Secrets and ConfigMaps cannot be used to be important in this particular setup.

Issue Deleting Temporary pods

I am trying to delete temporary pods and other artifacts using helm delete. I am trying to run this helm delete to run on a schedule. Here is my stand alone command which works
helm delete --purge $(helm ls -a -q temppods.*)
However if i try to run this on a schedule as below i am running into issues.
Here is what mycron.yaml looks like:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: cronbox
namespace: mynamespace
spec:
serviceAccount: cron-z
successfulJobsHistoryLimit: 1
schedule: "*/5 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: cronbox
image: alpine/helm:2.9.1
args: ["delete", "--purge", "$(helm ls -a -q temppods.*)"
env:
- name: TILLER_NAMESPACE
value: mynamespace-build
- name: KUBECONFIG
value: /kube/config
volumeMounts:
- mountPath: /kube
name: kubeconfig
restartPolicy: OnFailure
volumes:
- name: kubeconfig
configMap:
name: cronjob-kubeconfig
I ran
oc create -f ./mycron.yaml
This created the cronjob
Every 5th minute a pod is getting created and the helm command that is part of the cron job runs.
I am expecting the artifacts/pods name beginning with temppods* to be deleted.
What i see in the logs of the pod is:
Error: invalid release name, must match regex ^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])+$ and the length must not longer than 53
The CronJob container spec is trying to delete a release named (literally):
$(helm ls -a -q temppods.*)
This release doesn't exist, and fails helms expected naming conventions.
Why
The alpine/helm:2.9.1 container image has an entrypoint of helm. This means any arguments are passes directly to the helm binary via exec. No shell expansion ($()) occurs as there is no shell running.
Fix
To do what you are expecting you can use sh which is available in alpine images.
sh -uexc 'releases=$(helm ls -a -q temppods.*); helm delete --purge $releases'
In a Pod spec this translates to:
spec:
containers:
- name: cronbox
command: 'sh'
args:
- '-uexc'
- 'releases=$(helm ls -a -q temppods.*); helm delete --purge $releases;'
Helm
As a side note, helm is not the most reliable tool when clusters or releases get into vague states. Running multiple helm commands interacting with within the same release at the same time usually spells disaster and this seems on the surface like that is likely. Maybe there is a question in other ways to achieve this process your are implementing?

Replication Controller replica ID in an environment variable?

I'm attempting to inject a ReplicationController's randomly generated pod ID extension (i.e. multiverse-{replicaID}) into a container's environment variables. I could manually get the hostname and extract it from there, but I'd prefer if I didn't have to add the special case into the script running inside the container, due to compatibility reasons.
If a pod is named multiverse-nffj1, INSTANCE_ID should equal nffj1. I've scoured the docs and found nothing.
apiVersion: v1
kind: ReplicationController
metadata:
name: multiverse
spec:
replicas: 3
template:
spec:
containers:
- env:
- name: INSTANCE_ID
value: $(replicaID)
I've tried adding a command into the controller's template configuration to create the environment variable from the hostname, but couldn't figure out how to make that environment variable available to the running script.
Is there a variable I'm missing, or does this feature not exist? If it doesn't, does anyone have any ideas on how to make this to work without editing the script inside of the container?
There is an answer provided by Anton Kostenko about inserting DB credentials into container environment variables, but it could be applied to your case also. It is all about the content of the InitContainer spec.
You can use InitContainer to get the hash from the container’s hostname and put it to the file on the shared volume that you mount to the container.
In this example InitContainer put the Pod name into the INSTANCE_ID environment variable, but you can modify it according to your needs:
Create the init.yaml file with the content:
apiVersion: v1
kind: Pod
metadata:
name: init-test
spec:
containers:
- name: init-test
image: ubuntu
args: [bash, -c, 'source /data/config && echo $INSTANCE_ID && while true ; do sleep 1000; done ']
volumeMounts:
- name: config-data
mountPath: /data
initContainers:
- name: init-init
image: busybox
command: ["sh","-c","echo -n INSTANCE_ID=$(hostname) > /data/config"]
volumeMounts:
- name: config-data
mountPath: /data
volumes:
- name: config-data
emptyDir: {}
Create the pod using following command:
kubectl create -f init.yaml
Check if Pod initialization is done and is Running:
kubectl get pod init-test
Check the logs to see the results of this example configuration:
$ kubectl logs init-test
init-test