kubernetes with multiple jobs counter - kubernetes

New to kubernetes i´m trying to move a current pipeline we have using a queing system without k8s.
I have a perl script that generates a list of batch jobs (yml files) for each of the samples that i have to process.
Then i run kubectl apply --recursive -f 16S_jobscripts/
For example each sample needs to be treated sequentially and go through different processing
Exemple:
SampleA -> clean -> quality -> some_calculation
SampleB -> clean -> quality -> some_calculation
and so on for 300 samples.
So the idea is to prepare all the yml files and run them sequentially. This is working.
BUT, with this approach i need to wait that all samples are processed (let´s say that all the clean jobs need to completed before i run the next jobs quality).
what would be the best approach in such case, run each sample independently ?? how ?
The yml below describe one Sample for one job. You can see that i´m using a counter (mergereads-1 for sample1(A))
apiVersion: batch/v1
kind: Job
metadata:
name: merge-reads-1
namespace: namespace-id-16s
labels:
jobgroup: mergereads
spec:
template:
metadata:
name: mergereads-1
labels:
jobgroup: mergereads
spec:
containers:
- name: mergereads-$idx
image: .../bbmap:latest
command: ['sh', '-c']
args: ['
cd workdir &&
bbmerge.sh -Xmx1200m in1=files/trimmed/1.R1.trimmed.fq.gz in2=files/trimmed/1.R2.trimmed.fq.gz out=files/mergedpairs/1.merged.fq.gz merge=t mininsert=300 qtrim2=t minq=27 ratiomode=t &&
ls files/mergedpairs/
']
resources:
limits:
cpu: 1
memory: 2000Mi
requests:
cpu: 0.8
memory: 1500Mi
volumeMounts:
- mountPath: '/workdir'
name: db
volumes:
- name: db
persistentVolumeClaim:
claimName: workdir
restartPolicy: Never

If i understand you correctly you can use parallel-jobs with a use of Job Patterns.
It does support parallel processing of a set of independent but
related work items.
Also you can consider using Argo.
https://github.com/argoproj/argo
Argo Workflows is an open source container-native workflow engine for
orchestrating parallel jobs on Kubernetes. Argo Workflows is
implemented as a Kubernetes CRD (Custom Resource Definition).
Please let me know if that helps.

Related

Getting JAR file from S3 using Flink Kubernetes operator

I'm experimenting with the new Flink Kubernetes operator and I've been able to do pretty much everything that I need besides one thing: getting a JAR file from the S3 file system.
Context
I have a Flink application running in a EKS cluster in AWS and have all the information saved in a S3 buckets. Things like savepoints, checkpoints, high availability and JARs files are all stored there.
I've been able to save the savepoints, checkpoints and high availability information in the bucket, but when trying to get the JAR file from the same bucket I get the error:
Could not find a file system implementation for scheme 's3'. The scheme is directly supported by Flink through the following plugins: flink-s3-fs-hadoop, flink-s3-fs-presto.
I was able to get to this thread, but I wasn't able to get the resource fetcher to work correctly. Also the solution is not ideal and I was searching for a more direct approach.
Deployment files
Here's the files that I'm deploying in the cluster:
deployment.yml
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: flink-deployment
spec:
podTemplate:
apiVersion: v1
kind: Pod
metadata:
name: pod-template
spec:
containers:
- name: flink-main-container
env:
- name: ENABLE_BUILT_IN_PLUGINS
value: flink-s3-fs-presto-1.15.3.jar;flink-s3-fs-hadoop-1.15.3.jar
volumeMounts:
- mountPath: /flink-data
name: flink-volume
volumes:
- name: flink-volume
hostPath:
path: /tmp
type: Directory
image: flink:1.15
flinkVersion: v1_15
flinkConfiguration:
state.checkpoints.dir: s3://kubernetes-operator/checkpoints
state.savepoints.dir: s3://kubernetes-operator/savepoints
high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: s3://kubernetes-operator/ha
jobManager:
resource:
memory: "2048m"
cpu: 1
taskManager:
resource:
memory: "2048m"
cpu: 1
serviceAccount: flink
session-job.yml
apiVersion: flink.apache.org/v1beta1
kind: FlinkSessionJob
metadata:
name: flink-session-job
spec:
deploymentName: flink-deployment
job:
jarURI: s3://kubernetes-operator/savepoints/flink.jar
parallelism: 3
upgradeMode: savepoint
savepointTriggerNonce: 0
The Flink Kubernetes operator version that I'm using is 1.3.1
Is there anything that I'm missing or doing wrong?
The download of the jar happens in flink-kubernetes-operator pod. So, when you apply FlinkSessionJob, the fink-operator would recognize the Crd and will try to download the jar from jarUri location and construct a JobGraph and submit the sessionJob to JobDeployment. Flink Kubernetes Operator will also have flink running inside it to build a JobGraph.
So, You will have to add flink-s3-fs-hadoop-1.15.3.jar in location /opt/flink/plugins/s3-fs-hadoop/ inside flink-kubernetes-operator
You can add the jar either by extending the ghcr.io/apache/flink-kubernetes-operator image, curl the jar and copy it to plugins location
or
You can write an initContainer which will download the jar to a volume and mount that volume
volumes:
- name: s3-plugin
emptyDir: { }
initContainers:
- name: busybox
image: busybox:latest
volumeMounts:
- mountPath: /opt/flink/plugins/s3-fs-hadoop
name: s3-plugin
containers:
- image: 'ghcr.io/apache/flink-kubernetes-operator:95128bf'
name: flink-kubernetes-operator
volumeMounts:
- mountPath: /opt/flink/plugins/s3-fs-hadoop
name: s3-plugin
Also, if you are using serviceAccount for S3 authentication, give below config in flinkConfig
fs.s3a.aws.credentials.provider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider

The active users count dont match on execution through 2 containers in Kubernates

We are running jmx through Tauras using 2 containers in Kubernetes.
We are seeing only 50 users in results instead of 100(50*2 containers).
Can anyone please through some light if we are missing something here.
We get two jtl and checking them individual or combined the total users are same 50 only. Is it related to same Thread name being generated and logged in jtl file or something else.
Here is the yml details:
apiVersion: v1
kind: ConfigMap
metadata:
name: joba
namespace: AAA
data:
protocol: "https"
serverUrl: “testurl”
users: "50”
duration: "1m"
nodeName: "Nodename"
---
apiVersion: /v1
kind: Job
metadata:
name: perftest
namespace: dev
spec:
template:
spec:
containers:
- args: ["split -l ${users} --numeric-suffixes Test.csv Test-; /bin/bash ./Shellscripttoread_assignvariables.sh;"]
command: ["/bin/bash", "-c"]
env:
- name: JobNumber
value: "00"
envFrom:
- configMapRef:
name: job-multi
image: imagepath
name: ubuntu-00
resources:
limits:
memory: “8000Mi"
cpu: "2880m"
- args: ["split -l ${users} --numeric-suffixes Test.csv Test-; /bin/bash ./Shellscripttoread_assignvariables.sh;"]
command: ["/bin/bash", "-c"]
env:
- name: JobNumber
value: "01”
envFrom:
- configMapRef:
name: job-multi
image: imagepath
name: ubuntu-01
resources:
limits:
memory: “8000Mi"
cpu: "2880m"
Your YAML is very nice but it doesn't tell anything about how do you launch JMeter or what these shell scripts you invoke are doing.
If you just kick off 2 separate JMeter instances by means of k8s - JMeter will look at the number of active threads from the .jtl file and given the Sampler/Transaction names are the same JMeter "thinks" that the tests were executed on one engine.
The workaround is to add i.e. machineName() or __machineIP() function to sampler/transaction labels, this way JMeter will distinguish the results coming from different instances and you will see real number of active threads.
The solution would be running your JMeter test in Distributed Mode so master will run in one pod, slaves in their own pods and the master will be responsible for transferring .jmx script to the slaves and collecting results from them

Volume shared between two containers "is busy or locked"

I have a deployment that runs two containers. One of the containers attempts to build (during deployment) a javascript bundle that the other container, nginx, tries to serve.
I want to use a shared volume to place the javascript bundle after it's built.
So far, I have the following deployment file (with irrelevant pieces removed):
apiVersion: apps/v1
kind: Deployment
metadata:
...
spec:
...
template:
...
spec:
hostNetwork: true
containers:
- name: personal-site
image: wheresmycookie/personal-site:3.1
volumeMounts:
- name: build-volume
mountPath: /var/app/dist
- name: nginx-server
image: nginx:1.19.0
volumeMounts:
- name: build-volume
mountPath: /var/app/dist
volumes:
- name: build-volume
emptyDir: {}
To the best of my ability, I have followed these guides:
https://kubernetes.io/docs/concepts/storage/volumes/#emptydir
https://kubernetes.io/docs/tasks/access-application-cluster/communicate-containers-same-pod-shared-volume/
One other things to point out is that I'm trying to run this locally atm using minikube.
EDIT: The Dockerfile I used to build this image is:
FROM node:alpine
WORKDIR /var/app
COPY . .
RUN npm install
RUN npm install -g #vue/cli#latest
CMD ["npm", "run", "build"]
I realize that I do not need to build this when I actually run the image, but my next goal is to insert pod instance information as environment variables, so with javascript unfortunately I can only build once that information is available to me.
Problem
The logs from the personal-site container reveal:
- Building for production...
ERROR Error: EBUSY: resource busy or locked, rmdir '/var/app/dist'
Error: EBUSY: resource busy or locked, rmdir '/var/app/dist'
I'm not sure why the build is trying to remove /dist, but also have a feeling that this is irrelevant. I could be wrong?
I thought that maybe this could be related to the lifecycle of containers/volumes, but the docs suggest that "An emptyDir volume is first created when a Pod is assigned to a Node, and exists as long as that Pod is running on that node".
Question
What are some reasons that a volume might not be available to me after the containers are already running? Given that you probably have much more experience than I do with Kubernetes, what would you look into next?
The best way is to customize your image's entrypoint as following:
Once you finish building the /var/app/dist folder, copy(or move) this folder to another empty path (.e.g: /opt/dist)
cp -r /var/app/dist/* /opt/dist
PAY ATTENTION: this Step must be done in the script of ENTRYPOINT not in the RUN layer.
Now use /opt/dist instead..:
apiVersion: apps/v1
kind: Deployment
metadata:
...
spec:
...
template:
...
spec:
hostNetwork: true
containers:
- name: personal-site
image: wheresmycookie/personal-site:3.1
volumeMounts:
- name: build-volume
mountPath: /opt/dist # <--- make it consistent with image's entrypoint algorithm
- name: nginx-server
image: nginx:1.19.0
volumeMounts:
- name: build-volume
mountPath: /var/app/dist
volumes:
- name: build-volume
emptyDir: {}
Good luck!
If it's not clear how to customize the entrypoint, share with us your entrypoint of the image and we will implement it.

Handling cronjobs in a Pod with multiple containers

I have a requirement in which I need to create a cronjob in kubernetes but the pod is having multiple containers (with single container its working fine).
Is it possible?
The requirement is something like this:
1. First container: Run the shell script to do a job.
2. Second container: run fluentbit conf to parse the log and send it.
Previously I thought to have a deployment in place and that is working fine but since that deployment was used just for 10 mins jobs I thought to make it a cron job.
Any help is really appreciated.
Also about the cronjob I am not sure if a pod can support multiple containers to do that same.
Thank you,
Sunny
Yes you can create a cronjob with multiple containers. CronJob is an abstraction on top of pod. So in the pod spec you can have multiple containers just like you can have in a normal pod. As an example
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
namespace: default
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
- name: app
image: alpine
command:
- echo
- Hello World!
restartPolicy: OnFailure
I need to agree with the answer provided by #Arghya Sadhu. It shows how you can run multi container Pod with a CronJob. Before the answer I would like to give more attention to the comment provided by #Chris Stryczynski:
It's not clear whether the containers are run in parallel or sequentially
It is not entirely clear if the workload that you are trying to run:
The requirement is something like this:
First container: Run the shell script to do a job.
Second container: run fluentbit conf to parse the log and send it.
could be used in parallel (both running at the same time) or require sequential approach (after X completed successfully, run Y).
If the workload could be run in parallel the answer provided by #Arghya Sadhu is correct, however if one workload is depending on another, I'd reckon you should be using initContainers instead of multi container Pods.
The example of a CronJob that implements the initContainer could be following:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: ubuntu
image: ubuntu
command: [/bin/bash]
args: ["-c","cat /data/hello_there.txt"]
volumeMounts:
- name: data-dir
mountPath: /data
initContainers:
- name: echo
image: busybox
command: ["bin/sh"]
args: ["-c", "echo 'General Kenobi!' > /data/hello_there.txt"]
volumeMounts:
- name: data-dir
mountPath: "/data"
volumes:
- name: data-dir
emptyDir: {}
This CronJob will write a specific text to a file with an initContainer and then a "main" container will display its result. It's worth to mention that the main container will not start if the initContainer won't succeed with its operations.
$ kubectl logs hello-1234567890-abcde
General Kenobi!
Additional resources:
Linchpiner.github.io: K8S multi container pods
Whats about sidecar container for logging as second container which keep running without exit code. Even the job might run the state of the job still failed.

Can we create a POD from two existing Yamls each having their own container?

My project have 2 Yamls to create which create 2 PODS each.
Can we create a single POD with these yamls, without merging the yamls, with 2 containers ?
Thanks
Yes, you run multiple containers inside the single pod. In single YAML manifest, you can add your both containers spec and run it.
however, you cannot without merging YAML you can not run multiple containers inside one pod.
for single file example :
apiVersion: v1
kind: Pod
metadata:
name: mc1
spec:
volumes:
- name: html
emptyDir: {}
containers:
- name: 1st
image: nginx
volumeMounts:
- name: html
mountPath: /usr/share/nginx/html
- name: 2nd
image: debian
volumeMounts:
- name: html
mountPath: /html
command: ["/bin/sh", "-c"]
args:
- while true; do
date >> /html/index.html;
sleep 1;
done
more details you can also refer official document : https://kubernetes.io/docs/tasks/access-application-cluster/communicate-containers-same-pod-shared-volume/
If you don't want to merge the containers definition in the same file and in the same containers block, then no you can't.