Kubernetes - processing an unlimited number of work-items - kubernetes

I need to get a work-item from a work-queue and then sequentially run a series of containers to process each work-item. This can be done using initContainers (https://stackoverflow.com/a/46880653/94078)
What would be the recommended way of restarting the process to get the next work-item?
Jobs seem ideal but don't seem to support an infinite/indefinite number of completions.
Using a single Pod doesn't work because initContainers aren't restarted (https://github.com/kubernetes/kubernetes/issues/52345).
I would prefer to avoid the maintenance/learning overhead of a system like argo or brigade.
Thanks!

Jobs should be used for working with work queues. When using work queues you should not set the .spec.comletions (or set it to null). In that case Pods will keep getting created until one of the Pods exit successfully. It is a little awkward exiting from the (main) container with a failure state on purpose, but this is the specification. You may set .spec.parallelism to your liking irrespective of this setting; I've set it to 1 as it appears you do not want any parallelism.
In your question you did not specify what you want to do if the work queue gets empty, so I will give two solutions, one if you want to wait for new items (infinite) and one if want to end the job if the work queue gets empty (finite, but indefinite number of items).
Both examples use redis, but you can apply this pattern to your favorite queue. Note that the part that pops an item from the queue is not safe; if your Pod dies for some reason after having popped an item, that item will remain unprocessed or not fully processed. See the reliable-queue pattern for a proper solution.
To implement the sequential steps on each work item I've used init containers. Note that this really is a primitve solution, but you have limited options if you don't want to use some framework to implement a proper pipeline.
There is an asciinema if any would like to see this at work without deploying redis, etc.
Redis
To test this you'll need to create, at a minimum, a redis Pod and a Service. I am using the example from fine parallel processing work queue. You can deploy that with:
kubectl apply -f https://rawgit.com/kubernetes/website/master/docs/tasks/job/fine-parallel-processing-work-queue/redis-pod.yaml
kubectl apply -f https://rawgit.com/kubernetes/website/master/docs/tasks/job/fine-parallel-processing-work-queue/redis-service.yaml
The rest of this solution expects that you have a service name redis in the same namespace as your Job and it does not require authentication and a Pod called redis-master.
Inserting items
To insert some items in the work queue use this command (you will need bash for this to work):
echo -ne "rpush job "{1..10}"\n" | kubectl exec -it redis-master -- redis-cli
Infinite version
This version waits if the queue is empty thus it will never complete.
apiVersion: batch/v1
kind: Job
metadata:
name: primitive-pipeline-infinite
spec:
parallelism: 1
completions: null
template:
metadata:
name: primitive-pipeline-infinite
spec:
volumes: [{name: shared, emptyDir: {}}]
initContainers:
- name: pop-from-queue-unsafe
image: redis
command: ["sh","-c","redis-cli -h redis blpop job 0 >/shared/item.txt"]
volumeMounts: [{name: shared, mountPath: /shared}]
- name: step-1
image: busybox
command: ["sh","-c","echo step-1 working on `cat /shared/item.txt` ...; sleep 5"]
volumeMounts: [{name: shared, mountPath: /shared}]
- name: step-2
image: busybox
command: ["sh","-c","echo step-2 working on `cat /shared/item.txt` ...; sleep 5"]
volumeMounts: [{name: shared, mountPath: /shared}]
- name: step-3
image: busybox
command: ["sh","-c","echo step-3 working on `cat /shared/item.txt` ...; sleep 5"]
volumeMounts: [{name: shared, mountPath: /shared}]
containers:
- name: done
image: busybox
command: ["sh","-c","echo all done with `cat /shared/item.txt`; sleep 1; exit 1"]
volumeMounts: [{name: shared, mountPath: /shared}]
restartPolicy: Never
Finite version
This version stops the job if the queue is empty. Note the trick that the pop init container checks if the queue is empty and all the subsequent init containers and the main container immediately exit if it is indeed empty - this is the mechanism that signals Kubernetes that the Job is completed and there is no need to create new Pods for it.
apiVersion: batch/v1
kind: Job
metadata:
name: primitive-pipeline-finite
spec:
parallelism: 1
completions: null
template:
metadata:
name: primitive-pipeline-finite
spec:
volumes: [{name: shared, emptyDir: {}}]
initContainers:
- name: pop-from-queue-unsafe
image: redis
command: ["sh","-c","redis-cli -h redis lpop job >/shared/item.txt; grep -q . /shared/item.txt || :>/shared/done.txt"]
volumeMounts: [{name: shared, mountPath: /shared}]
- name: step-1
image: busybox
command: ["sh","-c","[ -f /shared/done.txt ] && exit 0; echo step-1 working on `cat /shared/item.txt` ...; sleep 5"]
volumeMounts: [{name: shared, mountPath: /shared}]
- name: step-2
image: busybox
command: ["sh","-c","[ -f /shared/done.txt ] && exit 0; echo step-2 working on `cat /shared/item.txt` ...; sleep 5"]
volumeMounts: [{name: shared, mountPath: /shared}]
- name: step-3
image: busybox
command: ["sh","-c","[ -f /shared/done.txt ] && exit 0; echo step-3 working on `cat /shared/item.txt` ...; sleep 5"]
volumeMounts: [{name: shared, mountPath: /shared}]
containers:
- name: done
image: busybox
command: ["sh","-c","[ -f /shared/done.txt ] && exit 0; echo all done with `cat /shared/item.txt`; sleep 1; exit 1"]
volumeMounts: [{name: shared, mountPath: /shared}]
restartPolicy: Never

The easiest way in this case is to use CronJob. CronJob runs Jobs according to a schedule. For more information go through documentation.
Here is an example (I took it from here and modified it)
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: sequential-jobs
spec:
schedule: "*/1 * * * *" #Here is the schedule in Linux-cron format
jobTemplate:
spec:
template:
metadata:
name: sequential-job
spec:
initContainers:
- name: job-1
image: busybox
command: ['sh', '-c', 'for i in 1 2 3; do echo "job-1 `date`" && sleep 5s; done;']
- name: job-2
image: busybox
command: ['sh', '-c', 'for i in 1 2 3; do echo "job-2 `date`" && sleep 5s; done;']
containers:
- name: job-done
image: busybox
command: ['sh', '-c', 'echo "job-1 and job-2 completed"']
restartPolicy: Never
his solution however has some limitations:
It cannot run more often than 1 minute
If you need to process your work-items one-by-one you need to create additional check for running jobs in InitContainer
CronJobs are available only in Kubernetes 1.8 and higher

Related

why busybox container is in completed state instead of running state

I am using busybox container to understand kubernetes concepts.
but if run a simple test-pod.yaml with busy box image, it is in completed state instead of running state
can anyone explain the reason
apiVersion: v1
kind: Pod
metadata:
name: dapi-test-pod
spec:
containers:
- name: test-container
image: k8s.gcr.io/busybox
command: [ "/bin/sh", "-c", "ls /etc/config/" ]
volumeMounts:
- name: config-volume
mountPath: /etc/config
volumes:
- name: config-volume
configMap:
# Provide the name of the ConfigMap containing the files you want
# to add to the container
name: special-config
restartPolicy: Never
Look, you should understand the basic concept here. Your docker container will be running till its main process is live. And it will be completed as soon as your main process will stop.
Going step-by-step in your case:
You launch busybox container with main process "/bin/sh", "-c", "ls /etc/config/" and obviously this process has the end. Once command is completed and you listed directory - your process throw Exit:0 status, container stop to work and you see completed pod as a result.
If you want to run container longer, you should explicitly run some command inside the main process, that will keep your container running as long as you need.
Possible solutions
#Daniel's answer - container will execute ls /etc/config/ and will stay alive additional 3600 sec
use sleep infinity option. Please be aware that there was a long time ago issue, when this option hasn't worked properly exactly with busybox. That was fixed in 2019, more information here. Actually this is not INFINITY loop, however it should be enough for any testing purpose. You can find huge explanation in Bash: infinite sleep (infinite blocking) thread
Example:
apiVersion: v1
kind: Pod
metadata:
name: busybox-infinity
spec:
containers:
-
command:
- /bin/sh
- "-c"
- "ls /etc/config/"
- "sleep infinity"
image: busybox
name: busybox-infinity
you can use different varioations of while loops, tailing and etc etc. That only rely on your imagination and needs.
Examples:
["sh", "-c", "tail -f /dev/null"]
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
That is because busybox runs the command and exits. You can solve it by updating your command in the containers section with the following command:
[ "/bin/sh", "-c", "ls /etc/config/", "sleep 3600"]

Kubectl wait command for init containers

I am looking for a Kubectl wait command for init containers. Basically I need to wait until pod initialization of init container before proceeding for next step in my script.
I can see a wait option for pods but not specific to init containers.
Any clue how to achieve this
Please suggest any alternative ways to wait in script
You can run multiple commands in the init container or multiple init containers to do the trick.
Multiple commands
command: ["/bin/sh","-c"]
args: ["command one; command two && command three"]
Refer: https://stackoverflow.com/a/33888424/3514300
* Multiple init containers
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: busybox:1.28
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
initContainers:
- name: init-myservice
image: busybox:1.28
command: ['sh', '-c', "until nslookup myservice.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
- name: init-mydb
image: busybox:1.28
command: ['sh', '-c', "until nslookup mydb.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for mydb; sleep 2; done"]
Refer: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#init-containers-in-use

How to make the successfully running pods delete themselves within the time set by the user

I am working on a new scheduler's stress test in Kubernetes. I need to open a lot of CPU and memory pods to analyze performance.
I am using image: polinux/stress in my pods.
I would like to ask if there is any instruction, or when I write the yaml file, I can set this successfully generated pod to delete itself within the time set by me.
The following yaml file is the pod I am writing for stress testing. I would like to ask if I can write it from here to let him delete it after a period of time.
apiVersion: v1
kind: Pod
metadata:
name: alltest12
namespace: test
spec:
containers:
- name: alltest
image: polinux/stress
resources:
requests:
memory: "1000Mi"
cpu: "1"
limits:
memory: "1000Mi"
cpu: "1"
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "500M", "--vm-hang", "1"]
If polinux/stress contains a shell, I believe you can have the thing kill itself:
- containers:
image: polinux/stress
command:
- sh
- -c
- |
sh -c "sleep 300; kill -9 1" &
stress --vm 1 --vm-bytes 500M --vm-hang 1
Or even slightly opposite:
- |
stress --vm etc etc &
child_pid=$!
sleep 300
kill -9 $child_pid
And you can parameterize that setup using env::
env:
- name: LIVE_SECONDS
value: "300"
command:
- sh
- -c
- |
sleep ${LIVE_SECONDS}
kill -9 $child_pid

Combining multiple Local-SSD on a node in Kubernetes (GKE)

The data required by my container is too large to fit on one local SSD. I also need to access the SSD's as one filesystem from my container. So I would need to attach multiple ones. How do I combine them (single partition, RAID0, etc) and make them accessible as one volume mount in my container?
This link shares how to mount an SSD https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/local-ssd to a mount path. I am not sure how you would merge multiple.
edit
The question asks how one would "combine" multiple SSD devices, individually mounted, on a single node in GKE.
WARNING
this is experimental and not intended for production use without
knowing what you are doing and only tested on gke version 1.16.x.
The approach includes a daemonset using a configmap to use nsenter (with wait tricks) for host namespace and privileged access so you can manage the devices. Specifically for GKE Local SSDs, we can unmount those devices and then raid0 them. InitContainer for the dirty work as this type of task seems most apparent for something you'd need to mark complete, and to then kill privileged container access (or even the Pod). Here is how it is done.
The example assumes 16 SSDs, however, you'll want to adjust the hardcoded values as necessary. Also, ensure your OS image reqs, I use Ubuntu. Also make sure the version of GKE you use starts local-ssd's at sd[b]
ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: local-ssds-setup
namespace: search
data:
setup.sh: |
#!/bin/bash
# returns exit codes: 0 = found, 1 = not found
isMounted() { findmnt -rno SOURCE,TARGET "$1" >/dev/null;} #path or device
# existing disks & mounts
SSDS=(/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq)
# install mdadm utility
apt-get -y update && apt-get -y install mdadm --no-install-recommends
apt-get autoremove
# OPTIONAL: determine what to do with existing, I wipe it here
if [ -b "/dev/md0" ]
then
echo "raid array already created"
if isMounted "/dev/md0"; then
echo "already mounted - unmounting"
umount /dev/md0 &> /dev/null || echo "soft error - assumed device was mounted"
fi
mdadm --stop /dev/md0
mdadm --zero-superblock "${SSDS[#]}"
fi
# unmount disks from host filesystem
for i in {0..15}
do
umount "${SSDS[i]}" &> /dev/null || echo "${SSDS[i]} already unmounted"
done
if isMounted "/dev/sdb";
then
echo ""
echo "unmount failure - prevent raid0" 1>&2
exit 1
fi
# raid0 array
yes | mdadm --create /dev/md0 --force --level=0 --raid-devices=16 "${SSDS[#]}"
echo "raid array created"
# format
mkfs.ext4 -F /dev/md0
# mount, change /mnt/ssd-array to whatever
mkdir -p /mnt/ssd-array
mount /dev/md0 /mnt/ssd-array
chmod a+w /mnt/ssd-array
wait.sh: |
#!/bin/bash
while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do sleep 1; done
DeamonSet pod spec
spec:
hostPID: true
nodeSelector:
cloud.google.com/gke-local-ssd: "true"
volumes:
- name: setup-script
configMap:
name: local-ssds-setup
- name: host-mount
hostPath:
path: /tmp/setup
initContainers:
- name: local-ssds-init
image: marketplace.gcr.io/google/ubuntu1804
securityContext:
privileged: true
volumeMounts:
- name: setup-script
mountPath: /tmp
- name: host-mount
mountPath: /host
command:
- /bin/bash
- -c
- |
set -e
set -x
# Copy setup script to the host
cp /tmp/setup.sh /host
# Copy wait script to the host
cp /tmp/wait.sh /host
# Wait for updates to complete
/usr/bin/nsenter -m/proc/1/ns/mnt -- chmod u+x /tmp/setup/wait.sh
# Give execute priv to script
/usr/bin/nsenter -m/proc/1/ns/mnt -- chmod u+x /tmp/setup/setup.sh
# Wait for Node updates to complete
/usr/bin/nsenter -m/proc/1/ns/mnt /tmp/setup/wait.sh
# If the /tmp folder is mounted on the host then it can run the script
/usr/bin/nsenter -m/proc/1/ns/mnt /tmp/setup/setup.sh
containers:
- image: "gcr.io/google-containers/pause:2.0"
name: pause
For high performance use cases, use the Ephemeral storage on local SSDs GKE feature. All local SSDs will be configures as a (striped) raid0 array and mounted into the pod.
Quick summary:
Create the node pool or cluster with the option: --ephemeral-storage local-ssd-count=X
Schedule to nodes with cloud.google.com/gke-ephemeral-storage-local-ssd.
Add an emptyDir volume.
Mount it with volumeMounts.
Here's how I used it with a DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: myapp
labels:
app: myapp
spec:
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
nodeSelector:
cloud.google.com/gke-ephemeral-storage-local-ssd: "true"
volumes:
- name: localssd
emptyDir: {}
containers:
- name: myapp
image: <IMAGE>
volumeMounts:
- mountPath: /scratch
name: localssd
You can use DaemonSet yaml file to deploy the pod will run on startup, assuming already created a cluster with 2 local-SSD (this pod will be in charge of creating the Raid0 disk)
kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
name: ssd-startup-script
labels:
app: ssd-startup-script
spec:
template:
metadata:
labels:
app: ssd-startup-script
spec:
hostPID: true
containers:
- name: ssd-startup-script
image: gcr.io/google-containers/startup-script:v1
imagePullPolicy: Always
securityContext:
privileged: true
env:
- name: STARTUP_SCRIPT
value: |
#!/bin/bash
sudo curl -s https://get.docker.com/ | sh
echo Done
The pod that will have access to the disk array in the above example is “/mnt/disks/ssd-array”
apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- name: test-container
image: ubuntu
volumeMounts:
- mountPath: /mnt/disks/ssd-array
name: ssd-array
args:
- sleep
- "1000"
nodeSelector:
cloud.google.com/gke-local-ssd: "true"
tolerations:
- key: "local-ssd"
operator: "Exists"
effect: "NoSchedule"
volumes:
- name: ssd-array
hostPath:
path: /mnt/disks/ssd-array
After deploying the test-pod, SSH to the pod from your cloud-shell or any instance.
Then run :
kubectl exec -it test-pod -- /bin/bash
After that you should be able to see the created file in the ssd-array disk.
cat test-file.txt

PreStop hook in kubernetes never gets executed

I am trying to create a little Pod example with two containers that share data via an emptyDir volume. In the first container I am waiting a couple of seconds before it gets destroyed.
In the postStart I am writing a file to the shared volume with the name "started", in the preStop I am writing a file to the shared volume with the name "finished".
In the second container I am looping for a couple of seconds outputting the content of the shared volume but the "finished" file never gets created. Describing the pod doesn't show an error with the hooks either.
Maybe someone has an idea what I am doing wrong
apiVersion: v1
kind: Pod
metadata:
name: shared-data-example
labels:
app: shared-data-example
spec:
volumes:
- name: shared-data
emptyDir: {}
containers:
- name: first-container
image: ubuntu
command: ["/bin/bash"]
args: ["-c", "for i in {1..4}; do echo Welcome $i;sleep 1;done"]
imagePullPolicy: Never
env:
- name: TERM
value: xterm
volumeMounts:
- name: shared-data
mountPath: /myshareddata
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "echo First container finished > /myshareddata/finished"]
postStart:
exec:
command: ["/bin/sh", "-c", "echo First container started > /myshareddata/started"]
- name: second-container
image: ubuntu
command: ["/bin/bash"]
args: ["-c", "for i in {1..20}; do ls /myshareddata;sleep 1;done"]
imagePullPolicy: Never
env:
- name: TERM
value: xterm
volumeMounts:
- name: shared-data
mountPath: /myshareddata
restartPolicy: Never
It is happening because the final status of your pod is Completed and applications inside containers stopped without any external calls.
Kubernetes runs preStop hook only if pod resolves an external signal to stop. Hooks were made to implement a graceful custom shutdown for applications inside a pod when you need to stop it. In your case, your application is already gracefully stopped by itself, so Kubernetes has no reason to call the hook.
If you want to check how a hook works, you can try to create Deployment and update it's image by kubectl rolling-update, for example. In that case, Kubernetes will stop the old version of the application, and preStop hook will be called.