Copy files from completed Kubernetes batch jobs - kubernetes

I would like to copy the result from a batch job execution but kubectl cp pod:/path /path is not working, since the pod is in completed state.
I have the data stored in a PV.
How can I copy the results to my computer?
Looking forward to read your answers, many thanks

Related

Extracting files from a k8s pod

I am running a function in a k8s pod and the function creates a log file once it finishes. How can I copy the log file to the local machine?
I have searched similar problems and found this. But the kubectl cp functions need to be run from the master node. The problem I have is that, I don't know when the function finishes. And once it has finished the container terminates automatically.
I am looking forward to any idea on how to solve this problem.
You can just create a bash script that runs
scp file_from_container user#master_node_ip:/file

Kubernetes cluster running Cronjob triggering only one pod

I was trying to find a solution how to run a job handled by 2 pods in a cluster.
The job is ran by the cronjob scheduler, to run every (say) 15 mins. This job is to fetch records from the db table and process it. There is only READ permission provided to access the table records. I am trying to see, is there any way to configure in k8s, that only one pod run the job.
This way I want to prevent the duplicate processing.
The alternate is have a temporary lock file in the persistent storage and the application in the pod puts a lock to it and releases after processing.
If there is any out of box solution available with in k8s, please let me know.
This is implemented using a traditional resource lock mechanism. A lock file is created during the process and the pods do no run if there is any lock file exists.
This way only one pods will run the job any point of time.

how to avoid uploading file that is being partially written - Google storage rsync

I'm using google cloud storage with option rsync
I create a cronjob that sync file every minute.
But there're the problem
Right on a file is being partially written, the cronjob run, then it sync a part of file even though it wasn't done.
Is there the way to settle this problem?
The gsutil rsync command doesn't have any way to check that a file is still being written. You will need to coordinate your writing and rsync'ing jobs such that they operate on disjoint parts of the file tree. For example, you could arrange your writing job to write to directory A while your rsync job rsyncs from directory B, and then switch pointers so your writing job writes to directory B while your rsync job writes to directory A. Another option would be to set up a staging area into which you copy all the files that have been written before running your rsync job. If you put it on the same file system as where they were written you could use hard links so the link operation works quickly (without byte copying).

How to copy directory out of Kubernetes job right when the job completes?

The kubectl cp command only seems to work when the pod is still running.
Is there a way to copy a directory of output files from a completed pod to my local machine?
By definition not a completed Pod, as those are ephemeral, however the answer to your question is to change the definition of what "completed" means.
The most straightforward answer to your question is to either mount a network Volume into the Pod, so its files survive termination, or to have the Pod copy its own files out to some extra-cluster location (maybe s3, or an FTP site).
But I suspect you don't mean under those those circumstances, or you would have already done so.
One other example might be to have the Pod wait some defined timeout period for the appearance of a sentinel file so you can inform to the Pod you have successfully copied the files out and that it is now free to terminate as expected. Or if it's more convenient, have the Pod listen on a socket and stream a tar (or zip) to a connection, enabling a more traditional request-response lifecycle, and at the end of the response then the Pod shuts down.
Implied in all those work-around steps is that you are notified of the "almost done"-ness of the Pod through another means. Without more information about your setup, it's hard to go into that, and might not even be necessary. So feel free to add clarifications as necessary.

What is a use case for kubernetes job?

I'm looking to fully understand the jobs in kubernetes.
I have successfully create and executed a job, but I do not see the use case.
Not being able to rerun a job or not being able to actively listen to it completion makes me think it is a bit difficult to manage.
Anyone using them? Which is the use case?
Thank you.
A job retries pods until they complete, so that you can tolerate errors that cause pods to be deleted.
If you want to run a job repeatedly and periodically, you can use CronJob alpha or cronetes.
Some Helm Charts use Jobs to run install, setup, or test commands on clusters, as part of installing services. (Example).
If you save the YAML for the job then you can re-run it by deleting the old job an creating it again, or by editing the YAML to change the name (or use e.g. sed in a script).
You can watch a job's status with this command:
kubectl get jobs myjob -w
The -w option watches for changes. You are looking for the SUCCESSFUL column to show 1.
Here is a shell command loop to wait for job completion (e.g. in a script):
until kubectl get jobs myjob -o jsonpath='{.status.conditions[?(#.type=="Complete")].status}' | grep True ; do sleep 1 ; done
One of the use case can be to take a backup of a DB. But as already mentioned that are some overheads to run a job e.g. When a Job completes the Pods are not deleted . so you need to manually delete the job(which will also delete the pods created by job). so recommended option will be to use Cron instead of Jobs