how kubernetes deal with file write locker accross multi pods when hostpath Volumes concerned - kubernetes

I got app that logs to file my_log/1.log, and then I use filebeat to collect the logs from the file
Now I use k8s to deploy it into some nodes, and use hostpath type Volumes to mounts my_log file to the local file syetem, /home/my_log, suddenly I found a subtle situation:
what will it happened if more than one pod deployed on this machine, and then they try to write the log at the same time?
I know that in normal situation, multi-processing try to write to a file at the same time, the system will lock the file,so these processes can write one by one, BUT I am not sure will k8s diffirent pods will not share the same lock space, if so, it will be a disaster.
I try to test this and it seems diffirent pods will still share the file lock,the log file seems normal

how kubernetes deal with file write locker accross multi pods when hostpath Volumes concerned
It doesn't.
Operating System and File System are handling that.
As an example let's take syslog. It handles it by opening a socket, setting the socket to server mode, opening a log file in write mode, being notified of packages, parsing the message and finally writing it to the file.
Logs can also be cached, and the process can be limited to 1 thread, so you should not have many pods writing to one file. This could lead to issues like missing logs or lines being cut.
Your application should handle the file locking to push logs, also if you want to have many pods writing logs, you should have a separate log file for each pod.

Related

How to start a POD in Kubernetes when another blocks an important resource?

I'm getting stuck in the configuration of a deployment. The problem is the following.
The application in the deployment is using a database which is stored in a file. While this database is open, it's locked (there's no way for read/write access for many).
If I delete the running POD the new one can't get in ready state, because the database is still locked. I read about preStop-Hook and tried to use it without success.
I could delete the lock file, which seems to be pretty harsh. What's the right way to solve this in Kubernetes?
This really isn't different than running this process outside of Kubernetes. When the pod is killed, it will be given a chance to shutdown cleanly. So the lock should be cleaned up. If the lock isn't cleaned up, there's not a lot of ways you can determined if the lock remains because an unclean shutdown was made, or a node is unhealthy, or if there is a network partition. So deleting the lock at pod startup does seem to be unwise.
I think the first step for you should be trying to determine why this lock file isn't getting cleaned up correctly. (Rather than trying to address the symptom.)

Proper way for pods to read input files from the same persistent volume?

I'm new to Kubernetes and plan to use Google Kubernetes Engine. Hypothetically speaking, let's say I have a K8s cluster with 2 worker nodes. Each node would have its own pod housing the same application. This application will grab a file from some persistent volume and generate an output file that will be pushed back into a different persistent volume. Both pods in my cluster would be doing this continuously until there are no input files in the persistent volume left to be processed. Do the pods inherently know NOT to grab the same file that one pod is already using? If not, how would I be able account for this? I would like to avoid 2 pods using the same input file.
Do the pods inherently know NOT to grab the same file that one pod is already using?
Pods are just processes. Two separate processes accessing files from a shared directory are going to run into conflicts unless they have some sort of coordination mechanism.
Option 1
Have one process whose job it is to enumerate the available files. Your two workers connect to this process and receive filenames via some sort of queue/message bus/etc. When they finish processing a file, they request the next one, until all files are processed. Because only a single process is enumerating the files and passing out the work, there's no option for conflict.
Option 2
In general, renaming files is an atomic operation. Each worker creates a subdirectory within your PV. To claim a file, it renames the file into the appropriate subdirectory and then processes it. Because renames are atomic, even if both workers happen to pick the same file at the same time, only one will succeed.
Option 3
If your files have some sort of systematic naming convention, you can divide the list of files between your two workers (e.g., "everything that ends in an even number is processed by worker 1, and everything that ends with an odd number is processed by worker 2").
Etc. There are many ways to coordinate this sort of activity. The wikipedia entry on Synchronization may be of interest.

Spring Batch Restartability on Kubernetes for File Operations

I want to learn what is the proper way to reach the processed files when restarting the spring batch application on Kubernetes. Especially if the target type is file, it is being deleted together with the pod after the job failed.
We are considering to use persistent volume or backing up the created file somewhere such as DB or sftp server by implementing a listener.
Is there anyone have the experience of persistent volume usage(nfs or other solutions) for file operations. We are concerned about the performance and unexpected problems. Do you have any suggestions?
Thank you.
You should not rely on the ephemeral file system of a Pod for files that should persist and survive a Job (Pod) failure.
You need to use a persistent volume for that, so that Spring Batch can find the (incomplete) output file in a restart scenario and resume writing where it left off.
If you want data persistence, you may begin by using hostPath volumes first. This will restrict which nodes your pods may be spawned on. But is the simplest and gives you the best performance.
https://kubernetes.io/docs/concepts/storage/volumes/#hostpath
If you want dynamic allocation, you will need to configure storage solutions such as GlusterFS, NFS, CEPH etc.

Copying directories into minikube and persisting them

I am trying to copy some directories into the minikube VM to be used by some of the pods that are running. These include API credential files and template files used at run time by the application. I have found you can copy files using scp into the /home/docker/ directory, however these files are not persisted over reboots of the VM. I have read files/directories are persisted if stored in the /data/ directory on the VM (among others) however I get permission denied when trying to copy files to these directories.
Are there:
A: Any directories in minikube that will persist data that aren't protected in this way
B: Any other ways of doing the above without running into this issue (could well be going about this the wrong way)
To clarify, I have already been able to mount the files from /home/docker/ into the pods using volumes, so it's just the persisting data I'm unclear about.
Kubernetes has dedicated object types for these sorts of things. API credential files you might store in a Secret, and template files (if they aren't already built into your Docker image) could go into a ConfigMap. Both of them can either get translated to environment variables or mounted as artificial volumes in running containers.
In my experience, trying to store data directly on a node isn't a good practice. It's common enough to have multiple nodes, to not directly have login access to those nodes, and for them to be created and destroyed outside of your direct control (imagine an autoscaler running on a cloud provider that creates a new node when all of the existing nodes are 90% scheduled). There's a good chance your data won't (or can't) be on the host where you expect it.
This does lead to a proliferation of Kubernetes objects and associated resources, and you might find a Helm chart to be a good resource to tie them together. You can check the chart into source control along with your application, and deploy the whole thing in one shot. While it has a couple of useful features beyond just packaging resources together (a deploy-time configuration system, a templating language for the Kubernetes YAML itself) you can ignore these if you don't need them and just write a bunch of YAML files and a small control file.
For minikube, data kept in $HOME/.minikube/files directory is copied to / directory in VM host by minikube.

How to copy directory out of Kubernetes job right when the job completes?

The kubectl cp command only seems to work when the pod is still running.
Is there a way to copy a directory of output files from a completed pod to my local machine?
By definition not a completed Pod, as those are ephemeral, however the answer to your question is to change the definition of what "completed" means.
The most straightforward answer to your question is to either mount a network Volume into the Pod, so its files survive termination, or to have the Pod copy its own files out to some extra-cluster location (maybe s3, or an FTP site).
But I suspect you don't mean under those those circumstances, or you would have already done so.
One other example might be to have the Pod wait some defined timeout period for the appearance of a sentinel file so you can inform to the Pod you have successfully copied the files out and that it is now free to terminate as expected. Or if it's more convenient, have the Pod listen on a socket and stream a tar (or zip) to a connection, enabling a more traditional request-response lifecycle, and at the end of the response then the Pod shuts down.
Implied in all those work-around steps is that you are notified of the "almost done"-ness of the Pod through another means. Without more information about your setup, it's hard to go into that, and might not even be necessary. So feel free to add clarifications as necessary.