Spring Batch Restartability on Kubernetes for File Operations - kubernetes

I want to learn what is the proper way to reach the processed files when restarting the spring batch application on Kubernetes. Especially if the target type is file, it is being deleted together with the pod after the job failed.
We are considering to use persistent volume or backing up the created file somewhere such as DB or sftp server by implementing a listener.
Is there anyone have the experience of persistent volume usage(nfs or other solutions) for file operations. We are concerned about the performance and unexpected problems. Do you have any suggestions?
Thank you.

You should not rely on the ephemeral file system of a Pod for files that should persist and survive a Job (Pod) failure.
You need to use a persistent volume for that, so that Spring Batch can find the (incomplete) output file in a restart scenario and resume writing where it left off.

If you want data persistence, you may begin by using hostPath volumes first. This will restrict which nodes your pods may be spawned on. But is the simplest and gives you the best performance.
https://kubernetes.io/docs/concepts/storage/volumes/#hostpath
If you want dynamic allocation, you will need to configure storage solutions such as GlusterFS, NFS, CEPH etc.

Related

Microservice Application ... Docker Volume for Databases or no Docker Volume?

I have an application (JHipster Gateway, UAA, Registry, 5 microservices) and each application source builds a Docker image and pushes to GitLab registry. Currently I'm running everything on Rancher using a Docker-Compose file. My volumes for Mongo databases are currently in each container.
I need advice about volume mounts. Here are my options as I see them.
Leave data in containers and monitor and backup
Use external mounts and monitor volumes on host.
If I leave Mongo data in the containers, do I just set up to just cluster and when the internal volumes fill, the database just scales? I am looking for some explanation to help my choice with Mongo database mounts, internal or external (on host)?
Thanks in advance,
David L. Whitehurst
Never store any data you care about directly in containers. There are good arguments in favor of both named volumes (native to Docker, some support in a multi-host Swarm environment, fewer host-specific dependencies) and host bind mounts (much easier to back up and maintain, possible to examine directly if needed) but use some sort of mounted storage.
The most important note here is that it's fairly routine to delete and recreate containers. If the software you're running or its underlying library stack has a security issue, you generally need to get (or build) an updated image, delete your existing container, and rebuild it against the new image. If data is stored only inside a container, then during this very routine delete-and-recreate operation, there's significant risk of losing data.
In principle, if you're really careful, and you have a replicated data store, you can roll this over without external volumes and not lose data. It's tricky, and takes a lot of patience; you'll be forced to take down one replica, wait for its data to be rebalanced across the other replicas, start up a new replica, wait for it to accept some of the data, and so on. If you can take a point release by stopping a container, deleting it, starting a new one with the same data store, and have it come up instantly with populated data, that's much easier to manage.
(The other corollary here is that you don't "back up containers", since they don't have any data you care about. You do back up the data stored on the host or in Docker named volumes, and you can always recreate the container from its image plus the external data.)

Kubernetes - Best way to initialize persistent volume for a database deployment

I have a Neo4j service, but before the deployment starts up, I need to pre-fill it with data (about 2GB of data). Currently, I wrote a Kubernetes Job to transform the data from a CSV and format it for the database using the neo4j-admin tool. It saves the formatted data to a persistent volume. After waiting for the job to complete, I mount the volume in the Neo4j container and the container is effectively read-only on this data for the rest of its life.
Is there a better way to do this more automatically?
I don't want to have to wait for the job to complete to run another command to create the Neo4j deployment. I looked into initContainers, but that isn't suitable because I don't want to redo the data filling when a pod is re-created. I just want subsequent pods to read from the same persistent volume. Is there a way to wait for the job to complete first?
As Jobs can't natively spawn new objects once finished (and if exited gracefully, using PreStop to invoke further actions won't work), you might want to monitor the API objects instead.
Programatically accessing the API to determine when the Job is finished and then, create your Deployment object might be a feasible, automated way to do it.
Doing it this way, you don't have to worry for redoing the data processing with initContainers as you can essentially call the deployment and remount the already existing volume.
Also, using the official Go library allows you to either run within the cluster, in a pod or externally.
I assume that your neo4j application data won't be updated from your neo4j deployment based on you said that the deployment loads the volume as read-only.
If that is the case why do you want kubernetes to do the data loading? Use object storage like s3 or azure data lake and ensure that there is some data workflow pipeline that will update the object storage. There are many tools that provides data pipeline features such as oozie, airflow.
In your deployment, then you can refer to the object storage via Persistent Volume Claim.

how kubernetes deal with file write locker accross multi pods when hostpath Volumes concerned

I got app that logs to file my_log/1.log, and then I use filebeat to collect the logs from the file
Now I use k8s to deploy it into some nodes, and use hostpath type Volumes to mounts my_log file to the local file syetem, /home/my_log, suddenly I found a subtle situation:
what will it happened if more than one pod deployed on this machine, and then they try to write the log at the same time?
I know that in normal situation, multi-processing try to write to a file at the same time, the system will lock the file,so these processes can write one by one, BUT I am not sure will k8s diffirent pods will not share the same lock space, if so, it will be a disaster.
I try to test this and it seems diffirent pods will still share the file lock,the log file seems normal
how kubernetes deal with file write locker accross multi pods when hostpath Volumes concerned
It doesn't.
Operating System and File System are handling that.
As an example let's take syslog. It handles it by opening a socket, setting the socket to server mode, opening a log file in write mode, being notified of packages, parsing the message and finally writing it to the file.
Logs can also be cached, and the process can be limited to 1 thread, so you should not have many pods writing to one file. This could lead to issues like missing logs or lines being cut.
Your application should handle the file locking to push logs, also if you want to have many pods writing logs, you should have a separate log file for each pod.

Kubernetes - application-consistent backup

Is it possible to perform a backup on Kubernetes in application-consistent manner?
Some of the backup solutions I found mainly base on freezing the pod and then launching the backup to maintain consistency (Heptio's Ark for example.)
The idea of application-consistent backups is to capture all data in memory and all transactions in process. This is performed by using some type of client software co-resident with the database application to quiesce the database application, flush its memory cache, complete all its writes in order and then perform the backup.
In its turn, Kubernetes operates with specifications of resources (e.g., Deployments, Services, etc.) and their statuses, and in any given time the resource status must be the same as defined in the specification. For storing any important data in Kubernetes, persistent volumes are used. In other words, you cannot perform a backup in an application-consistent manner on Kubernetes, because the main idea of it is different.
It is possible that a specific application for a specific database exists and allows implementing such type of backup. But it is related to that application but not to Kubernetes itself.

Persistent storage for Apache Mesos

Recently I've discovered such a thing as a Apache Mesos.
It all looks amazingly in all that demos and examples. I could easily imagine how one would run for stateless jobs - that fits to the whole idea naturally.
Bot how to deal with long running jobs that are stateful?
Say, I have a cluster that consists of N machines (and that is scheduled via Marathon). And I want to run a postgresql server there.
That's it - at first I don't even want it to be highly available, but just simply a single job (actually Dockerized) that hosts a postgresql server.
1- How would one organize it? Constraint a server to a particular cluster node? Use some distributed FS?
2- DRBD, MooseFS, GlusterFS, NFS, CephFS, which one of those play well with Mesos and services like postgres? (I'm thinking here on the possibility that Mesos/marathon could relocate the service if goes down)
3- Please tell if my approach is wrong in terms of philosophy (DFS for data servers and some kind of switchover for servers like postgres on the top of Mesos)
Question largely copied from Persistent storage for Apache Mesos, asked by zerkms on Programmers Stack Exchange.
Excellent question. Here are a few upcoming features in Mesos to improve support for stateful services, and corresponding current workarounds.
Persistent volumes (0.23): When launching a task, you can create a volume that exists outside of the task's sandbox and will persist on the node even after the task dies/completes. When the task exits, its resources -- including the persistent volume -- can be offered back to the framework, so that the framework can launch the same task again, launch a recovery task, or launch a new task that consumes the previous task's output as its input.
Current workaround: Persist your state in some known location outside the sandbox, and have your tasks try to recover it manually. Maybe persist it in a distributed filesystem/database, so that it can be accessed from any node.
Disk Isolation (0.22): Enforce disk quota limits on sandboxes as well as persistent volumes. This ensures that your storage-heavy framework won't be able to clog up the disk and prevent other tasks from running.
Current workaround: Monitor disk usage out of band, and run periodic cleanup jobs.
Dynamic Reservations (0.23): Upon launching a task, you can reserve the resources your task uses (including persistent volumes) to guarantee that they are offered back to you upon task exit, instead of going to whichever framework is furthest below its fair share.
Current workaround: Use the slave's --resources flag to statically reserve resources for your framework upon slave startup.
As for your specific use case and questions:
1a) How would one organize it? You could do this with Marathon, perhaps creating a separate Marathon instance for your stateful services, so that you can create static reservations for the 'stateful' role, such that only the stateful Marathon will be guaranteed those resources.
1b) Constraint a server to a particular cluster node? You can do this easily in Marathon, constraining an application to a specific hostname, or any node with a specific attribute value (e.g. NFS_Access=true). See Marathon Constraints. If you only wanted to run your tasks on a specific set of nodes, you would only need to create the static reservations on those nodes. And if you need discoverability of those nodes, you should check out Mesos-DNS and/or Marathon's HAProxy integration.
1c) Use some distributed FS? The data replication provided by many distributed filesystems would guarantee that your data can survive the failure of any single node. Persisting to a DFS would also provide more flexibility in where you can schedule your tasks, although at the cost of the difference in latency between network and local disk. Mesos has built-in support for fetching binaries from HDFS uris, and many customers use HDFS for passing executor binaries, config files, and input data to the slaves where their tasks will run.
2) DRBD, MooseFS, GlusterFS, NFS, CephFS? I've heard of customers using CephFS, HDFS, and MapRFS with Mesos. NFS would seem an easy fit too. It really doesn't matter to Mesos what you use as long as your task knows how to access it from whatever node where it's placed.
Hope that helps!