How to read in file paths into a queue that is in a Kubernetes cluster? - kubernetes

I want to read file paths from a persistent volume and store these file paths into a persistent queue of sorts. This would probably be done with an application contained within a pod. This persistent volume will be updated constantly with new files. This means that I will need to constantly update the queue with new file paths. What if this application that is adding items to the queue crashes? Kubernetes would be able to reboot the application, but I do not want to add in file paths that are already in the queue. The app would need to know what exists in the queue before adding in files, at least I would think. I was leaning on RabbitMQ, but apparently you cannot search a queue for specific items with this tool. What can I do to account for this issue? I am running this cluster on Google Kubernetes Engine, so this would be on the Google Cloud Platform.

What if this application that is adding items to the queue crashes?
Kubernetes would be able to reboot the application, but I do not want
to add in file paths that are already in the queue. The app would need
to know what exists in the queue before adding in files
if you are looking for searching option also i would suggest using the Redis instead of Queue Running rabbitMQ on K8s i have pretty good experience when it's come to scaling and elasticity however there is HA helm chart of RabbitMQ you can use it.
i would Recomand checking out Redis and using it as backend to store the data, if you looking forward to create queue still you can use Bull : https://github.com/OptimalBits/bull
it uses the Redis as background to store the data and you can create the queue using this library.
As in Redis you will be taking continuous dump at every second or so...! there is less chances to miss data however in RabbitMQ you can keep persistent messaging plus it provide option for acknowledgment and all.
it's about the actual requirement that you want to implement. If your application wants to order in the list you can not use the Redis in that case RabbitMQ would be best.

Have you ever heard about KubeMQ? There is a KubeMQ community where you can refer to with the guides and help.
As an alternative solution you can find useful guide on official Kubernetes documentation on creating working queue with Redis

Related

Write Logfiles to Slow Disk or sending Tomcat Access Logs to ElasticSearch?

My service (tomcat/java) is running on a kubernetes cluster (AKS).
I would like to write the log files (tomcat access logs, application logs with logback) to an AzureFile volume.
I do not want to write the access logs to the stdout, because I do not want to mix the access logs with the application logs.
Question
I expect that all logging is done asynchronously, so that writing to the slow AzureFile volume should not affect the performance.
Is this correct?
Update
In the end I want to collect the logfiles so that I can send all logs to ElasticSearch.
Especially I need a way to collect the access logs.
If you want to send your access logs to Elastic Search, you just need to extend the AbstractAccessLogValve and implement the log method.
The AbstractAccessLogValve already contains the logic to format the messages, so you need just to add the logic to send the formatted message.
Yes, you are right but still here depends on how you are writing the logs. If asynchronously you are writing long process will take and your files system is slow. If it's NFS there is also the chance of network latency etc.
i have seen performance issues if attaching NFS & Bucket volume direct to multiple PODs.
If your writing is slow asyn thread might take time to complete job and take higher resources also however it still depends on code and way of written code.
Ideally, people use to store in Elasticsearch for fast retrieval easy management.
People use different stacks based on requirement but mostly all of them backed by elasticsearch for example Graylog, ELK.
For sending or writing logs to these stack people use the UDP I personally prefer GELF UDP and throws a logs at Graylog and forget.

Spring Batch Restartability on Kubernetes for File Operations

I want to learn what is the proper way to reach the processed files when restarting the spring batch application on Kubernetes. Especially if the target type is file, it is being deleted together with the pod after the job failed.
We are considering to use persistent volume or backing up the created file somewhere such as DB or sftp server by implementing a listener.
Is there anyone have the experience of persistent volume usage(nfs or other solutions) for file operations. We are concerned about the performance and unexpected problems. Do you have any suggestions?
Thank you.
You should not rely on the ephemeral file system of a Pod for files that should persist and survive a Job (Pod) failure.
You need to use a persistent volume for that, so that Spring Batch can find the (incomplete) output file in a restart scenario and resume writing where it left off.
If you want data persistence, you may begin by using hostPath volumes first. This will restrict which nodes your pods may be spawned on. But is the simplest and gives you the best performance.
https://kubernetes.io/docs/concepts/storage/volumes/#hostpath
If you want dynamic allocation, you will need to configure storage solutions such as GlusterFS, NFS, CEPH etc.

Apache flink on Kubernetes - Resume job if jobmanager crashes

I want to run a flink job on kubernetes, using a (persistent) state backend it seems like crashing taskmanagers are no issue as they can ask the jobmanager which checkpoint they need to recover from, if I understand correctly.
A crashing jobmanager seems to be a bit more difficult. On this flip-6 page I read zookeeper is needed to be able to know what checkpoint the jobmanager needs to use to recover and for leader election.
Seeing as kubernetes will restart the jobmanager whenever it crashes is there a way for the new jobmanager to resume the job without having to setup a zookeeper cluster?
The current solution we are looking at is: when kubernetes wants to kill the jobmanager (because it want to move it to another vm for example) and then create a savepoint, but this would only work for graceful shutdowns.
Edit:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-HA-with-Kubernetes-without-Zookeeper-td15033.html seems to be interesting but has no follow-up
Out of the box, Flink requires a ZooKeeper cluster to recover from JobManager crashes. However, I think you can have a lightweight implementation of the HighAvailabilityServices, CompletedCheckpointStore, CheckpointIDCounter and SubmittedJobGraphStore which can bring you quite far.
Given that you have only one JobManager running at all times (not entirely sure whether K8s can guarantee this) and that you have a persistent storage location, you could implement a CompletedCheckpointStore which retrieves the completed checkpoints from the persistent storage system (e.g. reading all stored checkpoint files). Additionally, you would have a file which contains the current checkpoint id counter for CheckpointIDCounter and all the submitted job graphs for the SubmittedJobGraphStore. So the basic idea is to store everything on a persistent volume which is accessible by the single JobManager.
I implemented a light version of file-based HA, based on Till's answer and Xeli's partial implementation.
You can find the code in this github repo - runs well in production.
Also wrote a blog series explaining how to run a job cluster on k8s in general and about this file-based HA implementation specifically.
For everyone interested in this, I currently evaluate and implement a similar solution using Kubernetes ConfigMaps and a blob store (e.g. S3) to persist job metadata overlasting JobManager restarts. No need to use local storage as the solution relies on state persisted to blob store.
Github thmshmm/flink-k8s-ha
Still some work to do (persist Checkpoint state) but the basic implementation works quite nice.
If someone likes to use multiple JobManagers, Kubernetes provides an interface to do leader elections which could be leveraged for this.

Can we consider using containers (and kubernetes) for monolith, stateful web applications?

I'm learning about Containers and Kubernetes and was evaluating if we can move our monolith, stateful appplication to kubernetes?
I was also looking at https://kubernetes.io/blog/2018/03/principles-of-container-app-design/ and "Self-Containment" looks close. We can consider using "storage".
Properties of my application:
1. Runs on a JVM
2. Does not have a database. Saves all its data/content to TAR files on the file-system
3. Should be able to backup and retain state if the container goes down.
In our current scenarios, we deploy the app to a VM and our IT teams generally take snapshots of these VM's as backups and restore them if the app fails or they have to restore to a point where the app was working good. I wanted to avoid this.
Please advice.
You call it as web application, but based on what it does it just a process which writes to file system.
If you move to k8s, write to NFS or persistent storage from pod. If you can only run one instance, then you can't use k8s horizontal scaling.

Persistent storage for Apache Mesos

Recently I've discovered such a thing as a Apache Mesos.
It all looks amazingly in all that demos and examples. I could easily imagine how one would run for stateless jobs - that fits to the whole idea naturally.
Bot how to deal with long running jobs that are stateful?
Say, I have a cluster that consists of N machines (and that is scheduled via Marathon). And I want to run a postgresql server there.
That's it - at first I don't even want it to be highly available, but just simply a single job (actually Dockerized) that hosts a postgresql server.
1- How would one organize it? Constraint a server to a particular cluster node? Use some distributed FS?
2- DRBD, MooseFS, GlusterFS, NFS, CephFS, which one of those play well with Mesos and services like postgres? (I'm thinking here on the possibility that Mesos/marathon could relocate the service if goes down)
3- Please tell if my approach is wrong in terms of philosophy (DFS for data servers and some kind of switchover for servers like postgres on the top of Mesos)
Question largely copied from Persistent storage for Apache Mesos, asked by zerkms on Programmers Stack Exchange.
Excellent question. Here are a few upcoming features in Mesos to improve support for stateful services, and corresponding current workarounds.
Persistent volumes (0.23): When launching a task, you can create a volume that exists outside of the task's sandbox and will persist on the node even after the task dies/completes. When the task exits, its resources -- including the persistent volume -- can be offered back to the framework, so that the framework can launch the same task again, launch a recovery task, or launch a new task that consumes the previous task's output as its input.
Current workaround: Persist your state in some known location outside the sandbox, and have your tasks try to recover it manually. Maybe persist it in a distributed filesystem/database, so that it can be accessed from any node.
Disk Isolation (0.22): Enforce disk quota limits on sandboxes as well as persistent volumes. This ensures that your storage-heavy framework won't be able to clog up the disk and prevent other tasks from running.
Current workaround: Monitor disk usage out of band, and run periodic cleanup jobs.
Dynamic Reservations (0.23): Upon launching a task, you can reserve the resources your task uses (including persistent volumes) to guarantee that they are offered back to you upon task exit, instead of going to whichever framework is furthest below its fair share.
Current workaround: Use the slave's --resources flag to statically reserve resources for your framework upon slave startup.
As for your specific use case and questions:
1a) How would one organize it? You could do this with Marathon, perhaps creating a separate Marathon instance for your stateful services, so that you can create static reservations for the 'stateful' role, such that only the stateful Marathon will be guaranteed those resources.
1b) Constraint a server to a particular cluster node? You can do this easily in Marathon, constraining an application to a specific hostname, or any node with a specific attribute value (e.g. NFS_Access=true). See Marathon Constraints. If you only wanted to run your tasks on a specific set of nodes, you would only need to create the static reservations on those nodes. And if you need discoverability of those nodes, you should check out Mesos-DNS and/or Marathon's HAProxy integration.
1c) Use some distributed FS? The data replication provided by many distributed filesystems would guarantee that your data can survive the failure of any single node. Persisting to a DFS would also provide more flexibility in where you can schedule your tasks, although at the cost of the difference in latency between network and local disk. Mesos has built-in support for fetching binaries from HDFS uris, and many customers use HDFS for passing executor binaries, config files, and input data to the slaves where their tasks will run.
2) DRBD, MooseFS, GlusterFS, NFS, CephFS? I've heard of customers using CephFS, HDFS, and MapRFS with Mesos. NFS would seem an easy fit too. It really doesn't matter to Mesos what you use as long as your task knows how to access it from whatever node where it's placed.
Hope that helps!