keep cache in sync with FastAPI deployed on K8 - kubernetes

We have a FastAPI application that we deploy on AWS using Kubernetes. It uses PostgreSQL for storage. In this application, we need to keep some values in memory, else some calculations take too much time. E.g. we keep an in-memory cache of 400k mappings between a string and some glossary object.
In our deployment, there might be multiple instances of our FastAPI application.
What are best practices to keep our caches in sync?

Related

Manage multiple KieContainers

I've around ~50k user specific drool files, using which we return the computations to a REST API request.
What are the possible ways to manage such number of different KieContainers so that I don't have to initialise them in every API call?
Currently I was using Guava Cache to cache KieContainer but the limitation with this approach is that it creates an in-memory cache on each pod. However, as the service is running on k8 pods, kie container cached by 1 of the pods is not available to others.

Do I gain anything by using "proper" replicas for a read-only MongoDB database?

I have a web-app that depends on a read-only MongoDB database. Through trial and error, I discovered that by far the fastest way to run the ETL pipeline that populates the database is to run a local copy of MongoDB, populate the database, stop the database, and tarball the state directory.
To deploy a high-availability "cluster," I create multiple instances (or containers) running the app, each with access to a copy of the state in locally mounted storage. Putting these behind a load balancer with regular health checks and autoscaling (or in a Kubernetes cluster as a ReplicaSet), I get isolation, redundancy, easy rollbacks (using versioned storage), and easy setup in virtually any environment.
The key idea here is that because the database is read-only, it is in a sense a "stateless" application. Thus, I can treat it like any other static provider of information
There are many apparent advantages to this setup. Nevertheless, I have always had a nagging feeling that I was missing something. Given a read-only context, is there still some reason why it might be better to run a "proper" MongoDB cluster?
If you don't mind outages when the single node goes down and you don't mind taking the system down during upgrades then this is probably an ok deployment. You might get a safer dump and restore using mongodump and mongorestore rather than tar but apart from that this setup should work for a read-only deployment.

Backup of ignite stateful set in Kubernetes

I’m trying to come up with a strategy to backup data in my apache ignite cache hosted as a stateful set in google cloud Kubernetes.
My ignite deployment uses ignite native persistence and runs a 3 node ignite cluster backed up by persistence volumes in Kubernetes.
I’m using a binaryConfiguration to store binary objects in cache.
I’m looking for a reliable way to back up my ignite data and be able to restore it.
So far I’ve tried backing up just the persistence files and then restoring them back.
It hasn’t worked reliably yet.
The issue I’m facing is that after restore, the cache data which isn’t binary objects is restored properly, e.g. strings or numbers. I’m able to access numeric or string data just fine. But binary objects are not accessible. It seems the binary objects are restored, but I’m unable to fetch them.
The weird part is that after the restore, once I add a new binary object to the cache all the restored data seems to be accessed normally.
Can anyone please suggest a reliable way to back up and restore ignite native persistence data?
You should either backup ${ignite.work.dir}/marshaller directory, or call ignite.binary().type(KeyOrValue.class) for every type you have in cache to prime binary marshaller.
Apache Ignite providers ACID transactions which are pretty reliable. The cache also uses its own mechanism for primary backups and copies and assuming you have its WAL enabled some stuff is kept in memory.
The most likely thing happening is that you do your restore and the moment you make an initial write memory starts populating allowing you to see what's on disk (cache). This is not really a supported restore mechanism (there isn't one in the docs) but it could work that way where after the restore you run a minor sample irrelevant write. I advise testing this thoroughly though.

Kubernetes - application-consistent backup

Is it possible to perform a backup on Kubernetes in application-consistent manner?
Some of the backup solutions I found mainly base on freezing the pod and then launching the backup to maintain consistency (Heptio's Ark for example.)
The idea of application-consistent backups is to capture all data in memory and all transactions in process. This is performed by using some type of client software co-resident with the database application to quiesce the database application, flush its memory cache, complete all its writes in order and then perform the backup.
In its turn, Kubernetes operates with specifications of resources (e.g., Deployments, Services, etc.) and their statuses, and in any given time the resource status must be the same as defined in the specification. For storing any important data in Kubernetes, persistent volumes are used. In other words, you cannot perform a backup in an application-consistent manner on Kubernetes, because the main idea of it is different.
It is possible that a specific application for a specific database exists and allows implementing such type of backup. But it is related to that application but not to Kubernetes itself.

In a containerized cluster, should mongodb servers be running on a worker or a core service?

I'm trying to implement an architecture that's similar to the coreos's production architecture (shown below)
Should I run the database as a central service or one or more of the workers?
I figured the database needs some kind of replication, which makes me think that putting it in the worker cluster makes more sense, but I'm just not sure.
This should be run as a worker. The central services are the basic things that come with CoreOS (mainly etcd). The workers host your applications, the database being one of them. You do have a persistence issue because your database will have state to remember between restarts. So, there is a bigger issue of how do you make that persistence? One was to do it is use a host file and give the database an affinity to that host and mount the host file. Another thing you might consider is running more than one database (if your db technology supports that) and replicate that database so you have two (or more) copies in different workers. (non-affinity). If your database creates transaction logs that can be applied to a backup, you can manage those transaction logs in a worker.
Another thing to consider is not using a container for your database. The database is a weird animal, its care and feeding is not like the rest of the applications. So it is reasonable (in my opinion) to have your database managed and maintained outside the scope of your cluster (but still reachable by the cluster).