Helm kubernetes. Is it possible to inject what replica number the specific replica is? - kubernetes

So I have a spring boot app which runs with two replicas. I want to be able to inject whether the app is replica 1 or 2. I want to this as i want my application to run a proccess on startup, however I only want one of the replicas to run the start up proccess
My test.values.yaml.template
spring-boot:
application:
spring:
datasource:
url: url
username: username
password: password
profiles:
active: prod, agent

In general if for any reason you need to make your application replicas distinguishable from each other, then you should use StatefulSet instead of Deployment. Then you can inject the POD name into your container as env variable and use it in your application.

TL;DR:
No, you cannot inject job operations directly on a ReplicaSet.
It's not a helm issue, it's a core Kubernetes concept:
From ReplicaSet’s Documentation:
A ReplicaSet purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods.
This actually means that you may never need to manipulate ReplicaSet objects: use a Deployment instead, and define your application in the spec section.
The purpose of a ReplicaSet is to replicate the pods (usually described in a deployment) and ensure the desired number of replicas is always available.
I want to be able to inject whether the app is replica 1 or 2. I want to this as i want my application to run a proccess on startup, however I only want one of the replicas to run the start up proccess
Pods are separate hosts, It's not like two instances of a app running inside the same computer, hence if you need a startup job to make them work, this job needs to be run in each one of them.
A Pod represents a unit of deployment: a single instance of an application in Kubernetes, which might consist of either a single container or a small number of containers that are tightly coupled and that share resources.
For that you can use a InitContainer:
Init containers are exactly like regular containers, except:
Init containers always run to completion.
Each init container must complete successfully before the next one starts
I'll leave you some examples of how to use InitContainers:
Kubernetes.io InitContainer Examples
A Spring-boot Use Case with Kubernetes
Kubernetes.io Configure a Pod Initialization
The InitContainer Pattern
If you have any question let me know in the comments.

If you have any startup process, one of the best option is to make use of init container. Please see more details here

Related

Run different replica count for different containers within same pod

I have a pod with 2 closely related services running as containers. I am running as a StatefulSet and have set replicas as 5. So 5 pods are created with each pod having both the containers.
Now My requirement is to have the second container run only in 1 pod. I don't want it to run in 5 pods. But my first service should still run in 5 pods.
Is there a way to define this in the deployment yaml file for Kubernetes? Please help.
a "pod" is the smallest entity that is managed by kubernetes, and one pod can contain multiple containers, but you can only specify one pod per deployment/statefulset, so there is no way to accomplish what you are asking for with only one deployment/statefulset.
however, if you want to be able to scale them independently of each other, you can create two deployments/statefulsets to accomplish this. this is imo the only way to do so.
see https://kubernetes.io/docs/concepts/workloads/pods/ for more information.
Containers are like processes,
Pods are like VMs,
and Statefulsets/Deployments are like the supervisor program controlling the VM's horizontal scaling.
The only way for your scenario is to define the second container in a new deployment's pod template, and set its replicas to 1, while keeping the old statefulset with 5 replicas.
Here are some definitions from documentations (links in the references):
Containers are technologies that allow you to package and isolate applications with their entire runtime environment—all of the files necessary to run. This makes it easy to move the contained application between environments (dev, test, production, etc.) while retaining full functionality. [1]
Pods are the smallest, most basic deployable objects in Kubernetes. A Pod represents a single instance of a running process in your cluster. Pods contain one or more containers. When a Pod runs multiple containers, the containers are managed as a single entity and share the Pod's resources. [2]
A deployment provides declarative updates for Pods and ReplicaSets. [3]
StatefulSet is the workload API object used to manage stateful applications. Manages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of these Pods. [4]
Based on all that information - this is impossible to match your requirements using one deployment/Statefulset.
I advise you to try the idea #David Maze mentioned in a comment under your question:
If it's possible to have 4 of the main application container not having a matching same-pod support container, then they're not so "closely related" they need to run in the same pod. Run the second container in a separate Deployment/StatefulSet (also with a separate Service) and you can independently control the replica counts.
References:
Documentation about Containers
Documentation about Pods
Documentation about Deployments
Documentation about StatefulSet

Is there a design pattern to periodically update a database in kubernetes without inconsistency?

I have a simple Node.js API service in a Kubernetes cluster and it's connected to a MongoDB. This app will erase all the data from the DB and fill it up with new data every ten minutes.
In my current setup I have a NodePort service for the API and a ClusterIP service for the DB. I think everything works well as long as there is a service. However, I am afraid that if the number of Node.js pods is not one, but say 10, it will delete and upload the database at 10 different times.
Is there any way to ensure that no matter how many Node.js pods there are only once this database is deleted and uploaded every 10 minutes?
I see two ways but both require some code changes:
Define an environment variable to enable the deletion job and split your deployment in two: one deployment of a single replica with the deletion enabled and one deployment with all the other replicas but with the deletion disabled.
Use a statefulset and run the deletion only on the first pod. You can do this by checking the pod name which will always be the same on each pod, for example "myapp-0" for the first pod.
Both case solve your problem but are not that elegant. Something more in line with kubernetes design would be to remove the "deletion every 10 minutes" from your code and place the deletion code in a CLI command. Then create a kubernete CronJob that will run this command every 10 minutes. This way you keep a single, "clean" deployment, and you get all the visibility, features and guarantees from kubernetes cronjobs.

How can I maintain a set of unique number crunching containers in kubernetes?

I want to run a "set" of containers in kubernetes, each which only differs in the docker environment variables (each one searches it's own dataset, which is located on network storage, then cached into the container's ram). For example:
container 1 -> Dataset 1
container 2 -> Dataset 2
Over time, I'll want to add (and sometimes remove) containers from this "set", but don't want to restart ALL of the containers when doing so.
From my (naive) knowledge of kubernetes, the only way I can see to do this is:
Each container could be its own deployment -- However there are thousands of containers, so would be a pain to modify and manage.
So my questions are:
Can I use a StatefulSet to manage this?
1.1. When a StatefulSet is "updated", must it restart all pods, even if their "spec" is unchanged?
1.2 Do StatefulSets allow for each unique container/pod to have its own environment variable(s)?
Is there any kubernetes concept to "group" deployments into some logical unit?
Any other thoughts about how to implement this in kubernetes?
Would docker swarm (or another container management platform) be better suited to my use case?
According to your description, the StatefulSet it's what you need.
1.1. When a StatefulSet is "updated", must it restart all pods, even if their "spec" is unchanged?
You can choose a proper update strategy. I suggest RollingUpdate but you can try whatever suits you.
Also check out this tutorial.
1.2 Do StatefulSets allow for each unique container/pod to have its own environment variable(s)?
Yes, because their naming is consistent (name-0, name-1, name-2, etc). You can use hostname (pod name) index with that.
Please let me know if that helped.
If you expect your containers to eventually be done with their workload and terminate (as opposed to processing a single item loaded in RAM forever), you should use a job queue such as Celery on top of Kubernetes to manage the execution. In this case Celery will do all the orchestration, including restarting jobs if they fail. This is much more manageable than using Kubernetes directly.
Kubernetes even provides an official example of such a setup.

Set environment variable for a single pod in a cluster

I have n instances of my micro-service running as kubernetes pods but as there's some scheduling logic in the application code, I would like only one of these pods to execute the code.
In Spring applications, a common approach is to activate scheduled profile -Dspring.profiles.active=scheduled for only instance & leave it deactivated for the remaining instances. I'd like to know how one can accomplish this in kubernetes.
Note: I am familiar with the approach where a kubernetes cron job can invoke an end point so that only one instances picked by load balancer executes the scheduled code. However, I would like to know if it's possible to configure kubernetes specification in such a way that only one pod has an environment variable set.
You can create deployment with 1 replica with the required environment variable and another deployment with as many replicas you want without that variable. You may also set the same labels on both deployments so that Service can load balance traffic between pods from both deployments if you need it.

Kubernetes job that consists of two pods (that must run on different nodes and communicate with each other)

I am trying to create a Kubernetes job that consists of two pods that have to be scheduled on separate nodes in our Hybrid cluster. Our requirement is that one of the pods runs on a Windows Server node and the other pod is running on a Linux node (thus we cannot just run two Docker containers from the same pod, which I know is possible, but would not work in our scenario). The Linux pod (which you can imagine as a client) will communicate over the network with the Windows pod (which you can imagine as a stateful server) exchanging data while the job runs. When the Linux pod terminates, we want to also terminate the Windows pod. However, if one of the pods fail, then we want to fail both pods (as they are designed to be a single job)
Our current design is to write a K8S service that handles the communication between the pods, and then apply the service and the two pods to the cluster to "emulate" a job. However, this is not ideal since the two pods are not tightly coupled as a single job and adds quite a bit of overhead to manually manage this setup (e.g. when failures or the job, we probably need to manually kill the service and deployment of the Windows pod). Plus we would need to deploy a new service for each "job", as we require the Linux pod to always communicate with the same Windows pod for the duration of the job due to underlying state (thus cannot use a single service for all Windows pods).
Any thoughts on how this could be best achieved on Kubernetes would be much appreciated! Hopefully this scenario is supported natively, and I would not need to resort in this kind of pod-service-pod setup that I described above.
Many thanks
I am trying to distinguish your distaste for creating and wiring the Pods from your distaste at having to do so manually. Because, in theory, a Job that creates Pods is very similar to what you are describing, and would be able to have almost infinite customization for those kinds of rules. With a custom controller like that, one need not create a Service for the client(s) to speak to their server, as the Job could create the server Pod first, obtain its Pod-specific-IP, and feed that to the subsequently created client Pods.
I would expect one could create a Job controller using only bash and either curl or kubectl: generate the json or yaml that describes the situation you wish to have, feed it to the kubernetes API (since the Job would have a service account - just like any other in-cluster container), and use normal traps to cleanup after itself. Without more of the specific edge cases loaded in my head it's hard to say if that's a good idea or not, but I believe it's possible.