How to check if cassandra is up and running using initContainers - kubernetes

In my kubernetes cluster, I have cassandra statefulset which is using cassandra:4.0 image. I want my application pods to wait for cassandra to be up & running first.
I suppose this can be done by adding an initContainers in application pod deployment. The initContainer will check the status of cassandra thereby blocking the application pod from starting till it assures the availability of cassandra.
I don't know how to create such initContainer for checking the status of cassandra statefulset, I've checked onthe web but didn't find any examples related to cassandra.
Any help would be highly appreciated.
Note: I'm not using the actual cassandra image and not this example one (gcr.io/google-samples/cassandra:v13)

Using a similar approach to the Cassandra Helm Chart readiness probe:
spec:
template:
spec:
initContainers:
- name: check-cassandra-ready
image: cassandra:3.11
command: ['sh', '-c',
'until nodetool -h <cassandra-host> status | grep -E "^UN\\s+${POD_IP}";
do echo waiting for cassandra; sleep 2; done;']
Your Cassandra version must be the same as the nodetool version.

You could Wait for cassandra Service to be created, using a shell one-line command like:
for i in {1..100}; do sleep 1; if dig cassandraservice; then exit 0; fi; done; exit 1
Ref: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/

Related

External dependency in Pod Readiness and Liveness

I am new for pod health check with Readiness and Liveness. Recently I am working on Readiness. The scenario is as following:
The pod is a RestAPI service, it needs to connect to Database and store information in DB. So if RestAPI service wants to offer service, it needs to make sure database connection is successfully.
Si in our pod Readiness logic implementation, we use HTTP-Get and check if DB connection is connected, if it is okay, then HTTP-Get returns Ok, otherwise Readiness will be failed.
Not sure if the above logic is reasonable? Or is there any other approach for this logic processing?
Apart from Readiness, how about Liveness? Do I need to check DB connection in order to check Liveness is okay?
Any idea and suggestion are appreciated
readiness and liveness is mostly for service you are running inside container, there could be a scenario where your DB is up but there is issue with the application at that time also your readiness will be Up as DB is running, in ideal scenario if application not working it should stop accepting traffic.
i would recommend using the Init container or Life cycle hook for checking the condition of the Database first if it's up process will move ahead and your application or deployment will come into the picture.
If the application works well your readiness and liveness will HTTP-OK and the service start accepting traffic.
init container example
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: busybox
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
initContainers:
- name: init-myservice
image: busybox
command: ['sh', '-c', 'until nslookup myservice; do echo waiting for myservice; sleep 2; done;']
- name: init-mydb
image: busybox
command: ['sh', '-c', 'until nslookup mydb; do echo waiting for mydb; sleep 2; done;']
Extra Notes
There is actually no need to check the DB readiness at all.
As your application will be trying to connect with the Database so if DB is not UP your application won't respond HTTP-OK so your application won't start, and readiness keep failing for application.
As soon as your Database comes up your application will create a successful connection with DB and it will give 200 responses and readiness will mark POD ready.
there is no extra requirement to setup the readiness for Db and based on that start POD.

How to deploy a microservice ( possible multiple instances) dependent on database in the same kubernetes cluster?

I wanna run a microservice which use DB. DB need to deploy in the same kubernetes cluster as well using PVC/PV. What is the kubernetes service name/command to use to implement such logic:
Deploy the DB instance
If 1 is successful, then deploy the microservice, else return to 1 and try (if 100 times fail - then stop and alarm)
If 2 is successful, use work with it, autoscale if needed (autoscale kubernetes option)
I concern mostly about 1-2: the service cannot work without the DB, but at the same time need to be in different pods ( or am I wrong and it's better to put 2 containers: DB and service at the same pod?)
I would say you should add initContainer to your microservice, which would search for the DB service, and whenever it's gonna be ready, then the microservice will be started.
e.g.
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: busybox:1.28
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
initContainers:
- name: init-mydb
image: busybox:1.28
command: ['sh', '-c', "until nslookup mydb.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for mydb; sleep 2; done"]
As for the command simply use the kubectl apply with your yamls (with initContainer configured in your application).
If you want to do that in more automative way you can think about using fluxCD/argoCD.
As for the question from comments, containers that run before the main container runs and the main container must be in the same pod?
Yes, they have to be in the same pod. As the init container is going to work unless, f.e. the database service will be avaliable, then the main container is gonna start. There is great example with that in above initContainer documentation.

Can Kubernetes pods be scaled besides tight coupling to Hana?

We have a Kubernetes Cluster with backend services that pull data from an external Hana and send them to Kafka. The import process starts whenever the pod is started and takes around 90 minutes. Because of the tight coupling to Hana we cannot run multiple Pods of these Backend Services. I have the feeling that this could be somehow improved. But I don‘t know how.
What could be the way to go to have multiple pods for the backend services without pulling in the same data three times into Kafka?
Any other thoughts on this setup?
It's generally a good idea to have containers that perform only one action.
I would consider the following if you want to run the download & push in parallel:
A running container which does the download the data.
A running container that pushes the data.
Shared volume between the two for the data.
Each of these containers would have their own resources and readiness probes.
If the download & push cannot be done in parallel you could have:
An init container to download the data
A running container to push the data.
Shared volume between the two for the data.
Each of these containers would have their own resources and readiness probes.
This would have an extra advantage that if something goes wrong with the push of data, then you don't need to download everything again and the pushing of data will be retried as many times as you want(depending on the readiness probe configuration)
There is a concept of init containers in K8ns, please go through the documentation.
In a gist, if the import process is moved to init container as a separate routine on success of that the actual services can be started up in multiple instances.
An example pod.yml is given below - it's just an indicative sample to give you idea.
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: busybox:1.28
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
initContainers:
- name: init-myservice
image: busybox:1.28
command: ['sh', '-c', "until nslookup myservice.$(cat
/var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local;
do echo waiting for myservice; sleep 2; done"]
- name: init-mydb
image: busybox:1.28
command: ['sh', '-c', "until nslookup mydb.$(cat
/var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local;
do echo waiting for mydb; sleep 2; done"]
At the end of it, you will have to break up the functionality of importing data into a separate function - post which you can scale horizontally.

Airflow kubernetes executor: Run 2 jobs on the same pod

I'm using Airflow with kubernetes executor and the KubernetesPodOperator. I have two jobs:
A: Retrieve data from some source up to 100MB
B: Analyze the data from A.
In order to be able to share the data between the jobs, I would like to run them on the same pod, and then A will write the data to a volume, and B will read the data from the volume.
The documentation states:
The Kubernetes executor will create a new pod for every task instance.
Is there any way to achieve this? And if not, what recommended way there is to pass the data between the jobs?
Sorry this isn't possible - one job per pod.
You are best to use task 1 to put the data in a well known location (e.g in a cloud bucket) and get it from the second task. Or just combine the two tasks.
You can absolutely accomplish this using subdags and the SubDag operator. When you start a subdag the kubernetes executor creates one pod at the subdag level and all subtasks run on that pod.
This behavior does not seem to be documented. We just discovered this recently when troubleshooting a process.
yes you can do that using init containers inside job so in the same pod the job will not trigger before the init containers complete its task
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: busybox:1.28
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
initContainers:
- name: init-myservice
image: busybox:1.28
command: ['sh', '-c', 'until nslookup myservice; do echo waiting for myservice; sleep 2; done;']
- name: init-mydb
image: busybox:1.28
command: ['sh', '-c', 'until nslookup mydb; do echo waiting for mydb; sleep 2; done;']
this an example for pod and you can apply the same for kind job
You can have 2 separate tasks A and B where data can be handed of from A to B. K8S has out of box support for such type of volumes.
E.g. https://kubernetes.io/docs/concepts/storage/volumes/#awselasticblockstore.
Here data will be generated by one pod will be persistent so when the pod gets deleted data won't be lost. The same volume can be mounted by another pod and can access the data.

How to create a post-init container in Kubernetes?

I'm trying to create a redis cluster on K8s. I need a sidecar container to create the cluster after the required number of redis containers are online.
I've got 2 containers, redis and a sidecar. I'm running them in a statefulset with 6 replicas. I need the sidecar container to run just once for each replica then terminate. It's doing that, but K8s keeps rerunning the sidecar.
I've tried setting a restartPolicy at the container level, but it's invalid. It seems K8s only supports this at the pod level. I can't use this though because I want the redis container to be restarted, just not the sidecar.
Is there anything like a post-init container? My sidecar needs to run after the main redis container to make it join the cluster. So an init container is no use.
What's the best way to solve this with K8s 1.6?
I advise you to use Kubernetes Jobs:
https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
This kind of Job will keep running until it is completed once. In this job you could try detecting if all the required nodes are available in order to form the cluster.
A better answer is to just make the sidecar enter an infinite sleep loop. If it never exits it'll never keep on being restarted. Resource limits can be used to ensure there's minimal impact on the cluster.
Adding postStart as an optional answer to this problem:
https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/#define-poststart-and-prestop-handlers
apiVersion: v1
kind: Pod
metadata:
name: lifecycle-demo
spec:
containers:
- name: lifecycle-demo-container
image: nginx
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]