We have a Kubernetes Cluster with backend services that pull data from an external Hana and send them to Kafka. The import process starts whenever the pod is started and takes around 90 minutes. Because of the tight coupling to Hana we cannot run multiple Pods of these Backend Services. I have the feeling that this could be somehow improved. But I don‘t know how.
What could be the way to go to have multiple pods for the backend services without pulling in the same data three times into Kafka?
Any other thoughts on this setup?
It's generally a good idea to have containers that perform only one action.
I would consider the following if you want to run the download & push in parallel:
A running container which does the download the data.
A running container that pushes the data.
Shared volume between the two for the data.
Each of these containers would have their own resources and readiness probes.
If the download & push cannot be done in parallel you could have:
An init container to download the data
A running container to push the data.
Shared volume between the two for the data.
Each of these containers would have their own resources and readiness probes.
This would have an extra advantage that if something goes wrong with the push of data, then you don't need to download everything again and the pushing of data will be retried as many times as you want(depending on the readiness probe configuration)
There is a concept of init containers in K8ns, please go through the documentation.
In a gist, if the import process is moved to init container as a separate routine on success of that the actual services can be started up in multiple instances.
An example pod.yml is given below - it's just an indicative sample to give you idea.
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: busybox:1.28
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
initContainers:
- name: init-myservice
image: busybox:1.28
command: ['sh', '-c', "until nslookup myservice.$(cat
/var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local;
do echo waiting for myservice; sleep 2; done"]
- name: init-mydb
image: busybox:1.28
command: ['sh', '-c', "until nslookup mydb.$(cat
/var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local;
do echo waiting for mydb; sleep 2; done"]
At the end of it, you will have to break up the functionality of importing data into a separate function - post which you can scale horizontally.
Related
I am new for pod health check with Readiness and Liveness. Recently I am working on Readiness. The scenario is as following:
The pod is a RestAPI service, it needs to connect to Database and store information in DB. So if RestAPI service wants to offer service, it needs to make sure database connection is successfully.
Si in our pod Readiness logic implementation, we use HTTP-Get and check if DB connection is connected, if it is okay, then HTTP-Get returns Ok, otherwise Readiness will be failed.
Not sure if the above logic is reasonable? Or is there any other approach for this logic processing?
Apart from Readiness, how about Liveness? Do I need to check DB connection in order to check Liveness is okay?
Any idea and suggestion are appreciated
readiness and liveness is mostly for service you are running inside container, there could be a scenario where your DB is up but there is issue with the application at that time also your readiness will be Up as DB is running, in ideal scenario if application not working it should stop accepting traffic.
i would recommend using the Init container or Life cycle hook for checking the condition of the Database first if it's up process will move ahead and your application or deployment will come into the picture.
If the application works well your readiness and liveness will HTTP-OK and the service start accepting traffic.
init container example
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: busybox
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
initContainers:
- name: init-myservice
image: busybox
command: ['sh', '-c', 'until nslookup myservice; do echo waiting for myservice; sleep 2; done;']
- name: init-mydb
image: busybox
command: ['sh', '-c', 'until nslookup mydb; do echo waiting for mydb; sleep 2; done;']
Extra Notes
There is actually no need to check the DB readiness at all.
As your application will be trying to connect with the Database so if DB is not UP your application won't respond HTTP-OK so your application won't start, and readiness keep failing for application.
As soon as your Database comes up your application will create a successful connection with DB and it will give 200 responses and readiness will mark POD ready.
there is no extra requirement to setup the readiness for Db and based on that start POD.
I wanna run a microservice which use DB. DB need to deploy in the same kubernetes cluster as well using PVC/PV. What is the kubernetes service name/command to use to implement such logic:
Deploy the DB instance
If 1 is successful, then deploy the microservice, else return to 1 and try (if 100 times fail - then stop and alarm)
If 2 is successful, use work with it, autoscale if needed (autoscale kubernetes option)
I concern mostly about 1-2: the service cannot work without the DB, but at the same time need to be in different pods ( or am I wrong and it's better to put 2 containers: DB and service at the same pod?)
I would say you should add initContainer to your microservice, which would search for the DB service, and whenever it's gonna be ready, then the microservice will be started.
e.g.
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: busybox:1.28
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
initContainers:
- name: init-mydb
image: busybox:1.28
command: ['sh', '-c', "until nslookup mydb.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for mydb; sleep 2; done"]
As for the command simply use the kubectl apply with your yamls (with initContainer configured in your application).
If you want to do that in more automative way you can think about using fluxCD/argoCD.
As for the question from comments, containers that run before the main container runs and the main container must be in the same pod?
Yes, they have to be in the same pod. As the init container is going to work unless, f.e. the database service will be avaliable, then the main container is gonna start. There is great example with that in above initContainer documentation.
In my kubernetes cluster, I have cassandra statefulset which is using cassandra:4.0 image. I want my application pods to wait for cassandra to be up & running first.
I suppose this can be done by adding an initContainers in application pod deployment. The initContainer will check the status of cassandra thereby blocking the application pod from starting till it assures the availability of cassandra.
I don't know how to create such initContainer for checking the status of cassandra statefulset, I've checked onthe web but didn't find any examples related to cassandra.
Any help would be highly appreciated.
Note: I'm not using the actual cassandra image and not this example one (gcr.io/google-samples/cassandra:v13)
Using a similar approach to the Cassandra Helm Chart readiness probe:
spec:
template:
spec:
initContainers:
- name: check-cassandra-ready
image: cassandra:3.11
command: ['sh', '-c',
'until nodetool -h <cassandra-host> status | grep -E "^UN\\s+${POD_IP}";
do echo waiting for cassandra; sleep 2; done;']
Your Cassandra version must be the same as the nodetool version.
You could Wait for cassandra Service to be created, using a shell one-line command like:
for i in {1..100}; do sleep 1; if dig cassandraservice; then exit 0; fi; done; exit 1
Ref: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
I have a service which runs in apache. The container status is showing as completed and restarting. Why container is not maintaining its state as running even though the arguments passed does not have issues?
apiVersion: apps/v1
kind: Deployment
metadata:
name: ***
spec:
selector:
matchLabels:
app: ***
replicas: 1
template:
metadata:
labels:
app: ***
spec:
containers:
- name: ***
image: ****
command: ["/bin/sh", "-c"]
args: ["echo\ sid\ |\ sudo\ -S\ service\ mysql\ start\ &&\ sudo\ service\ apache2\ start"]
volumeMounts:
- mountPath: /var/log/apache2/
name: apache
- mountPath: /var/log/***/
name: ***
imagePullSecrets:
- name: regcred
volumes:
- name: apache
hostPath:
path: "/home/sandeep/logs/apache"
- name: vusmartmaps
hostPath:
path: "/home/sandeep/logs/***"
Soon after executing this arguments it is showing its status as completed and going to a loop. What we can do to maintain it status as running.
Please be advised this is not a good practice.
If you really want this working that way your last process must not end.
For example add sleep 9999 to your container.args
Best options would be splitting those into 2 separate Deployments.
First, would be easy to scale them independently.
Second, image would be smaller for each Deployment.
Third, Kubernetes would have a full control over those Deployments and you could utilize self-healing and rolling-updates.
There is a really good guide and examples on Deploying WordPress and MySQL with Persistent Volumes, which I think would be perfect for you.
But if you prefer to use just one pod then you would need to split you image or using official Docker images and your pod might look like this:
apiVersion: v1
kind: Pod
metadata:
name: app
labels:
app: test
spec:
containers:
- name: mysql
image: mysql:5.6
- name: apache
image: httpd:alpine
ports:
- containerPort: 80
volumeMounts:
- name: apache
mountPath: /var/log/apache2/
volumes:
- name: apache
hostPath:
path: "/home/sandeep/logs/apache"
You would need to expose the pod using Service:
$ kubectl expose pod app --type=NodePort --port=80
service "app" exposed
Checking what port it has:
$ kubectl describe service app
...
NodePort: <unset> 31418/TCP
...
Also you should read Communicate Between Containers in the Same Pod Using a Shared Volume.
You want to start apache and mysql in the same container and keep it running, aren't you?
Well, lets break down why it exits first. Kubernetes, just like Docker, will run whatever command you would give inside the container. If that command finishes, container would stop. echo sid | sudo -S service mysql start && sudo service apache2 start will ask your init process to start both mysql and apache, but the thing is that Kubernetes is not aware of your init inside the container.
In fact, the command statement will become instead of init process with pid 1, overriding whatever default startup command you have in your container image. Whenever process with pid 1 exits, container stops.
Therefore in your case you have to start whatever init system you have in your container.
However we come closer to another problem - Kubernetes already acts as init system. It starts your pods and supervises them. Therefore all you need is to start two containers instead - one for mysql and another one for apache.
For example you could use official dockerhub images from https://hub.docker.com//httpd/ and https://hub.docker.com//mysql. They already come with both services configured to startup correctly, therefore you don't even have to specify command and args in your deployment manifest.
Containers are not tiny VMs. You need two in this case, one running MySQL and another running Apache. Both have standard community images available, which I would probably start with.
I'm using Airflow with kubernetes executor and the KubernetesPodOperator. I have two jobs:
A: Retrieve data from some source up to 100MB
B: Analyze the data from A.
In order to be able to share the data between the jobs, I would like to run them on the same pod, and then A will write the data to a volume, and B will read the data from the volume.
The documentation states:
The Kubernetes executor will create a new pod for every task instance.
Is there any way to achieve this? And if not, what recommended way there is to pass the data between the jobs?
Sorry this isn't possible - one job per pod.
You are best to use task 1 to put the data in a well known location (e.g in a cloud bucket) and get it from the second task. Or just combine the two tasks.
You can absolutely accomplish this using subdags and the SubDag operator. When you start a subdag the kubernetes executor creates one pod at the subdag level and all subtasks run on that pod.
This behavior does not seem to be documented. We just discovered this recently when troubleshooting a process.
yes you can do that using init containers inside job so in the same pod the job will not trigger before the init containers complete its task
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: busybox:1.28
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
initContainers:
- name: init-myservice
image: busybox:1.28
command: ['sh', '-c', 'until nslookup myservice; do echo waiting for myservice; sleep 2; done;']
- name: init-mydb
image: busybox:1.28
command: ['sh', '-c', 'until nslookup mydb; do echo waiting for mydb; sleep 2; done;']
this an example for pod and you can apply the same for kind job
You can have 2 separate tasks A and B where data can be handed of from A to B. K8S has out of box support for such type of volumes.
E.g. https://kubernetes.io/docs/concepts/storage/volumes/#awselasticblockstore.
Here data will be generated by one pod will be persistent so when the pod gets deleted data won't be lost. The same volume can be mounted by another pod and can access the data.