How to create a post-init container in Kubernetes? - kubernetes

I'm trying to create a redis cluster on K8s. I need a sidecar container to create the cluster after the required number of redis containers are online.
I've got 2 containers, redis and a sidecar. I'm running them in a statefulset with 6 replicas. I need the sidecar container to run just once for each replica then terminate. It's doing that, but K8s keeps rerunning the sidecar.
I've tried setting a restartPolicy at the container level, but it's invalid. It seems K8s only supports this at the pod level. I can't use this though because I want the redis container to be restarted, just not the sidecar.
Is there anything like a post-init container? My sidecar needs to run after the main redis container to make it join the cluster. So an init container is no use.
What's the best way to solve this with K8s 1.6?

I advise you to use Kubernetes Jobs:
https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
This kind of Job will keep running until it is completed once. In this job you could try detecting if all the required nodes are available in order to form the cluster.

A better answer is to just make the sidecar enter an infinite sleep loop. If it never exits it'll never keep on being restarted. Resource limits can be used to ensure there's minimal impact on the cluster.

Adding postStart as an optional answer to this problem:
https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/#define-poststart-and-prestop-handlers
apiVersion: v1
kind: Pod
metadata:
name: lifecycle-demo
spec:
containers:
- name: lifecycle-demo-container
image: nginx
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]

Related

GKE - How to expose pod for cross-cluster communication with MCS

In Google Cloud Platform, my goal is to have one cluster with a message queue and a pod to consume these in another cluster with MCS (Multi Cluster Service). When trying this out with only one cluster it went fairly smooth. I used the container name with the port number as the endpoint to connect to the redpanda message queue like this:
Now I want to do this between two clusters, but I'm having trouble configuring stuff right. This is my setup:
I followed this guide to set the clusters up which seemed to work (hard to tell, but no errors), and the redpanda application inside the pod is configured to be on localhost:9092. Unfortunately, I'm getting a Connection Error when running the consumer on my-service-export.my-ns.svc.clusterset.local:9092.
Is it correct to expose the pod with the message queue on its localhost?
Are there ways I can debug or test the connection between pods easier?
Ok, got it working. I obviously misread the setup at one point and had to re-do some stuff to get it working.
Also, the my-service-export should probably have the same name as the service you want to export, in my case redpanda.
A helpful tool to check the connection without running up a consumer is the simple dnsutils image. Use this deployment file and change the namespace to my-ns:
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
namespace: my-ns # <----
spec:
containers:
- name: dnsutils
image: k8s.gcr.io/e2e-test-images/jessie-dnsutils:1.3
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
Then spin it up with apply, exec into it, then run host my-service-export.my-ns.svc.clusterset.local. If you get an IP back you are probably good.

Kubernetes start pods in batches for a node

For some application,start or restart need more resources than running。for exapmle:es/flink。if a node have network jitter,all pods would restart at the same time in this node。When this happens,cpu usage becomes very high in this node。it would increase resource competition in this node。
now i want to start pods in batches for only one node。how to realize the function now?
Kubernetes have auto-healing
You can let the POD crash and Kubernetes will auto re-start them soon as get the sufficient memory or resource requirement
Or else if you want to put the wait somehow so that deployment wait and gradually start one by one
you can use the sidecar and use the POD lifecycle hooks to start the main container, this process is not best but can resolve your issue.
Basic example :
apiVersion: v1
kind: Pod
metadata:
name: sidecar-starts-first
spec:
containers:
- name: sidecar
image: my-sidecar
lifecycle:
postStart:
exec:
command:
- /bin/wait-until-ready.sh
- name: application
image: my-application
OR
You can also use the Init container to check the other container's health and start the main container POD once one POD is of another service.
Init container
i would also recommend to check the Priority class : https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/

How to deploy a microservice ( possible multiple instances) dependent on database in the same kubernetes cluster?

I wanna run a microservice which use DB. DB need to deploy in the same kubernetes cluster as well using PVC/PV. What is the kubernetes service name/command to use to implement such logic:
Deploy the DB instance
If 1 is successful, then deploy the microservice, else return to 1 and try (if 100 times fail - then stop and alarm)
If 2 is successful, use work with it, autoscale if needed (autoscale kubernetes option)
I concern mostly about 1-2: the service cannot work without the DB, but at the same time need to be in different pods ( or am I wrong and it's better to put 2 containers: DB and service at the same pod?)
I would say you should add initContainer to your microservice, which would search for the DB service, and whenever it's gonna be ready, then the microservice will be started.
e.g.
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: busybox:1.28
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
initContainers:
- name: init-mydb
image: busybox:1.28
command: ['sh', '-c', "until nslookup mydb.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for mydb; sleep 2; done"]
As for the command simply use the kubectl apply with your yamls (with initContainer configured in your application).
If you want to do that in more automative way you can think about using fluxCD/argoCD.
As for the question from comments, containers that run before the main container runs and the main container must be in the same pod?
Yes, they have to be in the same pod. As the init container is going to work unless, f.e. the database service will be avaliable, then the main container is gonna start. There is great example with that in above initContainer documentation.

Difference between Replication Controller and LivenessProbs in K8S

I'm just confused between the replication controller and livenessProbs in K8S. Could anyone explain this?
ReplicationController and livenessProbe have nothing common so it is really hard to be confused, moreover Kubernetes documentation (check links) has a great explanation for both these objects.
Replication controller is older version of replicasets.
replication controller basically manage the state of replicas running inside kubernetes cluster.
Replication controller use at cluster level.
Liveness probe use at pod level. Liveness probe constantly on a frequent base ping one endpoint and check for the service liveness.if service is not live it will restart the pod.
ReplicationController and livenessProbe have nothing common.
Replication Controller in K8s makes sure that a specified number of pod replicas are running at any one time. Those pods supposed to be always up and available.
If there are too many pods, the ReplicationController terminates the extra pods. If there are too few, the ReplicationController starts more pods. Unlike manually created pods, the pods maintained by a ReplicationController are automatically replaced if they fail, are deleted, or are terminated.
Example Replication Controller config file:
apiVersion: v1
kind: ReplicationController
metadata:
name: nginx
spec:
replicas: 3
selector:
app: nginx
template:
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
Workflow of Replication Controllers:
More information you can find here: Replication Controller.
Useful articel: replication controller actions.
Liveness probe in K8s.
A Probe is a diagnostic performed periodically by the kubelet on a Container. To perform a diagnostic, the kubelet calls a Handler implemented by the Container.
The kubelet can optionally perform and react to two kinds of probes on running Containers:
livenessProbe: Indicates whether the Container is running. If the liveness probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state is Success.
More information you can find here: pod lifecycle.
Useful article: Kubernetes probes.

Is there a way to downscale pods only when message is processed (the pod finished its task) with the HorizontalPodAutoscaler in Kubernetes?

I'v setup Kubernetes Horizontal Pod Autoscaler with custom metrics using the prometheus adapter https://github.com/DirectXMan12/k8s-prometheus-adapter. Prometheus is monitoring rabbitmq, and Im watching the rabbitmq_queue_messages metric. The messages from the queue are picked up by the pods, that then do some processing, which can last for several hours.
The scale-up and scale-down is working based on the number of messages in the queue.
The problem:
When a pod finishes the processing and acks the message, that will lower the num. of messages in the queue, and that would trigger the Autoscaler terminate a pod. If I have multipe pods doing the processing and one of them finishes, if Im not mistaking, Kubernetes could terminate a pod that is still doing the processing of its own message. This wouldnt be desirable as all the processing that the pod is doing would be lost.
Is there a way to overcome this, or another way how this could be acheveed?
here is the Autoscaler configuration:
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
name: sample-app-rabbitmq
namespace: monitoring
spec:
scaleTargetRef:
# you created above
apiVersion: apps/v1
kind: Deployment
name: sample-app
minReplicas: 1
maxReplicas: 10
metrics:
- type: Object
object:
target:
kind: Service
name: rabbitmq-cluster
metricName: rabbitmq_queue_messages_ready
targetValue: 5
You could consider approach using preStop hook.
As per documentation Container States, Define postStart and preStop handlers:
Before a container enters into Terminated, preStop hook (if any) is executed.
So you can use in your deployment:
lifecycle:
preStop:
exec:
command: ["your script"]
### update:
I would like to provide more information due to some research:
There is an interesting project:
KEDA allows for fine grained autoscaling (including to/from zero) for event driven Kubernetes workloads. KEDA serves as a Kubernetes Metrics Server and allows users to define autoscaling rules using a dedicated Kubernetes custom resource definition.
KEDA can run on both the cloud and the edge, integrates natively with Kubernetes components such as the Horizontal Pod Autoscaler, and has no external dependencies.
For the main question "Kubernetes could terminate a pod that is still doing the processing of its own message".
As per documentation:
"Deployment is a higher-level concept that manages ReplicaSets and provides declarative updates to Pods along with a lot of other useful features"
Deployment is backed by Replicaset. As per this controller code there exist function "getPodsToDelete". In combination with "filteredPods" it gives the result: "This ensures that we delete pods in the earlier stages whenever possible."
So as proof of concept:
You can create deployment with init container. Init container should check if there is a message in the queue and exit when at least one message appears. This will allow main container to start, take and process that message. In this case we will have two kinds of pods - those which process the message and consume CPU and those who are in the starting state, idle and waiting for the next message. In this case starting containers will be deleted at the first place when HPA decide to decrease number of replicas in the deployment.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: complete
name: complete
spec:
replicas: 5
revisionHistoryLimit: 10
selector:
matchLabels:
app: complete
template:
metadata:
creationTimestamp: null
labels:
app: complete
spec:
hostname: c1
containers:
- name: complete
command:
- "bash"
args:
- "-c"
- "wa=$(shuf -i 15-30 -n 1)&& echo $wa && sleep $wa"
image: ubuntu
imagePullPolicy: IfNotPresent
resources: {}
initContainers:
- name: wait-for
image: ubuntu
command: ['bash', '-c', 'sleep 30']
dnsPolicy: ClusterFirst
restartPolicy: Always
terminationGracePeriodSeconds: 30
Hope this help.
Horizontal Pod Autoscaler is not designed for long-running tasks, and will not be a good fit. If you need to spawn one long-running processing tasks per message, I'd take one of these two approaches:
Use a task queue such as Celery. It is designed to solve your exact problem: have a queue of tasks that needs to be distributed to workers, and ensure that the tasks run to completion. Kubernetes even provides an official example of this setup.
If you don't want to introduce another component such as Celery, you can spawn a Kubernetes job for every incoming message by yourself. Kubernetes will make sure that the job runs to completion at least once - reschedule the pod if it dies, etc. In this case you will need to write a script that reads RabbitMQ messages and creates jobs for them by yourself.
In both cases, make sure you also have Cluster Autoscaler enabled so that new nodes get automatically provisioned if your current nodes are not sufficient to handle the load.