I can't find a way to provide a "graceful shutdown" in Nest microservices, in particular using NATS - kubernetes

Hello everyone I can't find a way to provide a "graceful shutdown" in Nest microservices, in particular using NATS.
Expected behavior:
The application in kubernetes received a 'SIGTERM' signal.
stops listening for new incoming requests.
service of accepted requests is completed and a response is given.
the application closes all connections and shuts down.

You can use the Kubernetes spec config to set the graceful shutdown for POD: terminationGracePeriodSeconds
Default value of terminationGracePeriodSeconds is 30 seconds
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: test
spec:
replicas: 1
template:
spec:
containers:
- name: test
image: ...
terminationGracePeriodSeconds: 60
Read best practise : https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-terminating-with-grace

Related

How can I run a cli app in a pod inside a Kubernetes cluster?

I have a cli app written in NodeJS [not by me].
I want to deploy this on a k8s cluster like I have done many times with web servers.
I have not deployed something like this before, so I am in a kind of a loss.
I have worked with dockerized cli apps [like Terraform] before, and i know how to use them in a CICD.
But how should I deploy them in a pod so they are always available for usage from another app in the cluster?
Or is there a completely different approach that I need to consider?
#EDIT#
I am using this in the end of my Dockerfile ..
# the main executable
ENTRYPOINT ["sleep", "infinity"]
# a default command
CMD ["mycli help"]
That way the pod does not restart and the cli inside is waiting for commands like mycli do this
Is it a hacky way that is frowned upon or a legit solution?
Your edit is one solution, another one if you do not want or cannot change the Docker image is to Define a Command for a Container to loop infinitely, this would achieve the same as the Dockerfile ENTRYPOINT but without having to rebuild the image.
Here's an example of such implementation:
apiVersion: v1
kind: Pod
metadata:
name: command-demo
labels:
purpose: demonstrate-command
spec:
containers:
- name: command-demo-container
image: debian
command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
restartPolicy: OnFailure
As for your question about if this is a legit solution, this is hard to answer; I would say it depends on what your application is designed to do. Kubernetes Pods are designed to be ephemeral, so a good solution would be one that is running until the job is completed; for a web server, for example, the job is never completed because it should be constantly listening to requests.
If your pods are in the same cluster they are already available to other pods through Core-DNS. An internal DNS service which allows you to access them by their internal DNS name. Something like my-cli-app.my-namespace.svc.cluster. DNS for service and pods
You would then create a deployment file with all your apps. Note this doesn't need ports to work and also doesn't include communication through the internet.
#deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80

kubernetes - container busy with request, then request should route to another container

I am very new to k8s and docker. But I have task on k8s. Now I stuck with a use case. That is:
If a container is busy with requests. Then incoming request should redirect to another container.
deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: twopoddeploy
namespace: twopodns
spec:
selector:
matchLabels:
app: twopod
replicas: 1
template:
metadata:
labels:
app: twopod
spec:
containers:
- name: secondcontainer
image: "docker.io/tamilpugal/angmanualbuild:latest"
env:
- name: "PORT"
value: "24244"
- name: firstcontainer
image: "docker.io/tamilpugal/angmanualbuild:latest"
env:
- name: "PORT"
value: "24243"
service.yaml
apiVersion: v1
kind: Service
metadata:
name: twopodservice
spec:
type: NodePort
selector:
app: twopod
ports:
- nodePort: 31024
protocol: TCP
port: 82
targetPort: 24243
From deployment.yaml, I created a pod with two containers with same image. Because, firstcontainer is not reachable/ busy, then secondcontainer should handles the incoming requests. This our idea and use case. So (only for checking our use case) I delete firstcontainer using docker container rm -f id_of_firstcontainer. Now site is not reachable until, docker recreates the firstcontainer. But I need k8s should redirects the requests to secondcontainer instead of waiting for firstcontainer.
Then, I googled about the solution, I found Ingress and Liveness - Readiness. But Ingress does route the request based on the path instead of container status. Liveness - Readiness also recreates the container. SO also have some question and some are using ngix server. But no luck. That why I create a new question.
So my question is how to configure the two containers to reduce the downtime?
what is keyword for get the solution from the google to try myself?
Thanks,
Pugal.
A service can load-balance between multiple pods. You should delete the second copy of the container inside the deployment spec, but also change it to have replicas: 2. Now your deployment will launch two identical pods, but the service will match both of them, and requests will go to both.
apiVersion: apps/v1
kind: Deployment
metadata:
name: onepoddeploy
namespace: twopodns
spec:
selector: { ... }
replicas: 2 # not 1
template:
metadata: { ... }
spec:
containers:
- name: firstcontainer
image: "docker.io/tamilpugal/angmanualbuild:latest"
env:
- name: "PORT"
value: "24243"
# no secondcontainer
This means if two pods aren't enough to handle your load, you can kubectl scale deployment -n twopodns onepoddeploy --replicas=3 to increase the replica count. If you can tell from CPU utilization or another metric when you're getting to "not enough", you can configure the horizontal pod autoscaler to adjust the replica count for you.
(You will usually want only one container per pod. There's no way to share traffic between containers in the way you're suggesting here, and it's helpful to be able to independently scale components. This is doubly true if you're looking at a stateful component like a database and a stateless component like an HTTP service. Typical uses for multiple containers are things like log forwarders and network proxies, that are somewhat secondary to the main operation of the pod, and can scale or be terminated along with the primary container.)
There are two important caveats to running multiple pod replicas behind a service. The service load balancer isn't especially clever, so if one of your replicas winds up working on intensive jobs and the other is more or less idle, they'll still each get about half the requests. Also, if your pods are configure for HTTP health checks (recommended), if a pod is backed up to the point where it can't handle requests, it will also not be able to answer its health checks, and Kubernetes will kill it off.
You can help Kubernetes here by trying hard to answer all HTTP requests "promptly" (aiming for under 1000 ms always is probably a good target). This can mean returning a "not ready yet" response to a request that triggers a large amount of computation. This can also mean rearranging your main request handler so that an HTTP request thread isn't tied up waiting for some task to complete.

Can I have a K8s pod per user/firm?

Is there a way we can have a K8s pod per user/per firm? I realise, per user/per firm grouping is mixing up the business level semantics with infrastructure but say I had this need for regulatory reasons, etc to keep things separate. Then is there a way to create a pod on the fly when a user logs in for the first time and hold this pod reference and route any further requests to the relevant pod which will host a set of containers each running an instance of one of the modules.
Is this even possible?
If possible, what are those identifiers that
can be injected into the pod on the fly that I could use to identify that this is
USER-A-POD vs USER_B_POD or FIRM_A_POD vs FIRM_B_POD ?
Effectively, I need to have a pod template that helps me create identical pods of 1 replica but the only way they differ is they are serving traffic related to one user/one firm only.
Generally, if you want to send traffic to a specific pod say from a Kubernetes Service you would use Labels and Selectors. For example, using the selector app: usera-app in the Service:
apiVersion: v1
kind: Service
metadata:
name: usera-service
spec:
selector:
app: usera-app
ports:
- protocol: TCP
port: 80
targetPort: 80
Then say if the Deployment for your pods, using the label app: usera-app:
apiVersion: apps/v1
kind: Deployment
metadata:
name: usera-deployment
spec:
selector:
matchLabels:
app: usera-app
replicas: 2
template:
metadata:
labels:
app: usera-app
spec:
containers:
- name: myservice
image: nginx
ports:
- containerPort: 80
More info here
How you assign your pods and deployments is up to you and whatever configuration you may use. If you'd like to force create some of the labels in deployments/pods you can take a look at MutatingAdminssionWebhooks.
If you are looking at projects to facilitate all this you can take a look at:
Gatekeeper which is an implementation of the Open Policy Agent for Kubernetes admission. (Still in alpha as of this writing)
Other tools that can help you with attestation and admission mechanism (would have to be adapted for labels):
Kritis
Portieris
Yes, you can create multiple virtual clusters for each user with namespaces.
https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/
Namespaces are the way to divide cluster between users.

How to get size of NATS Streaming Queue?

Background: I'm going to be doing some experiments with OpenFaaS (running in Kubernetes) where I will be invoking several asynchronous execution requests. OpenFaaS uses NATS Streaming to queue these requests for asynchronous function execution.
What I need is a way to determine the size of this NATS Streaming queue so I can know how many items are in the queue. Is there a command to get the size or number of items in a NATS Streaming Queue? I searched Google and the NATS documentation and found nothing of use.
I did find the command kubectl logs deployment/queue-worker -n openfaas from here which displays the logs of the queue; however, this is not quite what I want (I want the number of items left in the queue, not the queue's full logs).
You can enable a monitoring endpoint in NATS to get a number of general endpoints to query which can go down to a specific channel.
You would then need to expose a Service for that endpoint in Kubernetes for external access, probably via an Ingress if you want more control over which endpoints and how they are exposed.
Have a look at the templates in a nats-streaming-ft helm chart.
Add the monitoring port to your container spec
spec:
containers:
- name: nats-streaming
args:
- /opt/nats-streaming-server
- --http_port=8222
And the chosen monitoring port to your ports list in the Service.
apiVersion: v1
kind: Service
metadata:
name: nats-monitoring
labels:
app: nats
spec:
selector:
app: nats
ports:
- name: monitoring
protocol: TCP
port: 8222

Is there a way to downscale pods only when message is processed (the pod finished its task) with the HorizontalPodAutoscaler in Kubernetes?

I'v setup Kubernetes Horizontal Pod Autoscaler with custom metrics using the prometheus adapter https://github.com/DirectXMan12/k8s-prometheus-adapter. Prometheus is monitoring rabbitmq, and Im watching the rabbitmq_queue_messages metric. The messages from the queue are picked up by the pods, that then do some processing, which can last for several hours.
The scale-up and scale-down is working based on the number of messages in the queue.
The problem:
When a pod finishes the processing and acks the message, that will lower the num. of messages in the queue, and that would trigger the Autoscaler terminate a pod. If I have multipe pods doing the processing and one of them finishes, if Im not mistaking, Kubernetes could terminate a pod that is still doing the processing of its own message. This wouldnt be desirable as all the processing that the pod is doing would be lost.
Is there a way to overcome this, or another way how this could be acheveed?
here is the Autoscaler configuration:
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
name: sample-app-rabbitmq
namespace: monitoring
spec:
scaleTargetRef:
# you created above
apiVersion: apps/v1
kind: Deployment
name: sample-app
minReplicas: 1
maxReplicas: 10
metrics:
- type: Object
object:
target:
kind: Service
name: rabbitmq-cluster
metricName: rabbitmq_queue_messages_ready
targetValue: 5
You could consider approach using preStop hook.
As per documentation Container States, Define postStart and preStop handlers:
Before a container enters into Terminated, preStop hook (if any) is executed.
So you can use in your deployment:
lifecycle:
preStop:
exec:
command: ["your script"]
### update:
I would like to provide more information due to some research:
There is an interesting project:
KEDA allows for fine grained autoscaling (including to/from zero) for event driven Kubernetes workloads. KEDA serves as a Kubernetes Metrics Server and allows users to define autoscaling rules using a dedicated Kubernetes custom resource definition.
KEDA can run on both the cloud and the edge, integrates natively with Kubernetes components such as the Horizontal Pod Autoscaler, and has no external dependencies.
For the main question "Kubernetes could terminate a pod that is still doing the processing of its own message".
As per documentation:
"Deployment is a higher-level concept that manages ReplicaSets and provides declarative updates to Pods along with a lot of other useful features"
Deployment is backed by Replicaset. As per this controller code there exist function "getPodsToDelete". In combination with "filteredPods" it gives the result: "This ensures that we delete pods in the earlier stages whenever possible."
So as proof of concept:
You can create deployment with init container. Init container should check if there is a message in the queue and exit when at least one message appears. This will allow main container to start, take and process that message. In this case we will have two kinds of pods - those which process the message and consume CPU and those who are in the starting state, idle and waiting for the next message. In this case starting containers will be deleted at the first place when HPA decide to decrease number of replicas in the deployment.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: complete
name: complete
spec:
replicas: 5
revisionHistoryLimit: 10
selector:
matchLabels:
app: complete
template:
metadata:
creationTimestamp: null
labels:
app: complete
spec:
hostname: c1
containers:
- name: complete
command:
- "bash"
args:
- "-c"
- "wa=$(shuf -i 15-30 -n 1)&& echo $wa && sleep $wa"
image: ubuntu
imagePullPolicy: IfNotPresent
resources: {}
initContainers:
- name: wait-for
image: ubuntu
command: ['bash', '-c', 'sleep 30']
dnsPolicy: ClusterFirst
restartPolicy: Always
terminationGracePeriodSeconds: 30
Hope this help.
Horizontal Pod Autoscaler is not designed for long-running tasks, and will not be a good fit. If you need to spawn one long-running processing tasks per message, I'd take one of these two approaches:
Use a task queue such as Celery. It is designed to solve your exact problem: have a queue of tasks that needs to be distributed to workers, and ensure that the tasks run to completion. Kubernetes even provides an official example of this setup.
If you don't want to introduce another component such as Celery, you can spawn a Kubernetes job for every incoming message by yourself. Kubernetes will make sure that the job runs to completion at least once - reschedule the pod if it dies, etc. In this case you will need to write a script that reads RabbitMQ messages and creates jobs for them by yourself.
In both cases, make sure you also have Cluster Autoscaler enabled so that new nodes get automatically provisioned if your current nodes are not sufficient to handle the load.