Gracefully draining sessions from Dropwizard application using Kubernetes - kubernetes

I have a Dropwizard application that holds short-lived sessions (phone calls) in memory. There are good reasons for this, and we are not changing this model in the near term.
We will be setting up Kubernetes soon and I'm wondering what the best approach would be for handling shutdowns / rolling updates. The process will need to look like this:
Remove the DW node from the load balancer so no new sessions can be started on this node.
Wait for all remaining sessions to be completed. This should take a few minutes.
Terminate the process / container.
It looks like kubernetes can handle this if I make step 2 a "preStop hook"
http://kubernetes.io/docs/user-guide/pods/#termination-of-pods
My question is, what will the preStop hook actually look like? Should I set up a DW "task" (http://www.dropwizard.io/0.9.2/docs/manual/core.html#tasks) that will just wait until all sessions are completed and CURL it from kubernetes? Should I put a bash script that polls some sessionCount service until none are left in the docker container with the DW app and execute that?

Assume you don't use the preStop hook, and a pod deletion request has been issued.
API server processes the deletion request and modify the pod object.
Endpoint controller observes the change and removes the pod from the list of endpoints.
On the node, a SIGTERM signal is sent to your container/process.
Your process should trap the signal and drains all existing requests. Note that this step should not take more than the defined TerminationGracePeriod defined in your pod spec.
Alternatively, you can use the preStop hook which blocks until all the requests are drained. Most likely, you'll need a script to accomplish this.

Related

K8s graceful upgrade of service with long-running connections

tl;dr: I have a server that handles WebSocket connections. The nature of the workload is that it is necessarily stateful (i.e., each connection has long-running state). Each connection can last ~20m-4h. Currently, I only deploy new revisions of this service at off hours to avoid interrupting users too much.
I'd like to move to a new model where deploys happen whenever, and the services gracefully drain connections over the course of ~30 minutes (typically the frontend can find a "good" time to make that switch over within 30 minutes, and if not, we just forcibly disconnect them). I can do that pretty easily with K8s by setting gracePeriodSeconds.
However, what's less clear is how to do rollouts such that new connections only go to the most recent deployment. Suppose I have five replicas running. Normal deploys have an undesirable mode where a client is on R1 (replica 1) and then K8s deploys R1' (upgraded version) and terminates R1; frontend then reconnects and gets routed to R2; R2 terminates, frontend reconnects, gets routed to R3.
Is there any easy way to ensure that after the upgrade starts, new clients get routed only to the upgraded versions? I'm already running Istio (though not using very many of its features), so I could imagine doing something complicated with some custom deployment infrastructure (currently just using Helm) that spins up a new deployment, cuts over new connections to the new deployment, and gracefully drains the old deployment... but I'd rather keep it simple (just Helm running in CI) if possible.
Any thoughts on this?
This is already how things work with normal Services. Once a pod is terminating, it has already been removed from the Endpoints. You'll probably need to tune up your max burst in the rolling update settings of the Deployment to 100%, so that it will spawn all new pods all at once and then start the shutdown process on all the rest.

Can I execute a prestop hook only before deletion?

My Go-based custom resource operator needs some cleanup operations before it is deleted. It has to delete a specific znode from the ZooKeeper.
These operations must not be executed before regenerating resource. They have to be executed with the user's deletion command only. Thus, I can't use an ordinary prestop-hook.
Can I execute a prestop hook only before deletion? Or is there any other way for the operator to execute cleanup logic before the resource is deleted?
Can I execute a prestop hook only before deletion?
This is the whole purpose of the preStop hook. A pre-stop hook is executed immediately before the container is terminated. Once there as terminatino signal from API, Kubelet runs the pre-stop hook and afterwards sends the SIGTERM signal to the process.
Its design was to perform arbitrary operations before shutdown without having to implement those operations in the application itself. This is especially useful if you run some 3rd party app which code you can't modify.
Now the call the to terminate pod and invoke the hook can be due to API request, probes failed, resource contention and others.
For more reading please visit: Container Lifecycle Hooks

How do you create a message queue service for the scope of a specific Kubernetes job

I have a parallel Kubernetes job with 1 pod per work item (I set parallelism to a fixed number in the job YAML).
All I really need is an ID per pod to know which work item to do, but Kubernetes doesn't support this yet (if there's a workaround I want to know).
Therefore I need a message queue to coordinate between pods. I've successfully followed the example in the Kubernetes documentation here: https://kubernetes.io/docs/tasks/job/coarse-parallel-processing-work-queue/
However, the example there creates a rabbit-mq service. I typically deploy my tasks as a job. I don't know how the lifecycle of a job compares with the lifecycle of a service.
It seems like that example is creating a permanent message queue service. But I only need the message queue to be in existence for the lifecycle of the job.
It's not clear to me if I need to use a service, or if I should be creating the rabbit-mq container as part of my job (and if so how that works with parallelism).

Which kubernetes mode to chose

I have a situation where each message in a message queue has to be processed by a separate instance (one pod can process one message at a time). Many messages can be processed at once, but there is a limit of parallel executions. Once it's reached, no new messages are being pulled from the queue. Message processing takes about 30 minutes. No state needs to be stored on the pods between calls (all data is read from a database when pod starts processing a message). A new message should spawn a new pod, once the processing finishes, the pod should die.
Should I use Deployments, ReplicaSets, StatefulSets, Services? (we use Kubernetes with Azure) I guess the main
I've tried ReplicaSets, but in a situation when three messages are being processed and one finishes, scaling down a ReplicaSet can kill a working pod, which is definately not what I need.
I would say that since you do not need to handle state you must discard the StatefulSets, on the other hand, a Deployment is a higher-level concept of ReplicaSets, so, you should use a Deployment as it takes care of the replica set. Lastly, as your processing is under demand I would consider using Jobs, once the job completes the task it frees the resources and dies, this would require extra code to create the jobs based on a helper but could be very handy.

Blocking a Service Fabric service shutdown externally

I'm going to write a quick little SF service to report endpoints from a service to a load balancer. That part is easy and well understood. FabricClient. Find Services. Discover endpoints. Do stuff with load balancer API.
But I'd like to be able to deal with a graceful drain and shutdown situation. Basically, catch and block the shutdown of a SF service until after my app has had a chance to drain connections to it from the pool.
There's no API I can find to accomplish this. But I kinda bet one of the services would let me do this. Resource manager. Cluster manager. Whatever.
Is anybody familiar with how to pull this off?
From what I know this isn't possible in a way you've described.
Service Fabric service can be shutdown by multiple reasons: re-balancing, errors, outage, upgrade etc. Depending on the type of service (stateful or stateless) they have slightly different shutdown routine (see more) but in general if the service replica is shutdown gracefully then OnCloseAsync method is invoked. Inside this method replica can perform a safe cleanup. There is also a second case - when replica is forcibly terminated. Then OnAbort method is called and there are no clear statements in documentation about guarantees you have inside OnAbort method.
Going back to your case I can suggest the following pattern:
When replica is going to shutdown inside OnCloseAsync or OnAbort it calls lbservice and reports that it is going to shutdown.
The lbservice the reconfigure load balancer to exclude this replica from request processing.
replica completes all already processing requests and shutdown.
Please note that you would need to implement startup mechanism too i.e. when replica is started then it reports to lbservice that it is active now.
In a mean time I like to notice that Service Fabric already implements this mechanics. Here is an example of how API Management can be used with Service Fabric and here is an example of how Reverse Proxy can be used to access Service Fabric services from the outside.
EDIT 2018-10-08
In order to abstract receive notifications about services endpoints changes in general you can try to use FabricClient.ServiceManagementClient.ServiceNotificationFilterMatched Event.
There is a similar situation solved in this question.