Kubernetes Pod warm-up for load balancing

Kubernetes Pod warm-up for load balancing - kubernetes

We are having a Kubernetes service whose pods take some time to warm up with first requests. Basically first incoming requests will read some cached values from Redis and these requests might take a bit longer to process. When these newly created pods become ready and receive full traffic, they might become not very responsive for up to 30 seconds, before everything is correctly loaded from Redis and cached.
I know, we should definitely restructure the application to prevent this, unfortunately that is not feasible in a near future (we are working on it).
It would be great if it was possible to reduce the weight of the newly created pods, so they would receive 1/10 of the traffic in the beggining with the weight increasing as the time would pass. This would be also great for newly deployed versions of our application to see if it behaves correctly.

Why you need the cache loading in first call instead of having in heartbeat which is hooked to readiness probe? One other option is to make use of init containers in kubernetes

Until the application can be restructured to do this "priming" internally...
For when running on Kubernetes, look into Container Lifecycle Hooks and specifically into the PostStart hook. Documentation here and example here.
It seems that the behavior of "...The Container's status is not set to RUNNING until the postStart handler completes" is what can help you.
There's are few gotchas like "... there is no guarantee that the hook will execute before the container ENTRYPOINT" because "...The postStart handler runs asynchronously relative to the Container’s code", and "...No parameters are passed to the handler".
Perhaps a custom script can simulate that first request with some retry logic to wait for the application to be started?

Related

What happens when creaing crd without relating operator?

Once I register the crd into the k8s cluster, I can use .yaml to create it, without operator running. Then What happends to these created resouces?
I have seen the Reconciler of operator, but it's more like an async status transfer. When we create a pod, we can directly get the pod ip from the create result. But it seems that I didn't find a place to write my OnCreate hook. (I just see some validate webhook, but never see a hook that be called when creation request made, defines how to create the resourse, and return the created resourse info to the caller ).
If my story is that for one kind of resource, in a time window, all coming creation will multiplex only one pod. Can you give me some advice?

That's a big story for kubernetes crd/controller life cycle, I try to make a simple representation.
After register a new CRD, and create CR, kube-api-server do not care if there is a related controller existed or not. see the process:
That's means the resource(your CR) will be store to etcd, has no business of your controller
ok, let talk about your controller. your controller will setup a list/watch(actually a long live http link) to the api-server and register hook(what you ask, right?) for different event: onCreate, onUpdate and onDelete. Actually you will handle all event in your controller's reconcile (remember kubernetes reconcile's responsibility: move current state to desired state). see the diagram:
For the list/watch link in your controller, you need set different link for different kind of resource. for example: if you care about event for pod, you need set pod list/watch or care about deployment, and set a deployment list/watch...

Is there a way to process the queued webhook in ADO?

We have a service hook created for one of our projects in ADO. It was going fine until last weekend. Suddenly few webhooks started queued and I am not sure how to force it to get processed. Can someone help me if there is a way to force those items to get processed.
Thanks,
Venu

I am afraid that you cannot get that you want during process.
Under the process, the queued service hooks will not be picked again and will not be processed again.
When the main thread, such as a work item, is running, you cannot forcefully intervene or exit the content that is already queued.
And there is a similar issue also discussing about this situation.
And waiting service hooks are actually coupled, which also depends on your memory, because they actually run in memory. If there are occasional memory loss and other problems during execution, this cannot ensure that all service hooks can be executed as expected.
Or you should interrupt the current process and reduce the service hooks for it. But it is not a good solution.
So it is the best way to add a function that can handle the queued service hooks in the process. But currently there is no such function. Therefore we recommend you submit the suggestion ticket to the Team to suggest them add that feature.

Axon Server auto-scaling split/merge delay

I am implementing auto-scaling in an application using Axon Server, and running in k8s.
I have created ReST endpoints in the application itself, which look at the local configuration (for processors and thread counts) and then speak to the Axon Server ReST API in order to split/merge the processors appropriately. The intent being to use container lifecycle hooks to trigger them.
As a result, if a new instance (pod) of an application is launched, configured for 2 threads on ProcessorA, then my code will make 2 requests to the /v1/components/blah/processors/ProcessorA/segments/split?context=default endpoint on the server. This is in order to make full use of the 2 new threads.
Likewise, when the pod is shut down, it makes 2 similar requests to the merge endpoint on the server.
When scaling up I see the processor split twice, as expected. However, on shutdown I don't see the merge twice unless I put a long (5s) wait between requests. This isn't likely to be particularly stable, so I'm wondering if there's something else I need to be doing.
Perhaps I ought to request the merge, then loop waiting for it to occur, then request another. This seems like it's going to be excessively slow.
There was another question on SO somewhat related, Automatically scale Axon's tracking event processors, where Steven commented that there was no inbuilt auto-scaling in Axon Server at that point in time. I've not seen anything in more recent times either.

As it stands work is underway to improve the split/merge functionality. For one, the result of a split/merge will be returned, which has been resolved under issue #1001.
This should make it so you do not have to wait for the status' to have been updated, which is the likely cause why it (seems to) take long. This functionality will be part of Axon Framework / Server 4.4 by the way, which should be released relatively soon.
Subsequently, discussion are still underway to allow for auto scaling. One requirement deemed important is the capability of a TrackingEventProcessor to process several segments per thread (issue #1434). This will ensure that the TEP can take over several segments to transition the boundary when scaling, for example.
Eventually though, Axon Server should be able to do this for you. It's just not there yet.
So for now I think the most pragmatic solution is indeed to wait for the result to show up on the status'. As said, I trust 4.4 will improve upon this by returning the result of the split/merge operation once called. Lastly, the Axon team is aware this can be improved upon further, hence why discussion on the matter are underway.

AWS ECS. How to ensure only one instance of a task is running?

I'm wanting to setup an ECS task to schedule various other application tasks.
The "tasks" this task will schedule will mostly involve calling restful endpoints in another load balanced service.
I know there are other ways to do this, using cloudwatch to trigger a lambda etc. However this seems overly complex for what I need.
I was planning to just make a very simple, light-weight apline based image with a crontab to do the triggering of the restful calls.
This all seems easy enough. The only concern I have is that I would want to prevent, as far as possible, having multiple instances of this task running, even if only for a short period of time.
If my CI/CD pipeline triggers an update to this cron task, then there may be a short period of time, where the old and new task will be running simultaneously.
There may therefore be a small chance that a cron task could be triggered twice.
What I would like to do, is to have ECS stop the currently running task completely, before attempting to start the new one.
This seems to be contrary to the normal way it wants to work, where it will ensure the new task is up, and healthy before stopping the old one.
Is this possible, and if so, how do I configure it?
It's not a problem if my crons don't run for a period of time, but it could be a problem if any get triggered more than once.

Instead of using ECS Service (which makes sure a particular number of tasks is always running and deploys via rolling or B/G deploy strategy which is not you desire) - how about using StopTask and RunTask api to control when a task is stopped and started - gives you complete control.

Instead of using scheduled tasks, you could create an ECS service and use scheduled scaling to scale the desired service count to 1 and back down to zero.

How to kill a Marathon instance after duration?

I need a Marathon app (Docker container in this case) that is created on demand, then goes away (normal SIGTERM) after a configurable amount of time.
What is the best way to implement this? With a health check hack?
I initially turned to the API protos, but found nothing obvious there.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Kubernetes Pod warm-up for load balancing - kubernetes

Why you need the cache loading in first call instead of having in heartbeat which is hooked to readiness probe? One other option is to make use of init containers in kubernetes

Related

What happens when creaing crd without relating operator?

Is there a way to process the queued webhook in ADO?

Axon Server auto-scaling split/merge delay

AWS ECS. How to ensure only one instance of a task is running?

How to kill a Marathon instance after duration?

Categories

Resources