How to tune Netflix eureka self preservation to handle autoscaling? - netflix-eureka

The self-preservation feature that never expires does not looks friendly to cluster auto-scaling ability.
When we scale down our services after reduced load thous shutted down instances could trigger self-preservation.
As I understand self-preservation tries to tolerate short-term network issues. But there are already exists settings which allow us to tune some tolerance window:
eureka.instance.lease-expiration-duration-in-seconds = 90
eureka.instance.lease-renewal-interval-in-seconds = 30
I faced some advises to don't turn self-preservation off but seems it brings more pain than gain. Do I miss something?

First, you need to distinguish between normal shutdown and unclean termination of Eureka client. Self preservation mode only cares about unclean termination.
Namely, when you scale down your servers, if you make your application shutdown normally (unregister), self preservation mode will not be activated.
If you're using Spring cloud based Eureka client, this normal shutdown will be done when application shutdown. The problem is that some Spring cloud releases have the issue about sending shutdown(Eureka unregister) message. So if you want to make sure, just send unregister messages via REST API to Eureka server just after scaling down about the scaling downed instances.
Another possible approach is that just decreasing the threshold for self preservation.
eureka:
server:
renewal-percent-threshold: 0.50
One more thing.
You need to be careful when change eureka.instance.leaseRenewalIntervalInSeconds value. Original Eureka server source code assumes that this value is 30 seconds when it calculates the threshold for self preservation mode. I'm not sure this hard-coded part still lives in the latest Spring cloud release. You need double check.

Related

Request buffering in Kubernetes clusters

This is a purely theoretical question. A standard Kubernetes clusted is given with autoscaling in place. If memory goes above a certain targetMemUtilizationPercentage than a new pod is started and it takes on the flow of requests that is coming to the contained service. The number of minReplicas is set to 1 and the number of maxReplicas is set to 5.
What happens when the number of pods that are online reaches maximum (5 in our case) and requests from clients are still coming towards the node? Are these requests buffered somewhere of they are discarded? Can I take any actions to avoid request loss?
Natively Kubernetes does not support messaging queue buffering. Depends on the scenario and setup you use your requests will most likely 'timeout'. To efficiently manage those you`ll need custom resource running inside Kubernetes cluster.
In that situations it very common to use a message broker which ensures communication between microservices is reliable and stable, that the messages are managed and monitored within the system and that messages don’t get lost.
RabbitMQ, Kafka and Redis appears to be most popular but choosing the right one will heaving depend on your requirement and features needed.
Worth to note since Kubernetes essentially runs on linux is that linux itself also manages/limits the requests coming in socket. You may want to read more about it here.
Another thing is that if you have pods limits set or lack of resource it is most likely that pods might be restarted or cluster will become unstable. Usually you can prevent it by configuring some kind of "circuit breaker" to limit amount of requests that could go to backed without overloading it. If the amount of requests goes beyond the circuit breaker threshold, excessive requests will be dropped.
It is better to drop some request than having cascading failure.
I managed to test this scenario and I get 503 Service Unavailable and 403 Forbidden on my requests that do not get processed.
Knative Serving actually does exactly this. https://github.com/knative/serving/
It buffers requests and informs autoscaling decisions based on in-flight request counts. It also can enforce per-Pod max in-flight requests and hold onto request until newly scaled-up Pods come up and then Knative proxies the request to them as it has this container named queue-proxy as a sidecar to its workload type called "Service".

Behaviour when reducing instances of a Bluemix application

I have an orchestrator service which keeps track of the instances that are running and what request they are currently dealing with. If a new instance is required, I make a REST call to increase the instances and wait for the new instance to connect to the orchestrator. It's one request per instance.
The orchestrator tracks whether an instance is doing anything and knows which instances can be stopped, however there is nothing in the API that allows me to reduce the number of instances stopping a particular instance, which is what I am trying to achieve.
Is there anything I can do to manipulate the platform into deterministically stopping the instances that I want to stop? Perhaps by having long running HTTP requests to the instances I require and killing the request when it's no longer required, then making the API call to reduce the number of instances?
Part of the issue here is that I don't know the specifics of the current behavior...
Assuming you're talking about CloudFoundry/Instant Runtime applications, all of the instances of an applications are running behind a load balancer which uses round-robin to distribute requests across the instances (unless you have session affinity cookie set up). Differentiating between each instances for incoming requests or manual scaling is not recommended and it's an anti-pattern. You can not control which instance the scale down task will choose.
If you really want that level of control with each instance, maybe you should deploy them as separate applications. MyApp1, MyApp2, MyApp3, etc. All of your applications can have the same route (myapp.mybluemix.net). Each of the applications can now distinguish themselves by their name (VCAP_APPLICATION) allowing you terminate them.

What does 'Renews' and 'Renews threshold' mean in Eureka

I'm new to Eureka and I see this information from the home page of my Eureka server(localhost:8761/). I didn't find any explanation from official docs about 'Renews' and 'Renews threshold'. Could any one please explain these words? Thanks!
Hope it helps:
Renews: total number of heartbeats the server received from clients
Renews threshold: a switch which controls the "self-preservation mode" of Eureka. If "Renews" is below "Renews threshold", the "self-preservation mode" is on.
self-preservation mode:
When the Eureka server comes up, it tries to get all of the instance registry information from a neighboring node. If there is a problem getting the information from a node, the server tries all of the peers before it gives up. If the server is able to successfully get all of the instances, it sets the renewal threshold that it should be receiving based on that information. If any time, the renewals falls below the percent configured for that value (below 85% within 15 mins), the server stops expiring instances to protect the current instance registry information.
In Netflix, the above safeguard is called as self-preservation mode and is primarily used as a protection in scenarios where there is a network partition between a group of clients and the Eureka Server. In these scenarios, the server tries to protect the information it already has. There may be scenarios in case of a mass outage that this may cause the clients to get the instances that do not exist anymore. The clients must make sure they are resilient to eureka server returning an instance that is non-existent or un-responsive. The best protection in these scenarios is to timeout quickly and try other servers.
For more details, please refer to the Eureka wiki.

Application startup and shutdown based on authenticated user activity

There are applications and services in enterprises that do not need to run all the time and that have a limited user base (say a handful of people).
These applications can be shut down and started either based on scheduling or even better user activity. So, we are talking about on-demand service (say wrapped by a container) and node start-up and shut down.
Now, first to mention that the reason why I mention authenticated user activity is because is makes sense to startup and shutdown on that basis (i.e. not based on lower level network traffic). One can imagine corporate SSO (say OAuth 2 based) being involved.
So, my question is whether anyone has attempted to implement what I have described using Consul or Kubernetes?
In the case of Consul, it could be that the key-value store could be used to give "Micro" (i.e. small user base) class applications a TTL, each time an authenticated user requests access to a given "Micro" class application it's TTL is updated. During the TTL window we want to check the health of the node(s), containers and services - outside of the window we don't (since we want to save on op ex).
This question is similar to this autoscaling question, however different in the sense that this use case is about scaling from 0 nodes and then down to 0 based on an authenticated user base (most likely using SSO).
In the case of Kubernetes, the Horizontal Pod Autoscaling documentation lists the exact use case described under Next steps (i.e. the feature is on the backlog and may be implemented after v1.1. of Kubernetes). The cited feature description (Unidling proposal) is as follows:
Scale the number of pods starting from 0. All pods can be turned-off, and then turned-on when there is a demand for them. When a request to service with no pods arrives, kube-proxy will generate an event for autoscaler to create a new pod.
So basically, it may be possible to do what I've described in future using Kubernetes, but it is not possible right now. This in itself does not address the requirement to only scale from 0 based on authenticated user activity.
It's worth noting, as a cluster-agnostic aside, on-demand container activation based on systemd. This solution will of course not scale back down to 0 without a controlling process, but it's still worth noting.

Microservice, amqp and service registry / discovery

I m studying Microservices architecture and I m actually wondering something.
I m quite okay with the fact of using (back) service discovery to make request able on REST based microservices. I need to know where's the service (or at least the front of the server cluster) to make requests. So it make sense to be able to discover an ip:port in that case.
But I was wondering what could be the aim of using service registry / discovery when dealing with AMQP (based only, without HTTP possible calls) ?
I mean, using AMQP is just like "I need that, and I expect somebody to answer me", I dont have to know who's the server that sent me back the response.
So what is the aim of using service registry / discovery with AMQP based microservice ?
Thanks for your help
AMQP (any MOM, actually) provides a way for processes to communicate without having to mind about actual IP addresses, communication security, routing, among other concerns. That does not necessarily means that any process can trust or even has any information about the processes it communicates with.
Message queues do solve half of the process: how to reach the remote service. But they do not solve the other half: which service is the right one for me. In other words, which service:
has the resources I need
can be trusted (is hosted on a reliable server, has a satisfactory service implementation, is located in a country where the local laws are compatible with your requirements, etc)
charges what you want to pay (although people rarely discuss cost when it comes to microservices)
will be there during the whole time window needed to process your service -- keep in mind that servers are becoming more and more volatile. Some servers are actually containers that can last for a couple minutes.
Those two problems are almost linearly independent. To solve the second kind of problems, you have resource brokers in Grid computing. There is also resource allocation in order to make sure that the last item above is correctly managed.
There are some alternative strategies such as multicasting the intention to use a service and waiting for replies with offers. You may have reverse auction in such a case, for instance.
In short, the rule of thumb is that if you do not have an a priori knowledge about which service you are going to use (hardcoded or in some configuration file), your agent will have to negotiate, which includes dynamic service discovery.