Kube-proxy interaction with Kubernetes master api: config-sync-period - kubernetes

The kube-proxy gets the Services and Endpoints information from the master api, but how?
According to these links:
http://kubernetes.io/docs/user-guide/services/#proxy-mode-iptables
https://github.com/kubernetes/kubernetes/blob/ee2a0694b649941fc0c3be606746db041b75b91d/cmd/kube-proxy/app/server.go
The proxy seems to be a watcher of the master api, so the update of the proxy information is immediate.
But then, what is the parameter config-sync-period ( How often configuration from the apiserver is refreshed. Must be greater than 0. ) in the proxy that defaults to 15 min?
What configuration is refreshed?

The sync period is how often we force-refresh the whole state, rather than just doing incremental deltas. This is a safegauard against potential bugs that might cause syncronized state to drift.

Related

GridGain server deployment/Statefulset Termination grace period

I deployed gridgain cluster in google kubernetes cluster following[1]. I enabled native persistency using statefulset. In statefulset.yaml in [2] terminationGracePeriodSeconds set to 60000. What is the purpose of this large timeout?
When deleting pod using kubectl delete pod command it take very large time. What is the suitable value for terminationGracePeriodSeconds without loss any data.
[1]. https://www.gridgain.com/docs/latest/installation-guide/kubernetes/gke-deployment
[2]. https://www.gridgain.com/docs/latest/installation-guide/kubernetes/gke-deployment#creating-pod-configuration
I believe the reason behind setting it to 60000 was - do not rely on it. Prior to Ignite 2.9 there was an issue with the startup script that didn't bypass SYS SIGNAL to the underlying Java app, making it impossible to perform a graceful shutdown.
If a node is being restarted gracefully and IGNITE_WAIT_FOR_BACKUPS_ON_SHUTDOWN is enabled, Ignite will ensure that the node leave won't lead to a data loss. Sometimes a rebalance might take a while.
Keeping the above in mind: the hang issue might happen for Apache Ignite 2.8 and below, keeping the recommended terminationGracePeriodSeconds should be fine and never be used in practice (in a normal flow).

Kubernetes etcd HighNumberOfFailedHTTPRequests QGET

I run kubernetes cluster in AWS, CoreOS-stable-1745.6.0-hvm (ami-401f5e38), all deployed by kops 1.9.1 / terraform.
etcd_version = "3.2.17"
k8s_version = "1.10.2"
This Prometheus alert method=QGET alertname=HighNumberOfFailedHTTPRequests is coming from coreos kube-prometheus monitoring bundle. The alert started to fire from the very beginning of the cluster lifetime and now exists for ~3 weeks without visible impact.
^ QGET fails - 33% requests.
NOTE: I have the 2nd cluster in other region built from scratch on the same versions and it has exact same behavior. So it's reproducible.
Anyone knows what might be the root cause, and what's the impact if ignored further?
EDIT:
Later I found this GH issue which describes my case precisely: https://github.com/coreos/etcd/issues/9596
From CoreOS documentation:
For alerts to not appear on arbitrary events it is typically better not to alert directly on a raw value that was sampled, but rather by aggregating and defining a relative threshold rather than a hardcoded value. For example: send a warning if 1% of the HTTP requests fail, instead of sending a warning if 300 requests failed within the last five minutes. A static value would also require a change whenever your traffic volume changes.
Here you can find detailed information on how to Develop Prometheus alerts for etcd.
I got the explanation in GitHub issue thread.
HTTP metrics/alerts should be replaced with GRPC.

How to use the Python Kubernetes client in a way resilient to GKE Kubernetes Master disruptions?

We sometimes use Python scripts to spin up and monitor Kubernetes Pods running on Google Kubernetes Engine using the Official Python client library for kubernetes. We also enable auto-scaling on several of our node pools.
According to this, "Master VM is automatically scaled, upgraded, backed up and secured". The post also seems to indicate that some automatic scaling of the control plane / Master VM occurs when the node count increases from 0-5 to 6+ and potentially at other times when more nodes are added.
It seems like the control plane can go down at times like this, when many nodes have been brought up. In and around when this happens, our Python scripts that monitor pods via the control plane often crash, seemingly unable to find the KubeApi/Control Plane endpoint triggering some of the following exceptions:
ApiException, urllib3.exceptions.NewConnectionError, urllib3.exceptions.MaxRetryError.
What's the best way to handle this situation? Are there any properties of the autoscaling events that might be helpful?
To clarify what we're doing with the Python client is that we are in a loop reading the status of the pod of interest via read_namespaced_pod every few minutes, and catching exceptions similar to the provided example (in addition we've tried also catching exceptions for the underlying urllib calls). We have also added retrying with exponential back-off, but things are unable to recover and fail after a specified max number of retries, even if that number is high (e.g. keep retrying for >5 minutes).
One thing we haven't tried is recreating the kubernetes.client.CoreV1Api object on each retry. Would that make much of a difference?
When a nodepool size changes, depending on the size, this can initiate a change in the size of the master. Here are the nodepool sizes mapped with the master sizes. In the case where the nodepool size requires a larger master, automatic scaling of the master is initiated on GCP. During this process, the master will be unavailable for approximately 1-5 minutes. Please note that these events are not available in Stackdriver Logging.
At this point all API calls to the master will fail, including the ones from the Python API client and kubectl. However after 1-5 minutes the master should be available and calls from both the client and kubectl should work. I was able to test this by scaling my cluster from 3 node to 20 nodes and for 1-5 minutes the master wasn't available .
I obtained the following errors from the Python API client:
Max retries exceeded with url: /api/v1/pods?watch=False (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at>: Failed to establish a new connection: [Errno 111] Connection refused',))
With kubectl I had :
“Unable to connect to the server: dial tcp”
After 1-5 minutes the master was available and the calls were successful. There was no need to recreate kubernetes.client.CoreV1Api object as this is just an API endpoint.
According to your description, your master wasn't accessible after 5 minutes which signals a potential issue with your master or setup of the Python script. To troubleshoot this further on side while your Python script runs, you can check for availability of master by running any kubectl command.

What happens when Eureka instance skips a heartbeat against a Eureka server with self preservation turned off?

Consider this set-up:
Eureka server with self preservation mode disabled i.e. enableSelfPreservation: false
2 Eureka instances each for 2 services (say service#1 and service#2). Total 4 instances.
And one of the instances (say srv#1inst#1, an instance of service#1) sent a heartbeat, but it did not reach the Eureka server.
AFAIK, following actions take place in sequence on Server side:
ServerStep1: Server observes that a particular instance has missed a heartbeat.
ServerStep2: Server marks the instance for eviction.
ServerStep3: Server's eviction scheduler (which runs periodically) evicts the instance from registry.
Now on instance (srv#1inst#1) side:
InstanceStep1: It skips a heartbeat.
InstanceStep2: It realizes heartbeat did not reach Eureka Server. It retries with exponential back-off.
AFAIK, the eviction and registration do not happen immediately. Eureka server runs separate scheduler for both tasks periodically.
I have some questions related to this process:
Are the sequences correct? If not, what did I miss?
Is the assumption about eviction and registration scheduler correct?
An instance of service#2 requests fresh registry copy from server right after ServerStep2.
Will srv#1inst#1 be in the fresh registry copy, because it has not been evicted yet?
If yes, will srv#1inst#1 be marked UP or DOWN?
The retry request from InstanceStep2 of srv#1inst#1 reaches server right after ServerStep2.
Will there be an immediate change in registry?
How that will affect the response to instance of service#2's request for fresh registry? How will it affect the eviction scheduler?
This question was answered by qiangdavidliu in one of the issues of eureka's GitHub repository.
I'm adding his explanations here for sake of completeness.
Before I answer the questions specifically, here's some high level information regarding heartbeats and evictions (based on default configs):
instances are only evicted if they miss 3 consecutive heartbeats
(most) heartbeats do not retry, they are best effort every 30s. The only time a heartbeat will retry is that if there is a threadlevel error on the heartbeating thread (i.e. Timeout or RejectedExecution), but this should be very rare.
Let me try to answer your questions:
Are the sequences correct? If not, what did I miss?
A: The sequences are correct, with the above clarifications.
Is the assumption about eviction and registration scheduler correct?
A: The eviction is handled by an internal scheduler. The registration is processed by the handler thread for the registration request.
An instance of service#2 requests fresh registry copy from server right after ServerStep2.
Will srv#1inst#1 be in the fresh registry copy, because it has not been evicted yet?
If yes, will srv#1inst#1 be marked UP or DOWN?
A: There are a few things here:
until the instance is actually evicted, it will be part of the result
eviction does not involve changing the instance's status, it merely removes the instance from the registry
the server holds 30s caches of the state of the world, and it is this cache that's returned. So the exact result as seem by the client, in an eviction scenario, still depends on when it falls within the cache's update cycle.
The retry request from InstanceStep2 of srv#1inst#1 reaches server right after ServerStep2.
Will there be an immediate change in registry?
How that will affect the response to instance of service#2's request for fresh registry? How will it affect the eviction scheduler?
A: again a few things:
When the actual eviction happen, we check each evictee's time to see if it is eligible to be evicted. If an instance is able to renew its heartbeats before this event, then it is no longer a target for eviction.
The 3 events in question (evaluation of eviction eligibility at eviction time, updating the heartbeat status of an instance, generation of the result to be returned to the read operations) all happen asynchronously and their result will depend on the evaluation of the above described criteria at execution time.

How do config tools like Consul "push" config updates to clients?

There is an emerging trend of ripping global state out of traditional "static" config management tools like Chef/Puppet/Ansible, and instead storing configurations in some centralized/distributed tool, of which the main players appear to be:
ZooKeeper (Apache)
Consul (Hashicorp)
Eureka (Netflix)
Each of these tools works differently, but the principle is the same:
Store your env vars and other dynamic configurations (that is, stuff that is subject to change) in these tools as key/value pairs
Connect to these tools/services via clients at startup and pull down your config KV pairs. This typically requires the client to supply a service name ("MY_APP"), and an environment ("DEV", "PROD", etc.).
There is an excellent Consul Java client which explains all of this beautifully and provides ample code examples.
My understanding of these tools is that they are built on top of consensus algorithms such as Zab, Paxos and Gossip that allow config updates to spread almost virally, with eventual consistency, throughout your nodes. So the idea there is that if you have a myapp app that has 20 nodes, say myapp01 through myapp20, if you make a config change to one of them, that change will naturally "spread" throughout the 20 nodes over a period of seconds/minutes.
My problem is: how do these updates actually deploy to each node? In none of the client APIs (the one I linked to above, the ZooKeeper API, or the Eureka API) do I see some kind of callback functionality that can be set up and used to notify the client when the centralized service (e.g. the Consul cluster) wants to push and reload config updates.
So I ask: how is this supposed to work (dynamic config deployment and reload on clients)? I'm interested in any viable answer for any of those 3 tools, though Consul's API seems to be the most advanced IMHO.
You could use cfg4j for that. It's a Java configuration library for distributed services. It supports Consul as one of the configuration sources.
That's a nice question. I can tell how Consul HTTP client works.
I also think initially that it works in the push mechanism but while I was recently exploring Consul, I found that all Consul clients poll server for changes they want to watch. Although it is a bit different polling mechanism, Consul supports blocking queries. These are HTTP requests with a max timeout of 10 mins. This query waits until there is some change on the watched key/folder and return with the latest index. If the index is changed, the client reloads the configuration. For more info : Consul Blocking Query