we want to ensure automatic failover to our secondary region but our current environment is not set up to use failover routing. we are currently using weighted routing. how can i achieve this?
since it not possible to use failover routing, i am thinking of using Global accelerator with 2 static ips, one attached to each endpoint and when an endpoint is unhealthy, trafic is redirected. however, this approach is costly. is there a possibility to create a cloudwatch alarm that triggers a lambda function to update the weighted proportion when the ALB in the primary region is unhealthy
Related
I'm wondering about an approach one has to take for our server setup. We have pods that are short lived. They are started up with 3 pods at a minimum and each server is waiting on a single request that it handles - then the pod is destroyed. I'm not sure of the mechanism that this pod is destroyed, but my question is not about this part anyway.
There is an "active session count" metric that I am envisioning. Each of these pod resources could make a rest call to some "metrics" pod that we would create for our cluster. The metrics pod would expose a sessionStarted and sessionEnded endpoint - which would increment/decrement the kubernetes activeSessions metric. That metric would be what is used for horizontal autoscaling of the number of pods needed.
Since having a pod as "up" counts as zero active sessions, the custom event that increments the session count would update the metric server session count with a rest call and then decrement again on session end (the pod being up does not indicate whether or not it has an active session).
Is it correct to think that I need this metric server (and write it myself)? Or is there something that Prometheus exposes where this type of metric is supported already - rest clients and all (for various languages), that could modify this metric?
Looking for guidance and confirmation that I'm on the right track. Thanks!
It's impossible to give only one way to solve this and your question is more "opinion-based". However there is an useful similar question on StackOverFlow, please check the comments that can give you some tips. If nothing works, probably you should write the script. There is no exact solution from Kubernetes's side.
Please also take into the consideration of Apache Flink. It has Reactive Mode in combination of Kubernetes:
Reactive Mode allows to run Flink in a mode, where the Application Cluster is always adjusting the job parallelism to the available resources. In combination with Kubernetes, the replica count of the TaskManager deployment determines the available resources. Increasing the replica count will scale up the job, reducing it will trigger a scale down. This can also be done automatically by using a Horizontal Pod Autoscaler.
Overview
Kubernetes scheduling errs on the side of 'not shuffling things around once scheduled and happy' which can lead to quite the level of imbalance in terms of CPU, Memory, and container count distribution. It can also mean that sometimes Affinity and Topology rules may not be enforced / as the state of affair changes:
With regards to topology spread constraints introduced in v1.19 (stable)
There's no guarantee that the constraints remain satisfied when Pods are removed. For example, scaling down a Deployment may result in imbalanced Pods distribution.
Context
We are currently making use of pod topology spread contraints, and they are pretty superb, aside from the fact that they only seem to handle skew during scheduling, and not execution (unlike the ability to differentiate with Taints and Tolerations).
For features such as Node affinity, we're currently waiting on the ability to add RequiredDuringExecution requirements as opposed to ScheduledDuringExecution requirements
Question
My question is, is there a native way to make Kubernetes re-evaluate and attempt to enforce topology spread skew when a new fault domain (topology) is added, without writing my own scheduler?
Or do I need to wait for Kubernetes to advance a few more releases? ;-) (I'm hoping someone may have a smart answer with regards to combining affinity / topology constraints)
After more research I'm fairly certain that using an outside tool like Descheduler is the best way currently.
There doesn't seem to be a combination of Taints, Affinity rules, or Topology constraints that can work together to achieve the re-evaluation of topology rules during execution.
Descheduler allows you to kill of certain workloads based on user requirements, and let the default kube-scheduler reschedule killed pods. It can be installed easily with manifests or Helm and ran on a schedule. It can even be triggered manually when the topology changes, which is what I think we will implement to suit our needs.
This will be the best means of achieving our goal while waiting for RequiredDuringExecution rules to mature across all feature offerings.
Given our topology rules mark each node as a topological zone, using a Low Node Utilization strategy to spread workloads across new hosts as they appear will be what we go with.
apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
"LowNodeUtilization":
enabled: true
params:
nodeResourceUtilizationThresholds:
thresholds:
"memory": 20
targetThresholds:
"memory": 70
In my cluster there are 30 VMs which are located in 3 different physical servers. I want to deploy different replicas of each workload on different physical server.
I know I can use podAntiAffinity to deploy replicas on different VMs but I cant find any way to guarantee spread replication on different physical server.
I want to know is there any way to solve this challenge?
I believe you gave the answer ;)
I went to the Kubernetes Patterns book (PDF available for free in here) to see if there was something related to that over there, and found exactly that:
To express how Pods should be spread to achieve high availability, or be packed and co-located together to improve latency, Pod affinity and antiaffinity can be used.
Node affinity works at node granularity, but Pod affinity is not limited to nodes and
can express rules at multiple topology levels. Using the topologyKey field, and the
matching labels, it is possible to enforce more fine-grained rules, which combine
rules on domains like node, rack, cloud provider zone, and region [...]
I really like the k8s docs as well, they are super complete and full of examples, so maybe you can get some ideas from here. I think the main idea will be to create your own affinity/antiaffinity rule.
----------------------------------- EDIT -----------------------------------
There is a new feature within k8s version 1.18 that may be a better solution.
It's called: Pod Topology Spread Constraints:
You can use topology spread constraints to control how Pods are spread across your cluster among failure-domains such as regions, zones, nodes, and other user-defined topology domains. This can help to achieve high availability as well as efficient resource utilization.
Based on this
Link, auto scaling instances or partitions are provided from service fabric.
However, what's confusing is if this can also provide auto-scaling in/out of the nodes(VMs / actual physical environment), which seems not mentioned explicitly.
Yes, you can auto scale the cluster as well, assuming that you are running in Azure. This will be done based on performance counter data. It works by defining rules on the VM scaleset.
Note that in order to automatically scale down gracefully, it's recommended you use the durability level Gold or Silver, otherwise you'll be responsible to drain the node before it's taken out of the cluster.
More info here and here.
With the understanding that Ubernetes is designed to fully solve this problem, is it currently possible (not necessarily recommended) to span a single K8/OpenShift cluster across multiple internal corporate datacententers?
Additionally assuming that latency between data centers is relatively low and that infrastructure across the corporate data centers is relatively consistent.
Example: Given 3 corporate DC's, deploy 1..* masters at each datacenter (as a single cluster) and have 1..* nodes at each DC with pods/rc's/services/... being spun up across all 3 DC's.
Has someone implemented something like this as a stop gap solution before Ubernetes drops and if so, how has it worked and what would be some considerations to take into account on running like this?
is it currently possible (not necessarily recommended) to span a
single K8/OpenShift cluster across multiple internal corporate
datacententers?
Yes, it is currently possible. Nodes are given the address of an apiserver and client credentials and then register themselves into the cluster. Nodes don't know (or care) of the apiserver is local or remote, and the apiserver allows any node to register as long as it has valid credentials regardless of where the node exists on the network.
Additionally assuming that latency between data centers is relatively
low and that infrastructure across the corporate data centers is
relatively consistent.
This is important, as many of the settings in Kubernetes assume (either implicitly or explicitly) a high bandwidth, low-latency network between the apiserver and nodes.
Example: Given 3 corporate DC's, deploy 1..* masters at each
datacenter (as a single cluster) and have 1..* nodes at each DC with
pods/rc's/services/... being spun up across all 3 DC's.
The downside of this approach is that if you have one global cluster you have one global point of failure. Even if you have replicated, HA master components, data corruption can still take your entire cluster offline. And a bad config propagated to all pods in a replication controller can take your entire service offline. A bad node image push can take all of your nodes offline. And so on. This is one of the reasons that we encourage folks to use a cluster per failure domain rather than a single global cluster.