split Kubernetes cluster costs between namespaces - kubernetes

We are running a multi tenant Kubernetes cluster running on EKS (in AWS) and I need to come up with an appropriate way of charging all the teams that use the cluster. We have the costs of the EC2 worker nodes but I don't know how to split these costs up given metrics from prometheus. To make it trickier I also need to give the cost per team (or pod/namespace) for the past week and the past month.
Each team uses a different namespace but this will change soon so that each pod will have a label with the team name.
From looking around I can see that I'll need to use container_spec_cpu_shares and container_memory_working_set_bytes metrics but how can these two metrics be combined to used so that we get a percentage of the worker node cost?
Also, I don't know promql well enough to know how to get the stats for the past week and the past month for the range vector metrics.
If anyone can share a solution if they're done this already or maybe even point me in the right direction i would appreciate it.
Thanks

Related

Is there an HPA configuration that could autoscale based on previous CPU usage?

We currently have a GKE environemt with several HPAs for different deployments. All of them work just fine out-of-the-box, but sometimes our users still experience some delay during peak hours.
Usually this delay is the time it takes the new instances to start and become ready.
What I'd like is a way to have an HPA that could predict usage and scale eagerly before it is needed.
The simplest implementation I could think of is just an HPA that could take the average usage of previous days and in advance (say 10 minutos earliers) scale up or down based on the historic usage for the current time-frame.
Is there anything like that in vanilla k8s or GKE? I was unable to find anything like that in GCP's docs.
If you want to scale your applications based on events/custom metrics, you can use KEDA (Kubernetes-based Event Driven Autoscaler) which support scaling based on GCP Stackdriver, Datadog or Promtheus metrics (and many other scalers).
What you need to do is creating some queries to get the CPU usage at the moment: CURRENT_TIMESTAMP - 23H50M (or the aggregated value for the last week), then defining some thresholds to scale up/down your application.
If you have trouble doing this with your monitoring tool, you can create a custom metrics API that queries the monitoring API and aggregate the values (with the time shift) before sending it to the metrics-api scaler.

In Kubernetes, how many namespaces can you have?

I want to use the Kubernetes namespace for each user of my application. So potentially I'll need to create thousands of namespaces each with kubernetes resources in them. I want to make sure this is scalable, so I want to ensure that I can have millions of namespaces on a Kubernetes Cluster before I use this construct on a per user basis.
I'm building a web hosting application. So I'm giving resources to each user, but I want them separated by namespaces.
Are there any limitations to the number of Kubernetes namespaces you can create?
"In majority of cases, thresholds are NOT hard limits - crossing the limit results in degraded performance and doesn't mean cluster immediately fails over.
Many of the thresholds (for cluster scope) are given for the largest possible cluster. For smaller clusters, the limits are proportionally lower.
"
#Namespaces = 10000 scope=cluster
source with more data
kube Talk explaining how the data is computed
You'll usually run into limitations with resources and etcd long before you hit a namespace limit.
For scaling, you're probably going to want to scale your clusters which most companies treat as cattle rather than create a giant cluster which will be a Pet, which is not a scenario you want to be dealing with.

Kubernetes : Disadvantages of an all Master cluster

Hy !!
I was wondering if it could be possible to replicate an VMWare architecture in Kubernetes.
What I mean by that :
In place of having the Control-Panel always separated from the Worker Nodes, I would like to put them all together, at the end we would obtain a cluster of Master Nodes on which we can schedule applications. For now I'm using kata-container with containerd as such all applications are deployed in 'mini' VMs so there isn't the 'escape from the container' problem. The management of the Cluster would be done trough a special interface (eth0 1Gb). The users would be able to communicate with the apps that are deployed within the cluster trough another interface (eth1 10Gb). I would use Keepalived and HAProxy to elect my 'Main Master' and load balance the traffic.
The question might be 'why would you do that ?'. Well to assure High Availability at all time and reduce the management overhead, in place of having 2 sets of "entities" to manage (the control-plane and the worker nodes) simply reduce it to one, as such there won't be any problems such as 'I don't have more than 50% of my masters online so there won't be a leader elect', so now I would have to either eliminate master nodes from my cluster until the percentage of online master nodes > 50%, that would ask for technical intervention and as fast as possible which might result in human errors etc..
Another positive point would be the scaling, in place of having 2 parts of the cluster that I would need to scale (masters and workers) there would be only one, I would need to add another master/worker to the cluster and that's it. All the management traffic would be redirected to the Main Master that uses a Virtual IP (VIP) and in case of an overcharge the request would be redirected to another Node.
In the end I would have something resembling to this :
Photo - Architecture VMWare-like
I try to find disadvantages to this kind of architecture, I know that there would be etcd traffic on each Node but how impactful is it ? I know that there will be wasted resources for the Pods of the control-plane on each node, but knowing that these pods (except etcd) wont do much beside waiting, how impactful would it be ? Having each Node being capable to take the Master role there won't be any down time. Right now if my control-plane (3 masters) go down I have to reboot them or find the solution as fast as possible before there's a problem with one of the apps that turn on the worker Nodes.
The topology I'm using right now resembles the following :
Architecture basic Kubernetes
I'm new to kuberentes so the question might be seen as stupid but I would really like to know the advantages/disadvantages between the two and understand why it wouldn't be a good idea.
Thanks a lot for any help !! :slightly_smiling_face:
There are two reasons for keeping control planes on their own. The big one is that you only want a small number of etcd nodes, usually 3 or 5 and that's usually the bounding factor on the size of the control plane. You usually want the ability to scale worker nodes independently from that. The second issue is Etcd is very sensitive to IOPS brownouts and can get bad cascade failures if the machine runs low on IOPS.
And given that you are doing things on top of VMWare anyway, the overhead of managing 3 vs 6 VMs is not generally a difference in kind. This seems like a false savings in the long run.

schedule kubernetes pods on different physical server

In my cluster there are 30 VMs which are located in 3 different physical servers. I want to deploy different replicas of each workload on different physical server.
I know I can use podAntiAffinity to deploy replicas on different VMs but I cant find any way to guarantee spread replication on different physical server.
I want to know is there any way to solve this challenge?
I believe you gave the answer ;)
I went to the Kubernetes Patterns book (PDF available for free in here) to see if there was something related to that over there, and found exactly that:
To express how Pods should be spread to achieve high availability, or be packed and co-located together to improve latency, Pod affinity and antiaffinity can be used.
Node affinity works at node granularity, but Pod affinity is not limited to nodes and
can express rules at multiple topology levels. Using the topologyKey field, and the
matching labels, it is possible to enforce more fine-grained rules, which combine
rules on domains like node, rack, cloud provider zone, and region [...]
I really like the k8s docs as well, they are super complete and full of examples, so maybe you can get some ideas from here. I think the main idea will be to create your own affinity/antiaffinity rule.
----------------------------------- EDIT -----------------------------------
There is a new feature within k8s version 1.18 that may be a better solution.
It's called: Pod Topology Spread Constraints:
You can use topology spread constraints to control how Pods are spread across your cluster among failure-domains such as regions, zones, nodes, and other user-defined topology domains. This can help to achieve high availability as well as efficient resource utilization.

Breakdown of GKE bill based on pods or deployments

I need a breakdown of my usage inside a single project categorized on the basis of Pods or Services or Deployments but the billing section in console doesn't seem to provide such granular information. Is it possible to get this data somehow? I want to know what was the network + compute cost on per deployment or pods.
Or maybe if it is possible to have it atleast on the cluster level? Is this breakdown available in BigQuery?
Recently it was released a new features in GKE that allows to collect metrics inside a cluster that can also be combined with the exported billing data to separate costs per project/environment, making it possible to separate costs per namespace, deployment, labels, among other criteria.
https://cloud.google.com/blog/products/containers-kubernetes/gke-usage-metering-whose-line-item-is-it-anyway
It's not possible at the moment to breakdown the billing on a pod level, services or deployment, Kubernetes Engine uses Google Compute Engine instances for nodes in the cluster. You are billed for each of those instances according to Compute Engine's pricing, until the nodes are deleted. Compute Engine resources are billed on a per-second basis with a 1 minute minimum usage cost.
You can Export Billing Data to BigQuery enables you to export your daily usage and cost estimates automatically throughout the day to a BigQuery dataset you specify. You can then access your billing data from BigQuery then you can use BigQuery queries on exported billing data to do some breakdown.
You can view your usage reports as well and estimate your kubernetes charges using the GCP Pricing Calculator. If you want to move forward you can create a PIT request as a future request
You can get this visibility with your GKE Usage Metering dataset and your BigQuery cost exports.
Cost per namespace, cost per deployment, per node can be obtained by writing queries to combine these tables. If you have labels set, you can drilldown based on labels too. It shows you what's the spend on CPU, RAM, and egress cost.
Check out economize.cloud - it integrates with your datasets and allows you to slice and dice views. For example, cost per customer or cost per service can be obtained with such granular cost data.
https://www.economize.cloud/blog/gke-usage-cost-monitoring-kubernetes-clusters/
New GCP offering: GKE Cost Allocation allows users easily and natively view and manage the cost of a GKE cluster by cluster, namespace pod labels and more right from the Billing page or export Detailed usage cost data to Big Query:
https://cloud.google.com/kubernetes-engine/docs/how-to/cost-allocations
GKE Cost Allocation is a more accurate and robust compare to GKE Usage Metering.
Kubecost provides Kubernetes cost allocation by any concept, e.g. pod, service, controller, etc. It's open source and is available for GKE, AWS/EKS, and other major providers. https://github.com/kubecost/cost-model