Kubernetes pods replica CPU usage is high - kubernetes

We have microservices running AWS EKS cluster and many of the microservices having more than 10 pod replicas, for monitoring we are using grafana. unfortunately some of the pods in same microsevices are showing very high CPU usage say 80% and some are lke 0.35%. we have understanding like kubernetes will do the load balancing equally to distribute load. What we are missing here.?

How traffic is distributed from outside the cluster to your pods depends on the Load Balancer Controller, e.g. AWS Load Balancer Controller.
But the Load Balancer Controller typically does not take CPU usage in consideration, it only spreads traffic evenly to your replicas.
Typically, CPU load depends heavily on what your replicas are doing, e.g. some paths may use more CPU and some other HTTP paths is easier to handle. You need more insight to decide what to do, e.g. add some caching.


Kubernetes - Multiple pods per node vs one pod per node

What is usually preferred in Kubernetes - having a one pod per node configuration, or multiple pods per node?
From a performance standpoint, what are the benefits of having multiple pods per node, if there is an overhead in having multiple pods living on the same node?
From a performance standpoint, wouldn't it be better to have a single pod per node?
The answer to your question is heavily dependent on your workload.
There are very specific scenarios (machine learning, big data, GPU intensive tasks) where you might have a one pod per node configuration due to an IO or hardware requirement for a singular pod. However, this is normally not a efficient use of resources and sort of eliminates a lot of the benefits of containerization.
The benefit of multiple pods per node is a more efficient use of all available resources. Generally speaking, managed kubernetes clusters will automatically schedule and manage the amount of pods that run on a node for you automatically, and many providers offer simple autoscaling solutions to ensure that you are always able to run all your workloads.
Running only a single pod per node has its cons as well. For example each node will need its own "support" pods such as metrics, logs, network agents and other system pods which most likely will not have its all resources
fully utilized. Which in terms of performance would mean that selecting the correct node size to pods amount ratio might result with less costs for the same performance as single pod per node.
On the contrary running too many pods in a massive node can cause lack of those resources and cause metrics or logs gaps or lost packets OOM errors etc.
Finally, when we also consider auto scaling, scaling up couple more pods on an existing nodes will be lot more responsive than scaling up a new node for each pod.

How can I scale my google compute engine that hosts my kubernetes cluster?

Description :
Ideally (i.e in a non kubernetes scenario where my compute engines is hosting my application ) a load balancer would distribute the load on multiple replicated version of compute engines. But in case when I am using just my compute engine as worker node and it has some pods deployed on it.
Question 1 :
What would happen if my worker node ( a google computer engine ) starts receiving a lot of traffic.
Question 2 :
What would be the best(or atleast a better) way to scale my current solution so that it is able to manage more load and also that my load is efficiently distributed ?
In Kubernetes you deploy applications as pods. You can deploy multiple replicas of pods and Kubernetes will schedule it into multiple worker node VMs based on the resource requirement of the pods and available capacity on the nodes.This will provide resiliency and availability for applications. Once your workload increases you can scale the kubernetes cluster horizontally by adding more worker nodes.
You can use an ingress or L7 Loadbalancer to load balance user traffic onto the pods across different nodes. Even without those kubernetes provides L4 load balancing via kube proxy component.
Kubernetes scales to 5000 nodes. Some best practices for large cluster.

What's the maximum number of Kubernetes namespaces?

Is there a maximum number of namespaces supported by a Kubernetes cluster? My team is designing a system to run user workloads via K8s and we are considering using one namespace per user to offer logical segmentation in the cluster, but we don't want to hit a ceiling with the number of users who can use our service.
We are using Amazon's EKS managed Kubernetes service and Kubernetes v1.11.
This is quite difficult to answer which has dependency on a lot of factors, Here are some facts which were created on the k8s 1.7 cluster kubernetes-theresholds the Number of namespaces (ns) are 10000 with few assumtions
The are no limits from the code point of view because is just a Go type that gets instantiated as a variable.
In addition to link that #SureshVishnoi posted, the limits will depend on your setup but some of the factors that can contribute to how your namespaces (and resources in a cluster) scale can be:
Physical or VM hardware size where your masters are running
Unfortunately, EKS doesn't provide that yet (it's a managed service after all)
The number of nodes your cluster is handling.
The number of pods in each namespace
The number of overall K8s resources (deployments, secrets, service accounts, etc)
The hardware size of your etcd database.
Storage: how many resources can you persist.
Raw performance: how much memory and CPU you have.
The network connectivity between your master components and etcd store if they are on different nodes.
If they are on the same nodes then you are bound by the server's memory, CPU and storage.
There is no limit on number of namespaces. You can create as many as you want. It doesn't actually consume cluster resources like cpu, memory etc.

Question about 100 pods per node limitation

I'm trying to build a web app where each user gets their own instance of the app, running in its own container. I'm new to kubernetes so I'm probably not understanding something correctly.
I will have a few physical servers to use, which in kubernetes as I understand are called nodes. For each node, there is a limitation of 100 pods. So if I am building the app so that each user gets their own pod, will I be limited to 100 users per physical server? (If I have 10 servers, I can only have 500 users?) I suppose I could run multiple VMs that act as nodes on each physical server but doesn't that defeat the purpose of containerization?
The main issue in having too many pods in a node is because it will degrade the node performance and makes is slower(and sometimes unreliable) to manage the containers, each pod is managed individually, increasing the amount will take more time and more resources.
When you create a POD, the runtime need to keep a constant track, doing probes (readiness and Liveness), monitoring, Routing rules many other small bits that adds up to the load in the node.
Containers also requires processor time to run properly, even though you can allocate fractions of a CPU, adding too many containers\pod will increase the context switch and degrade the performance when the PODs are consuming their quota.
Each platform provider also set their own limits to provide a good quality of service and SLAs, overloading the nodes is also a risk, because a node is a single point of failure, and any fault in high density nodes might have a huge impact in the cluster and applications.
You should either consider:
Smaller nodes and add more nodes to the cluster or
Use Actors instead, where each client will be one Actor. And many actor will be running in a single container. To make it more balanced around the cluster, you partition the actors into multiple containers instances.
Regarding the limits, this thread has a good discussion about the concerns
Because of the hard limit if you have 10 servers you're limited to 1000 pods.
You might want to count also control plane pods in your 1000 available pods. Usually located in the namespace kube-system it can include (but is not limited to) :
node log exporters (1 per node)
metrics exporters
kube proxy (usually 1 per node)
kubernetes dashboard
DNS (scaling according to the number of nodes)
controllers like certmanager
A pretty good rule of thumb could be 80-90 application pods per node, so 10 nodes will be able to handle 800-900 clients considering you don't have any other big deployment on those nodes.
If you're using containers in order to gain perfs, creating node VMs will be against your goal. But if you're using containers as a way to deploy coherent environments and scale stateless applications then using VMs as node can make sense.
There are no magic rules and your context will dictate what to do.
As managing a virtualization cluster and a kubernetes cluster may skyrocket your infrastructure complexity, maybe kubernetes is not the most efficient tool to manage your workload.
You may also want to take a look at Nomad wich does not seem to have those kind of limitations and may provide features that are closer to your needs.

Kubernetes Cluster with different CPU configuration

I have created a K8S cluster of 10 machines. which is having cpus of different memory and cores (4 core 32 GB, 4 core 8 GB). Now when I am deploying any application on the cluster it is creating pods in a random manner. It is not creating the POD on the basis of memory or load.
How is Kubernetes master distributing the Pods in the cluster? I am not getting any significant answers. How can i configure the cluster for best use of resources?
Kubernetes uses a scheduler for deciding which pod is started on which node. One improvement is to tell the scheduler what your pods need as minimum and maximum resources.
Resources are Memory (measured in bytes), CPU (measured in cpu units) and ephemeral storage for things like emtpy dir(with 1.11). When you provide these information for your deployments Kubernetes can make better decisions where to run.
Without these information a nginx pod will be scheduled the same way as any heavy Java application.
The limits and requests config is described here. Setting both limits is a good idea to make scheduling easier and to avoid pods running amok and using all node resources.
If this is not enough there is also the possibility to add a custom scheduler which is explained in this documentation