OpenShift/Kubernetes memory handling - kubernetes

What is the correct way of memory handling in OpenShift/Kubernetes?
If I create a project in OKD, how can I determine optimal memory usage of pods? For example, if I use 1 deployment for 1-2 pods and each pod uses 300-500 Mb of RAM - Spring Boot apps. So technically, 20 pods uses around 6-10GB RAM, but as I see, sometimes each project could have around 100-150 containers which needs at least 30-50Gb of RAM.
I also tried with horizontal scale, and/or request/limits but still lot of memory used by each micro-service.
However, to start a pod, it requires around 500-700MB RAM, after spring container has been started they can live with around 300MB as mentioned.
So, I have 2 questions:
Is it able to give extra memory but only for the first X minutes for each pod start?
If not, than what is the best practice to handle memory shortage, if I have limited memory (16GB) and wants to run 35-40 pod?
Thanks for the answer in advance!

Is it able to give extra memory but only for the first X minutes for each pod start?
You do get this behavior when you set the limit to a higher value than the request. This allows pods to burst, unless they all need the memory at the same time.
If not, than what is the best practice to handle memory shortage, if I have limited memory (16GB) and wants to run 35-40 pod?
It is common to use some form of cluster autoscaler to add more nodes to your cluster if it needs more capacity. This is easy if you run in the cloud.
In general, Java and JVM is memory hungry, consider some other technology if you want to use less memory. How much memory an application needs/uses totally depends on your application, e.g what data structures are used.

Related

How to determine resource limit for Openshift Pods required for my tomcat application?

I've a web application (soap service) running in Tomcat 8 server in Openshift. The payload size is relatively small with 5-10 elements and the traffic is also small (300 calls per day, 5-10 max threads at a time). I'm little confused on the Pod resource restriction. How do I come up with min and max cpu and memory limits for each pod if I'm going to use min 1 and max 3 pods for my application?
It's tricky to configure accurate limitation value without performance test.
Because we don't expect your application is required how much resources process per requests. A good rule of thumb is to limit the resource based on heaviest workload on your environment. Memory limitation can trigger OOM-killer, so you should set up afforded value which is based on your tomcat heap and static memory size.
As opposed to CPU limitation will not kill your pod if reached the limitation value, but slow down the process speed.
My suggestion of each limitation value's starting point is as follows.
Memory: Tomcat(Java) memory size + 30% buffer
CPU: personally I think CPU limitation is useless to maximize the
process performance and efficiency. Even though CPU usage is afforded and the pod
can use full cpu resources to process the requests as soon as
possible at that time, the limitation setting can disturb it. But if
you should spread the resource usage evenly for suppressing some
aggressive resource eater, you can consider the CPU limitation.
This answer might not be what you want to, but I hope it help you to consider your capacity planning.

How to set the right cpu millicores for a container?

I want to optimally configure the CPU cores without over or under allocation. How can I measure the required CPU millicore for a given container? It also brings the question of how much traffic a proxy will send it to any given pod based on CPU consumption so we can optimally use the compute.
Currently I send requests and monitor with,
kubectl top pod
Is there any tool that can measure, Requests, CPU and Memory over the time and suggest the optimal CPU recommendation for the pods.
Monitoring over time and per Pod yes, there's suggestions at https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-monitoring/ One of the more popular is the Prometheus-Grafana combination - https://grafana.com/dashboards/315
As for automatic suggestion of the request and limits, I don't think there is anything. Keep in mind Kubernetes already tries to balance giving each Pod what it needs without it taking too much. The limits and requests that you set are to help it do this more safely. There are limitations on automatically inference as an under-resourced Pod can still work but respond a bit slower - it is up to you to decide what level of slowness you would tolerate. It is also up to you to decide what level of resource consumption could be acceptable in peak load, as opposed to excessive consumption that might indicate a bug in your app or even an attack. There's a further limitation as the metric units are themselves an attempt to approximate resource power that can actually vary with types of hardware (memory and CPUs can differ in mode of operation as well as quantity) and so can vary across clusters or even nodes on a cluster if the hardware isn't all equal.
What you are doing with top seems to me a good way to get started. You'll want to monitor resource usage for the cluster anyway so keeping track of this and adjusting limits as you go is a good idea. If you can run the same app outside of kubernetes and read around to see what other apps using the same language do then that can help to indicate if there's anything you can do to improve utilisation (memory consumption on the JVM in containers for example famously requires some tweaking to get right).

Multiple node pools vs single pool with many machines vs big machines

We're moving all of our infrastructure to Google Kubernetes Engine (GKE) - we currently have 50+ AWS machines with lots of APIs, Services, Webapps, Database servers and more.
As we have already dockerized everything, it's time to start moving everything to GKE.
I have a question that may sound too basic, but I've been searching the Internet for a week and did not found any reasonable post about this
Straight to the point, which of the following approaches is better and why:
Having multiple node pools with multiple machine types and always specify in which pool each deployment should be done; or
Having a single pool with lots of machines and let Kubernetes scheduler do the job without worrying about where my deployments will be done; or
Having BIG machines (in multiple zones to improve clusters' availability and resilience) and let Kubernetes deploy everything there.
List of consideration to be taken merely as hints, I do not pretend to describe best practice.
Each pod you add brings with it some overhead, but you increase in terms of flexibility and availability making failure and maintenance of nodes to be less impacting the production.
Nodes too small would cause a big waste of resources since sometimes will be not possible to schedule a pod even if the total amount of free RAM or CPU across the nodes would be enough, you can see this issue similar to memory fragmentation.
I guess that the sizes of PODs and their memory and CPU request are not similar, but I do not see this as a big issue in principle and a reason to go for 1). I do not see why a big POD should run merely on big machines and a small one should be scheduled on small nodes. I would rather use 1) if you need a different memoryGB/CPUcores ratio to support different workloads.
I would advise you to run some test in the initial phase to understand which is the size of the biggest POD and the average size of the workload in order to properly chose the machine types. Consider that having 1 POD that exactly fit in one node and assign to it is not the right to proceed(virtual machine exist for this kind of scenario). Since fragmentation of resources would easily cause to impossibility to schedule a large node.
Consider that their size will likely increase in the future and to scale vertically is not always this immediate and you need to switch off machine and terminate pods, I would oversize a bit taking this issue into account and since scaling horizontally is way easier.
Talking about the machine type you can decide to go for a machine 5xsize the biggest POD you have (or 3x? or 10x?). Oversize a bit as well the numebr of nodes of the cluster to take into account overheads, fragmentation and in order to still have free resources.
Remember that you have an hard limit of 100 pods each node and 5000 nodes.
Remember that in GCP the network egress throughput cap is dependent on the number of vCPUs that a virtual machine instance has. Each vCPU has a 2 Gbps egress cap for peak performance. However each additional vCPU increases the network cap, up to a theoretical maximum of 16 Gbps for each virtual machine.
Regarding the prices of the virtual machines notice that there is no difference in price buying two machines with size x or one with size 2x. Avoid to customise the size of machines because rarely is convenient, if you feel like your workload needs more cpu or mem go for HighMem or HighCpu machine type.
P.S. Since you are going to build a pretty big Cluster, check the size of the DNS
I will add any consideration that it comes to my mind, consider in the future to update your question with the description of the path you chose and the issue you faced.
1) makes a lot of sense as if you want, you can still allow kube deployments treat it as one large pool (by not adding nodeSelector/NodeAffinity) but you can have different machines of different sizes, you can think about having a pool of spot instances, etc. And, after all, you can have pools that are tainted and so forth excluded from normal scheduling and available to only a particular set of workloads. It is in my opinion preferred to have some proficiency with this approach from the very beginning, yet in case of many provisioners it should be very easy to migrate from 2) to 1) anyway.
2) As explained above, it's effectively a subset of 1) so better to build up exp with 1) approach from day 1, but if you ensure your provisioning solution supports easy extension to 1) model then you can get away with starting with this simplified approach.
3) Big is nice, but "big" is relative. It depends on the requirements and amount of your workloads. Remember that while you need to plan for loss of a whole AZ anyway, it will be much more frequent to loose single nodes (reboots, decommissions of underlying hardware, updates etc.) so if you have more hosts, impact of loosing one will be smaller. Bottom line is that you need to find your own balance, that makes sense for your particular scale. Maybe 50 nodes is too much, would 15 cut it? Who knows but you :)

Apache Spark Auto Scaling properties - Add Worker on the Fly

During the execution of a Spark Program, let's say,
reading 10GB of data into memory, and just doing a filtering, a map, and then saving in another storage.
Can I auto-scale the cluster based on the load, and for instance add more Worker Nodes to the Program, if this program eventually needs to hangle 1TB instead of 10GB ?
If this is possible, how can it be done?
It is possible to some extent, using dynamic allocation, but behavior is dependent on the job latency, not direct usage of particular resource.
You have to remember that in general, Spark can handle data larger than memory just fine, and memory problems are usually caused by user mistakes, or vicious garbage collecting cycles. None of these could be easily solved, by "adding more resources".
If you are using any of the cloud platforms for creating the cluster you can use auto-scaling functionality. that will scale cluster horizontally(number of nodes with change)
Agree with #user8889543 - You can read much more data then your memory.
And as for adding more resources on the fly. It is depended on your cluster type.
I use standalone mode, and I have a code that add on the fly machines that attached to the master automatically, then my cluster has more cores and memory.
If you only have on job/program in the cluster then it is pretty simple. Just set
spark.cores.max
to a very high number and the job will take all the cores of the cluster always. see
If you have several jobs in the cluster it becomes complicate. as mentioned in #user8889543 answer.

mongodb limit server memory consumption

I'm testing running monbodb on the kubernetes platform where I can limit the resources used by the running container.
Say I set the memory limit to 256Mb. The problem is that for example while making backup memory consumption increases to the limit and container gets restarted by kubernetes.
So the question is is there a way to limit mongodb memory consumption for my case so that it would not cause the crush by exeeding memory limit set by platform.
I could of course increase the limit but I'm interested in a principal solution and would like to understand this process better because I don't really now how memory consumed by mongodb and container os. Is it possible to tune mongodb/underlying linux os to work inside existing limits.
The limits that you have set are good enough for a monogodb pod, these are the limits used by the community as well.
The only way I think you can get around this for backups is to increase the memory limits, but still it might fail, because in other places on stackoverflow people have experienced OOM killing on VMs with memory of giga bytes. MongoDB basically tries to eat any and every memory that is made available to it.
Also there are other ways to backup mongodb: https://dba.stackexchange.com/questions/76130/how-to-backup-large-mongodb-database
I am not sure how this aligns in the k8s world.