Specifying memory in Kubernetes pods for deployment of Docker image - kubernetes

I am exploring about implementation of Kubernetes cluster and deployment into Kubernetes cluster using Jenkins via CI/CD pipeline. When exploring I found that we don't need to define the worker machine node where we need to deploy our pods. Kubernetes master will take care for where to deploy / free pod in worker machine for deployment. We only need to define how much memory need to that pod in definition.
Here my confusion is that, Already we assigned and configured Kubernetes cluster for deployment. That all nodes containing its own memory according to creation of AWS EC2 (since I am planning to use AWS Ec2 - Ubuntu 16.04 LTS).
So why we again need to define memory in pod ? Is that proper way of pod deployment ?
I am only started in CI/CD pipeline world.

Specifying memory and cpu in the pod specification is completely optional. Still there are a couple of aspects to specifying memory and CPU at pod level:
As explained here, if you don't specify CPU/memory - the pod/container can consume all resources on that node and potentially affect other pod/containers running on that node.
Each application should specify the memory and CPU they need for running the application. This information is used by Kubernetes during scheduling the pod on one of the nodes in the cluster where enough resources are available. This information ensures better scheduling decisions.
It enables the Horizontal Pod Autoscaler (HPA) to scale the pods when the resource consumption beyond a certain limit. The details are explained in this doc. Unless there is a memory/cpu limit specified, you can not calculate that the pod is running 80% of that metric and it should be scaled into two replicas.
You can also enable a certain default at namespace level and then only override for specific applications, details here

Related

GKE node pool doesn't scale up

I have a GKE cluster which doesn't scale up when a particular deployment needs more resources.
I've checked the cluster autoscaler logs and it has entries with this error:
no.scale.up.nap.pod.zonal.resources.exceeded. The documentation for this error says:
Node auto-provisioning did not provision any node group for the Pod in
this zone because doing so would violate resource limits.
I don't quite understand which resource limits are mentiond in the documentation and why it prevents node-pool from scaling up?
If I scale cluster up manually - deployment pods are scaled up and everything works as expected, so, seems it's not a problem with project quotas.
Limits for clusters that you define are enforced based on the total CPU and memory resources used across your cluster, not just auto-provisioned pools.
When you are not using node auto provisioning (NAP), disable node auto provisioning feature for the cluster.
When you are using NAP, then update the cluster wide resource limits defined in NAP for the cluster .
Try a workaround by specifying the machine type explicitly in the workload spec. Ensure to use a supported machine family with GKE node auto-provisioning

GKE Cluster autoscaler profile for older luster

Now in GKE there is new tab while creating new K8s cluster
Automation - Set cluster-level criteria for automatic maintenance, autoscaling, and auto-provisioning. Edit the node pool for automation like auto-scaling, auto-upgrades, and repair.
it has two options - Balanced (default) & Optimize utilization (beta)
cant we set this for older cluster any work around?
we are running old GKE version 1.14 we want to auto-scale cluster when 70% of resource utilization of existing nodes.
Currently, we have 2 different pools - only one has auto node provisioning enable but during peak hour if HPA scales POD, New node taking some time to join the cluster and sometimes exiting node start crashing due to resource pressure.
You can set the autoscaling profile by going into:
GCP Cloud Console (Web UI) -> Kubernetes Engine -> CLUSTER-NAME -> Edit -> Autoscaling profile
This screenshot was made on GKE version 1.14.10-gke.50
You can also run:
gcloud beta container clusters update CLUSTER-NAME --autoscaling-profile optimize-utilization
The official documentation states:
You can specify which autoscaling profile to use when making such decisions. The currently available profiles are:
balanced: The default profile.
optimize-utilization: Prioritize optimizing utilization over keeping spare resources in the cluster. When selected, the cluster autoscaler scales down the cluster more aggressively: it can remove more nodes, and remove nodes faster. This profile has been optimized for use with batch workloads that are not sensitive to start-up latency. We do not currently recommend using this profile with serving workloads.
-- Cloud.google.com: Kubernetes Engine: Cluster autoscaler: Autoscaling profiles
This setting (optimize-utilization) could not be the best option when using it for serving workloads. It will more aggressively try to scale-down (remove a node). It will automatically reduce the amount of available resources your cluster is having and could be more vulnerable to workload spikes.
Answering the part of the question:
we are running old GKE version 1.14 we want to auto-scale cluster when 70% of resource utilization of existing nodes.
As stated in the documentation:
Cluster autoscaler increases or decreases the size of the node pool automatically, based on the resource requests (rather than actual resource utilization) of Pods running on that node pool's nodes. It periodically checks the status of Pods and nodes, and takes action:
If Pods are unschedulable because there are not enough nodes in the node pool, cluster autoscaler adds nodes, up to the maximum size of the node pool.
-- Cloud.google.com: Kubernetes Engine: Cluster autoscaler: How cluster autoscaler works
You can't directly scale the cluster based on the percentage of resource utilization (70%).
Autoscaler bases on inability of the cluster to schedule pods on currently existing nodes.
You can scale the amount of replicas of your Deployment by CPU usage with Horizontal Pod Autoscaler. This Pods could have a buffer to handle increased amount of traffic and after a specific threshold they could spawn new Pods where the CA( Cluster autoscaler) would send a request for a new node (if new Pods are unschedulable). This buffer would be the mechanism to prevent sudden spikes that application couldn't manage.
The buffer part and over-provisioning explained in details in:
Cloud.google.com: Solutions: Best practices for running cost effective kubernetes applications on gke: Autoscaler and over-provisioning
There is an extensive documentation about running cost effective apps on GKE:
Cloud.google.com: Solutions: Best practices for running cost effective kubernetes applications on gke
I encourage you to check above link as there are a lot of tips and insights on (scaling, over-provisioning, workload spikes, HPA, VPA,etc.)
Additional resources:
Cloud.google.com: Kubernetes Engine: Node auto provisioning

Is it possible to schedule a pod to run for say 24 hours and then remove deployment/statefulset? or need to use jobs?

We have a bunch of pods running in dev environment. The pods are auto-provisioned by an application on every business action. The problem is that across various namespaces they are accumulating and eating available resources in EKS.
Is there a way without jenkins/k8s jobs to simply put some parameter on the pod manifest to tell it to self destruct say in 24 hours?
Add to your pod.spec:
activeDeadlineSeconds: 86400
After deadline your Pod will be stopped for good with the status DeadlineExceeded
If I understood your situation properly, you would like to scale your cluster down in order to save resources.
Kubernetes is featured with the ability to autoscale your application in a cluster. Literally, it means that Kubernetes can start additional pods when the load is increasing and terminate excessive pods when the load is decreasing.
It is possible to downscale the application to zero pods, but, in this case, you will have a delay serving the first request while the pod is starting.
This functionality relies on performance metrics. From the practical side, it means that autoscaling doesn't happen instantly, because it takes some time to performance metrics reach the configured threshold.
The mentioned Kubernetes feature called HPA(horizontal pod autoscale) is described in this document.
In case you are running your cluster on GCP or GKE, you are able to go further and automatically start additional nodes for your cluster when you need more computing capacity and shut down nodes when they are not running application pods anymore.
More information about this functionality can be found following the link.
Last, but not least, you can use tool like Ansible to manage all your kubernetes assets (it can create/manage deployments via playbooks).
If you decide to give it a try, you might find this information useful:
Creating a Container cluster in GKE
70% cheaper Kubernetes cluster on AWS
How to build a Kubernetes Horizontal Pod Autoscaler using custom metrics

Kubernetes Cluster with different CPU configuration

I have created a K8S cluster of 10 machines. which is having cpus of different memory and cores (4 core 32 GB, 4 core 8 GB). Now when I am deploying any application on the cluster it is creating pods in a random manner. It is not creating the POD on the basis of memory or load.
How is Kubernetes master distributing the Pods in the cluster? I am not getting any significant answers. How can i configure the cluster for best use of resources?
Kubernetes uses a scheduler for deciding which pod is started on which node. One improvement is to tell the scheduler what your pods need as minimum and maximum resources.
Resources are Memory (measured in bytes), CPU (measured in cpu units) and ephemeral storage for things like emtpy dir(with 1.11). When you provide these information for your deployments Kubernetes can make better decisions where to run.
Without these information a nginx pod will be scheduled the same way as any heavy Java application.
The limits and requests config is described here. Setting both limits is a good idea to make scheduling easier and to avoid pods running amok and using all node resources.
If this is not enough there is also the possibility to add a custom scheduler which is explained in this documentation

Kubernetes automatic shutdown after some idle time

Does kubernetes or Helm support shut down the pods if it is idle for more than a given threshold time?
This would be very useful in the development environment, to provide room for other processes to consume it and save cost.
Kubernetes is featured with the ability to autoscale your application in a cluster. Literally, it means that Kubernetes can start additional pods when the load is increasing and terminate excessive pods when the load is decreasing.
It is possible to downscale the application to zero pods, but, in this case, you will have a delay serving the first request while the pod is starting.
This functionality relies on performance metrics provided by Heapster application, that must be run in the cluster. From the practical side, it means that autoscaling doesn't happen instantly, because it takes some time to performance metrics reach the configured threshold.
The mentioned Kubernetes feature called HPA(horizontal pod autoscale) is described in this document.
In case you are running your cluster on GCP or GKE, you are able to go further and automatically start additional nodes for your cluster when you need more computing capacity and shut down nodes when they are not running application pods anymore.
More information about this functionality can be found following the link.
If you decide to give it a try, you might find this information useful:
Creating a Container cluster in GKE
70% cheaper Kubernetes cluster on AWS
How to build a Kubernetes Horizontal Pod Autoscaler using custom metrics