I would like to ask about a strange memory behavior that we encountered in some of our clusters.
After a spike in the memory consumption of the api server, the ram remains in the same level of the top of the spike which means that the kube api server does not free any memory.
Is this behavior normal? Can you guide us to a document that describes the kube api server memory cleanup mechanism?
Cluster information:
Kubernetes version: openshift 4.6.35 / kubernetes version 1.19
Cloud being used: openstack 13
Installation method: openshift IPI installation
Host OS: coreos
UPDATE:
We upgraded the cluster to openshift version 4.8 and now the api server can free up memory.
Related
I am seeing a continuous 8 to 15% CPU usage on Rancher related processes while there is not a single cluster being managed by it. Nor is any user interacting with. What explains this high CPU usage when idle? Also, there are several "rancher-agent" containers perpetually running and restarting. Which does not look right. There is no Kubernetes cluster running on this machine. This machine (unless Rancher is creating its own single node cluster for whatever reason).
I am using Rancher 2.3
docker stats:
docker ps:
htop:
I'm not sure I would call 15% "high", but Kubernetes has a lot of ongoing stuff even if it looks like the cluster is entirely quiet. Stuff like processing node heartbeats, etcd election traffic, controllers with time-based conditions which have to be processed. K3s probably streamlines that a bit, but 0% CPU usage is not a design goal even in the fork.
Rancher (2.3.x) does not do anything involving k3s. These pictures are not "just Rancher".
k3s is separately installed and running.
The agents further suggest that this node is added to a cluster (maybe the same Rancher running on it, maybe not).
It restarting all the time is not helping CPU usage, especially if it is registered to that local Rancher instance.
Also you're running a completely random commit from head instead of an actual release.
FWIW...In my case, I built the raspberry pi based Rancher/k3 lab as designed by Network Chuck on youtube. The VM on my linux host that runs Rancher will start off fairly quiet, then over the course of a couple of days the rancherd process will consistently hit near 100% cpu usage (I gave it 3 vcpu's) and stay there, even though I have no pods running on either the pi cluster or the local Rancher VM cluster. A reboot starts the process over, but within a few days its back to 100% cpu usage.
On writing this I just noticed that due to a DHCP issue, my original external ip for the local rancher cluster node got changed from 163 to 151 (I reserved it in pihole to 151, just never updated rancher config). Just fixed it in the Rancher gui, we'll see if that clears up some of the errors I saw in the logs and keeps the CPU usage normal on idle.
I am new to Kubernetes and, trying to setup the master and 2 node architecture using oracle Virtualbox.
OS: Ubuntu 16.04.6 LTS
Docker: 17.03.2-ce
Kubernetes
Client Version: v1.17.4
Server Version: v1.17.4
When I run the join command on the worker node, "kube-controller-manager" and "api-server manager" get disappeared and worker nodes are not getting joined (though join command executed successfully)
I have set the Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs" but still same error.
Please see below snapshot.
Thanks.
The link you have provided is no longer available. While learning and trying out Kubernetes for the first time I highly recommend using the official docs.
There you will find a detailed guide regarding Creating a single control-plane cluster with kubeadm. Note that:
To follow this guide, you need:
One or more machines running a deb/rpm-compatible Linux OS; for example: Ubuntu or CentOS.
2 GiB or more of RAM per machine–any less leaves little room for your apps.
At least 2 CPUs on the machine that you use as a control-plane node.
Full network connectivity among all machines in the cluster. You can use either a public or a private network.
You also need to use a version of kubeadm that can deploy the version
of Kubernetes that you want to use in your new cluster.
Kubernetes’ version and version skew support policy applies to kubeadm
as well as to Kubernetes overall. Check that policy to learn about
what versions of Kubernetes and kubeadm are supported. This page is
written for Kubernetes v1.18.
The kubeadm tool’s overall feature state is General Availability (GA).
Some sub-features are still under active development. The
implementation of creating the cluster may change slightly as the tool
evolves, but the overall implementation should be pretty stable.
If you encounter any issues, first try the troubleshooting steps.
Please let me know if that helped.
I'm using Istio at the moment combined with the cert-manager. Because I need to have multiple certificates I'm using SDS instead of the volume mount approach.
But the hardware requirements for this stuff are really high. For GKE it is recommended to use a node-pool of 4x n1-standard-2 machines. This sums up to 200$ per month just for Istio. The recommendation for EKS is 2x m5.large machines. So it is a little bit cheaper but still around 150$. What confuses me is, that Minikube "just" needs 4vCPUs and 16GB memory in total which is round about the half of the requirements for GKE and EKS.
You'll see the resource hungry components by looking at the istio-system namespace, I mean especially the limits. For me it is:
istio-telemetry > 1100m / 6800m (requested / limits)
istio-policys (I have 5 of them) > 110m / 2000m
My question is:
Did you manage to reduce the limits without facing issues in production?
What node-pool size / machine type are your running your Istio plane?
Did someone tried auto-scaling for this node-pool? Did it reduce the costs?
Kind regards from Berlin.
Managed Istio for GKE is offered by Google as a pre-configured bundle. 4x n1-standard-2 is recommended to provide enough resources for all Istio components being installed.
Downsizing a cluster below the recommended size does not make sense.
Installation of managed Istio onto a standard GKE cluster (3x n1-standard-1)
will fail due to lack of resources. Besides that you wouldn't have
free computing capacity for your workloads. Recommended cluster size
seems reasonable.
Apart from recommended hardware configuration (4x n1-standard-2),
managed Istio can be installed and running on a cluster with configuration
8x n1-standard-1.
Taking into account mentioned in the point ##1, autoscaling could be beneficial
mostly for volatile workloads, but won't help that much for saving resources
allocated for Istio.
If the managed Istio for GKE seemed too resource consuming, you could install original version of Istio and select an installation profile with the components you actually need, as described here:
Customizable Install with Helm
My Kubernetes Engine cluster keeps rebooting one of my nodes, even though all pods on the node are "well-behaved". I've tried to look at the cluster's Stackdriver logs, but was not able to find a reason. After a while, the continuous reboots usually stop, only to occur again a few hours or days later.
Usually only one single node is affected, while the other nodes are fine, but deleting that node and creating a new one in its place only helps temporarily.
I have already disabled node auto-repair to see if that makes a difference (it was turned on before), and if I recall correctly this started after upgrading my cluster to Kubernetes 1.13 (specifically version 1.13.5-gke). The issue has persisted after upgrading to 1.13.6-gke.0. Even creating a new node pool and migrating to it had no effect.
The cluster consists of four nodes with 1 CPU and 3 GB RAM each. I know that's small for a k8s cluster, but this has worked fine in the past.
I am using the new Stackdriver Kubernetes Monitoring as well as Istio on GKE.
Any pointers as to what could be the reason or where I look for possible causes would be appreciated.
Screenshots of the Node event list (happy to provide other logs; couldn't find anything meaningful in Stackdriver Logging yet):
Posting this answer as a community wiki to give some troubleshooting tips/steps as the underlying issue wasn't found.
Feel free to expand it.
After below steps, the issue with a node rebooting were not present anymore:
Updated the Kubernetes version (GKE)
Uninstalling Istio
Using e2-medium instances as nodes.
As pointed by user #aurelius:
I would start from posting the kubectl describe node maybe there is something going on before your Node gets rebooted and unhealthy. Also do you use resources and limits? Can this restarts be a result of some burstable workload? Also have you tried checking system logs after the restart on the Node itself? Can you post the results? – aurelius Jun 7 '19 at 15:38
Above comment could be a good starting point for troubleshooting issues with the cluster.
Options to troubleshoot the cluster pointed in comment:
$ kubectl describe node focusing on output in:
Conditions - KubeletReady, KubeletHasSufficientMemory, KubeletHasNoDiskPressure, etc.
Allocated resources - Requests and Limits of scheduled workloads
Checking system logs after the restart on the node itself:
GCP Cloud Console (Web UI) -> Logging -> Legacy Logs Viewer/Logs Explorer -> VM Instance/GCE Instance
It could be also beneficiary to check the CPU/RAM usage in:
GCP Cloud Console (Web UI) -> Monitoring -> Metrics Explorer
You can also check if there are any operations on the cluster:
gcloud container operations list
Adding to above points:
Creating a cluster with Istio on GKE
We suggest creating at least a 4 node cluster with the 2 vCPU machine type when using this add-on. You can deploy Istio itself with the default GKE new cluster setup but this may not provide enough resources to explore sample applications.
-- Cloud.google.com: Istio: Docs: Istio on GKE: Installing
Also, the official docs of Istio are stating:
CPU and memory
Since the sidecar proxy performs additional work on the data path, it consumes CPU and memory. As of Istio 1.7, a proxy consumes about 0.5 vCPU per 1000 requests per second.
-- Istio.io: Docs: Performance and scalability: CPU and memory
Additional resources:
Cloud.google.com: Kubernetes Engine: Docs: Troubleshooting
Kubernetes.io: Docs: Debug cluster
Hi im trying to setup stackdriver to monitor my containers but the cpu metrics dont seem to work, im working with the following versions
Master Version 1.2.5
Node Version 1.2.4
heapster-v1.0.2-594732231-sil32
this is a group a create for the databases (it also happens for the wildfly pod and modcluster), i have a couple of other questions,
is it posible to monitor postgres or i have to install the agent on
the docker image
can i monitor the images on kubernetes, or the disks on Google cloud?
Do your containers have CPU limits specified on them? The CPU Usage graph on that page is supposed to show utilization, which is defined as cores used / cores reserved. If a container hasn't specified a maximum number of cores, then it won't have a utilization either, as mentioned in the description of the CPU utilization metric.