We are running out of disk space for containers running on our nodes. We are running k8s 1.0.1 in aws. We are also trying to do all our configuration in software instead of manually configuring things.
How do we increase the disk size of the nodes? Right now they have 8gb each as created by https://get.k8s.io | bash. It's fine if we have to create a new cluster and move our services/pods to it.
You should be able to do so setting the MINION_ROOT_DISK_SIZE environment variable before creating the cluster. However this option was just merged yesterday, so it may not be available yet unless you use the cluster/kube-up.sh script from HEAD of the repository.
Related
I tried installing ceph octopus using cephadm. Few issues as I see it(non production):
Defining public and cluster network
Defining osd properties, say multiple osd's per device
Monitoring does not come up in the dashboard by default.
I am using centos7 single BM, single free disk. Not the ideal situation, but for test purposes, should be fine.
What exactly are your questions?
After setting up the bootstrap node you can define cluster network (before any OSDs are created). Multiple OSDs per device won't be possible because it's not a production environment and hence not documented. You'll have to do that manually on each node with ceph-volume lvm command.
There are some basic monitoring values in the dashboard but you can enhance it with grafana (https://docs.ceph.com/en/latest/mgr/dashboard/#dashboard-grafana).
is it possible to run k8s on single node without using minikube? Today I use kubeadm with 2 hosts, but I would like to know if it is possible to run using only one host.
You can run kubeadm init command to initialize single node cluster. You can add/remove nodes to the cluster.
taint the master so that it can run containers using the below command
kubectl taint nodes --all node-role.kubernetes.io/master-
You need to look into the hardware requirements for running a single node cluster. You would need to run
etcd which is the backing store for all cluster data.
Control plane software(scheduler, controller manager, api-server, kubeadm)
Worker node software(kubectl, kube-proxy)
all on one node.
When installing kube-adm I see the hardware requirements(https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/) as
2 GB or more of RAM per machine (any less will leave little room for your apps)
and 2 CPUs or more
Example configurations for etcd (https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/hardware.md#example-hardware-configurations).
For the CKA exam training material the recommended node setting for a single machine is 2 vcpu's and 7.5 GB memory with a note of caution that you may experience slowness.
I am going by Ubuntu 18.04 Linux for my recommendations. Another thing you need to do is disable the swap(https://serverfault.com/questions/881517/why-disable-swap-on-kubernetes). It is necessary since kubernetes makes maximum use of disk and cpu resources provided.
So if it for your learning. Go ahead and start with 2 vcpu's and 7.5 GB memory.
You could check
k3s
KinD
MicroK8s
for single-node Kubernetes installations.
I have a simple 3-node cluster created using AKS. Everything has been going fine for 3 months. However, I'm starting to have some disk space usage issues that seem related to the Os disks attached to each nodes.
I have no error in kubectl describe node and all disk-related checks are fine. However, when I try to run kubectl logs on some pods, I sometimes get "no space left on device".
How can one manage storage used in those disks? I can't seem to find a way to SSH into those nodes as it seems to only be manageable via Azure CLI / web interface. Is there also a way to clean what takes up this space (I assume unused docker images would take place, but I was under the impression that those would get cleaned automatically...)
Generally, the AKS nodes just run the pods or other resources for you, the data is stored in other space just like remote storage server. In Azure, it means managed disks and Azure file Share. You can also store the growing data in the nodes, but you need to configure big storage for each node and I don't think it's a good way.
To SSH into the AKS nodes, there are ways. One is that set the NAT rule manually for the node which you want to SSH into in the load balancer. Another is that create a pod as the jump box and the steps here.
The last point is that the AKS will delete the unused images regularly and automatically. It's not recommended to delete the unused images manually.
Things you can do to fix this:
Create AKS with bigger OS disk (I usually use 128gb)
Upgrade AKS to a newer version (this would replace all the existing vms with new ones, so they won't have stale docker images on them)
Manually clean up space on nodes
Manually extend OS disk on nodes (will only work until you scale\upgrade the cluster)
I'd probably go with option 1, else this problem would haunt you forever :(
This describes how one would install the agent on a regular gce instance:
https://cloud.google.com/monitoring/agent/install-agent
Previously the cluster ran on debian os nodes and we'd have the agent running to monitor cpu, disk space etc. now it's upgraded to kubernetes 1.4 and running on container-optimized os (https://cloud.google.com/container-optimized-os/docs/) the agent can't be installed manually.
I realise pods are monitored automatically, but that's only part of the picture.
I feel like I'm missing something here as this it'd be a big backward step for this not to be possible.
I've ran into the same thing several times. You have to switch back to the container-vm format in order to install the stackdriver agent.
gcloud container clusters upgrade --image-type=container_vm [CLUSTER_NAME]
That should flip it back. You can install the agent once the images flip. We're running 1.4.7 on the container-vm image and haven't seen any issues. Seems like overhead but not an actual step-back if that helps.
How can I specify a specific vm type for the cluster master (I don't want to use an high memory instance for relative an inactive node).
Also, is there any way to add nodes to a cluster and choosing the type of vm? (this can solve the first problem)
Update November 2015:
Now that Google Container Engine is no longer in alpha, you don't need to worry about the size of your cluster master, as it is part of the managed service.
You can now easily add/remove nodes from your cluster through the cloud console UI but they will all be the same machine type that you originally choose for your cluster.
If you are running OSS Kubernetes on GCE, then you can set the MASTER_SIZE environment variable in cluster/gce/config-default.sh before creating your cluster.
If you are running on GKE, we unfortunately don't yet offer the option to customize the size of your master differently than the size of your nodes. We hope to offer more flexibility in cluster provisioning soon.
There is currently not a way to resize your cluster after you create it. I'm actually working on this for OSS Kubernetes in Issue #3168.