Ephemeral Storage usage in AKS - kubernetes

I have a simple 3-node cluster created using AKS. Everything has been going fine for 3 months. However, I'm starting to have some disk space usage issues that seem related to the Os disks attached to each nodes.
I have no error in kubectl describe node and all disk-related checks are fine. However, when I try to run kubectl logs on some pods, I sometimes get "no space left on device".
How can one manage storage used in those disks? I can't seem to find a way to SSH into those nodes as it seems to only be manageable via Azure CLI / web interface. Is there also a way to clean what takes up this space (I assume unused docker images would take place, but I was under the impression that those would get cleaned automatically...)

Generally, the AKS nodes just run the pods or other resources for you, the data is stored in other space just like remote storage server. In Azure, it means managed disks and Azure file Share. You can also store the growing data in the nodes, but you need to configure big storage for each node and I don't think it's a good way.
To SSH into the AKS nodes, there are ways. One is that set the NAT rule manually for the node which you want to SSH into in the load balancer. Another is that create a pod as the jump box and the steps here.
The last point is that the AKS will delete the unused images regularly and automatically. It's not recommended to delete the unused images manually.

Things you can do to fix this:
Create AKS with bigger OS disk (I usually use 128gb)
Upgrade AKS to a newer version (this would replace all the existing vms with new ones, so they won't have stale docker images on them)
Manually clean up space on nodes
Manually extend OS disk on nodes (will only work until you scale\upgrade the cluster)
I'd probably go with option 1, else this problem would haunt you forever :(

Related

Shared file system among pods

We are running a cluster of x nodes.
Every node in the cluster pulls some files from remote storage. Unfortunately, the remote server is getting overloaded. So we are exploring a solution in which only a subset of the nodes pulls the files and are served to the remaining nodes (read-only - the other nodes do not need to write). Some subset of nodes can undergo maintenance often and can be taken offline.
I was experimenting with running NFS as a pod in a replica set with a service (fixed IP) for each of the NFS pods. If one node with the NFS-pod goes down, k8 will take care of bringing up an NFS-pod in another node with the same sticky IP.
But this new NFS would still need to remounted on the other nodes.
Any better solution for this storage problem?
Note that we would ideally not like to use remote storage since this adds extra latency.
Try Expanding Persistent Volume Claims. It's overhead for you to maintain, I recommend you to go with some locally managed the same. After that your choice.
There 2 options also recommended like : hostPath & GlusterFS volume, Please refer to this SO for more information.
#scenox suggested that's also a good option.

Where is KOPS located/running from?

I am new to Docker and Kubernetes, though I have mostly figured out how it all works at this point.
I inherited an app that uses both, as well as KOPS.
One of the last things I am having trouble with is the KOPS setup. I know for absolute certain that Kubernetes is setup via KOPS. There's two KOPS state stores on an S3 bucket (corresponding to a dev and prod cluster respectively)
However while I can find the server that kubectl/kubernetes is running on, absolutely none of the servers I have access to seem to have a kops command.
Am I misunderstanding how KOPS works? Does it not do some sort of dynamic monitoring (would that just be done by ReplicaSet by itself?), but rather just sets a cluster running and it's done?
I can include my cluster.spec or config files, if they're helpful to anyone, but I can't really see how they're super relevant to this question.
I guess I'm just confused - as far as I can tell from my perspective, it looks like KOPS is run once, sets up a cluster, and is done. But then whenever one of my node or master servers goes down, it is self-healing. I would expect that of the node servers, but not the master servers.
This is all on AWS.
Sorry if this is a dumb question, I am just having trouble conceptually understanding what is going on here.
kops is a command line tool, you run it from your own machine (or a jumpbox) and it creates clusters for you, it’s not a long-running server itself. It’s like Terraform if you’re familiar with that, but tailored specifically to spinning up Kubernetes clusters.
kops creates nodes on AWS via autoscaling groups. It’s this construct (which is an AWS thing) that ensures your nodes come back to the desired number.
kops is used for managing Kubernetes clusters themselves, like creating them, scaling, updating, deleting. kubectl is used for managing container workloads that run on Kubernetes. You can create, scale, update, and delete your replica sets with that. How you run workloads on Kubernetes should have nothing to do with how/what tool you (or some cluster admin) use to manage the Kubernetes cluster itself. That is, unless you’re trying to change the “system components” of Kubernetes, like the Kubernetes API or kubedns, which are cluster-admin-level concerns but happen to run on top of Kuberentes as container workloads.
As for how pods get spun up when nodes go down, that’s what Kubernetes as a container orchestrator strives to do. You declare the desired state you want, and the Kubernetes system makes it so. If things crash or fail or disappear, Kubernetes aims to reconcile this difference between actual state and desired state, and schedules desired container workloads to run on available nodes to bring the actual state of the world back in line with your desired state. At a lower level, AWS does similar things — it creates VMs and keeps them running. If Amazon needs to take down a host for maintenance it will figure out how to run your VM (and attach volumes, etc.) elsewhere automatically.

Kubernetes Citus setup with individual hostname/ip

I am in the process of learning Kubernetes with a view to setting up a simple cluster with Citus DB and I'm having a little trouble with getting things going, so would be grateful for any help.
I have a docker image containing my base debian image configured for Citus for the project, and I want to set it up at this point with one master, that should mount a GCP master disk with a Postgres DB that I'll then distribute among the other containers, each mounted with a individual separate disk with empty tables (configured with the Citus extension) to hold what gets distributed to each. I'd like to automate this further at some point, but now I'm aiming for just a master container, and eight nodes. My plan is to create a deployment that opens port 5432 and 80 on each node, and I thought that I can create two pods, one to hold the master and one to hold the eight nodes. Ideally I'd want to mount all the disks and then run a post-mount script on the master that will find all the node containers (by IP or hostname??), add them as Citus nodes, then run create_distributed_table to distribute the data.
My confusion at present is about how to label all the individual nodes so they will keep their internal address or hostname and so in the case of one going down it will be replaced and resume with the data on the PD. I've read about ConfigMaps and setting hostname aliases but I'm still unclear about how to proceed. Is this possible, or is this the wrong way to approach this kind of setup?
You are looking for a StatefulSet. That lets you have a known number of pod replicas; with attached storage (PersistentVolumes); and consistent DNS names. In the pod spec I would launch only a single copy of the server and use the StatefulSet's replica count to control the number of "nodes" (also a Kubernetes term), if the replica is #0 then it's the master.

Couchbase on Google Container Engine resets itself

I have deployed a 4 node Couchbase cluster using Docker images on the Google Container Engine with Kubernetes. I was able to access the Couchbase Console, look at the buckets, query etc. Now, after a couple of days, I go the Console URL and the Couchbase initial setup screen comes up! As though this is a fresh install. I can see that the nodes and pods are all still up and running.
Had a similar problem on my Windows box with Docker cluster (No Kubernetes). I redeployed that again.
Anyone else experienced this?
When you destroy and recreate container instances all the underlying state is lost.
If you want to preserve the state of your couchbase installation you'll need to use a docker data volume. Just create one and mount your couchbase data file directory.
On gcp, you'll additionally want to map a directory on the data volume to a persistent disk.

How to increase disk size of kubernetes nodes programmatically

We are running out of disk space for containers running on our nodes. We are running k8s 1.0.1 in aws. We are also trying to do all our configuration in software instead of manually configuring things.
How do we increase the disk size of the nodes? Right now they have 8gb each as created by https://get.k8s.io | bash. It's fine if we have to create a new cluster and move our services/pods to it.
You should be able to do so setting the MINION_ROOT_DISK_SIZE environment variable before creating the cluster. However this option was just merged yesterday, so it may not be available yet unless you use the cluster/kube-up.sh script from HEAD of the repository.