Prometheus + Longhorn = wrong volume size - kubernetes

I am not really sure, if this is a prometheus issue, or just Longhorn, or maybe a combination of the two.
Setup:
Kubernetes K3s v1.21.9+k3s1
Rancher Longhorn Storage Provider 1.2.2
Prometheus Helm Chart 32.2.1 and image: quay.io/prometheus/prometheus:v2.33.1
Problem:
Infinitely growing PV in Longhorn, even over the defined max size. Currently using 75G on a 50G volume.
Description:
I have a really small 3 node cluster with not too many deployments running. Currently only one "real" application and the rest is just kubernetes system stuff so far.
Apart from etcd, I am using all the default scraping rules.
The PV is filling up a bit more than 1 GB per day, which seems fine to me.
The problem is, that for whatever reason, the data used inside longhorn is infinitely growing. I have configured retention rules for the helm chart with a retention: 7d and retentionSize: 25GB, so the retentionSize should never be reached anyway.
When I log into the containers shell and do a du -sh in /prometheus, it shows ~8.7GB being used, which looks good to me as well.
The problem is that when I look at the longhorn UI, the used spaced is growing all the time. The PV does exist now for ~20 days and is currently using almost 75GB of a defined max of 50GB. When I take a look at the Kubernetes node itself and inspect the folder, which longhorn uses to store its PV data, I see the same values of space being used as in the Longhorn UI, while inside the prometheus container, everything looks good to me.
I hope someone has an idea what the problem could be. I have not experienced this issue with any other deployment so far, all others are good and really decrease in size used, when something inside the container gets deleted.

Related

How to set http2-max-streams-per-connection with Kops

Is it possible to set the --http2-max-streams-per-connection value for a cluster created by Kops.
I have an interesting situation where one of my nodes falls into a NotReady status whenever I deploy a helm chart to it and I feel like it might be connected to this setting.
The nodes are generally fine and run without any issues, but once I deploy my helm chart, the status of whatever node it gets deployed to changes after a few minutes to NotReady Which I find weird.
I've done a bit of reading and seen a number of similar issues pointing to the setting --http2-max-streams-per-connection but I'm not how to go about setting this.
Any ideas anyone?
You should be able to set this by adding the following to the cluster spec:
spec:
kubeAPIServer:
http2MaxStreamsPerConnection: <value>
See https://pkg.go.dev/k8s.io/kops/pkg/apis/kops#KubeAPIServerConfig and https://kops.sigs.k8s.io/cluster_spec/
That being said, I do not believe the reason for your NotReady nodes is due to that setting. You may want to join #kops-users on the kubernetes slack space and ask for help triaging that problem.

Migrate PV and change CPU limits on Kubernetes

I have a small kubernetes cluster with AWX running.
I would like to make some changes, the PV is a filesystem on one of the nodes.
Is it possible to migrate it to a different PV, like NFS?
Also, I would like to change the CPU and memory limits. But I guess I will have to redeploy it.
Should I try to migrate the PV or delete everything and recreate it?
Thanks
Assuming that you have dynamic provisioning enabled I advice you to use pv-migrate.
This is a cli tool/kubectl plugin to easily migrate the contents of one Kubernetes PersistentVolume to another.
Common use cases:
You have a database with a bound 30 GB PersistentVolumeClaim. It occurred 30 GB was not enough and you filled all the disk space rather quickly. And sadly your StorageClass/provisioner doesn't support volume expansion. Now you need to create a new PVC of 100 GB and somehow copy all the data to the new volume, as-is, with its permissions and so on.
You need to move a PersistentVolumeClaim from one namespace to another.
To migrate contents of PersistentVolumeClaim pvc-a in namespace name-space-a to the PersistentVolumeClaim pvc-b in namespace name-space-b, use the following command:
$ kubectl pv-migrate \
--source-namespace name-space-a \
--source pvc-a \
--dest-namespace name-space-b \
--dest pvc-b
Take also a look at: change-pv-reclaim-policy, resizing-persistent-volumes-using-kubernetes.

Attach new azure disk volume per pod in Kubernetes deployment

I have a Kubernetes Deployment app with 3 replicas, which needs a 7GB storage for each replica, I want to be able to attach a new empty azureDisk storage to be mounted into each pod/replica created in this deployment.
Basically I have the following restrictions:
I must use Deployment, not a Statefulset
Each time a pod dies and a new pod is up, it shouldn't have a state, and it will have a new empty azureDisk attached to it.
the pods do not share their storage, each pod has its own 7GB storage.
the pods need to use azureDisk because I need a 7GB storage on demand, which means, dynamically creating azureStorage when I scale my deployment replicas.
When using azureDisk, I need to use it with Access mode type ReadWriteOnce (as says in the docs ) and it will attach the only 1 pod to this disk, that's found, but, that only works if I have 1 pod, if I have more than 1 pod, I can't use the same claim... is there any way to dynamically ask for more storages like the one in the first claim?
NOTE 1: I know there is a volumeClaimTemplates, but that's only related to a Statefulset.
NOTE 2: I don't care if a pod restarts 100 times, and this in turn creates 100 PV which only 1 is used, that is fine.
I'm not sure why you need to use a StatefulSet but the only I see to do this is to create your own operator for your application. The operator would have a controller that manages your pods similar to what a ReplicaSet does but with the exception that for every new pod that is instantiated a new PVC is created.
It might just be better to figure out how to run your application in a StatefulSet and use VolumeClaimTemplates
✌️
The main question is - Why? "if I have an application which doesn't have state, still I need a large volume for each pod"
Looking at this explanation you should focus on StateFull application. From my point of view it looks like you are forcing to use Deployment instead of StateFullSet for StateFull application
In your example probably you need pv which support different access modes.
The main problem you have experienced is that using pv with supported mode ReadWriteOnce you can bind at the same time only one pv by single node. So your pods in different nodes will not start due to failing volume mounting. You can use this approach only for ReadOnlyMany/ReadWriteMany scenario.
Please refer to other providers which have different capabilities for access modes like: filestore(gcp), AzureFile(azure), Glusterfs, NFS
Deployments vs. StatefulSets

How Can I Monitor Persistent Volume Metrics in Kubernetes 1.13?

I have a kubernetes 1.13 cluster running on Azure and I'm using multiple persistent volumes for multiple applications.
I have setup monitoring with Prometheus, Alertmanager, Grafana.
But I'm unable to get any metrics related to the PVs.
It seems that kubelet started to expose some of the metrics from kubernetes 1.8 , but again stopped since 1.12
I have already spoken to Azure team about any workaround to collect the metrics directly from the actual FileSystem (Azure Disk in my case). But even that is not possible.
I have also heard some people using sidecars in the Pods to gather PV metrics. But I'm not getting any help on that either.
It would be great even if I get just basic details like consumed / available free space.
I'm was having the same issue and solved it by joining two metrics:
avg(label_replace(
1 - node_filesystem_free_bytes{mountpoint=~".*pvc.*"} / node_filesystem_size_bytes,
"volumename", "$1", "mountpoint", ".*(pvc-[^/]*).*")) by (volumename)
+ on(volumename) group_left(namespace, persistentvolumeclaim)
(0 * kube_persistentvolumeclaim_info)
As an explanation, I'm adding a label volumename to every time-series of
node_filesystem*, cut out of the existing mountpoint label and then joining with the other metrics containing the additional labels. Multiplying by 0 ensures this is otherwise a no-op.
Also quick warning: I or you may be using some relabeling configs making this not work out immediately without adaptation.

Persistent Volumes & Claims & Replicas in Helm recommended approach

I'm trying to get my head around Persistent Volumes & Persistent Volume Claims and how it should be done in Helm...
The TLDR version of the question is: How do I create a PVC in helm that I can attach future releases (whether upgrades or brand new installs) to?
My current understanding:
PV is an interface to a piece of physical storage.
PVC is how a pod claims the existence of a PV for its own use. When the pod is deleted, the PVC is also deleted, but the PV is maintained - and is therefore persisted. But then how I do use it again?
I know it is possible to dynamically provision PVs. Like with Google Cloud as an example if you create ONLY a PVC, it will automatically create a PV for you.
Now this is the part I'm stuck on...
I've created a helm chart that explicitly creates the PVC & thus has a dynamically created PV as part of a release. I then later delete the release, which will then also remove the PVC. The cloud provider will maintain the PV. On a subsequent install of the same chart with a new release... How do I reuse the old PV? Is there a way to actually do that?
I did find this question which kind of answers it... However, it implies that you need to pre-create PVs for each PVC you're going to need, and the whole point of the replicas & auto-scaling is that all of those should be generated on demand.
The use case is - as always - for test/dev environments where I want my data to be persisted, but I don't always want the servers running.
Thank you in advance! My brain's hurting a bit cause I just can't figure it out... >.<
It will be a headache indeed.
Let's start with how you should do it to achieve scalable deployments with RWO storages that are attached to your singular pods when they are brought up. This is where volumeClaimTemplates come into play. You can have PVC created dynamicaly as your Deployment scales. This however suits well situation when your pod needs storage that is attached to a pod, but not really needed any longer when pod goes away (volume can be reused following reclaim policy.
If you need the data like this reatached when pod fails, you should think of StatefulSets which solve that part at least.
Now, if you precreate PVC explicitly, you have more control over what happens, but dynamic scalability will have problems with this for RWO. This and manual PV management as in the response you linked can actually achieve volume reuse, and it's the only mechanism that would allow it that I can think of.
After you hit a wall like this it's time to think about alternatives. For example, why not use a StatefulSet that will give you storage retention in running cluster and instead of deleting the chart, set all it's replicas to 0, retaining non-compute resources in place but scaling it down to nothing. Then when you scale up a still bound PVC should get reattached to rescaled pods.