i am running an AKS cluster and i have deployed Prometheus and Alertmanager via deployment resources in k8s and they also are controlled by replicaset.The issue is that sometimes the restart of Alertmanger get stuck.It is related to accessMode of PVC.During restart,k8s will start the new pod in a different node from the currently node where the running pod is assigned,depending on resource utilization on the node.In simple words it means,same PVC is accessed from 2 different pods assigned to different nodes.This is not allowed because in the config of PVC i am using accessMode ReadWriteOnce.Looking this comment in github for prometheus operator seems to be by design that option accessMode ReadWriteMany is not allowed.
So my questions, why such design and what could happen if i change accessMode to ReadWriteMany?Any practical experience?
Related
I want to apply VPA vertical pod auto scaling for database pods. Can we use VPA for database auto scaling (Vertical) as VPA requires at least 2 replicas (ref : https://github.com/kubernetes/autoscaler/issues/1665#issuecomment-464679271) as it delete pods when set criteria is reached. So pods are deleted hence also data.
What is good practice for using VPA with database pods?
As I understand it, the real question is how to run a stateful workload with multiple replicas.
Use StatefulSets to configure n replicas for a database. StatefulSet pods have stable names which are preserved across pod restarts (and reincarnations). Combined with PersistentVolumeClaim templates (accepted with StatefulSet spec) and headless services, it is capable of retaining same volumes and network FQDN across reincarnations.
Take a look at Helm charts for various databases, e.g. MySQL chart, for useful insights.
On a side note, it might be worthwhile to consider using an operator for the database application you're using. Operators for most applications can be found on https://operatorhub.io.
VPA - Vertical pod autoscaler can work in 2 ways:
Recommendation mode - it will recommend the requests and limits for pods based on resources used
Auto mode - it will automatically analyze the usage and set the request and limits on pods. This will result in pod termination to recreate it with new specification as stated here:
Due to Kubernetes limitations, the only way to modify the resource requests of a running Pod is to recreate the Pod. If you create a VerticalPodAutoscaler with an updateMode of "Auto", the VerticalPodAutoscaler evicts a Pod if it needs to change the Pod's resource requests.
Cloud.google.com: Kubernetes Engine: Docs: Concepts: Vertical pod autoscaler
Please refer to above link for more information regarding the concepts of VPA.
The fact that it needs at least 2 replicas is most probably connected with the fact of high availability. As the pods are getting evicted to support new limits they are unable to process the request. If it came to situation where there is only 1 replica at the time, this replica wouldn't be able to respond to requests when in terminating/recreating state.
There is an official guide to run VPA on GKE:
Cloud.google.com: Kubernetes Engine: How to: Vertical pod autoscaling
VPA supports: Deployments as well as StatefulSets.
StatefulSet
Like a Deployment, a StatefulSet manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.
If you want to use storage volumes to provide persistence for your workload, you can use a StatefulSet as part of the solution.
Kubernetes.io: StatefulSet
Configuring StatefulSet with PersistentVolumes will ensure that the data stored on PV will not be deleted in case of pod termination.
To be able to use your database with replicas > 1 you will need to have replication implemented within your database environment.
There are guides/resources/solutions on running databases within Kubernetes environment. Please choose the solution most appropriate to your use case. Some of them are:
Kubernetes.io: Run replicated stateful application
Github.com: Zalando: Postgres operator
Github.com: Oracle: Mysql operator
After deploying your database you will be able to run below command to extract the name of the StatefulSet:
$ kubectl get sts
You can then apply the name of the StatefulSet to the VPA like below:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: DB-VPA
spec:
targetRef:
apiVersion: "apps/v1"
kind: StatefulSet
name: <INSERT_DB_STS_HERE>
updatePolicy:
updateMode: "Auto"
I encourage you also to read this article:
Cloud.google.com: Blog: To run or not to run a database on Kubernetes, what to consider
I have a Kubernetes v1.17.0 cluster with multiple nodes. I've created PVC with access mode set to RWO. From the Kubernetes docs:
ReadWriteOnce -- the volume can be mounted as read-write by a single node
I'm using a Cinder volume plugin which doesn't support ReadWriteMany.
When I create two different deployments that mount the same PVC Kubernetes sometimes deploys them on two different nodes which cause pods to fail.
Is this desired behaviour or is there a problem in my configuration?
As I gathered from your answers to the comments, you do not want to use affinity rules but want the scheduler to perform this work for you.
It seems that this issue has been known since at least 2016 but has not yet been resolved, as the scheduling is considered to be working as expected: https://github.com/kubernetes/kubernetes/issues/26567
You can read the details in the issue, but the core problem seems to be that in the definition of Kubernetes, a ReadWriteOnce volume can never be accessed by two Pods at the same time. By definition. What would need to be implemented is a flag saying "it is OK for this RWO volume to be accessed by two Pods at the same time, even though it is RWO". But this functionality has not been implemented yet.
In practice, you can typically work around this issue by using a Recreate Deployment Strategy: .spec.strategy.type: Recreate. Alternatively, use the affinity rules as described by the other answers.
The provisioning of PV/PVC and deployment of new pods, on the same node can only be achieved via node affinity. However, if you want Kubernetes to decide it for you will have to use inter-pod affinity.
However just to verify if you are doing everything the right way please refer this.
Persistent volumes in Kubernetes can be tied to a node or an availability zone because of the underlying hardware: A storage drive within a server, a SAN within a single datacenter cannot be moved around by the storage provisioner.
Now how does the storage provisioner know on which node or in which availability zone he needs to create the persistent volume? That's why persistent volume claims have a volume binding mode, which is set to WaitForFirstConsumer in that case. This means, the provisioning happens after the first pod that mounts the persistent volume has been scheduled. For more details, read here.
When a second pod is scheduled, it might run on another node or another availability zone unless you tell the scheduler to run the pod on the same node or in the same availability zone as the first pod by using inter-pod affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
# adjust the labels so that they identify your pod
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- myapp
# make pod run on the same node
topologyKey: kubernetes.io/hostname
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am new to k8s and i want to have few clarification on below questions, please let me know ur thoughts
Does persistent volumes claims are confined to single namespace ?
A PersistentVolumeClaim (kubectl get pvc) is confined to a Namespace. A PersistentVolume (kubectl get pv) is defined on cluster-level. Each namespace can access the PV which are not "Bound"
You have to install one CNI (Container Network Interface) like calico or flannel. There you will specify a PodNetworkCIDR e.q. 10.20.0.0/16. Then the IPAdressManagement of e.q. Calico will split that network into some smaller networks. Each Kubernetes Node get's his own Network from the 10.20.0.0/16 Network.
If you mean the Kubernetes "Infrastructure" it's mostly deployed to kube-system. To deploy you're own stuff like Monitoring, Logging, Storage you can create your own Namespaces
No not all Objects are bound to a Namespace. With kubectl api-resources you will get an overview.
There are a lot of storagetype (https://kubernetes.io/docs/concepts/storage/volumes/#types-of-volumes). But if you not specify any volumes (PV) which are persistant, your files which are written in a container are gone if the container restarts.
A Pod is the smallest Unit which can be addressed. A Pod could contain multiple container.
A Deployment describes the state of the Pod. It's recommended to use a Deployment. You can start a Pod without a Deployment, but if you delete the Pod it will not be restarted by the Kubelet. (The following command creates a Pod without a Deployment: kubectl run nginx --image=nginx --port=80 --restart=Never). For Storage, you would specify the PVC in the Deployment. But you have to create that PVC before.(https://kubernetes.io/docs/tasks/configure-pod-container/configure-persistent-volume-storage/)
Exactly, For e.q. a MySQL you would use recreate, for httpd you would use rolling.
What do you mean with local proxy For local development you can have a look at minikube?
No, a Pod has only 1 IP.
Does persistent volumes claims are confined to single namespace ?
Persistent Volume Claims(PVC) is bound to namespace. PVC must exist in the same
namespace as the Pod using the claim
How many pod networks can we have per cluster ?
Default maximum of 110 Pods per node, Kubernetes assigns a /24 CIDR block (256 addresses) to each of the nodes.
Which namespace contains the infrastructure pods ?
Generally kube-system
Does all objects are restricted to single namespace ?
No, not all objects are restricted to single namespace. You can create objects in different namespaces.
Does container offer a persistent storage that outlives the container ?
If you use PV/PVC then your storage must be persistent
What is the smallest object or unit(pod or container or replicaset or deployment) we can work with in k8s?
A Kubernetes pod is a group of containers, and is the smallest unit that Kubernetes administers.
does a deployment use a persistent volume or a persistent volume claim ?
You need to use PVC in deployment, in volume section like following
volumes:
- name: data
persistentVolumeClaim:
claimName: <pvc name>
With deployment config spec which strategy(recreate or rollingupdate) allows us to control the updates to pod ?
Recreate will terminate all the running instances then recreate them with the newer version. Rolling update follows defined strategy of how many instance will be down and recreate at a time.
How can we start local proxy which is useful for development and testing ?
You can use port-forwarding
Pod can have multiple ip address?
single pod have single ip address. details here
We are having problem with several deployments in our cluster that do not seem to be working. But I am a bit apprehensive in touching these, since they are part of the kube-system namespace. I am also unsure as what the correct approach to getting them into an OK state is.
I currently have two daemonsets that have warnings with the message
DaemonSet has no nodes selected
See images below. Does anyone have any idea what the correct approach is?
A DaemonSet is creating a pod in each node of your Kubernetes cluster.
If the Kubernetes scheduler cannot schedule any pod, there are several possibilities:
Pod spec has a too high memory requests resource for the memory node capacity, look at the value of spec.containers[].resources.requests.memory
The nodes may have a taint, so DaemonSet declaration must have a toleration (kubernetes documentation about taint and toleration)
The pod spec may have a nodeSelector field (kubernetes documentation about node selector)
The pod spec may have an enforced node affinity or anti-affinity (kubernetes documentation about node affinity)
If Pod Security Policies are enabled on the cluster, a security policy may be blocking access to a resource that the pod needs to run
There are not the only solutions possible. More generally, a good start would be to look at the events associated to the daemon set:
> kubectl describe daemonsets NAME_OF_YOUR_DAEMON_SET
I create a Deployment with a volumeMount that references a PersistentVolumeClaim along with a memory request on a cluster with nodes in 3 difference AZs us-west-2a, us-west-2b, and us-west-2c.
The Deployment takes a while to start while the PersistentVolume is being dynamically created but they both eventually start up.
The problem I am running into is that the PersistentVolume is made in us-west-2c and the only node the pod can run on is already over allocated.
Is there a way for me to create the Deployment and claim such that the claim is not made in a region where no pod can start up?
I believe you're looking for Topology Awareness feature.
Topology Awareness
In Multi-Zone clusters, Pods can be spread across
Zones in a Region. Single-Zone storage backends should be provisioned
in the Zones where Pods are scheduled. This can be accomplished by
setting the Volume Binding Mode.
Kubernetes released topology-aware dynamic provisioning feature with kubernetes version 1.12, and I believe this will solve your issue.