Airflow on Kubernetes - NFS volume won't mount onto worker - kubernetes

I'm fairly new to Kubernetes so apologies for any mixups in terminology.
I'm using the official Airflow helm chart to create a development environment, and have my Dogs (and other) folders in a NFS volume on my local machine. I have configured the values.yaml like so (same for both the scheduler and worker):
# Mount additional volumes into scheduler.
extraVolumes:
- name: dags
nfs:
server: '10.106.0.113'
path: '/home/dev/projects/airflow-jobs/dags'
- name: plugins
nfs:
server: '10.106.0.113'
path: '/home/dev/projects/airflow-jobs/plugins'
- name: scripts
nfs:
server: '10.106.0.113'
path: '/home/dev/projects/airflow-jobs/scripts'
extraVolumeMounts:
- mountPath: '/opt/airflow/dags'
name: 'dags'
- mountPath: '/opt/airflow/plugins'
name: 'plugins'
- mountPath: '/opt/airflow/scripts'
name: 'scripts'
When I then spin this up, only one of the scheduler or worker pod will mount the volume successfully - the other will fail with the following message:
> kubectl describe pod airflow-worker-0
Warning FailedMount 2s kubelet Unable to attach or mount volumes: unmounted volumes=[dags plugins scripts], unattached volumes=[dags plugins scripts logs config kube-api-access-dnsjx]: timed out waiting for the condition
Why am I receiving this error - is it not possible to have two pods using the same NFS store? I had this working before using the same values.yaml file so I don't quite know what has changed!

Figured it out - it was due to the NFS mount being configured as ReadWriteOnce. As per the documentation here, this does allow multiple pods to access the volume, but only if they are located on the same node. So what was happening is that my Scheduler pod would spin up first, mount the volume, and then when the Worker pod followed it would be unable to do so because the Scheduler had reserved the volume. By coincidence, the first time I deployed these two pods must have been assigned the same node.
The simplest solution here would be to mount this as ReadWriteMany, but as I have limited permissions to my cluster and development environment, I simply made some changes to my deployment to ensure that the pods that needed access to this volume were on the same node. Plus, learning experience!
First - get the nodes that each pod is assigned to using kubectl get pods -o wide.
Get all the nodes in the cluster kubectl get nodes --show-labels
Pick a node to assign the two pods that need to share the NFS mount to. This was arbitrary, so lets call it "node123".
Update the labels of the node kubectl label nodes node123 airflow=nfs
Finally, in the values.yaml file, specify the nodeSelector property for the Scheduler and Worker nodes!
# Select certain nodes for airflow worker pods.
nodeSelector:
airflow: nfs
Then re-deploy the chart, and everything works as intended!

Related

Where can I locate the actual files of Kubernates PV hostpath

I just created the following PersistantVolume.
apiVersion: v1
kind: PersistentVolume
metadata:
name: sql-pv
labels:
type: local
spec:
storageClassName: standard
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/var/lib/sqldata"
Then I SSH the Node and traversed to the /var/lib. But I cannot see the sqldata directory created anywhere in it.
Where is the real directory created?
I created a POD that mounts this volume to a path inside the container. When I SSH the container, I can see the file in the mount path. Where are these files stored?
You have setup your cluster on Google Kubernetes Engine, that means nodes are virtual machine instances on GCP. You've probably been connecting to the cluster using the Kubernetes Engine dashboard and Connect to the cluster option. It does not SSH you to any of the node, it just starting GCP Cloud Shell terminal instance with following command like:
gcloud container clusters get-credentials {your-cluster} --zone {your-zone} --project {your-project-name}
That command is configuring kubectl agent on GCP Cloud Shell by setting proper cluster name, certificates etc. in ~/.kube/config file so you have access to the cluster (by communicating with the cluster endpoint), but you are not SSHed to any node. That's why you can't access the path defined in the hostPath.
To find a hostPath directory, you need to:
find on which node is the pod
SSH into this node
Finding a node:
Run following kubectl get pod {pod-name} with -o wide flag command - change {pod-name} to your pod name
user#cloudshell:~ (project)$ kubectl get pod task-pv-pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
task-pv-pod 1/1 Running 0 53m xx.xx.x.xxx gke-test-v-1-21-default-pool-82dbc10b-8mvx <none> <none>
SSH to the node:
Run following gcloud compute ssh {cluster-name} command - change {cluster-name} to node name from the previous command:
user#cloudshell:~ (project)$ gcloud compute ssh gke-test-v-1-21-default-pool-82dbc10b-8mvx
Welcome to Kubernetes v1.21.3-gke.2001!
You can find documentation for Kubernetes at:
http://docs.kubernetes.io/
The source for this release can be found at:
/home/kubernetes/kubernetes-src.tar.gz
Or you can download it at:
https://storage.googleapis.com/kubernetes-release-gke/release/v1.21.3-gke.2001/kubernetes-src.tar.gz
It is based on the Kubernetes source at:
https://github.com/kubernetes/kubernetes/tree/v1.21.3-gke.2001
For Kubernetes copyright and licensing information, see:
/home/kubernetes/LICENSES
user#gke-test-v-1-21-default-pool-82dbc10b-8mvx ~ $
Now there will be a hostPath directory (in your case /var/lib/sqldata), there will also be files if pod created some.
Avoid hostPath if possible
It's not recommended using hostPath. As mentioned in the comments, it will cause issues when a pod will be created on the different node (but you have a single node cluster) but it also presents many security risks:
Warning:
HostPath volumes present many security risks, and it is a best practice to avoid the use of HostPaths when possible. When a HostPath volume must be used, it should be scoped to only the required file or directory, and mounted as ReadOnly.
If restricting HostPath access to specific directories through AdmissionPolicy, volumeMounts MUST be required to use readOnly mounts for the policy to be effective.
In your case it's much better to use the gcePersistentDiskvolume type - check this article.

Where to find the Kubernetes Scheduler Configuration file in local system

I am currently working in Minikube cluster and looking to change some flags of kubernetes scheduler configuration, but I can't find it. The file looks something like-
apiVersion: kubescheduler.config.k8s.io/v1alpha1
kind: KubeSchedulerConfiguration
algorithmSource:
provider: DefaultProvider
...
disablePreemption: true
What is it's name and where can I find it?
Posting this answer as a community wiki to set a baseline and to provide additional resources/references rather than giving a definitive solution.
Feel free to edit and expand.
I haven't found the file that you are referencing (KubeSchedulerConfiguration) in minikube.
The minikube provisioning process does not create it nor references it in the configuration files (/etc/kubernetes/manifests/kube-scheduler.yaml and the --config=PATH parameter).
I'd reckon you could take a look on other Kubernetes solutions where you can configure how your cluster is created (how kube-scheduler is configured). Some of the options are:
Kubernetes.io: Docs: Setup: Production environment: Tools: Kubeadm: Create cluster and also:
Kubernetes.io: Docs: Setup: Production environment: Tools: Kubeadm: Control plane flags
Github.com: Kubernetes sigs: Kubespray
A side note!
Both: kubespray and minikube are using kubeadm as a bootstrapper!
I would also consider creating additional scheduler that would be responsible for spawning your workload (by referencing in the YAML manifests):
Kubernetes.io: Docs: Tasks: Extend Kubernetes: Configure multiple schedulers
I haven't tested it extensively and in the long term but I've managed to include the YAML manifest that you are referencing for the kube-scheduler.
Disclaimers!
Please consider below example as a workaround!
The method described below is not persistent.
Steps:
Start your minikube instance with the --extra-config
Connect to your minikube instance and edit/add files:
/etc/kubernetes/manifests/kube-scheduler.yaml
newly created KubeSchedulerConfiguration
Delete the failing kube-scheduler Pod and wait for it to be recreated.
Start your minikube instance with the --extra-config
As previously said you can add some additional parameters for your $ minikube start to be passed down to the provisioning process.
In this setup you can either pass it with $ minikube start ... or do it manually later on.
$ minikube start --extra-config=scheduler.config="/etc/kubernetes/sched.yaml"
Above parameter will add the - --config=/etc/kubernetes/sched.yaml to the command of your kube-scheduler. It will look for the file in the mentioned location.
Connect to your minikube instance ($ minikube ssh) and edit/add files:
Your kube-scheduler will fail as you've passed an argument (config) that is incorrect (lack of file). To work around this you will need to:
add: /etc/kubernetes/sched.yaml with your desired configuration
modify: /etc/kubernetes/manifests/kube-scheduler.yaml:
add to: volumeMounts:
- mountPath: /etc/kubernetes/sched.yaml
name: scheduler
readOnly: true
add to volumes:
- hostPath:
path: /etc/kubernetes/sched.conf
type: FileOrCreate
name: scheduler
Delete the failing kube-scheduler Pod and wait for it to be recreated.
You will need to redeploy modified scheduler to get its new config running:
$ kubectl delete pod -n kube-system kube-scheduler-minikube
After some time you should see your kube-scheduler in Ready state.
Additional resources:
Kubernetes.io: Docs: Concepts: Scheduling eviction: Kube-scheduler
Kubernetes.io: Docs: Reference: Command line tools reference: Kube-scheduler

running pods and containers in Kubernetes

I am fairly new to Kubernates and what I am able to understand so far,
cluster is collection of node(s)
each node can have a set of running container(s)
set of tightly coupled container(s) itself can be grouped together to form a pod (despite of the node in which the container is running).
First of all, am I correct so far?
secondly, and going through docs about kube-scheduler says,
Control Plane component that watches for newly created pods with no assigned node, and selects a node for them to run on.
and docs also says pods are,
The smallest and simplest Kubernetes object. A Pod represents a set of running containers on your cluster.
My question, rather confusion is since we have already containers running in different nodes, why do we need additional node to run a pod on ?
cluster is collection of node(s)
each node can have a set of running container(s)
You are correct.
set of tightly coupled container(s) itself can be grouped together to form a pod (despite of the node in which the container is running).
All containers belonging to a pod run on the same node.
My question, rather confusion is since we have already containers
running in different nodes, why do we need additional node to run a
pod on ?
It's not the pod that actually runs. The only things that actually run on your nodes are containers. Pod is just a logical grouping of containers and is the basic unit in kubernetes to create a container. (Docker container logo is a whale, a group of whales is called a pod if you want a parallel to remember this). So if the containers that belong to the pod are running, the pod is termed as running.
In the following pod specification, nginx-container and debian-container containers belong to the pod named two-containers. When you create this pod object, kube-scheduler will select a node to run this pod (i.e., to run the two containers) and assigns a node to the pod. The kubelet running on that node then gets notified and starts the two containers on the node. Since the two containers belong to same pod, they are run in same network namespace.
apiVersion: v1
kind: Pod
metadata:
name: two-containers
spec:
restartPolicy: Never
volumes:
- name: shared-data
emptyDir: {}
containers:
- name: nginx-container
image: nginx
volumeMounts:
- name: shared-data
mountPath: /usr/share/nginx/html
- name: debian-container
image: debian
volumeMounts:
- name: shared-data
mountPath: /pod-data
command: ["/bin/sh"]
args: ["-c", "echo Hello from the debian container > /pod-data/index.html"]
Number 1 & 3 are correct.
For number 2 i would say 'Each node can have set pods and each pod can have 1 or more than 1 containers'
and for your last question lets say you create a deployment having 3 pods 2 of them were deployed to node A and it's resources get consumed by 2 of them (No memory or cpu left) but 3rd pod will be in pending state as long as their is no new node to run that pod.
Their is a concept of horizontal pod auto-scaling and cluster auto-scaling
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ & https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
These will further clear your confusion

Kubernetes local persistent volume cannot mount an existing path [duplicate]

I try, I try, but Rancher 2.1 fails to deploy the "mongo-replicaset" Catalog App, with Local Persistent Volumes configured.
How to correctly deploy a mongo-replicaset with Local Storage Volume? Any debugging techniques appreciated since I am new to rancher 2.
I follow the 4 ABCD steps bellow, but the first pod deployment never ends. What's wrong in it? Logs and result screens are at the end. Detailed configuration can be found here.
Note: Deployment without Local Persistent Volumes succeed.
Note: Deployment with Local Persistent Volume and with the "mongo" image succeed (without replicaset version).
Note: Deployment with both mongo-replicaset and with Local Persistent Volume fails.
Step A - Cluster
Create a rancher instance, and:
Add three nodes: a worker, a worker etcd, a worker control plane
Add a label on each node: name one, name two and name three for node Affinity
Step B - Storage class
Create a storage class with these parameters:
volumeBindingMode : WaitForFirstConsumer saw here
name : local-storage
Step C - Persistent Volumes
Add 3 persistent volumes like this:
type : local node path
Access Mode: Single Node RW, 12Gi
storage class: local-storage
Node Affinity: name one (two for second volume, three for third volume)
Step D - Mongo-replicaset Deployment
From catalog, select Mongo-replicaset and configure it like that:
replicaSetName: rs0
persistentVolume.enabled: true
persistentVolume.size: 12Gi
persistentVolume.storageClass: local-storage
Result
After doing ABCD steps, the newly created mongo-replicaset app stay infinitely in "Initializing" state.
The associated mongo workload contain only one pod, instead of three. And this pod has two 'crashed' containers, bootstrap and mongo-replicaset.
Logs
This is the output from the 4 containers of the only running pod. There is no error, no problem.
I can't figure out what's wrong with this configuration, and I don't have any tools or techniques to analyze the problem. Detailed configuration can be found here. Please ask me for more commands results.
Thanks you
All this configuration is correct.
It's missing a detail since Rancher is a containerized deployment of kubernetes.
Kubelets are deployed on each node in docker containers. They don't access to OS local folders.
It's needed to add a volume binding for the kubelets, like that K8s will be able to create the mongo pod with this same binding.
In rancher:
Edit the cluster yaml (Cluster > Edit > Edit as Yaml)
Add the following entry under "services" node:
kubelet:
extra_binds:
- "/mongo:/mongo:rshared"

How to attach OpenStack volume to a Kubernetes staic pod?

Suppose I bootstrap a single master node with kubelet v1.10.3 in OpenStack cloud and I would like to have a "self-hosted" single etcd node for k8s necessities as a pod.
Before starting kube-apiserver component you need a working etcd instance, but of course you can't just perform kubectl apply -f or put a manifest to addon-manager folder because cluster is not ready at all.
There is a way to start pods by kubelet without having a ready apiserver. It is called static pods (yaml Pod definitions usually located at /etc/kubernetes/manifests/). And it is the way I start "system" pods like apiserver, scheduler, controller-manager and etcd itself. Previously I just mounted a directory from node to persist etcd data, but now I would like to use OpenStack blockstorage resource. And here is the question: how can I attach, mount and use OpenStack cinder volume to persist etcd data from static pod?
As I learned today there are at least 3 ways to attach OpenStack volumes:
CSI OpenStack cinder driver which is pretty much new way of managing volumes. And it won't fit my requirements, because in static pods manifests I can only declare Pods and not other resources like PVC/PV while CSI docs say:
The csi volume type does not support direct reference from Pod and may only be referenced in a Pod via a PersistentVolumeClaim object.
before-csi way to attach volumes is: FlexVolume.
FlexVolume driver binaries must be installed in a pre-defined volume plugin path on each node (and in some cases master).
Ok, I added those binaries to my node (using this DS as a reference), added volume to pod manifest like this:
volumes:
- name: test
flexVolume:
driver: "cinder.io/cinder-flex-volume-driver"
fsType: "ext4"
options:
volumeID: "$VOLUME_ID"
cinderConfig: "/etc/kubernetes/cloud-config"
and got the following error from kubelet logs:
driver-call.go:258] mount command failed, status: Failure, reason: Volume 2c21311b-7329-4cf4-8230-f3ce2f23cf1a is not available
which is weird because I am sure this Cinder volume is already attached to my CoreOS compute instance.
and the last way to mount volumes I know is cinder in-tree support which should work since at least k8s 1.5 and does not have any special requirements besides --cloud-provider=openstack and --cloud-config kubelet options.
The yaml manifest part for declaring volume for static pod looks like this:
volumes:
- name: html-volume
cinder:
# Enter the volume ID below
volumeID: "$VOLUME_ID"
fsType: ext4
Unfortunately when I try this method I get the following error from kubelet:
Volume has not been added to the list of VolumesInUse in the node's volume status for volume.
Do not know what it means but sounds like the node status could not be updated (of course, there is no etcd and apiserver yet). Sad, it was the most promising option for me.
Are there any other ways to attach OpenStack cinder volume to a static pod relying on kubelet only (when cluster is actually not ready)? Any ideas on what cloud I miss of got above errors?
Message Volume has not been added to the list of VolumesInUse in the node's volume status for volume. says that attach/detach operations for that node are delegated to controller-manager only. Kubelet waits for attachment being made by controller but volume doesn't reach appropriate state because controller isn't up yet.
The solution is to set kubelet flag --enable-controller-attach-detach=false to let kubelet attach, mount and so on. This flag is set to true by default because of the following reasons
If a node is lost, volumes that were attached to it can be detached
by the controller and reattached elsewhere.
Credentials for attaching and detaching do not need to be made
present on every node, improving security.
In your case setting of this flag to false is reasonable as this is the only way to achieve what you want.