Block ingress from different Kubernetes Namespaces to Pod running NFS - kubernetes

I currently have multiple NFS server Pods running in different namespaces (1 replica per namespace). I have a Service per namespace to wrap this Pod just to have a fixed endpoint. A Persistent Volume connects to this server with the fixed endpoint, so other Pods in the namespace can mount this as a volume using a PVC. Since I create a PV per NFS server, how can I prevent that a PV not bounded to a PVC that belongs to the same namespace reads from it. I tried using a Network Policy, but it looks like the PV (not tied to a namespace) can go around it. Unfortunately, the application deployed in K8s currently has a field where a user can provide any nfs:// endpoint to instruct the PV where it needs to access the files.
Using GKE 1.17.
I'm trying this NP:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: test-network-policy
namespace: team1-ns
spec:
podSelector:
matchLabels:
role: nfs-server
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
nfs-server: team1-ns
- podSelector: {}
am I missing something in the NP, or PVs can actually go around NPs?
Help is really appreciated...

I would not say that PVs "can go around" NPs, but rather than are not applicable.
PVs, Volumes and StorageClasses provide a layer of abstraction between your pod(s) and the underlying storage implementation. The actual storage itself is actually attached/mounted to the node and not directly to the container(s) in the pods.
In your case with NFS, the storage driver/plugin attaches the actual NFS share to the node running your pod(s). So NetworkPolicy cannot possible apply.

Related

Limit access of services deployed in Kubernetes namespace

Let us assume we are the owners of a Kubernetes cluster and we give other users in our organization access to individual namespaces, so they are not supposed to know what is going on in other namespaces.
If user A deploys a certain ressource like a Grafana-Prometheus monitoring stack to namespace A, how do we ensure that he cannot see with the monitoring stack anything from namespace B, where he should not have any access to.
Of course, we will have to limit the rights of user A anyhow, but how do we automatically limit the rights of it's deployed ressources in namespace A? In case you have any suggestions perhaps with some Kubernetes configuration examples, that would be great.
The most important aspect of this question is to control the access permission of the service accounts which will be used in the Pods and a network policy which will limit the traffic within the namespace.
Hence we arrive to this algorithm:
Prerequisite:
Creating the user and namespace
sudo useradd user-a
kubectl create ns ns-user-a
limiting access permission of user-a to the namespace ns-user-a.
kubectl create clusterrole permission-users --verb=* --resource=*
kubectl create rolebinding permission-users-a --clusterrole=permission-users --user=user-a --namespace=ns-user-a
limiting all the service accounts access permission of namespace ns-user-a.
kubectl create clusterrole permission-serviceaccounts --verb=* --resource=*
kubectl create rolebinding permission-serviceaccounts --clusterrole=permission-serviceaccounts --namespace=ns-user-a --group=system:serviceaccounts:ns-user-a
kubectl auth can-i create pods --namespace=ns-user-a --as-group=system:serviceaccounts:ns-user-a --as sa
A network policy in namespace ns-user-a to limit incoming traffic from other namespaces.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-from-other-namespaces
namespace: ns-user-a
spec:
podSelector: {}
policyTypes:
- Ingress
ingress:
- from:
- podSelector: {}
Edit: Allowing traffic from selective namespaces
Assign a label to the monitoring namespace with a custom label.
kubectl label ns monitoring nsname=monitoring
Or, use the following reserved labels from kubernetes to make sure nobody can edit or update this label. So by convention this label should have "monitoring" as assigned value for the "monitoring" namespace.
https://kubernetes.io/docs/reference/labels-annotations-taints/#kubernetes-io-metadata-name
kubernetes.io/metadata.name
Applying a network policy to allow traffic from internal and monitoring namespace.
Note: Network policies always add up. So you can keep both or you can only keep the new one. I am keeping both here, for example purposes.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-only-monitoring-and-inernal
namespace: ns-user-a
spec:
podSelector: {}
policyTypes:
- Ingress
ingress:
- from:
- podSelector: {} # allows traffic from ns-user-a namespace (same as earlier)
- namespaceSelector: # allows traffic from monitoring namespace
matchLabels:
kubernetes.io/metadata.name: monitoring

Kubernetes trouble with StatefulSet and 3 PersistentVolumes

I'm in the process of creating a StatefulSet based on this yaml, that will have 3 replicas. I want each of the 3 pods to connect to a different PersistentVolume.
For the persistent volume I'm using 3 objects that look like this, with only the name changed (pvvolume, pvvolume2, pvvolume3):
kind: PersistentVolume
apiVersion: v1
metadata:
name: pvvolume
labels:
type: local
spec:
storageClassName: standard
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/nfs"
claimRef:
kind: PersistentVolumeClaim
namespace: default
name: mongo-persistent-storage-mongo-0
The first of the 3 pods in the StatefulSet seems to be created without issue.
The second fails with the error pod has unbound PersistentVolumeClaims
Back-off restarting failed container.
Yet if I go to the tab showing PersistentVolumeClaims the second one that was created seems to have been successful.
If it was successful why does the pod think it failed?
I want each of the 3 pods to connect to a different PersistentVolume.
For that to work properly you will either need:
provisioner (in link you posted there are example how to set provisioner on aws, azure, googlecloud and minicube) or
volume capable of being mounted multiple times (such as nfs volume). Note however that in such a case all your pods read/write to the same folder and this can lead to issues when they are not meant to lock/write to same data concurrently. Usual use case for this is upload folder that pods are saving to, that is later used for reading only and such use cases. SQL Databases (such as mysql) on the other hand, are not meant to write to such shared folder.
Instead of either of mentioned requirements in your claim manifest you are using hostPath (pointing to /nfs) and set it to ReadWriteOnce (only one can use it). You are also using 'standard' as storage class and in url you gave there are fast and slow ones, so you probably created your storage class as well.
The second fails with the error pod has unbound PersistentVolumeClaims
Back-off restarting failed container
That is because first pod already took it's claim (read write once, host path) and second pod can't reuse same one if proper provisioner or access is not set up.
If it was successful why does the pod think it failed?
All PVC were successfully bound to accompanying PV. But you are never bounding second and third PVC to second or third pods. You are retrying with first claim on second pod, and first claim is already bound (to fist pod) in ReadWriteOnce mode and can't be bound to second pod as well and you are getting error...
Suggested approach
Since you reference /nfs as your host path, it may be safe to assume that you are using some kind of NFS-backed file system so here is one alternative setup that can get you to mount dynamically provisioned persistent volumes over nfs to as many pods in stateful set as you want
Notes:
This only answers original question of mounting persistent volumes across stateful set replicated pods with the assumption of nfs sharing.
NFS is not really advisable for dynamic data such as database. Usual use case is upload folder or moderate logging/backing up folder. Database (sql or no sql) is usually a no-no for nfs.
For mission/time critical applications you might want to time/stresstest carefully prior to taking this approach in production since both k8s and external pv are adding some layers/latency in-between. Although for some application this might suffice, be warned about it.
You have limited control of name for pv that are being dynamically created (k8s adds suffix to newly created, and reuses available old ones if told to do so), but k8s will keep them after pod get terminated and assign first available to new pod so you won't loose state/data. This is something you can control with policies though.
Steps:
for this to work you will first need to install nfs provisioner from here:
https://github.com/kubernetes-incubator/external-storage/tree/master/nfs. Mind you that installation is not complicated but has some steps where you have to take careful approach (permissions, setting up nfs shares etc) so it is not just fire-and-forget deployment. Take your time installing nfs provisioner correctly. Once this is properly set up you can continue with suggested manifests below:
Storage class manifest:
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
name: sc-nfs-persistent-volume
# if you changed this during provisioner installation, update also here
provisioner: example.com/nfs
Stateful Set (important excerpt only):
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: ss-my-app
spec:
replicas: 3
...
selector:
matchLabels:
app: my-app
tier: my-mongo-db
...
template:
metadata:
labels:
app: my-app
tier: my-mongo-db
spec:
...
containers:
- image: ...
...
volumeMounts:
- name: persistent-storage-mount
mountPath: /wherever/on/container/you/want/it/mounted
...
...
volumeClaimTemplates:
- metadata:
name: persistent-storage-mount
spec:
storageClassName: sc-nfs-persistent-volume
accessModes: [ ReadWriteOnce ]
resources:
requests:
storage: 10Gi
...

Why should I specify service before deployment in a single Kubernetes configuration file?

I'm trying to understand why kubernetes docs recommend to specify service before deployment in one configuration file:
The resources will be created in the order they appear in the file. Therefore, it’s best to specify the service first, since that will ensure the scheduler can spread the pods associated with the service as they are created by the controller(s), such as Deployment.
Does it mean spread pods between kubernetes cluster nodes?
I tested with the following configuration where a deployment is located before a service and pods are distributed between nods without any issues.
apiVersion: apps/v1
kind: Deployment
metadata:
name: incorrect-order
namespace: test
spec:
selector:
matchLabels:
app: incorrect-order
replicas: 2
template:
metadata:
labels:
app: incorrect-order
spec:
containers:
- name: incorrect-order
image: nginx
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: incorrect-order
namespace: test
labels:
app: incorrect-order
spec:
type: NodePort
ports:
- port: 80
selector:
app: incorrect-order
Another explanation is that some environment variables with service URL will not be set for pods in this case. However it also works ok in case a configuration is inside one file like the example above.
Could you please explain why it is better to specify service before the deployment in case of one configuration file? Or may be it is some outdated recommendation.
If you use DNS as service discovery, the order of creation doesn't matter.
In case of Environment Vars (the second way K8S offers service discovery) the order matters, because once that vars are passed to the starting pod, they cannot be modified later if the service definition changes.
So if your service is deployed before you start your pod, the service envvars are injected inside the linked pod.
If you create a Pod/Deployment resource with labels, this resource will be exposed through a service once this last is created (with proper selector to indicate what resource to expose).
You are correct in that it effects the spread among the worker nodes.
Deployments without a Service will simply be scheduled onto the nodes with the least cpu/memory allocation. For instance, a brand new and empty node will get all new pods from a new deployment.
With a Deployment that also has a service the Scheduler tries to spread the pods between nodes, disregarding the cpu/memory load (within limits), to help the Service survive better.
It puzzles me that a Deployment on it's own doesn't cause a optimal spread but it doesn't, not yet at least.
This is the answer from the official documentation:
The resources will be created in the order they appear in the file.
Therefore, it's best to specify the service first, since that will
ensure the scheduler can spread the pods associated with the service
as they are created by the controller(s), such as Deployment.
Kubernetes Documentation/Concepts/Cluster/Administration/Managing Resources

How to configure a Kubernetes Multi-Pod Deployment

I would like to deploy an application cluster by managing my deployment via k8s Deployment object. The documentation has me extremely confused. My basic layout has the following components that scale independently:
API server
UI server
Redis cache
Timer/Scheduled task server
Technically, all 4 above belong in separate pods that are scaled independently.
My questions are:
Do I need to create pod.yml files and then somehow reference them in deployment.yml file or can a deployment file also embed pod definitions?
K8s documentation seems to imply that the spec portion of Deployment is equivalent to defining one pod. Is that correct? What if I want to declaratively describe multi-pod deployments? Do I do need multiple deployment.yml files?
Pagids answer has most of the basics. You should create 4 Deployments for your scenario. Each deployment will create a ReplicaSet that schedules and supervises the collection of PODs for the Deployment.
Each Deployment will most likely also require a Service in front of it for access. I usually create a single yaml file that has a Deployment and the corresponding Service in it. Here is an example for an nginx.yaml that I use:
apiVersion: v1
kind: Service
metadata:
annotations:
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
name: nginx
labels:
app: nginx
spec:
type: NodePort
ports:
- port: 80
name: nginx
targetPort: 80
nodePort: 32756
selector:
app: nginx
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginxdeployment
spec:
replicas: 3
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginxcontainer
image: nginx:latest
imagePullPolicy: Always
ports:
- containerPort: 80
Here some additional information for clarification:
A POD is not a scalable unit. A Deployment that schedules PODs is.
A Deployment is meant to represent a single group of PODs fulfilling a single purpose together.
You can have many Deployments work together in the virtual network of the cluster.
For accessing a Deployment that may consist of many PODs running on different nodes you have to create a Service.
Deployments are meant to contain stateless services. If you need to store a state you need to create StatefulSet instead (e.g. for a database service).
You can use the Kubernetes API reference for the Deployment and you'll find that the spec->template field is of type PodTemplateSpec along with the related comment (Template describes the pods that will be created.) it answers you questions. A longer description can of course be found in the Deployment user guide.
To answer your questions...
1) The Pods are managed by the Deployment and defining them separately doesn't make sense as they are created on demand by the Deployment. Keep in mind that there might be more replicas of the same pod type.
2) For each of the applications in your list, you'd have to define one Deployment - which also makes sense when it comes to difference replica counts and application rollouts.
3) you haven't asked that but it's related - along with separate Deployments each of your applications will also need a dedicated Service so the others can access it.
additional information:
API server use deployment
UI server use deployment
Redis cache use statefulset
Timer/Scheduled task server maybe use a statefulset (If your service has some state in)

Configure NFS server for PersistentVolume either via DNS or static ClusterIP

I have a kubernetes cluster running on google container engine that defines a Pod running a NFS-Server, which I want to access in other Pods via various PersistentVolumes.
What is the best way to configure the NFS Service, if it is in the same cluster?
According to various documentation ive found its not possible to rely on kube-dns for this, because the node starting the kubernetes pod is not configured to use it as its DNS.
So this is out of question (and really does not work - ive tested it, with various different hostname/FQDN...)
apiVersion: v1
kind: PersistentVolume
metadata:
name: xxx-persistent-storage
labels:
app: xxx
spec:
capacity:
storage: 10Gi
nfs:
path: "/exports/xxx"
server: nfs-service.default.svc.cluster.local # <-- does not work
I can start the NFS Server and check its ClusterIP via kubectl describe svc nfs-service and then hardcode its Endpoint-IP for the PV (this works):
apiVersion: v1
kind: PersistentVolume
metadata:
name: xxx-persistent-storage
labels:
app: xxx
spec:
capacity:
storage: 10Gi
nfs:
path: "/exports/xxx"
server: 10.2.1.7 # <-- does work
But this feels wrong - as soon as I need to recreate the NFS-Service ill get a new IP and i have to reconfigure all the PVs based on it.
What is the best-practice here? Im surprised i did not find any example for it, because i supposed thats quite a normal thing to do - isnt it?
Is it possible to set a kind of static IP for a service, so that i can rely on having always the same IP for the NFS service?
You are on the right track. To make sure that your Service is using a static IP just add clusterIP: 1.2.3.3 under the spec: section of the Service.
From the canonical example:
In the future, we'll be able to tie these together using the service names, but for now, you have to hardcode the IP.