Is it possible to run gcsfuse without privileged mode inside GCP kubernetes? - kubernetes

Following this guide,I am trying to run gcsfuse inside a pod in GKE. Below is the deployment manifest that I am using:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: gcsfuse-test
spec:
replicas: 1
template:
metadata:
labels:
app: gcsfuse-test
spec:
containers:
- name: gcsfuse-test
image: gcr.io/project123/gcs-test-fuse:latest
securityContext:
privileged: true
capabilities:
add:
- SYS_ADMIN
lifecycle:
postStart:
exec:
command: ["mkdir", "/mnt"]
command: ["gcsfuse", "-o", "nonempty", "cloudsql-p8p", "/mnt"]
preStop:
exec:
command: ["fusermount", "-u", "/mnt"]
However, I would like to run gcsfuse without the privileged mode inside my GKE Cluster.
I think (because of questions like these on SO) it is possible to run the docker image with certain flags and there will be no need to run it in privileged mode.
Is there any way in GKE to run gcsfuse without running the container in privileged mode?

Edit Apr 26, 2022: for a further developed repo derived from this answer, see https://github.com/samos123/gke-gcs-fuse-unprivileged
Now it finally is possible to mount devices without privileged: true or CAP_SYS_ADMIN!
What you need is
A Kubelet device manager which allows containers to have direct access to host devices in a secure way. The device manager explicitly given devices to be available via the Kubelet Device API. I used this hidden gem: https://gitlab.com/arm-research/smarter/smarter-device-manager.
Define list of devices provided by the Device Manager - add /dev/YOUR_DEVICE_NAME into this list, see example below.
Request a device via the Device Manager in the pod spec resources.requests.smarter-devices/YOUR_DEVICE_NAME: 1
I spent quite some time figuring this out so I hope sharing the information here will help someone else from the exploration.
I wrote my details findings in the Kubernetes Github issue about /dev/fuse. See an example setup in this comment and more technical details above that one.
Examples from the comment linked above:
Allow FUSE devices via Device Manager:
apiVersion: v1
kind: ConfigMap
metadata:
name: smarter-device-manager
namespace: device-manager
data:
conf.yaml: |
- devicematch: ^fuse$
nummaxdevices: 20
Request /dev/fuse via Device Manager:
# Pod spec:
resources:
limits:
smarter-devices/fuse: 1
memory: 512Mi
requests:
smarter-devices/fuse: 1
cpu: 10m
memory: 50Mi
Device Manager as a DaemonSet:
# https://gitlab.com/arm-research/smarter/smarter-device-manager/-/blob/master/smarter-device-manager-ds.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: smarter-device-manager
namespace: device-manager
labels:
name: smarter-device-manager
role: agent
spec:
selector:
matchLabels:
name: smarter-device-manager
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
name: smarter-device-manager
annotations:
node.kubernetes.io/bootstrap-checkpoint: "true"
spec:
## kubectl label node pike5 smarter-device-manager=enabled
# nodeSelector:
# smarter-device-manager : enabled
priorityClassName: "system-node-critical"
hostname: smarter-device-management
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: smarter-device-manager
image: registry.gitlab.com/arm-research/smarter/smarter-device-manager:v1.1.2
imagePullPolicy: IfNotPresent
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
resources:
limits:
cpu: 100m
memory: 15Mi
requests:
cpu: 10m
memory: 15Mi
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
- name: dev-dir
mountPath: /dev
- name: sys-dir
mountPath: /sys
- name: config
mountPath: /root/config
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: dev-dir
hostPath:
path: /dev
- name: sys-dir
hostPath:
path: /sys
- name: config
configMap:
name: smarter-device-manager

Privileged mode means you have all the capabilities enabled, see https://stackoverflow.com/a/36441605. So adding CAP_SYS_ADMIN looks redundant here in your example.
You can either give all the privileges or do something more fine-grained by mounting /dev/fuse and giving only SYS_ADMIN capability (which remains an important permission).
I think we can rephrase the question as : Can we run GCSFuse without the capability SYS_ADMIN ?
Actually it does not look feasible, you can find the related docker issue here : https://github.com/docker/for-linux/issues/321.
For most of projects it won't be a hard no-go. You may want to act in regard of your threat model and decide whether or not it may be a security risk for your production.

Related

How to run DNS Server (dnsmasq) in Kubernetes?

I'm trying to run DNS Server (Dnsmasq) in Kubernetes cluster. The cluster has only one node. Everything works fine until I need to restart dnsmasq container (kubectl rollout restart daemonsets dnsmasq-daemonset) to apply changes made to hosts ConfigMap. As I found out this is needed as Dnsmasq that is already running will not otherwise load changes made into hosts ConfigMap.
Soon as the container is restarted it is not able to pull dnsmasq image and it fails. It is expected behavior as it cannot resolve the image name as there are no other dns servers running, but I wonder what would be best way around it or what are the best practices with running DNS Server in Kubernetes in general. Is this something that CoreDNS is used for or what other alternatives are there? Maybe some high availability solution?
hosts ConfigMap:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: dnsmasq-hosts
namespace: core
data:
hosts: |
127.0.0.1 localhost
10.x.x.x example.com
...
Dnsmasq deployment:
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: dnsmasq-daemonset
namespace: core
spec:
selector:
matchLabels:
app: dnsmasq-app
template:
metadata:
labels:
app: dnsmasq-app
namespace: core
spec:
containers:
- name: dnsmasq
image: registry.gitlab.com/path/to/dnsmasqImage:tag
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: "1"
memory: "32Mi"
requests:
cpu: "150m"
memory: "16Mi"
ports:
- name: dns
containerPort: 53
hostPort: 53
protocol: UDP
volumeMounts:
- name: conf-dnsmasq
mountPath: /etc/dnsmasq.conf
subPath: dnsmasq.conf
readOnly: true
- name: dnsconf-dnsmasq
mountPath: /etc/dnsmasq.d/dns.conf
subPath: dns.conf
readOnly: true
- name: hosts-dnsmasq
mountPath: /etc/dnsmasq.d/hosts
subPath: hosts
readOnly: true
volumes:
- name: conf-dnsmasq
configMap:
name: dnsmasq-conf
- name: dnsconf-dnsmasq
configMap:
name: dnsmasq-dnsconf
- name: hosts-dnsmasq
configMap:
name: dnsmasq-hosts
imagePullSecrets:
- name: gitlab-registry-credentials
nodeSelector:
kubernetes.io/hostname: master
restartPolicy: Always
I tried to use imagePullPolicy: Never, but it seems to fail anyway.

GCP Firestore: Server request fails with Missing or insufficient permissions from GKE

I am trying to connect to Firestore from code running on GKE Container. Simple REST GET api is working fine, but when I access the Firestore from read/write, I am getting Missing or insufficient permissions.
An unhandled exception was thrown by the application.
Info
2021-06-06 21:21:20.283 EDT
Grpc.Core.RpcException: Status(StatusCode="PermissionDenied", Detail="Missing or insufficient permissions.", DebugException="Grpc.Core.Internal.CoreErrorDetailException: {"created":"#1623028880.278990566","description":"Error received from peer ipv4:172.217.193.95:443","file":"/var/local/git/grpc/src/core/lib/surface/call.cc","file_line":1068,"grpc_message":"Missing or insufficient permissions.","grpc_status":7}")
at Google.Api.Gax.Grpc.ApiCallRetryExtensions.<>c__DisplayClass0_0`2.<<WithRetry>b__0>d.MoveNext()
Update I am trying to provide secret to pod with service account credentails.
Here is the k8 file which deploys a pod to cluster with no issues when no secrets are provided and I can do Get Operations which don't hit Firestore, and they work fine.
kind: Deployment
apiVersion: apps/v1
metadata:
name: foo-worldmanagement-production
spec:
replicas: 1
selector:
matchLabels:
app: foo
role: worldmanagement
env: production
template:
metadata:
name: worldmanagement
labels:
app: foo
role: worldmanagement
env: production
spec:
containers:
- name: worldmanagement
image: gcr.io/foodev/foo/master/worldmanagement.21
resources:
limits:
memory: "500Mi"
cpu: "300m"
imagePullworld: Always
readinessProbe:
httpGet:
path: /api/worldManagement/policies
port: 80
ports:
- name: worldmgmt
containerPort: 80
Now, if I try to mount secret, the pod never gets created fully, and it eventually fails
kind: Deployment
apiVersion: apps/v1
metadata:
name: foo-worldmanagement-production
spec:
replicas: 1
selector:
matchLabels:
app: foo
role: worldmanagement
env: production
template:
metadata:
name: worldmanagement
labels:
app: foo
role: worldmanagement
env: production
spec:
volumes:
- name: google-cloud-key
secret:
secretName: firestore-key
containers:
- name: worldmanagement
image: gcr.io/foodev/foo/master/worldmanagement.21
volumeMounts:
- name: google-cloud-key
mountPath: /var/
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /var/key.json
resources:
limits:
memory: "500Mi"
cpu: "300m"
imagePullworld: Always
readinessProbe:
httpGet:
path: /api/worldManagement/earth
port: 80
ports:
- name: worldmgmt
containerPort: 80
I tried to deploy the sample application and it works fine.
If I keep only the following the yaml file, the container gets deployed properly
- name: google-cloud-key
secret:
secretName: firestore-key
But once I add the following to yaml, it fails
volumeMounts:
- name: google-cloud-key
mountPath: /var/
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /var/key.json
And I can see in GCP events that the container is not able to find the google-cloud-key. Any idea how to troubleshoot this issue, i.e why I am not able to mount the secrets, I can bash into the pod if needed.
I am using multi stage docker file made of
From mcr.microsoft.com/dotnet/sdk:5.0 AS build
FROM mcr.microsoft.com/dotnet/aspnet:5.0 AS runtime
Thanks
Looks like they key itself might not be correctly visible to the pod. I would start by getting into the pod with kubectl exec --stdin --tty <podname> -- /bin/bash and ensuring that the /var/key.json (per your config) is accessible and has the correct credentials.
The following would be a good way to mount the secret:
volumeMounts:
- name: google-cloud-key
mountPath: /var/run/secret/cloud.google.com
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /var/run/secret/cloud.google.com/key.json
The above assumes your secret was created with a command like:
kubectl --namespace <namespace> create secret generic firestore-key --from-file key.json
Also it is important to check your Workload Identity setup. The Workload Identity | Kubernetes Engine Documentation has a good section on this.

Separate filebeat daemon set based on namespace?

I am trying to run the filebeat daemon set to get the log for particular app. There are basically two nodegroups:- eai and eai-staging. eai nodgroup have only single namespace by the eai-staging have multiple namespace.
I have following filebeat config:
apiVersion: v1
kind: ConfigMap
metadata:
name: filebeat-config
labels:
app: filebeat
data:
filebeat.yml: |-
filebeat.inputs:
- type: log
fields:
app_type: "${NAMESPACE}". <<<---- I want this app type to be different based on namespace
log_type: secure
fields_under_root: true
output.logstash:
hosts: ["${LOGSTASH_HOST}:${LOGSTASH_PORT}"]
ttl: 1s
pipelining: 0
processors:
- drop_fields:
fields: ["beat", "host", "input", "offset", "source"]
Filebeat Daemon set
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: filebeat
labels:
app: filebeat
spec:
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
app: filebeat
spec:
nodeSelector:
nodegroup: eai
priorityClassName: critical
terminationGracePeriodSeconds: 30
containers:
- name: filebeat
imagePullPolicy: Always
image: docker.elastic.co/beats/filebeat:6.5.4
args: [
"-c", "/etc/filebeat.yml",
"-e",
]
env:
- name: LOGSTASH_HOST
value: "logstash-headless.etl.svc.cluster.local"
- name: LOGSTASH_PORT
value: "5046"
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
securityContext:
runAsUser: 0
# If using Red Hat OpenShift uncomment this:
#privileged: true
resources:
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: config
mountPath: /etc/filebeat.yml
readOnly: true
subPath: filebeat.yml
- name: data
mountPath: /usr/share/filebeat/data
- name: app-log
mountPath: /var/log/app/
readOnly: true
volumes:
- name: config
configMap:
name: filebeat-config
- name: data
hostPath:
path: /var/lib/filebeat-data/eai/app-filebeat
type: DirectoryOrCreate
- name: app-log
hostPath:
path: /var/log/app/
type: DirectoryOrCreate
Now, how can I get the particular namespace from where the app log is obtained by the filebeat.
I tried to deploy one daemon set in the eai namespace in the eai nodegroup. So I can get the namespace for that using metadata.namespace.
But, if I deployed the daemon set in the eai-staging node group in the particular namespace. I will always get the same namespace value.
Is there any way around. Or should I deploy the daemon set in each namespace?
P.S. I could not use the filebeat in the same container because if filebeat is down due to some reason, the pod cannot receive the request for the app
Deploy filebeat as daemonset in each node and filebeat will get logs from all containers in that node but you can add namespace, pod name, labels as metadata to each event. This way you will get to know from which namespace the event was originated.
The add_kubernetes_metadata processor annotates each event with relevant metadata based on which Kubernetes pod the event originated from. Each event is annotated with:
Pod Name
Namespace
Labels
https://www.elastic.co/guide/en/beats/filebeat/6.1/add-kubernetes-metadata.html

How can I give grafana user appropriate permission so that it can start successfully?

env:
kubernetes provider: gke
kubernetes version: v1.13.12-gke.25
grafana version: 6.6.2 (official image)
grafana deployment manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
name: grafana
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:6.6.2
ports:
- name: grafana
containerPort: 3000
# securityContext:
# runAsUser: 104
# allowPrivilegeEscalation: true
resources:
limits:
memory: "1Gi"
cpu: "500m"
requests:
memory: "500Mi"
cpu: "100m"
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-storage
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
Problem
when I deployed this grafana dashboard first time, its working fine. after sometime I restarted the pod to check whether volume mount is working or not. after restarting, I getting below error.
mkdir: can't create directory '/var/lib/grafana/plugins': Permission denied
GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
what I understand from this error, user could create these files. How can I give this user appropriate permission to start grafana successfully?
I recreated your deployment with appropriate PVC and noticed that grafana pod was failing.
Output of command: $ kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
grafana-6466cd95b5-4g95f 0/1 Error 2 65s
Further investigation pointed the same errors as yours:
mkdir: can't create directory '/var/lib/grafana/plugins': Permission denied
GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
This error showed on first creation of a pod and the deployment. There was no need to recreate any pods.
What I did to make it work was to edit your deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
name: grafana
labels:
app: grafana
spec:
securityContext:
runAsUser: 472
fsGroup: 472
containers:
- name: grafana
image: grafana/grafana:6.6.2
ports:
- name: grafana
containerPort: 3000
resources:
limits:
memory: "1Gi"
cpu: "500m"
requests:
memory: "500Mi"
cpu: "100m"
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-storage
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
Please take a specific look on part:
securityContext:
runAsUser: 472
fsGroup: 472
It is a setting described in official documentation: Kubernetes.io: set the security context for a pod
Please take a look on this Github issue which is similar to yours and pointed me to solution that allowed pod to spawn correctly:
https://github.com/grafana/grafana-docker/issues/167
Grafana had some major updates starting from version 5.1. Please take a look: Grafana.com: Docs: Migrate to v5.1 or later
Please let me know if this helps.
On v8.0, I do that setting runAsUser: 0.
It works.
---
apiVersion: v1
kind: Service
metadata:
name: grafana
spec:
ports:
- name: grafana-tcp
port: 3000
protocol: TCP
targetPort: 3000
selector:
project: grafana
type: LoadBalancer
status:
loadBalancer: {}
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
project: grafana
name: grafana
spec:
replicas: 1
selector:
matchLabels:
project: grafana
strategy:
type: RollingUpdate
template:
metadata:
labels:
project: grafana
name: grafana
spec:
securityContext:
runAsUser: 0
containers:
- image: grafana/grafana
name: grafana
ports:
- containerPort: 3000
protocol: TCP
resources: {}
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-volume
volumes:
- name: grafana-volume
hostPath:
# directory location on host
path: /opt/grafana
# this field is optional
type: DirectoryOrCreate
restartPolicy: Always
status: {}

Failed to attach volume ... already being used by

I am running Kubernetes in a GKE cluster using version 1.6.6 and another cluster with 1.6.4. Both are experiencing issues with failing over GCE compute disks.
I have been simulating failures using kill 1 inside the container or killing the GCE node directly. Sometimes I get lucky and the pod will get created on the same node again. But most of the time this isn't the case.
Looking at the event log it shows the error trying to mount 3 times and it fails to do anything more. Without human intervention it never corrects it self. I am forced to kill the pod multiple times until it works. During maintenances this is a giant pain.
How do I get Kubernetes to fail over with volumes properly ? Is there a way to tell the deployment to try a new node on failure ? Is there a way to remove the 3 retry limit ?
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: dev-postgres
namespace: jolene
spec:
revisionHistoryLimit: 0
template:
metadata:
labels:
app: dev-postgres
namespace: jolene
spec:
containers:
- image: postgres:9.6-alpine
imagePullPolicy: IfNotPresent
name: dev-postgres
volumeMounts:
- mountPath: /var/lib/postgresql/data
name: postgres-data
env:
[ ** Removed, irrelevant environment variables ** ]
ports:
- containerPort: 5432
livenessProbe:
exec:
command:
- sh
- -c
- exec pg_isready
initialDelaySeconds: 30
timeoutSeconds: 5
failureThreshold: 6
readinessProbe:
exec:
command:
- sh
- -c
- exec pg_isready --host $POD_IP
initialDelaySeconds: 5
timeoutSeconds: 3
periodSeconds: 5
volumes:
- name: postgres-data
persistentVolumeClaim:
claimName: dev-jolene-postgres
I have tried this with and without PersistentVolume / PersistentVolumeClaim.
apiVersion: "v1"
kind: "PersistentVolume"
metadata:
name: dev-jolene-postgres
spec:
capacity:
storage: "1Gi"
accessModes:
- "ReadWriteOnce"
claimRef:
namespace: jolene
name: dev-jolene-postgres
gcePersistentDisk:
fsType: "ext4"
pdName: "dev-jolene-postgres"
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: dev-jolene-postgres
namespace: jolene
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
By default, every node is schedulable, so there is no need to explicitly mention it in deployment. and feature which can mention retry limits is still in progress, which can be tracked here, https://github.com/kubernetes/kubernetes/issues/16652