Using sysctls in Google Kubernetes Engine (GKE) - kubernetes

I'm running a k8s cluster - 1.9.4-gke.1 - on Google Kubernetes Engine (GKE).
I need to set sysctl net.core.somaxconn to a higher value inside some containers.
I've found this official k8s page: Using Sysctls in a Kubernetes Cluster - that seemed to solve my problem. The solution was to make an annotation on my pod spec like the following:
annotations:
security.alpha.kubernetes.io/sysctls: net.core.somaxconn=1024
But when I tried to create my pod:
Status: Failed
Reason: SysctlForbidden
Message: Pod forbidden sysctl: "net.core.somaxconn" not whitelisted
So I've tried to create a PodSecurityPolicy like the following:
---
apiVersion: extensions/v1beta1
kind: PodSecurityPolicy
metadata:
name: sites-psp
annotations:
security.alpha.kubernetes.io/sysctls: 'net.core.somaxconn'
spec:
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
runAsUser:
rule: RunAsAny
fsGroup:
rule: RunAsAny
volumes:
- '*'
... but it didn't work either.
I've also found that I can use a kubelet argument on every node to whitelist the specific sysctl: --experimental-allowed-unsafe-sysctls=net.core.somaxconn
I've added this argument to the KUBELET_TEST_ARGS setting on my GCE machine and restarted it. From what I can see from the output of ps command, it seems that the option was successfully added to the kubelet process on the startup:
/home/kubernetes/bin/kubelet --v=2 --kube-reserved=cpu=60m,memory=960Mi --experimental-allowed-unsafe-sysctls=net.core.somaxconn --allow-privileged=true --cgroup-root=/ --cloud-provider=gce --cluster-dns=10.51.240.10 --cluster-domain=cluster.local --pod-manifest-path=/etc/kubernetes/manifests --experimental-mounter-path=/home/kubernetes/containerized_mounter/mounter --experimental-check-node-capabilities-before-mount=true --cert-dir=/var/lib/kubelet/pki/ --enable-debugging-handlers=true --bootstrap-kubeconfig=/var/lib/kubelet/bootstrap-kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --anonymous-auth=false --authorization-mode=Webhook --client-ca-file=/etc/srv/kubernetes/pki/ca-certificates.crt --cni-bin-dir=/home/kubernetes/bin --network-plugin=kubenet --volume-plugin-dir=/home/kubernetes/flexvolume --node-labels=beta.kubernetes.io/fluentd-ds-ready=true,cloud.google.com/gke-nodepool=temp-pool --eviction-hard=memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5% --feature-gates=ExperimentalCriticalPodAnnotation=true
The problem is that I keep receiving a message telling me that my pod cannot be started because sysctl net.core.somaxconn is not whitelisted.
Is there some limitation on GKE so that I cannot whitelist a sysctl? Am I doing something wrong?

Until sysctl support becomes better integrated you can put this in your pod spec
spec:
initContainers:
- name: sysctl-buddy
image: busybox:1.29
securityContext:
privileged: true
command: ["/bin/sh"]
args:
- -c
- sysctl -w net.core.somaxconn=4096 vm.overcommit_memory=1
resources:
requests:
cpu: 1m
memory: 1Mi

This is an intentional Kubernetes limitation. There is an open PR to add net.core.somaxconn to the whitelist here: https://github.com/kubernetes/kubernetes/pull/54896
As far as I know, there isn't a way to override this behavior on GKE.

Related

How to enforce MustRunAsNonRoot policy in K8S cluster in AKS

I have a K8S cluster running in Azure AKS service.
I want to enforce MustRunAsNonRoot policy. How to do it?
The following policy is created:
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restrict-root
spec:
privileged: false
allowPrivilegeEscalation: false
runAsUser:
rule: MustRunAsNonRoot
seLinux:
rule: RunAsAny
fsGroup:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- '*'
It is deployed in the cluster:
$ kubectl get psp
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES
restrict-root false RunAsAny MustRunAsNonRoot RunAsAny RunAsAny false *
Admission controller is running in the cluster:
$ kubectl get pods -n gatekeeper-system
NAME READY STATUS RESTARTS AGE
gatekeeper-audit-7b4bc6f977-lvvfl 1/1 Running 0 32d
gatekeeper-controller-5948ddcd54-5mgsm 1/1 Running 0 32d
gatekeeper-controller-5948ddcd54-b59wg 1/1 Running 0 32d
Anyway it is possible to run a simple pod running under root:
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: mypod
image: busybox
args: ["sleep", "10000"]
securityContext:
runAsUser: 0
Pod is running:
$ kubectl describe po mypod
Name: mypod
Namespace: default
Priority: 0
Node: aks-default-31327534-vmss000001/10.240.0.5
Start Time: Mon, 08 Feb 2021 23:10:46 +0100
Labels: <none>
Annotations: <none>
Status: Running
Why MustRunAsNonRoot is not applied? How to enforce it?
EDIT: It looks like AKS engine does not support PodSecurityPolicy (list of supported policies). Then the question is still the same: how to enforce MustRunAsNonRoot rule on workloads?
You shouldn't use PodSecurityPolicy on Azure AKS cluster as it has been set for deprecation as of May 31st, 2021 in favor of Azure Policy for AKS. Check the official docs for further details:
Warning
The feature described in this document, pod security policy (preview), is set for deprecation and will no longer be available
after May 31st, 2021 in favor of Azure Policy for
AKS.
The deprecation date has been extended from the previous date of
October 15th, 2020.
So currently you should rather use Azure Policy for AKS, where among other built-in policies grouped into initiatives (an initiative in Azure Policy is a collection of policy definitions that are tailored towards achieving a singular overarching goal), you can find a policy which goal is to disallow running of privileged containers on your AKS cluster.
As to PodSecurityPolicy, for the time being it should still work. Please check here if you didn't forget about anything e.g. make sure you set up the corresponding ClusterRole and ClusterRoleBinding to allow the policy to be used.

Whitelisting sysctl parameters for helm chart

I have a helm chart that deploys an app but also needs to reconfigure some sysctl parameters in order to run properly. When I install the helm chart and run kubectl describe pod/pod_name on the pod that was deployed, I get forbidden sysctl: "kernel.sem" not whitelisted. I have added a podsecuritypolicy like so but with no such luck.
apiVersion:policy/v1beta1
kind:PodSecurityPolicy
metadata:
name: policy
spec:
allowedUnsafeSysctls:
- kernel.sem
- kernel.shmmax
- kernel.shmall
- fs.mqueue.msg_max
seLinux:
rule: 'RunAsAny'
runAsUser:
rule: 'RunAsAny'
supplementalGroups:
rule: 'RunAsAny'
fsGroup:
rule:'RunAsAny'
---UPDATE---
I also try to set the kubelet parameters via a config file in order to allow-unsafe-ctls but I get an error no kind "KubeletConfiguration" is registered for version "kubelet.config.k8s.io/v1beta1".
Here's the configuration file:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
allowedUnsafeSysctls:
- "kernel.sem"
- "kernel.shmmax"
- "kernel.shmall"
- "fs.mqueue.msg_max"
The kernel.sem sysctl is considered as unsafe sysctl, therefore is disabled by default (only safe sysctls are enabled by default). You can allow one or more unsafe sysctls on a node-by-node basics, to do so you need to add --allowed-unsafe-sysctls flag to the kubelet.
Look at "Enabling Unsafe Sysctls"
I've created simple example to illustrate you how it works.
First I added --allowed-unsafe-sysctls flag to the kubelet.
In my case I use kubeadm, so I need to add this flag to
/etc/systemd/system/kubelet.service.d/10-kubeadm.conf file:
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --allowed-unsafe-sysctls=kernel.sem"
...
NOTE: You have to add this flag on every node you want to run Pod with kernel.sem enabled.
Then I reloaded systemd manager configuration and restarted kubelet using below command:
# systemctl daemon-reload && systemctl restart kubelet
Next I created a simple Pod using this manifest file:
apiVersion: v1
kind: Pod
metadata:
labels:
run: web
name: web
spec:
securityContext:
sysctls:
- name: kernel.sem
value: "250 32000 100 128"
containers:
- image: nginx
name: web
Finally we can check if it works correctly:
# sysctl -a | grep "kernel.sem"
kernel.sem = 32000 1024000000 500 32000 // on the worker node
# kubectl get pod
NAME READY STATUS RESTARTS AGE
web 1/1 Running 0 110s
# kubectl exec -it web -- bash
root#web:/# cat /proc/sys/kernel/sem
250 32000 100 128 // inside the Pod
Your PodSecurityPolicy doesn't work as expected, because of as you can see in the documentation:
Warning: If you allow unsafe sysctls via the allowedUnsafeSysctls field in a PodSecurityPolicy, any pod using such a sysctl will fail to start if the sysctl is not allowed via the --allowed-unsafe-sysctls kubelet flag as well on that node.

Creating a link to an NFS share in K3s Kubernetes

I'm very new to Kubernetes, and trying to get node-red running on a small cluster of raspberry pi's
I happily managed that, but noticed that once the cluster is powered down, next time I bring it up, the flows in node-red have vanished.
So, I've create a NFS share on a freenas box on my local network and can mount it from another RPI, so I know the permissions work.
However I cannot get my mount to work in a kubernetes deployment.
Any help as to where I have gone wrong please?
apiVersion: apps/v1
kind: Deployment
metadata:
name: node-red
labels:
app: node-red
spec:
replicas: 1
selector:
matchLabels:
app: node-red
template:
metadata:
labels:
app: node-red
spec:
containers:
- name: node-red
image: nodered/node-red:latest
ports:
- containerPort: 1880
name: node-red-ui
securityContext:
privileged: true
volumeMounts:
- name: node-red-data
mountPath: /data
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: TZ
value: Europe/London
volumes:
- name: node-red-data
nfs:
server: 192.168.1.96
path: /mnt/Pool1/ClusterStore/nodered
The error I am getting is
error: error validating "node-red-deploy.yml": error validating data:
ValidationError(Deployment.spec.template.spec): unknown field "nfs" in io.k8s.api.core.v1.PodSpec; if
you choose to ignore these errors, turn validation off with --validate=false
New Information
I now have the following
apiVersion: v1
kind: PersistentVolume
metadata:
name: clusterstore-nodered
labels:
type: nfs
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteMany
nfs:
path: /mnt/Pool1/ClusterStore/nodered
server: 192.168.1.96
persistentVolumeReclaimPolicy: Recycle
claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: clusterstore-nodered-claim
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
Now when I start the deployment it waits at pending forever and I see the following the the events for the PVC
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 5m47s (x7 over 7m3s) persistentvolume-controller waiting for first consumer to be created before binding
Normal Provisioning 119s (x5 over 5m44s) rancher.io/local-path_local-path-provisioner-58fb86bdfd-rtcls_506528ac-afd0-11ea-930d-52d0b85bb2c2 External provisioner is provisioning volume for claim "default/clusterstore-nodered-claim"
Warning ProvisioningFailed 119s (x5 over 5m44s) rancher.io/local-path_local-path-provisioner-58fb86bdfd-rtcls_506528ac-afd0-11ea-930d-52d0b85bb2c2 failed to provision volume with StorageClass "local-path": Only support ReadWriteOnce access mode
Normal ExternalProvisioning 92s (x19 over 5m44s) persistentvolume-controller
waiting for a volume to be created, either by external provisioner "rancher.io/local-path" or manually created by system administrator
I assume that this is becuase I don't have a nfs provider, in fact if I do kubectl get storageclass I only see local-path
New question, how do I a add a storageclass for NFS? A little googleing around has left me without a clue.
Ok, solved the issue. Kubernetes tutorials are really esoteric and missing lots of assumed steps.
My problem was down to k3s on the pi only shipping with a local-path storage provider.
I finally found a tutorial that installed an nfs client storage provider, and now my cluster works!
This was the tutorial I found the information in.
In the stated Tutorial there are basically these steps to fulfill:
1.
showmount -e 192.168.1.XY
to check if the share is reachable from outside the NAS
2.
helm install nfs-provisioner stable/nfs-client-provisioner --set nfs.server=192.168.1.**XY** --set nfs.path=/samplevolume/k3s --set image.repository=quay.io/external_storage/nfs-client-provisioner-arm
Whereas you replace the IP with your NFS Server and the NFS path with your specific Path on your synology (both should be visible from your showmount -e IP command
Update 23.02.2021
It seems that you have to use another Chart and Image too:
helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner --set nfs.server=192.168.1.**XY** --set nfs.path=/samplevolume/k3s --set image.repository=gcr.io/k8s-staging-sig-storage/nfs-subdir-external-provisioner
kubectl get storageclass
To check if the storageclass now exists
4.
kubectl patch storageclass nfs-client -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' && kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
To configure the new Storage class as "default". Replace nfs-client and local-path with what kubectl get storageclass tells
5.
kubectl get storageclass
Final check, if it's marked as "default"
This is a validation error pointing at the very last part of your Deployment yaml, therefore making it an invalid object. It looks like you've made a mistake with indentations. It should look more like this:
volumes:
- name: node-red-data
nfs:
server: 192.168.1.96
path: /mnt/Pool1/ClusterStore/nodered
Also, as you are new to Kubernetes, I strongly recommend getting familiar with the concepts of PersistentVolumes and its claims. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual Pod that uses the PV.
Please let me know if that helped.

How kube-proxy runs iptables commands on the host node, while running inside a process isolated container?

I'm working in kube-proxy development and I'm in the stage of understanding the purpose and execution of kube-proxy.
I know that kube-proxy will add iptables rules to enable user to access the exposed pods (which is kubernetes service in iptables mode).
what makes me wonder, is the fact that those rules are added in the host node where a pod of kube-proxy is running, and it's not clear how this pod is capable of accessing those privileges on the host node.
I have took a look on the code of kubernetes with no success to find this specific part, so if you have any idea, resource, or documentation that would help me to figure this out it would be appreciated.
kube-proxy.yaml
apiVersion: v1
kind: Pod
metadata:
name: kube-proxy
namespace: kube-system
spec:
hostNetwork: true
containers:
- name: kube-proxy
image: gcr.io/google_containers/hyperkube:v1.0.6
command:
- /hyperkube
- proxy
- --master=http://127.0.0.1:8080
securityContext:
privileged: true
volumeMounts:
- mountPath: /etc/ssl/certs
name: ssl-certs-host
readOnly: true
volumes:
- hostPath:
path: /usr/share/ca-certificates
name: ssl-certs-host
According to Pod Security Policies document:
Privileged - determines if any container in a pod can enable privileged mode. By default a container is not allowed to access any devices on the host, but a “privileged” container is given access to all devices on the host. This allows the container nearly all the same access as processes running on the host. This is useful for containers that want to use linux capabilities like manipulating the network stack and accessing devices.
In other words, it gives the container or the pod (depending on a context) most of the root privileges.
There are many more options to control pods capabilities in the securityContext section:
Privilege escalation
Linux Capabilities
SELinux
Volumes
Users and groups
Networking
Consider reading the full article for details and code snippets.
In my kube-proxy.yaml, there is configuration about privilege, like this:
securityContext:
privileged: true
I think this will give kube-proxy enough privilege.

Kubernetes: Failed to get GCE GCECloudProvider with error <nil>

I have set up a custom kubernetes cluster on GCE using kubeadm. I am trying to use StatefulSets with persistent storage.
I have the following configuration:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: gce-slow
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
zones: europe-west3-b
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: myname
labels:
app: myapp
spec:
serviceName: myservice
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: mycontainer
image: ubuntu:16.04
env:
volumeMounts:
- name: myapp-data
mountPath: /srv/data
imagePullSecrets:
- name: sitesearch-secret
volumeClaimTemplates:
- metadata:
name: myapp-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: gce-slow
resources:
requests:
storage: 1Gi
And I get the following error:
Nopx#vm0:~$ kubectl describe pvc
Name: myapp-data-myname-0
Namespace: default
StorageClass: gce-slow
Status: Pending
Volume:
Labels: app=myapp
Annotations: volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/gce-pd
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 5s persistentvolume-controller Failed to provision volume
with StorageClass "gce-slow": Failed to get GCE GCECloudProvider with error <nil>
I am treading in the dark and do not know what is missing. It seems logical that it doesn't work, since the provisioner never authenticates to GCE. Any hints and pointers are very much appreciated.
EDIT
I Tried the solution here, by editing the config file in kubeadm with kubeadm config upload from-file, however the error persists. The kubadm config looks like this right now:
api:
advertiseAddress: 10.156.0.2
bindPort: 6443
controlPlaneEndpoint: ""
auditPolicy:
logDir: /var/log/kubernetes/audit
logMaxAge: 2
path: ""
authorizationModes:
- Node
- RBAC
certificatesDir: /etc/kubernetes/pki
cloudProvider: gce
criSocket: /var/run/dockershim.sock
etcd:
caFile: ""
certFile: ""
dataDir: /var/lib/etcd
endpoints: null
image: ""
keyFile: ""
imageRepository: k8s.gcr.io
kubeProxy:
config:
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 10
contentType: application/vnd.kubernetes.protobuf
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 5
clusterCIDR: 192.168.0.0/16
configSyncPeriod: 15m0s
conntrack:
max: null
maxPerCore: 32768
min: 131072
tcpCloseWaitTimeout: 1h0m0s
tcpEstablishedTimeout: 24h0m0s
enableProfiling: false
healthzBindAddress: 0.0.0.0:10256
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: 14
minSyncPeriod: 0s
syncPeriod: 30s
ipvs:
minSyncPeriod: 0s
scheduler: ""
syncPeriod: 30s
metricsBindAddress: 127.0.0.1:10249
mode: ""
nodePortAddresses: null
oomScoreAdj: -999
portRange: ""
resourceContainer: /kube-proxy
udpIdleTimeout: 250ms
kubeletConfiguration: {}
kubernetesVersion: v1.10.2
networking:
dnsDomain: cluster.local
podSubnet: 192.168.0.0/16
serviceSubnet: 10.96.0.0/12
nodeName: mynode
privilegedPods: false
token: ""
tokenGroups:
- system:bootstrappers:kubeadm:default-node-token
tokenTTL: 24h0m0s
tokenUsages:
- signing
- authentication
unifiedControlPlaneImage: ""
Edit
The issue was resolved in the comments thanks to Anton Kostenko. The last edit coupled with kubeadm upgrade solves the problem.
The answer took me a while but here it is:
Using the GCECloudProvider in Kubernetes outside of the Google Kubernetes Engine has the following prerequisites (the last point is Kubeadm specific):
The VM needs to be run with a service account that has the right to provision disks. Info on how to run a VM with a service account can be found here
The Kubelet needs to run with the argument --cloud-provider=gce. For this the KUBELET_KUBECONFIG_ARGS in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf have to be edited. The Kubelet can then be restarted with
sudo systemctl restart kubelet
The Kubernetes cloud-config file needs to be configured. The file can be found at /etc/kubernetes/cloud-config and the following content is enough to get the cloud provider to work:
[Global]
project-id = "<google-project-id>"
Kubeadm needs to have GCE configured as its cloud provider. The config posted in the question works fine for this. However, the nodeName has to be changed.
Create dynamic persistent volumes in Kubernetes nodes in the Google cloud virtual machine.
GCP role:
google cloud console go to IAM & Admin.
Add a new service account e.g gce-user.
Add role "compute instance admin".
Add the role to GCP VM:
stop the instance and click edit.
click service account and select new account e.g gce-user.
start the virtual machine.
Add GCE parameter in kubelet in all nodes.
add "--cloud-provider=gce"
sudo vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
add the value:
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cloud-provider=gce"
create new file /etc/kubernetes/cloud-config in all nodes
add this param.
[Global]
project-id = "xxxxxxxxxxxx"
restart kubelet
Add gce in controller-master
vi /etc/kubernetes/manifests
add this params under commands:
--cloud-provider=gce
then restart the control plane.
run the ps -ef |grep controller then you must see "gce" in controller output.
Note: Above method is not recommended on the production system, use kubeadm config to update the controller-manager settings.