Cannot enable Pod Security Admission controller on Minikube - kubernetes

I'm trying to enable the new PSA controller with Minikube but no luck (neither with Kind).
Here is the command I'm using to start minikube: minikube start --kubernetes-version=v1.25.3 --feature-gates=PodSecurity=true --extra-config=apiserver.enable-admission-plugins=PodSecurity
This is not really documented properly but I found that there is both a feature-gate for PSA and the admission controller plugin. Even enabling both seems to have no effect.
To make sure I'm not missing something, here is how I'm trying to test it:
Namespace configuration:
kind: Namespace
metadata:
labels:
pod-security.kubernetes.io/enforce: restricted
name: psa```
Super unsecure Deployment:
```apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment-unsecure
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
securityContext:
runAsUser: 0
runAsGroup: 0
fsGroup: 0
volumes:
- name: etcvol
hostPath:
path: "/etc"
type: Directory
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
securityContext:
allowPrivilegeEscalation: true
privileged: true
capabilities:
add: ["NET_ADMIN", "SYS_TIME"]```
When I try to create this deployment in the `psa` namespace it goes through without a hitch.

Ok I realized this was my misunderstanding of how PSA should work. The PSA controller doesn't check higher level resources like Deployments it seems. So the Deployment gets created just fine but then it can't create Pods because those would be violating the policies.
No configuration is needed at all for minikube (or Kind) at all (no feature gate or admission plugins configuration) if running Kubernetes 1.25.
kubectl run --image=nginx nginx
Error from server (Forbidden): pods "nginx" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "nginx" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "nginx" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "nginx" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "nginx" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

Related

Unable to fetch metrics from custom metrics API: the server is currently unable to handle the request

I'm using a HPA based on a custom metric on GKE.
The HPA is not working and it's showing me this error log:
unable to fetch metrics from custom metrics API: the server is currently unable to handle the request
When I run kubectl get apiservices | grep custom I get
v1beta1.custom.metrics.k8s.io services/prometheus-adapter False (FailedDiscoveryCheck) 135d
this is the HPA spec config :
spec:
scaleTargetRef:
kind: Deployment
name: api-name
apiVersion: apps/v1
minReplicas: 3
maxReplicas: 50
metrics:
- type: Object
object:
target:
kind: Service
name: api-name
apiVersion: v1
metricName: messages_ready_per_consumer
targetValue: '1'
and this is the service's spec config :
spec:
ports:
- name: worker-metrics
protocol: TCP
port: 8080
targetPort: worker-metrics
selector:
app.kubernetes.io/instance: api
app.kubernetes.io/name: api-name
clusterIP: 10.8.7.9
clusterIPs:
- 10.8.7.9
type: ClusterIP
sessionAffinity: None
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
What should I do to make it work ?
First of all, confirm that the Metrics Server POD is running in your kube-system namespace. Also, you can use the following manifest:
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-server
namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
labels:
k8s-app: metrics-server
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
name: metrics-server
labels:
k8s-app: metrics-server
spec:
serviceAccountName: metrics-server
volumes:
# mount in tmp so we can safely use from-scratch images and/or read-only containers
- name: tmp-dir
emptyDir: {}
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server-amd64:v0.3.1
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
imagePullPolicy: Always
volumeMounts:
- name: tmp-dir
mountPath: /tmp
If so, take a look into the logs and look for any stackdriver adapter’s line. This issue is commonly caused due to a problem with the custom-metrics-stackdriver-adapter. It usually crashes in the metrics-server namespace. To solve that, use the resource from this URL, and for the deployment, use this image:
gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.10.1
Another common root cause of this is an OOM issue. In this case, adding more memory solves the problem. To assign more memory, you can specify the new memory amount in the configuration file, as the following example shows:
apiVersion: v1
kind: Pod
metadata:
name: memory-demo
namespace: mem-example
spec:
containers:
- name: memory-demo-ctr
image: polinux/stress
resources:
limits:
memory: "200Mi"
requests:
memory: "100Mi"
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "150M", "--vm-hang", "1"]
In the above example, the Container has a memory request of 100 MiB and a memory limit of 200 MiB. In the manifest, the "--vm-bytes", "150M" argument tells the Container to attempt to allocate 150 MiB of memory. You can visit this Kubernetes Official Documentation to have more references about the Memory settings.
You can use the following threads for more reference GKE - HPA using custom metrics - unable to fetch metrics, Stackdriver-metadata-agent-cluster-level gets OOMKilled, and Custom-metrics-stackdriver-adapter pod keeps crashing.
What do you get for kubectl get pod -l "app.kubernetes.io/instance=api,app.kubernetes.io/name=api-name"?
There should be a pod, to which the service reffers.
If there is a pod, check its logs with kubectl logs <pod-name>. you can add -f to kubectl logs command, to follow the logs.
Adding this block in my EKS nodes security group rules solved the issue for me:
node_security_group_additional_rules = {
...
ingress_cluster_metricserver = {
description = "Cluster to node 4443 (Metrics Server)"
protocol = "tcp"
from_port = 4443
to_port = 4443
type = "ingress"
source_cluster_security_group = true
}
...
}

StatefulSet: Longer rolling update lead Version mismatching

Application is deployed on K8s using StatefulSet because of stateful in nature. There is around 250+ pods are running and HPA has been implemented on it too that can scale upto 400 pods.
When new deployment occurs, it takes longer time (~ 10-15m) to update all pods in Rolling Update fashion.
Problem: End user get response from 2 version of pods until all pods are replaced with new revision.
I am googling for an architecture where overall deployment time can be reduced and getting the best possible solutions to use BLUE/GREEN strategy but it has bunch of impact with integrated services like monitoring, logging, telemetry etc because of 2 naming conventions.
Ideally I am looking for a solutions like maxSurge for Deployment in which firstly new pods are created and then traffic are shifted to it at a time but in case of StatefulSet, it won't support maxSurge with RollingUpdate strategy & controller will delete and recreate each Pod in the StatefulSet based on ordinal index from bigger to smaller.
The solution is to do a partitioning rolling update along with a canary deployment.
Let’s suppose we have the statefulset workload defined by the following yaml file:
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
version: "1.20"
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
version: "1.20"
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx # Label selector that determines which Pods belong to the StatefulSet
# Must match spec: template: metadata: labels
serviceName: "nginx"
replicas: 3
template:
metadata:
labels:
app: nginx # Pod template's label selector
version: "1.20"
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: nginx:1.20
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
You could patch the statefulset to create a partition, and change the image and version label for the remaining pods: (In this case, since there are only 3 pods, the last one will be the one that will change its image.)
$ kubectl patch statefulset web -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":2}}}}'
$ kubectl patch statefulset web --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"nginx:1.21"}]'
$ kubectl patch statefulset web --type='json' -p='[{"op": "replace", "path": "/spec/template/metadata/labels/version", "value":"1.21"}]'
At this point, you have a pod with the new image and version label ready to use, but since the version label is different, the traffic is still going to the other two pods. If you change the version in the yaml file and apply the new configuration, the rollout will be transparent, since there is already a pod ready to migrate the traffic:
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
version: "1.21"
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
version: "1.21"
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx # Label selector that determines which Pods belong to the StatefulSet
# Must match spec: template: metadata: labels
serviceName: "nginx"
replicas: 3
template:
metadata:
labels:
app: nginx # Pod template's label selector
version: "1.21"
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
$ kubectl apply -f file-name.yaml
Once traffic is migrated to the pod containing the new image and version label, you should patch again the statefulset and remove the partition with the command kubectl patch statefulset web -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":0}}}}'
Note: You will need to be very careful with the size of the partition, since the remaining pods will handle the whole traffic for some time.

Get the Kubernetes uid of the Deployment that created the pod, from within the pod

I want to be able to know the Kubernetes uid of the Deployment that created the pod, from within the pod.
The reason for this is so that the Pod can spawn another Deployment and set the OwnerReference of that Deployment to the original Deployment (so it gets Garbage Collected when the original Deployment is deleted).
Taking inspiration from here, I've tried*:
Using field refs as env vars:
containers:
- name: test-operator
env:
- name: DEPLOYMENT_UID
valueFrom:
fieldRef: {fieldPath: metadata.uid}
Using downwardAPI and exposing through files on a volume:
containers:
volumeMounts:
- mountPath: /etc/deployment-info
name: deployment-info
volumes:
- name: deployment-info
downwardAPI:
items:
- path: "uid"
fieldRef: {fieldPath: metadata.uid}
*Both of these are under spec.template.spec of a resource of kind: Deployment.
However for both of these the uid is that of the Pod, not the Deployment. Is what I'm trying to do possible?
The behavior is correct, the Downward API is for pod rather than deployment/replicaset.
So I guess the solution is set the name of deployment manually in spec.template.metadata.labels, then adopt Downward API to inject the labels as env variables.
I think it's impossible to get the UID of Deployment itself, you can set any range of runAsUser while creating the deployment.
Try this command to get the UIDs of the existing pods:
kubectl get pod -o jsonpath='{range .items[*]}{#.metadata.name}{" runAsUser: "}{#.spec.containers[*].securityContext.runAsUser}{" fsGroup: "}{#.spec.securityContext.fsGroup}{" seLinuxOptions: "}{#.spec.securityContext.seLinuxOptions.level}{"\n"}{end}'
It's not the exact what you wanted to get, but it can be a hint for you.
To set the UID while creating the Deployment, see the example below:
apiVersion: apps/v1
kind: Deployment
metadata:
name: toolbox2
labels:
app: toolbox2
spec:
replicas: 3
selector:
matchLabels:
app: toolbox2
template:
metadata:
labels:
app: toolbox2
spec:
securityContext:
supplementalGroups: [1000620001]
seLinuxOptions:
level: s0:c25,c10
containers:
- name: net-toolbox
image: quay.io/wcaban/net-toolbox
ports:
- containerPort: 2000
securityContext:
runAsUser: 1000620001

Load is not balanced with Kubernetes Services

I created two replicas of nginx with following yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.20-alpine
ports:
- containerPort: 80
And I created service with:
apiVersion: v1
kind: Service
metadata:
name: nginx-test-service
spec:
selector:
app: nginx
ports:
- port: 8082
targetPort: 80
Everything looks good. But when I do
minikube service nginx-test-service
I am able to access the nginx. But when I see the two pods logs, the request is always going to single pod. The other pod is not getting any request.
But, kubernetes service should do the load balancing right?
Am I missing anything?
One way to get load balancing on-premise running is with ip virtual services. (ipvs). It;s a service which hands out ip's of the next pod to schedule/call
it's likely installed already.
lsmod | grep ip_vs
ip_vs_sh 16384 0
ip_vs_wrr 16384 0
ip_vs_rr 16384 19
Have your cni properly setup and run
kubectl edit cm -n kube-system kube-proxy
edit the ipvs section
set mode to ipvs
mode: "ipvs"
and the ipvs section
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: "rr"
As always there are lots of variables biting each other with k8s, but it is possible with ipvs.
https://kubernetes.io/blog/2018/07/09/ipvs-based-in-cluster-load-balancing-deep-dive/

Non uniform distribution of requests between internal services in minikube

I have 2 different microservices deployed as backend in minikube, call it deployment A and deployment B. Both these deployments have a different replica of pods running.
Deployment B is exposed as service B. Pods of deployment A call deployment B pods via service B which is of ClusterIP type.
The pods of deployment B have a scrapyd application running inside them with scraping spiders deployed in them. Each celery worker( pods of deployment A) takes a task from a redis queue and calls scrapyd service to schedule spiders on them.
Everything works fine but after I scale the application (deployment A and B seperately), I observe that the resource consumption is not uniform, using kubectl top pods
I observe that some of the pods of deployment B are not used at all. The pattern I observe is that only those pods of deployment B, that are up and running after all the pods of deployment A are up, are never utilized.
Is it normal behavior? I suspect the connection between pods of deployment A and B is persistent I am confused as to why request handling by pods of deployment B is not uniformly distributed? Sorry for the naive question. I am new to this field.
The manifest for deployment A is :
apiVersion: apps/v1
kind: Deployment
metadata:
name: celery-worker
labels:
deployment: celery-worker
spec:
replicas: 1
selector:
matchLabels:
pod: celery-worker
template:
metadata:
labels:
pod: celery-worker
spec:
containers:
- name: celery-worker
image: celery:latest
imagePullPolicy: Never
command: ['celery', '-A', 'mysite', 'worker', '-E', '-l', 'info',]
resources:
limits:
cpu: 500m
requests:
cpu: 200m
terminationGracePeriodSeconds: 200
and that of deployment B is
apiVersion: apps/v1
kind: Deployment
metadata:
name: scrapyd
labels:
app: scrapyd
spec:
replicas: 1
selector:
matchLabels:
pod: scrapyd
template:
metadata:
labels:
pod: scrapyd
spec:
containers:
- name: scrapyd
image: scrapyd:latest
imagePullPolicy: Never
ports:
- containerPort: 6800
resources:
limits:
cpu: 800m
requests:
cpu: 800m
terminationGracePeriodSeconds: 100
---
kind: Service
apiVersion: v1
metadata:
name: scrapyd
spec:
selector:
pod: scrapyd
ports:
- protocol: TCP
port: 6800
targetPort: 6800
Output of kubectl top pods :
So the solution to the above problem that I figured out is as follows :
In the current setup, install linkerd using this link. Linkerd Installation in Kubernetes
After that inject the linkerd proxy into celery deployment as follows :
cat celery/deployment.yaml | linkerd inject - | kubectl apply -f -
This ensures that the requests from celery are first passed on to this proxy and then Load balanced directly to scrapy at L7 Layer. In this case, kube-proxy is by passed and the default load balancing over L4 is no longer functional.