Kubernetes: security context and IPC_LOCK capability - kubernetes

I'm trying to install a helm package that needs IPC_LOCK capability. So, I'm getting this message:
Error creating: pods "pod..." is forbidden: unable to validate against any security context constraint: [capabilities.add: Invalid value: "IPC_LOCK": capability may not be added capabilities.add: Invalid value: "IPC_LOCK": capability may not be added]
You can see the DeploymentConfig here.
I'm installing Vault using a Helm chart, so I'm not able to change DeploymentConfig.
I guess the only way to get it would be using a service account with an scc associated allowing it to perform the container.
How could I solve that?

I haven't worked on vault yet, so my answers might not be accurate.
But I think you can remove that capability and disable m_lock in vault config.
https://www.vaultproject.io/docs/configuration/index.html#disable_mlock
Having said that, I don't think kubernetes supports memory swapping anyway (someone needs to verify this) therefore a syscall to mlock might not be needed.

Related

How to install keycloak operator on IBM Cloud Kubernetes Service?

The operator is https://operatorhub.io/operator/keycloak-operator version 11.0.0.
The cluster is Kubernetes version 1.18.12.
I was able to follow the steps from OperatorHub.io to install the Operator Lifecycle Manager and the Keycloak "OperatorGroup" and "Subscription".
It took much longer than I was expecting (maybe 20 minutes?), but eventually the corresponding "ClusterServiceVersion" was created.
However, now when I try to use it by creating the following resource, it doesn't seem to be doing anything at all:
apiVersion: keycloak.org/v1alpha1
kind: Keycloak
metadata:
name: example-keycloak
namespace: keycloak
labels:
app: sso
spec:
instances: 1
externalAccess:
enabled: true
extensions:
- https://github.com/aerogear/keycloak-metrics-spi/releases/download/1.0.4/keycloak-metrics-spi-1.0.4.jar
It accepts the new resource, so I know the CRD is in place. The documentation states that it should create a stateful set, an ingress, and more, but it just doesn't seem to create anything.
I checked the cluster logs and this is the error that is jumping out to me:
olm-operator ERROR controllers.operator Could not update Operator status {"request": "/keycloak-operator.my-keycloak-operator", "error": "Operation cannot be fulfilled on operators.operators.coreos.com \"keycloak-operator.my-keycloak-operator\": the object has been modified; please apply your changes to the latest version and try again"}
I have quite a bit of experience with plain kubernetes, but I'm brand new to "operators" and so I'm really not sure where to look next wrt what might be going wrong.
Any hints/suggestions/explanations?
UPDATE: I was creating the keycloak resource in a namespace OTHER than the one I installed the operator into. Since it allowed me to create the custom resource (Kind: Keycloak) into this namespace, I thought this was supported. However, when I created the keycloak resource to the same namespace where the operator was installed (my-keycloak-operator), then it actually tried to do something. Its still failing to bring up the pod, mind you, but at least its trying to do something.
Will leave this question open for a bit to see if the "Could not update Operator status" is something I should be concerned about or not...
It looks like the operator or/and the components that it wants to bring up cannot do a write (POST/PUT) to the kube-apiserver.
From what you describe, it appears that the first time when you installed the operator on a different namespace it just didn't have permissions to bring up anything at all. The second time when you installed it on the right namespace it looks like the operator was able to talk to the kube-apiserver but the components that it's bring up (Keycloak, etc) are not able to.
I would check the logs on the kube-apiserver (control plane) to see if you have some unauthorized requests, also check the log files of the components (pods, deployments, etc) that the operator is trying to bring up.
If you have unauthorized requests you may have to manually update the RBAC rules. Finally, I would check with IBM cloud to see what specific permission its K8s control plane could have that is preventing applications to talk to it (the kube-apiserver).
✌️

How to set MaxRevisionTimeoutSeconds in Knative?

I have deployed a service using Cloud run on gke which uses Knative as an abstraction over k8s. The default MaxRevisionTimeoutSeconds is set to 600s in the knative default config but according to this PR this is customizable.
I couldn't find anything in the official Knative documentation, can anybody help me out here?
UPDATE:
After digging a bit more in knative source code and documentation. It looks like that the MaxRevisionTimeoutSeconds is defined in resource=ConfigMap/config-defaults. So have to update it with custom value.
From this it looks like we can use something called as operator to modify the ConfigMap resource but it did not work probably because gcp's does not use operator to install Knative components. Anyways I went on to install the operator and then used resource=knativeserving to overwrite the config-defaults. But this also did not work when I tried re-deploying service.
The next solution is to directly edit the config-defaults using kubectl edit. I even tried doing this but encountered weird behavior. After editing the YAML file when I used kubectl describe to check the changed value, it sometimes shows the modified value, sometimes shows the old value, and sometimes doesn't even show that particular key-value pair in the YAML. Also, it doesn't work when trying to re-deploy the service after doing this edit.
If anyone can help me with this, it would be really great.
MaxRevisionTimeoutSeconds is a cluster-global setting which enforces the max value for TimeoutSeconds on each Revision. This value exists so that cluster administrators can set upper bounds on the amount of time a single HTTP request can be in the system. Knowing an upper bound can be useful when configuring graceful shutdown settings on the HTTP routing components to prevent dropped requests during upgrades.
It's possible that Cloud Run on GKE has overridden these configurations so that they can upgrade the underlying Istio and Knative components on a predictable schedule. (If you have a 10% upgrade budget and it takes 10m to drain a component, your minimum upgrade time is probably around 110m, taking into account additional scheduling / image fetch / startup time.)

Kubernetes Operator CSV stuck pending

I'm trying to install a Kubernetes operator into an OpenShift cluster using OLM 0.12.0. I ran oc create -f my-csv.yaml to install it. It is created successfully, but I do not get any results.
In the olm operator logs I find this message:
level=info msg="couldn't ensure RBAC in target namespaces" csv=my-operator.v0.0.5 error="no owned roles found" id=d1h5n namespace=playground phase=Pending
I also note that there is no InstallPlan created to make the accounts that I thought it was making.
What's wrong?
This message probably means that the RBAC assigned to your service account does not match the requirements specified by CSV (cluster service version).
In other words, while creating an operator you define CSV which defines the requirements for creating your custom resource. Then, when operator creates the resource it checks if the used service account fulfills these requirements.
You can check the Hazelcast Operator we created. It has some requirements regarding RBAC. So, before installing it, you need to apply the following RBAC file.

Google Cloud Kubernetes Persistent Volume Claim error in deployment Yaml

I have a persistent volume claim file, which previously was being read by buildkite in the deployment stage. Only recently it has been erroring in the build process with this error:
error: error validating "kube/common/01-redis-volume-claim.yml": error validating data: field
spec.dataSource for v1.PersistentVolumeClaimSpec is required; if you choose to ignore these
errors, turn validation off with --validate=false
I've seen this issue crop up twice recently, and the immediate fix is to add the missing field (spec.dataSource) and setting it to null.
My question is, if it was absent in the first instance, then will setting it to null be any different than what it was previously?
Based on documentation
spec.dataSource should have:
name: existing-src-pvc-name
kind: PersistentVolumeClaim
In my opinion everything you should do is add name and kind to your yaml file and there should not be any error anymore.
My question is, if it was absent in the first instance, then will setting it to null be any different than what it was previously?
Answering to this question, as far as I am concerned it is happening because you are not creating a new pvc, but you are likely to cloning it.
Volume clone feature was added to support CSI Volume Plugins only. For details, see volume cloning.
The CSI Volume Cloning feature adds support for specifying existing PVCs in the dataSource field to indicate a user would like to clone a Volume.

kubernetes petset on google cloud

I am running a kubernetes cluster on google cloud(version 1.3.5) .
I found a redis.yaml
that uses petset to create a redis cluster but when i run kubectl create -f redis.yaml i get the following error :
error validating "redis.yaml": error validating data: the server could not find the requested resource (get .apps); if you choose to ignore these errors, turn validation off with --validate=false
i cant find why i get this error or how to solve this.
PetSet is currently an alpha feature (which you can tell because the apiVersion in the linked yaml file is apps/v1alpha1). It may not be obvious, but alpha features are not supported in Google Container Engine.
As described in api_changes.md, alpha level API objects are disabled by default, have no guarantees that they will exist in future versions, can break compatibility with older versions at any time, and may destabilize the cluster.
I'm using PetSet with some success, for example https://github.com/Yolean/kubernetes-mysql-cluster, in zone europe-west1-d but when I tried europe-west1-c I got the aforementioned error.
Google just enabled Alpha Clusters for GKE as announced here: https://cloud.google.com/container-engine/docs/alpha-clusters
Now you are able (but not SLA covered) to use all alpha features within an alpha cluster, what was disable previously.