How to install keycloak operator on IBM Cloud Kubernetes Service? - kubernetes

The operator is https://operatorhub.io/operator/keycloak-operator version 11.0.0.
The cluster is Kubernetes version 1.18.12.
I was able to follow the steps from OperatorHub.io to install the Operator Lifecycle Manager and the Keycloak "OperatorGroup" and "Subscription".
It took much longer than I was expecting (maybe 20 minutes?), but eventually the corresponding "ClusterServiceVersion" was created.
However, now when I try to use it by creating the following resource, it doesn't seem to be doing anything at all:
apiVersion: keycloak.org/v1alpha1
kind: Keycloak
metadata:
name: example-keycloak
namespace: keycloak
labels:
app: sso
spec:
instances: 1
externalAccess:
enabled: true
extensions:
- https://github.com/aerogear/keycloak-metrics-spi/releases/download/1.0.4/keycloak-metrics-spi-1.0.4.jar
It accepts the new resource, so I know the CRD is in place. The documentation states that it should create a stateful set, an ingress, and more, but it just doesn't seem to create anything.
I checked the cluster logs and this is the error that is jumping out to me:
olm-operator ERROR controllers.operator Could not update Operator status {"request": "/keycloak-operator.my-keycloak-operator", "error": "Operation cannot be fulfilled on operators.operators.coreos.com \"keycloak-operator.my-keycloak-operator\": the object has been modified; please apply your changes to the latest version and try again"}
I have quite a bit of experience with plain kubernetes, but I'm brand new to "operators" and so I'm really not sure where to look next wrt what might be going wrong.
Any hints/suggestions/explanations?
UPDATE: I was creating the keycloak resource in a namespace OTHER than the one I installed the operator into. Since it allowed me to create the custom resource (Kind: Keycloak) into this namespace, I thought this was supported. However, when I created the keycloak resource to the same namespace where the operator was installed (my-keycloak-operator), then it actually tried to do something. Its still failing to bring up the pod, mind you, but at least its trying to do something.
Will leave this question open for a bit to see if the "Could not update Operator status" is something I should be concerned about or not...

It looks like the operator or/and the components that it wants to bring up cannot do a write (POST/PUT) to the kube-apiserver.
From what you describe, it appears that the first time when you installed the operator on a different namespace it just didn't have permissions to bring up anything at all. The second time when you installed it on the right namespace it looks like the operator was able to talk to the kube-apiserver but the components that it's bring up (Keycloak, etc) are not able to.
I would check the logs on the kube-apiserver (control plane) to see if you have some unauthorized requests, also check the log files of the components (pods, deployments, etc) that the operator is trying to bring up.
If you have unauthorized requests you may have to manually update the RBAC rules. Finally, I would check with IBM cloud to see what specific permission its K8s control plane could have that is preventing applications to talk to it (the kube-apiserver).
✌️

Related

Create kubernetes events using kubernetes event api

I am pretty new in the kubernetes field, and have specific interest in kubernetes eventing. What I did was creating a K8 cluster, assign PODs to that cluster, and type kubectl get events command to see the corresponding events. Now for my work, I need to explore how can I create an K8 event resource using the eveting API provided here, so that using the kubectl get events command I can see the event stored in etcd. But as I mentioned I don't know K8 that deep, I'm struggling to make this api work.
N.B. I had a look into knative eventing, but seems to be the eventing feature Knative provides, is different than the K8 eveting, as I can't see the Knative events in the kubectl get events command. (please correct me if I am wrong).
Thanks in advance.
Usually events are created in Kubernetes as a way of informing of relevant status changes of an object.
Creating an event is as simple as:
kubectl apply -f - <<EOF
apiVersion: v1
kind: Event
metadata:
name: myevent
namespace: default
type: Warning
message: 'not sure if this what you want to do'
involvedObject:
kind: someObject
EOF
But usually this is done programmatically, using the Kubernetes client library of your choice and the core (v1) API group.
Knative Eventing is platform that enables Event Driven architectures, which does not seems related to your question, but if it were, you can find all getting started docs here:
Installation: https://knative.dev/docs/install/yaml-install/eventing/install-eventing-with-yaml/
Getting started: https://knative.dev/docs/eventing/getting-started/
Maybe this could be helpfull to. Knative have this concepts of "Event Sources" that already exists and serves as links for events for certain producers and sinks.
In the case of the K8's events API there is one called the APIServer Source: https://knative.dev/docs/eventing/sources/apiserversource/
A bit more about Event Sources here: https://knative.dev/docs/eventing/sources/

How to set MaxRevisionTimeoutSeconds in Knative?

I have deployed a service using Cloud run on gke which uses Knative as an abstraction over k8s. The default MaxRevisionTimeoutSeconds is set to 600s in the knative default config but according to this PR this is customizable.
I couldn't find anything in the official Knative documentation, can anybody help me out here?
UPDATE:
After digging a bit more in knative source code and documentation. It looks like that the MaxRevisionTimeoutSeconds is defined in resource=ConfigMap/config-defaults. So have to update it with custom value.
From this it looks like we can use something called as operator to modify the ConfigMap resource but it did not work probably because gcp's does not use operator to install Knative components. Anyways I went on to install the operator and then used resource=knativeserving to overwrite the config-defaults. But this also did not work when I tried re-deploying service.
The next solution is to directly edit the config-defaults using kubectl edit. I even tried doing this but encountered weird behavior. After editing the YAML file when I used kubectl describe to check the changed value, it sometimes shows the modified value, sometimes shows the old value, and sometimes doesn't even show that particular key-value pair in the YAML. Also, it doesn't work when trying to re-deploy the service after doing this edit.
If anyone can help me with this, it would be really great.
MaxRevisionTimeoutSeconds is a cluster-global setting which enforces the max value for TimeoutSeconds on each Revision. This value exists so that cluster administrators can set upper bounds on the amount of time a single HTTP request can be in the system. Knowing an upper bound can be useful when configuring graceful shutdown settings on the HTTP routing components to prevent dropped requests during upgrades.
It's possible that Cloud Run on GKE has overridden these configurations so that they can upgrade the underlying Istio and Knative components on a predictable schedule. (If you have a 10% upgrade budget and it takes 10m to drain a component, your minimum upgrade time is probably around 110m, taking into account additional scheduling / image fetch / startup time.)

One Traefik Pod in Kubernetes fails with error: 'command traefik error: field not found, node: redirect'

I'm running Traefik on a Kubernetes cluster to manage Ingress, which has been running ok for a long time.
I recently implemented Cluster-Autoscaling, which works fine except that on one Node (newly created by the Autoscaler) Traefik won't start. It sits in CrashLoopBackoff, and when I log the Pod I get: [date] [time] command traefik error: field not found, node: redirect.
Google found no relevant results, and the error itself is not very descriptive, so I'm not sure where to look.
My best guess is that it has something to do with the RedirectRegex Middleware configured in Traefik's config file:
[entryPoints.http.redirect]
regex = "^http://(.+)(:80)?/(.*)"
replacement = "https://$1/$3"
Traefik actually works still - I can still access all of my apps from their urls in my browser, even those which are on the node with the dead Traefik Pod.
The other Traefik Pods on other Nodes still run happily, and the Nodes are (at least in theory) identical.
After further googling, I found this on Reddit. Turns out Traefik updated a few days ago to v2.0, which is not backwards compatible.
Only this pod had the issue, because it was the only one for which a new (v2.0) image was pulled (being the only recently created Node).
I reverted to v1.7 until I have time to fix it properly. Had update the Daemonset to use v1.7, then kill the Pod so it could be recreated from the old image.
The devs have a Migration Guide that looks like it may help.
"redirect" is gone but now there is "RedirectScheme" and "RedirectRegex" as a new concept of "Middlewares".
It looks like they are moving to a pipeline approach, so you can define a chain of "middlewares" to apply to an "entrypoint" to decide how to direct it and what to add/remove/modify on packets in that chain. "backends" are now "providers", and they have a clearer, modular concept of configuration. It looks like it will offer better organization than earlier versions.

What is the difference between the core os projects kube-prometheus and prometheus operator?

The github repo of Prometheus Operator https://github.com/coreos/prometheus-operator/ project says that
The Prometheus Operator makes the Prometheus configuration Kubernetes native and manages and operates Prometheus and Alertmanager clusters. It is a piece of the puzzle regarding full end-to-end monitoring.
kube-prometheus combines the Prometheus Operator with a collection of manifests to help getting started with monitoring Kubernetes itself and applications running on top of it.
Can someone elaborate this?
I've always had this exact same question/repeatedly bumped into both, but tbh reading the above answer didn't clarify it for me/I needed a short explanation. I found this github issue that just made it crystal clear to me.
https://github.com/coreos/prometheus-operator/issues/2619
Quoting nicgirault of GitHub:
At last I realized that prometheus-operator chart was packaging
kube-prometheus stack but it took me around 10 hours playing around to
realize this.
**Here's my summarized explanation:
"kube-prometheus" and "Prometheus Operator Helm Chart" both do the same thing:
Basically the Ingress/Ingress Controller Concept, applied to Metrics/Prometheus Operator.
Both are a means of easily configuring, installing, and managing a huge distributed application (Kubernetes Prometheus Stack) on Kubernetes:**
What is the Entire Kube Prometheus Stack you ask? Prometheus, Grafana, AlertManager, CRDs (Custom Resource Definitions), Prometheus Operator(software bot app), IaC Alert Rules, IaC Grafana Dashboards, IaC ServiceMonitor CRDs (which auto-generate Prometheus Metric Collection Configuration and auto hot import it into Prometheus Server)
(Also when I say easily configuring I mean 1,000-10,000++ lines of easy for humans to understand config that generates and auto manage 10,000-100,000 lines of machine config + stuff with sensible defaults + monitoring configuration self-service, distributed configuration sharding with an operator/controller to combine config + generate verbose boilerplate machine-readable config from nice human-readable config.
If they achieve the same end goal, you might ask what's the difference between them?
https://github.com/coreos/kube-prometheus
https://github.com/helm/charts/tree/master/stable/prometheus-operator
Basically, CoreOS's kube-prometheus deploys the Prometheus Stack using Ksonnet.
Prometheus Operator Helm Chart wraps kube-prometheus / achieves the same end result but with Helm.
So which one to use?
Doesn't matter + they achieve the same end result + shouldn't be crazy difficult to start with 1 and switch to the other.
Helm tends to be faster to learn/develop basic mastery of.
Ksonnet is harder to learn/develop basic mastery of, but:
it's more idempotent (better for CICD automation) (but it's only a difference of 99% idempotent vs 99.99% idempotent.)
has built-in templating which means that if you have multiple clusters you need to manage / that you want to always keep consistent with each other. Then you can leverage ksonnet's templating to manage multiple instances of the Kube Prometheus Stack (for multiple envs) using a DRY code base with lots of code reuse. (If you only have a few envs and Prometheus doesn't need to change often it's not completely unreasonable to keep 4 helm values files in sync by hand. I've also seen Jinja2 templating used to template out helm values files, but if you're going to bother with that you may as well just consider ksonnet.)
Kubernetes operator are kubernetes specific application(pods) that configure, manage and optimize other Kubernetes deployments automatically. They are implemented as a custom controller.
According to official coreOS website:
Operators were introduced by CoreOS as a class of software that operates other software, putting operational knowledge collected by humans into software.
The prometheus operator provides the easy way to deploy configure and monitor your prometheus instances on kubernetes cluster. To do so, prometheus operator introduces three types of custom resource definition(CRD) in kubernetes.
Prometheus
Alertmanager
ServiceMonitor
Now, with the help of above CRD's, you can directly create a prometheus instance by providing kind: Prometheus and the prometheus instance is ready to serve, likewise you can do for AlertManager. Without this you would have to setup the deployment for prometheus with its image, configuration and many more things.
The Prometheus Operator serves to make running Prometheus on top of Kubernetes as easy as possible, while preserving Kubernetes-native configuration options.
Now, kube-prometheus implemented the prometheus operator and provides you minimum yaml files to create your basic setup of prometheus, alertmanager and grafana by running a single command.
git clone https://github.com/coreos/prometheus-operator.git
kubectl apply -f prometheus-operator/contrib/kube-prometheus/manifests/
By running above command in kube-prometheus directory, you will get a monitoring namespace which will have an instance of alertmanager, prometheus and grafana for UI. This is enough setup for most of the basic implementation and if you need any more specifics according to your application, you can add more yamls of exporter you need.
Kube-prometheus is more of a contribution to prometheus-operator project, which implements the prometheus operator functionality very well and provide you a complete monitoring setup for your kubernetes cluster. You can start with kube-prometheus and extend the functionality of your monitoring setup according to your application from there.
You can learn more about prometheus-operator here
As of today, 28-09-2020, this is the way to install Prometheus in a Kubernetes cluster
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack#kube-prometheus-stack
According to official documentation, kube-prometheus-stack is a rename of prometheus-operator.
As I understood, kube-prometheus-stack also has preinstalled grafana dashboards and prometheus rules.
Note: This chart was formerly named prometheus-operator chart, now
renamed to more clearly reflect that it installs the kube-prometheus
project stack, within which Prometheus Operator is only one component.
Taken from https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
Architecturally the container runs docker
The default container logs are managed by Docker, and the default log driver uses JSON-file
log-driver": "json-file
https://docs.docker.com/config/containers/logging/configure/
If the default jSON-file is used to manage container logs, log rotation is not performed by default. Therefore, the default JSON-file log driver the log files stored by the log driver can result in a large amount of disk space for containers that generate a large amount of output, which can cause disk space to run out.
In this case, save the log to ES, store it separately, and periodically delete the index using curator kubernetes
And run a scheduled task in K8S to delete the index periodically
Another solution for disk space is to periodically delete old logs from jSON-files
Typically we set the size and number of logs
This will set up a maximum of 10 log files, each with a maximum size of 20 Mb. Therefore, the container has a maximum of 200 Mb of logs
"log-driver": "json-file", "log-opts": { "max-size": "20m", "max-file": "10" },
Note: In general, the default Docker log is placed
/var/lib/docker/containers/
But in the same case kubernetes also saves logs and creates a directory structure to help you find pods-based logs, so you can find container logs for each Pod running on a node
/var/log/pods/<namespace>_<pod_name>_<pod_id>/<container_name>/
When removing pod, / var/lib/container under the docker/containers/log and k8s created under/var/log/pods/pod log will be deleted
For example, if the POD is restarted during production, the pod log will be deleted whether it is on the original node or jumped to another node
Therefore, this log needs to be saved in ES for centralized management. Many R&D projects will check the log for troubleshooting in most cases

Kubernetes: security context and IPC_LOCK capability

I'm trying to install a helm package that needs IPC_LOCK capability. So, I'm getting this message:
Error creating: pods "pod..." is forbidden: unable to validate against any security context constraint: [capabilities.add: Invalid value: "IPC_LOCK": capability may not be added capabilities.add: Invalid value: "IPC_LOCK": capability may not be added]
You can see the DeploymentConfig here.
I'm installing Vault using a Helm chart, so I'm not able to change DeploymentConfig.
I guess the only way to get it would be using a service account with an scc associated allowing it to perform the container.
How could I solve that?
I haven't worked on vault yet, so my answers might not be accurate.
But I think you can remove that capability and disable m_lock in vault config.
https://www.vaultproject.io/docs/configuration/index.html#disable_mlock
Having said that, I don't think kubernetes supports memory swapping anyway (someone needs to verify this) therefore a syscall to mlock might not be needed.