One Traefik Pod in Kubernetes fails with error: 'command traefik error: field not found, node: redirect' - kubernetes

I'm running Traefik on a Kubernetes cluster to manage Ingress, which has been running ok for a long time.
I recently implemented Cluster-Autoscaling, which works fine except that on one Node (newly created by the Autoscaler) Traefik won't start. It sits in CrashLoopBackoff, and when I log the Pod I get: [date] [time] command traefik error: field not found, node: redirect.
Google found no relevant results, and the error itself is not very descriptive, so I'm not sure where to look.
My best guess is that it has something to do with the RedirectRegex Middleware configured in Traefik's config file:
[entryPoints.http.redirect]
regex = "^http://(.+)(:80)?/(.*)"
replacement = "https://$1/$3"
Traefik actually works still - I can still access all of my apps from their urls in my browser, even those which are on the node with the dead Traefik Pod.
The other Traefik Pods on other Nodes still run happily, and the Nodes are (at least in theory) identical.

After further googling, I found this on Reddit. Turns out Traefik updated a few days ago to v2.0, which is not backwards compatible.
Only this pod had the issue, because it was the only one for which a new (v2.0) image was pulled (being the only recently created Node).
I reverted to v1.7 until I have time to fix it properly. Had update the Daemonset to use v1.7, then kill the Pod so it could be recreated from the old image.

The devs have a Migration Guide that looks like it may help.
"redirect" is gone but now there is "RedirectScheme" and "RedirectRegex" as a new concept of "Middlewares".
It looks like they are moving to a pipeline approach, so you can define a chain of "middlewares" to apply to an "entrypoint" to decide how to direct it and what to add/remove/modify on packets in that chain. "backends" are now "providers", and they have a clearer, modular concept of configuration. It looks like it will offer better organization than earlier versions.

Related

How to set http2-max-streams-per-connection with Kops

Is it possible to set the --http2-max-streams-per-connection value for a cluster created by Kops.
I have an interesting situation where one of my nodes falls into a NotReady status whenever I deploy a helm chart to it and I feel like it might be connected to this setting.
The nodes are generally fine and run without any issues, but once I deploy my helm chart, the status of whatever node it gets deployed to changes after a few minutes to NotReady Which I find weird.
I've done a bit of reading and seen a number of similar issues pointing to the setting --http2-max-streams-per-connection but I'm not how to go about setting this.
Any ideas anyone?
You should be able to set this by adding the following to the cluster spec:
spec:
kubeAPIServer:
http2MaxStreamsPerConnection: <value>
See https://pkg.go.dev/k8s.io/kops/pkg/apis/kops#KubeAPIServerConfig and https://kops.sigs.k8s.io/cluster_spec/
That being said, I do not believe the reason for your NotReady nodes is due to that setting. You may want to join #kops-users on the kubernetes slack space and ask for help triaging that problem.

How to install keycloak operator on IBM Cloud Kubernetes Service?

The operator is https://operatorhub.io/operator/keycloak-operator version 11.0.0.
The cluster is Kubernetes version 1.18.12.
I was able to follow the steps from OperatorHub.io to install the Operator Lifecycle Manager and the Keycloak "OperatorGroup" and "Subscription".
It took much longer than I was expecting (maybe 20 minutes?), but eventually the corresponding "ClusterServiceVersion" was created.
However, now when I try to use it by creating the following resource, it doesn't seem to be doing anything at all:
apiVersion: keycloak.org/v1alpha1
kind: Keycloak
metadata:
name: example-keycloak
namespace: keycloak
labels:
app: sso
spec:
instances: 1
externalAccess:
enabled: true
extensions:
- https://github.com/aerogear/keycloak-metrics-spi/releases/download/1.0.4/keycloak-metrics-spi-1.0.4.jar
It accepts the new resource, so I know the CRD is in place. The documentation states that it should create a stateful set, an ingress, and more, but it just doesn't seem to create anything.
I checked the cluster logs and this is the error that is jumping out to me:
olm-operator ERROR controllers.operator Could not update Operator status {"request": "/keycloak-operator.my-keycloak-operator", "error": "Operation cannot be fulfilled on operators.operators.coreos.com \"keycloak-operator.my-keycloak-operator\": the object has been modified; please apply your changes to the latest version and try again"}
I have quite a bit of experience with plain kubernetes, but I'm brand new to "operators" and so I'm really not sure where to look next wrt what might be going wrong.
Any hints/suggestions/explanations?
UPDATE: I was creating the keycloak resource in a namespace OTHER than the one I installed the operator into. Since it allowed me to create the custom resource (Kind: Keycloak) into this namespace, I thought this was supported. However, when I created the keycloak resource to the same namespace where the operator was installed (my-keycloak-operator), then it actually tried to do something. Its still failing to bring up the pod, mind you, but at least its trying to do something.
Will leave this question open for a bit to see if the "Could not update Operator status" is something I should be concerned about or not...
It looks like the operator or/and the components that it wants to bring up cannot do a write (POST/PUT) to the kube-apiserver.
From what you describe, it appears that the first time when you installed the operator on a different namespace it just didn't have permissions to bring up anything at all. The second time when you installed it on the right namespace it looks like the operator was able to talk to the kube-apiserver but the components that it's bring up (Keycloak, etc) are not able to.
I would check the logs on the kube-apiserver (control plane) to see if you have some unauthorized requests, also check the log files of the components (pods, deployments, etc) that the operator is trying to bring up.
If you have unauthorized requests you may have to manually update the RBAC rules. Finally, I would check with IBM cloud to see what specific permission its K8s control plane could have that is preventing applications to talk to it (the kube-apiserver).
✌️

Kubernetes 1.18 on EKS StartupProbe with old ServiceAccountName

I've got a deployment which worked just fine on K8S 1.17 on EKS. After upgrading K8S to 1.18, I tried to use startupProbe feature with a simple deployment. Everything works as expected. But when I tried to add the startupProbe to my production deployment, it didn't work. The cluster simply drops the startupProbe entry when creating pods (the startupProbe entry exists in deployment object definition on the cluster though). Interestingly when I change the serviceAccountName entry to default (instead of my application service account) in the deployment manifest, everything works as expected.
So the question now is, why existing service accounts can't have startup probes?
Thanks.
Posting this as a community member answer. Feel free to expand.
Issue
startupProbe is not applied to Pod if serviceAccountName is set
When adding serviceAccountName and startupProbeto the pod template in my deployment, the resulting pods will not have a startup probe.
There is github issue about that.
Solution
This issue is being addressed here, currently it is still open and there is no specific answer for this.
As mentioned by #mcristina422
I think this is due to the old version of k8s.io/api being used in the webhook. The API for the startup probe was added more recently. Updating the k8s packages should fix this

how to take a problematic pod offline to troubleshoot

HI I know there's a way i can pull out a problematic node out of loadbalancer to troubleshoot. But how can i pull a pod out of service to troubleshoot. What tools or command can do it ?
Change its labels so they no longer matches the selector: in the Service; we used to do that all the time. You can even put it back into rotation if you want to test a hypothesis. I don't recall exactly how quickly it takes effect, but I would guess "real quick" is a good approximation. :-)
## for example:
$ kubectl label pod $the_pod -app.kubernetes.io/name
## or, change it to non-matching
$ kubectl label pod $the_pod app.kubernetes.io/name=i-am-debugging-this-pod
As mentioned in Oreilly's "Kubernetes recipes: Maintenance and troubleshooting" page here
Removing a Pod from a Service
Problem
You have a well-defined service (see not available) backed by several
pods. But one of the pods is misbehaving, and you would like to take
it out of the list of endpoints to examine it at a later time.
Solution
Relabel the pod using the --overwrite option—this will allow you to
change the value of the run label on the pod. By overwriting this
label, you can ensure that it will not be selected by the service
selector (not available) and will be removed from the list of
endpoints. At the same time, the replica set watching over your pods
will see that a pod has disappeared and will start a new replica.
To see this in action, start with a straightforward deployment
generated with kubectl run (see not available):
For commands, check the recipes page mentioned above. There is also a section talking about "Debugging Pods" which will be helpful

pods keep creating themselves even I deleted all deployments

I am running k8s on aws, and I updated the deployment of nginx - which normally, it works fine-, but after this time, the nginx deployment won't show up in "kubectl get deployments".
I want to kill all the pods related to nginx, but they keep reproduce themselves. I deleted all deployments "kubectl delete --all deployments", other pods just got terminated, but not nginx.
I have no idea where I can stop the pods recreating.
any idea where to start ?
check the deployment, replication controller and replica set and remove them.
kubectl get deploy,rc,rs
In modern kubernetes, there is also an annotation kubernetes.io/created-by on the Pod showing its "owner", as seen here, but I can't lay my hands on the documentation link right now. However, I found a pastebin containing a concrete example of the contents of the annotation