Cert-manager fails on kubernetes with webhooks - kubernetes

I'm following the Kubernetes install instructions for Helm: https://docs.cert-manager.io/en/latest/getting-started/install/kubernetes.html
With Cert-manager v0.81 on K8 v1.15, Ubuntu 18.04 on-premise.
When I get to testing the installation, I get these errors:
error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "issuers.admission.certmanager.k8s.io": the server is currently unable to handle the request
Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "certificates.admission.certmanager.k8s.io": the server is currently unable to handle the request
If I apply the test-resources.yaml before installing with Helm, I'm not getting the errors but it is still not working.
These errors are new to me, as Cert-manager used to work for me on my previous install about a month ago, following the same installation instructions.
I've tried with Cert-Manager 0.72(CRD 0.7) as well as I think that was the last version I managed to get installed but its not working either.
What does these errors mean?
Update: It turned out to be an internal CoreDNS issue on my cluster. Somehow not being configured correctly. Possible related to wrong POD_CIDR configuration.

If you experience this problem, check the logs of CoreDNS(Or KubeDNS) and you may see lots of errors related to contacting services. Unfortunately, I no longer have the errors.
But this is how I figured out that my network setup was invalid.
I'm using Calico(Will apply for other networks as well) and its network was not set to the same as the POD_CIDR network that I initialized my Kubernetes with.
Example
1. Set up K8:
kubeadm init --pod-network-cidr=10.244.0.0/16
Configure Calico.yaml:
- name: CALICO_IPV4POOL_CIDR
value: "10.244.0.0/16"

I also tried cert-manager v0.8.0 a very similar setup on Ubuntu 18.04 and k8s v1.14.1 and I began to get the same error when i tore down cert-manager using kubectl delete and reinstalled it, after experiencing some network issues on the cluster.
I stumbled on a solution that worked. On the master node, simply restart the apiserver container:
$ sudo docker ps -a | grep apiserver
af99f816c7ec gcr.io/google_containers/kube-apiserver#sha256:53b987e5a2932bdaff88497081b488e3b56af5b6a14891895b08703129477d85 "/bin/sh -c '/usr/loc" 15 months ago Up 19 hours k8s_kube-apiserver_kube-apiserver-ip-xxxxxc_0
40f3a18050c3 gcr.io/google_containers/pause-amd64:3.0 "/pause" 15 months ago Up 15 months k8s_POD_kube-apiserver-ip-xxxc_0
$ sudo docker restart af99f816c7ec
af99f816c7ec
$
Then try applying the test-resources.yaml again:
$ kubectl apply -f test-resources.yaml
namespace/cert-manager-test unchanged
issuer.certmanager.k8s.io/test-selfsigned created
certificate.certmanager.k8s.io/selfsigned-cert created
If that does not work, this github issue mentions that the master node might need firewall rules to be able to reach the cert-manager-webhook pod. The exact steps to do so will depend on which cloud platform you are on.

Related

Failing to run Mattermost locally on a Kubernetes cluster using Minikube

Summary in one sentence
I want to deploy Mattermost locally on a Kubernetes cluster using Minikube
Steps to reproduce
I used this tutorial and the Github documentation:
https://mattermost.com/blog/how-to-get-started-with-mattermost-on-kubernetes-in-just-a-few-minutes/
https://github.com/mattermost/mattermost-operator/tree/v1.15.0
To start minikube: minikube start --kubernetes-version=v1.21.5
To start ingress; minikube addons enable ingress
I cloned the Github repo with tag v1.15.0 (second link)
In the Github documentation (second link) they state that you need to install Custom Resources by running: kubectl apply -f ./config/crd/bases
Afterwards I installed MinIO and MySQL operators by running: make mysql-minio-operators
Started the Mattermost-operator locally by running: go run .
In the end I deployed Mattermost (I followed step 2, 7 and 9 from the first link)
Observed behavior
Unfortunately I keep getting the following error in the mattermost-operator:
INFO[1419] [opr.controllers.Mattermost] Reconciling Mattermost Request.Name=mm-demo Request.Namespace=mattermost
INFO[1419] [opr.controllers.Mattermost] Updating resource Reconcile=fileStore Request.Name=mm-demo Request.Namespace=mattermost kind="&TypeMeta{Kind:,APIVersion:,}" name=mm-demo-minio namespace=mattermost patch="{\"status\":{\"availableReplicas\":0}}"
INFO[1419] [opr.controllers.Mattermost.health-check] mattermost pod not ready: pod mm-demo-ccbd46b9c-9nq8k is in state 'Pending' Request.Name=mm-demo Request.Namespace=mattermost
INFO[1419] [opr.controllers.Mattermost.health-check] mattermost pod not ready: pod mm-demo-ccbd46b9c-tp567 is in state 'Pending' Request.Name=mm-demo Request.Namespace=mattermost
ERRO[1419] [opr.controllers.Mattermost] Error checking Mattermost health Request.Name=mm-demo Request.Namespace=mattermost error="found 0 updated replicas, but wanted 2"
By using k9s I can see that mm-demo won't start. See below for photo.
Another variation of deployment
Also tried another variation by following all the steps from the first link (without the licences secret step). At this point the mattermost-operator is visible using k9s and won't getting any errors. But unfortunately the mm-demo pod keeps crashing (empty logs, so seeing no errors or something).
Anybody an idea?
As #Ashish faced the same issue, he fixed it by upgrading the resources.
Minikube will be able to run all the pods by running minikube start --kubernetes-version=v1.21.5 --memory 4000 --cpus 4

EKS kubectl logs <podname> suddenly stop working

I have pods running on eks, and pulling the container logs worked fine couple days ago. but today when i tried to run kubectl logs podname i get a tls error.
Error from server: Get "https://host:10250/containerLogs/dev/pod-748b649458-bczdq/server": remote error: tls: internal error
does anyone know how to fix this? the other answers on stackoverflow seems to suggest deleting the kubenetes cluster and rebuilding it...... is there no better solutions?
This could probably due to some firewall rules or security settings that were introduced. I would encourage you to check it along with the following troubleshooting steps -
Ensure all EKS nodes are in running state.
Restart nodes as required
Checking networking configuration and see if other kubectl commands are running.

Kubectl connection refused existing cluster

Hope someone can help me.
To describe the situation in short, I have a self managed k8s cluster, running on 3 machines (1 master, 2 worker nodes). In order to make it HA, I attempted to add a second master to the cluster.
After some failed attempts, I found out that I needed to add controlPlaneEndpoint configuration to kubeadm-config config map. So I did, with masternodeHostname:6443.
I generated the certificate and join command for the second master, and after running it on the second master machine, it failed with
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available
Checking the first master now, I get connection refused for the IP on port 6443. So I cannot run any kubectl commands.
Tried recreating the .kube folder, with all the config copied there, no luck.
Restarted kubelet, docker.
The containers running on the cluster seem ok, but I am locked out of any cluster configuration (dashboard is down, kubectl commands not working).
Is there any way I make it work again? Not losing any of the configuration or the deployments already present?
Thanks! Sorry if it’s a noob question.
Cluster information:
Kubernetes version: 1.15.3
Cloud being used: (put bare-metal if not on a public cloud) bare-metal
Installation method: kubeadm
Host OS: RHEL 7
CNI and version: weave 0.3.0
CRI and version: containerd 1.2.6
This is an old, known problem with Kubernetes 1.15 [1,2].
It is caused by short etcd timeout period. As far as I'm aware it is a hard coded value in source, and cannot be changed (feature request to make it configurable is open for version 1.22).
Your best bet would be to upgrade to a newer version, and recreate your cluster.

Cannot get fabric8 to start on local development machine (OSX or Linux)

I'm trying to give fabric8 a shot but I'm having issues getting it to start on a local machine running minikube and virtualbox (I've attempted this on Linux and OSX. I'm able to get all but one of the pods to start (after manually increasing minikube's VM ram to 8GB). The expose controller won't start and is giving me the following error in the logs:
I0415 14:29:43.431944 1 exposecontroller.go:47] Using build: '2.3.2'
F0415 14:29:43.492059 1 exposecontroller.go:66] failed to create new strategy: failed to create node port expose strategy: failed to list nodes: nodes is forbidden: User "system:serviceaccount:fabric8:exposecontroller" cannot list nodes at the cluster scope
Here are the commands I'm running:
minikube start --cpus=5 --disk-size=50g --memory=8000
curl -sS http://get.fabric8.io/download.txt | bash
gofabric8 start
I also tried creating OAuth secret via GitHub (using bogus IP address info for the redirect URL) but this doesn't make sense to me because I don't have a domain... Then I ran these:
minikube start --vm-driver=xhyve --cpus=5 --disk-size=50g --memory=8000
minikube addons enable ingress
gofabric8 deploy --package system -n fabric8
That resulted in the exposecontroller working but then additional pods (keycloak, for example) were created but failed to start.
I've spent hours trying to get this to work and am about to give up. The documentation on GitHub differs from fabric8's site documentation and I just can't get it to work. If someone is able to help, I would greatly appreciate it.
Note:
I've attempted to follow the instructions here:
http://fabric8.io/guide/getStarted/gofabric8.html
Additionally, I attempted to follow this:
https://github.com/fabric8io/fabric8-platform/blob/master/INSTALL.md

Failed to create pod sandbox kubernetes cluster

I have an weave network plugin.
inside my folder /etc/cni/net.d there is a 10-weave.conf
{
"name": "weave",
"type": "weave-net",
"hairpinMode": true
}
My weave pods are running and the dns pod is also running
But when i want to run a pod like a simple nginx wich will pull an nginx image
The pod stuck at container creating , describe pod gives me the error , failed create pod sandbox.
When i run journalctl -u kubelet i get this error
cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
is my network plugin not good configured ?
i used this command to configure my weave network
kubectl apply -f https://git.io/weave-kube-1.6
After this won't work i also tried this command
kubectl apply -f “https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d ‘\n’)”
I even tried flannel and that gives me the same error.
The system i am setting kubernetes on is a raspberry pi.
I am trying to build a raspberry pi cluster with 3 nodes and 1 master with kubernetes
Dose anyone have ideas on this?
Thank you all for responding to my question. I solved my problem now. For anyone who has come to my question in the future the solution was as followed.
I cloned my raspberry pi images because i wanted a basicConfig.img for when i needed to add a new node to my cluster of when one gets down.
Weave network (the plugin i used) got confused because on every node and master the os had the same machine-id. When i deleted the machine id and created a new one (and reboot the nodes) my error got fixed. The commands to do this was
sudo rm /etc/machine-id
sudo rm /var/lib/dbus/machine-id
sudo dbus-uuidgen --ensure=/etc/machine-id
Once again my patience was being tested. Because my kubernetes setup was normal and my raspberry pi os was normal. I founded this with the help of someone in the kubernetes community. This again shows us how important and great are IT community is. To the people of the future who will come to this question. I hope this solution will fix your error and will decrease the amount of time you will be searching after a stupid small thing.
Looking at the pertinent code in Kubernetes and in CNI, the specific error you see seems to indicate that it cannot find any files ending in .json, .conf or .conflist in the directory given.
This makes me think it could be something as the conf file not being present on all the hosts, so I would verify that as a first step.