Cert-manager order is in invalid state - kubernetes

I’m migrating from a GitLab managed Kubernetes cluster to a self managed cluster. In this self managed cluster need to install nginx-ingress and cert-manager. I have already managed to do the same for a cluster used for review environments. I use the latest Helm3 RC to managed this, so I won’t need Tiller.
So far, I ran these commands:
# Add Helm repos locally
helm repo add stable https://kubernetes-charts.storage.googleapis.com
helm repo add jetstack https://charts.jetstack.io
# Create namespaces
kubectl create namespace managed
kubectl create namespace production
# Create cert-manager crds
kubectl apply --validate=false -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.11/deploy/manifests/00-crds.yaml
# Install Ingress
helm install ingress stable/nginx-ingress --namespace managed --version 0.26.1
# Install cert-manager with a cluster issuer
kubectl apply -f config/production/cluster-issuer.yaml
helm install cert-manager jetstack/cert-manager --namespace managed --version v0.11.0
This is my cluster-issuer.yaml:
# Based on https://docs.cert-manager.io/en/latest/reference/issuers.html#issuers
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: XXX # This is an actual email address in the real resource
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- selector: {}
http01:
ingress:
class: nginx
I installed my own Helm chart named docs. All resources from the Helm chart are installed as expected. Using cURL, I can fetch the page over HTTP. Google Chrome redirects me to an HTTPS page with an invalid certificate though.
The additional following resources have been created:
$ kubectl get secrets
NAME TYPE DATA AGE
docs-tls kubernetes.io/tls 3 18m
$ kubectl get certificaterequests.cert-manager.io
NAME READY AGE
docs-tls-867256354 False 17m
$ kubectl get certificates.cert-manager.io
NAME READY SECRET AGE
docs-tls False docs-tls 18m
$ kubectl get orders.acme.cert-manager.io
NAME STATE AGE
docs-tls-867256354-3424941167 invalid 18m
It appears everything is blocked by the cert-manager order in an invalid state. Why could it be invalid? And how do I fix this?

It turns out that in addition to a correct DNS A record for #, there were some AAAA records that pointed to an IPv6 address I don’t know. Removing those records and redeploying resolved the issue for me.

Related

ApiVersions missing in updated Cert-manager. Cert-manager Conversion webhook for cert-manager.io/v1alpha2, Kind=ClusterIssuer failed

When I am trying to get cluster issuers in my k8s cluster, I am receiving this error message. Can someone help me troubleshoot this?
kubectl get cluster issuers
Output: Error from server: conversion webhook for cert-manager.io/v1alpha2, Kind=ClusterIssuer failed: an error on the server ("") has prevented the request from succeeding
Here's my clusterissuer.yml file:
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
name: letsencrypt-dev-certificate
namespace: cert-manager
spec:
acme:
# The ACME server URL
server: https://acme-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: my.email#org.co
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-dev-certificate
# Enable the HTTP-01 challenge provider
solvers:
- http01:
ingress:
class: nginx
Here's the kubernetes version:
kubectl version --short
Output: Client Version: v1.23.1
Server Version: v1.22.12-gke.2300
Initially, I thought the problem was when I was trying to get ClusterIssuer or Certificates in Cert-Manager. But later after more troubleshooting, I found that after Cert-Manager 1.5.5, we cannot directly update to Cert-Manager 1.9.1 since some CRD resource gets missed out in between hence causing issue in getting resources like Certificates, ClusterIssuer which are defined as CRDs in Cert-Manager package.
Solution: I downgraded my cert-manager back to 1.5.5 then followed the steps from Cert-Manager's blog on making CRDs ready for Cert-Manager update. Then, updated Cert-Manager back to 1.9.1. And everything was working fine. Here're the steps that I followed:
Downgrade back to Cert-Manager 1.5.5:
kaf https://github.com/cert-manager/cert-manager/releases/download/v1.5.5/cert-manager.yaml
Install cmctl for cert-manager:
curl -sSL -o cmctl.tar.gz https://github.com/cert-manager/cert-manager/releases/download/v1.9.1/cmctl-linux-amd64.tar.gz
tar xzf cmctl.tar.gz
mv cmctl /usr/local/bin
cmctl upgrade --help
Upgrade api-versions before cert-manager updates:
cmctl upgrade migrate-api-version
Now, its safe to upgrade cert-manager to 1.7.1
4. Upgrade to Cert-Manager 1.7.1 or 1.9.1:
kaf https://github.com/cert-manager/cert-manager/releases/download/v1.7.2/cert-manager.yaml
kaf https://github.com/cert-manager/cert-manager/releases/download/v1.7.2/cert-manager.yaml
Verify by fetching the CRD resources:
kubectl get clusterissuer
kubectl get certificates

How can I fix 'failed calling webhook "webhook.cert-manager.io"'?

I'm trying to set up a K3s cluster. When I had a single master and agent setup cert-manager had no issues. Now I'm trying a 2 master setup with embedded etcd. I opened TCP ports 6443 and 2379-2380 for both VMs and did the following:
VM1: curl -sfL https://get.k3s.io | sh -s server --token TOKEN --cluster-init
VM2: curl -sfL https://get.k3s.io | sh -s server --token TOKEN --server https://MASTER_IP:6443
# k3s kubectl get nodes
NAME STATUS ROLES AGE VERSION
VM1 Ready control-plane,etcd,master 130m v1.22.7+k3s1
VM2 Ready control-plane,etcd,master 128m v1.22.7+k3s1
Installing cert-manager works fine:
# k3s kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.0/cert-manager.yaml
# k3s kubectl get pods --namespace cert-manager
NAME READY STATUS
cert-manager-b4d6fd99b-c6fpc 1/1 Running
cert-manager-cainjector-74bfccdfdf-gtmrd 1/1 Running
cert-manager-webhook-65b766b5f8-brb76 1/1 Running
My manifest has the following definition:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: info#example.org
privateKeySecretRef:
name: letsencrypt-account-key
solvers:
- selector: {}
http01:
ingress: {}
Which results in the following error:
# k3s kubectl apply -f manifest.yaml
Error from server (InternalError): error when creating "manifest.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": context deadline exceeded
I tried disabling both firewalls, waiting a day, reset and re-setup, but the error persists. Google hasn't been much help either. The little info I can find goes over my head for the most part and no tutorial seems to do any extra steps.
Try to specify the proper ingress class name in your Cluster Issuer, like this:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: info#example.org
privateKeySecretRef:
name: letsencrypt-account-key
solvers:
- http01:
ingress:
class: nginx
Also, make sure that you have the cert manager annotation and the tls secret name specified in your Ingress like this:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: letsencrypt
...
spec:
tls:
- hosts:
- domain.com
secretName: letsencrypt-account-key
A good starting point for troubleshooting issues with the webhook can be found int the docs, e.g. there is a section for problems on GKE private clusters.
In my case, however, this didn't really solve the problem. For me the issue was that when I played around with cert-manager I happen to install and uninstall it multiple times. It turned out that just removing the namespace, e.g. kubectl delete namespace cert-manager didn't remove the webhooks and other non-obvious resources.
Following the official guide for uninstalling cert-manager and applying the manifests again solved the issue.
I do this it, and work for me.
helm install
cert-manager jetstack/cert-manager
--namespace cert-manager
--create-namespace
--version v1.8.0
--set webhook.securePort=10260
source: https://hackmd.io/#maelvls/debug-cert-manager-webhook

Kubernetes - NginxIngressController resources not creating

We are using Nginx ingress operator version 0.2.0 and the controller version 1.11.1. Following steps are completed to deploy the CRD and operator.
https://github.com/nginxinc/nginx-ingress-operator/blob/release-0.2.0/docs/manual-installation.md
After that, we are deploying the controller using the following yaml:
apiVersion: k8s.nginx.org/v1alpha1
kind: NginxIngressController
metadata:
name: nginx-ingress-controller
namespace: ingress-nginx
spec:
type: deployment
image:
repository: nginx/nginx-ingress
tag: 1.11.1
pullPolicy: Always
serviceType: NodePort
nginxPlus: False
The manifest gets applied successfully but none of the required resources are getting created (deployment and service). Hence, the ingress is not getting the address.
kubectl get all -n ingress-nginx
No resources found in ingress-nginx namespace.
kubectl get ing
NAME CLASS HOSTS ADDRESS PORTS AGE
my-ingress <none> * 80 6h23m
kubeadm, kubelet & kubectl version 1.21.2.
Earlier we had deployed it on minikube and it was working fine.
I have reproduced the use case using Nginx ingress operator version 0.4.0 and the controller version 2.0.x by following the documentation and successfully created the Nginx Ingress Operator and NginxIngressController. Firstly, I didn't create the namespace ingress-nginx,While running the command
kubectl get all -n ingress-nginx. I was getting the error No resources found in ingress-nginx namespace.
After creating the required namespace by running the command kubectl create namespace ingress-nginx , I am able to get the resources(pod,service,deployment,replica set) successfully.
Can you try again by changing the nginx controller and operator versions to the latest one and also check the configurations correctly.

how to move prometheus adapter to another namespace?

For now I have prometheus and prometheus adapter in different namespaces:
I tried to configure adapter YML but I was not successful:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
creationTimestamp: "2020-01-30T08:49:05Z"
generation: 2
labels:
app: prometheus-adapter
chart: prometheus-adapter-2.0.1
heritage: Tiller
release: prometheus-adapter
name: prometheus-adapter
namespace: my-custom-namespace
resourceVersion: "18513075"
selfLink: /apis/apps/v1/namespaces/my-custom-namespace/deployments/prometheus-adapter
...
But I see error:
the namespace of the object (my-custom-namespace) does not match the namespace on the request (default)
How to fix it ?
You can not edit an existing resource to change namespace.You need to delete the existing deployment first and then recreate the deployment in another namespace.
Edit:
With Helm2 you need to delete the release first helm delete --purge release-name and then deploy it to different namespace as helm install stable/prometheus-adapter --namespace namespace-name
With helm 3 since there is no --namespace flag you need to delete the existing deployment and then redeploy it to a different namespace as below example to deploy metrics server.
$ helm install metricserver stable/metrics-server
Error: the namespace from the provided object "kube-system" does not match the namespace "default". You must pass '--namespace=kube-system' to perform this operation.
$ helm install metricserver stable/metrics-server --namespace=kube-system
Error: the namespace from the provided object "kube-system" does not match the namespace "default". You must pass '--namespace=kube-system' to perform this operation.
$ kubectl config set-context kube-system --cluster=kubernetes --user=kubernetes-admin --namespace=kube-system
Context "kube-system" created.
$ kubectl config use-context kube-system
Switched to context "kube-system".
$ kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
* kube-system kubernetes kubernetes-admin kube-system
kubernetes-admin#kubernetes kubernetes kubernetes-admin
metallb kubernetes kubernetes-admin metallb
nfstorage kubernetes kubernetes-admin nfstorage
$ helm install metricserver stable/metrics-server
NAME: metricserver
LAST DEPLOYED: 2019-05-26 14:37:45.582245559 -0700 PDT m=+2.942929639
NAMESPACE: kube-system
STATUS: deployed
For helm 2 you can install the chart in any namespace you want by using:
helm install stable/prometheus-adapter --name my-release --namespace foo
Keep in mind that you need to remove the previous one.
This can be done using helm delete --purge my-release
Also there is a really nice article regarding changes in Helm3 Breaking Changes in Helm 3 (and How to Fix Them).

How to automate Let's Encrypt certificate renewal in Kubernetes with cert-manager on a bare-metal cluster?

I would like to access my Kubernetes bare-metal cluster with an exposed Nginx Ingress Controller for TLS termination. To be able to automate certificate renewal, I would like to use the Kubernetes addon cert-manager, which is kube-lego's successor.
What I have done so far:
Set up a Kubernetes (v1.9.3) cluster on bare-metal (1 master, 1 minion, both running Ubuntu 16.04.4 LTS) with kubeadm and flannel as pod network following this guide.
Installed nginx-ingress (chart version 0.9.5) with Kubernetes package manager helm
helm install --name nginx-ingress --namespace kube-system stable/nginx-ingress --set controller.hostNetwork=true,rbac.create=true,controller.service.type=ClusterIP
Installed cert-manager (chart version 0.2.2) with helm
helm install --name cert-manager --namespace kube-system stable/cert-manager --set rbac.create=true
The Ingress Controller is exposed successfully and works as expected when I test with an Ingress resource. For proper Let's Encrypt certificate management and automatic renewal with cert-manager I do first of all need an Issuer resource. I created it from this acme-staging-issuer.yaml:
apiVersion: certmanager.k8s.io/v1alpha1
kind: Issuer
metadata:
name: letsencrypt-staging
namespace: default
spec:
acme:
server: https://acme-staging.api.letsencrypt.org/directory
email: email#example.com
privateKeySecretRef:
name: letsencrypt-staging
http01: {}
kubectl create -f acme-staging-issuer.yaml runs successfully but kubectl describe issuer/letsencrypt-staging gives me:
Status:
Acme:
Uri:
Conditions:
Last Transition Time: 2018-03-05T21:29:41Z
Message: Failed to verify ACME account: Get https://acme-staging.api.letsencrypt.org/directory: tls: oversized record received with length 20291
Reason: ErrRegisterACMEAccount
Status: False
Type: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ErrVerifyACMEAccount 1s (x11 over 7s) cert-manager-controller Failed to verify ACME account: Get https://acme-staging.api.letsencrypt.org/directory: tls: oversized record received with length 20291
Warning ErrInitIssuer 1s (x11 over 7s) cert-manager-controller Error initializing issuer: Get https://acme-staging.api.letsencrypt.org/directory: tls: oversized record received with length 20291
Without a ready Issuer, I can not proceed to generate cert-manager Certificates or utilse the ingress-shim (for automatic renewal).
What am I missing in my setup? Is it sufficient to expose the ingress controller using hostNetwork=true or is there a better way to expose the its ports 80 and 443 on a bare-metal cluster? How can I resolve tls: oversized record received error when creating a cert-manager Issuer resource?
The tls: oversized record received error was caused by a misconfigured /etc/resolv.conf of the Kubernetes minion. It could be resolved by editing it like this:
$ sudo vi /etc/resolvconf/resolv.conf.d/base
Add nameserver list:
nameserver 8.8.8.8
nameserver 8.8.4.4
Update resolvconf:
$ sudo resolvconf -u