How to automate Let's Encrypt certificate renewal in Kubernetes with cert-manager on a bare-metal cluster? - kubernetes

I would like to access my Kubernetes bare-metal cluster with an exposed Nginx Ingress Controller for TLS termination. To be able to automate certificate renewal, I would like to use the Kubernetes addon cert-manager, which is kube-lego's successor.
What I have done so far:
Set up a Kubernetes (v1.9.3) cluster on bare-metal (1 master, 1 minion, both running Ubuntu 16.04.4 LTS) with kubeadm and flannel as pod network following this guide.
Installed nginx-ingress (chart version 0.9.5) with Kubernetes package manager helm
helm install --name nginx-ingress --namespace kube-system stable/nginx-ingress --set controller.hostNetwork=true,rbac.create=true,controller.service.type=ClusterIP
Installed cert-manager (chart version 0.2.2) with helm
helm install --name cert-manager --namespace kube-system stable/cert-manager --set rbac.create=true
The Ingress Controller is exposed successfully and works as expected when I test with an Ingress resource. For proper Let's Encrypt certificate management and automatic renewal with cert-manager I do first of all need an Issuer resource. I created it from this acme-staging-issuer.yaml:
apiVersion: certmanager.k8s.io/v1alpha1
kind: Issuer
metadata:
name: letsencrypt-staging
namespace: default
spec:
acme:
server: https://acme-staging.api.letsencrypt.org/directory
email: email#example.com
privateKeySecretRef:
name: letsencrypt-staging
http01: {}
kubectl create -f acme-staging-issuer.yaml runs successfully but kubectl describe issuer/letsencrypt-staging gives me:
Status:
Acme:
Uri:
Conditions:
Last Transition Time: 2018-03-05T21:29:41Z
Message: Failed to verify ACME account: Get https://acme-staging.api.letsencrypt.org/directory: tls: oversized record received with length 20291
Reason: ErrRegisterACMEAccount
Status: False
Type: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ErrVerifyACMEAccount 1s (x11 over 7s) cert-manager-controller Failed to verify ACME account: Get https://acme-staging.api.letsencrypt.org/directory: tls: oversized record received with length 20291
Warning ErrInitIssuer 1s (x11 over 7s) cert-manager-controller Error initializing issuer: Get https://acme-staging.api.letsencrypt.org/directory: tls: oversized record received with length 20291
Without a ready Issuer, I can not proceed to generate cert-manager Certificates or utilse the ingress-shim (for automatic renewal).
What am I missing in my setup? Is it sufficient to expose the ingress controller using hostNetwork=true or is there a better way to expose the its ports 80 and 443 on a bare-metal cluster? How can I resolve tls: oversized record received error when creating a cert-manager Issuer resource?

The tls: oversized record received error was caused by a misconfigured /etc/resolv.conf of the Kubernetes minion. It could be resolved by editing it like this:
$ sudo vi /etc/resolvconf/resolv.conf.d/base
Add nameserver list:
nameserver 8.8.8.8
nameserver 8.8.4.4
Update resolvconf:
$ sudo resolvconf -u

Related

cert-manager fails to create certificate in K8s cluster with Istio and LetsEncrypt

I have a Kubernetes cluster with Istio installed and I want to secure the gateway with TLS using cert-manager.
So, I deployed a cert-manager, issuer and certificate as per this tutorial: https://github.com/tetratelabs/istio-weekly/blob/main/istio-weekly/003/demo.md
(to a cluster reachable via my domain)
But, the TLS secret does not get created - only what seems to be a temporary one with a random string appended: my-domain-com-5p8rd
The cert-manager Pod has these 2 lines spammed in the logs:
W0208 19:30:20.548725 1 reflector.go:424] k8s.io/client-go#v0.26.0/tools/cache/reflector.go:169: failed to list *v1.Challenge: the server could not find the requested resource (get challenges.acme.cert-manager.io)
E0208 19:30:20.548785 1 reflector.go:140] k8s.io/client-go#v0.26.0/tools/cache/reflector.go:169: Failed to watch *v1.Challenge: failed to list *v1.Challenge: the server could not find the requested resource (get challenges.acme.cert-manager.io)
Now, I don't understand why it's trying to reach "challenges.acme.cert-manager.io", because my Issuer resource has spec.acme.server: https://acme-v02.api.letsencrypt.org/directory
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: letsencrypt-prod
namespace: istio-system
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: removed#my.domain.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- selector: {}
http01:
ingress:
class: istio
then
kubectl get certificate -A
shows the certificate READY = False
kubectl describe certificaterequest -A
returns
Status:
Conditions:
Last Transition Time: 2023-02-08T18:09:55Z
Message: Certificate request has been approved by cert-manager.io
Reason: cert-manager.io
Status: True
Type: Approved
Last Transition Time: 2023-02-08T18:09:55Z
Message: Waiting on certificate issuance from order istio-system/my--domain-com-jl6gm-3167624428: "pending"
Reason: Pending
Status: False
Type: Ready
Events: <none>
notes:
The cluster does not have a Load Balancer, so I expose the
ingress-gateway with nodePort(s).
accessing the https://my.domain.com/.well-known/acme-challenge/
cluster is installed on Kubeadm
cluster networking is done via Calico
http01 challenge
Thanks.
Figured this out.
Turns out, the 'get challenges.acme.cert-manager.io' is not a HTTP get, but rather a resource GET within K8s cluster.
There is 'challenges.acme.cert-manager.io' CustomResourceDefinition in cert-manager.yml
Running this command
kubectl get crd -A
returns a list of all CustomResourceDefinitions, but this one was missing.
I copied it out from cert-manager.yml to separate file and applied it manually - suddenly the challenge got created and so did the secret.
Why it didn't get applied with everything else in cert-manager.yml is beyond me.

ApiVersions missing in updated Cert-manager. Cert-manager Conversion webhook for cert-manager.io/v1alpha2, Kind=ClusterIssuer failed

When I am trying to get cluster issuers in my k8s cluster, I am receiving this error message. Can someone help me troubleshoot this?
kubectl get cluster issuers
Output: Error from server: conversion webhook for cert-manager.io/v1alpha2, Kind=ClusterIssuer failed: an error on the server ("") has prevented the request from succeeding
Here's my clusterissuer.yml file:
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
name: letsencrypt-dev-certificate
namespace: cert-manager
spec:
acme:
# The ACME server URL
server: https://acme-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: my.email#org.co
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-dev-certificate
# Enable the HTTP-01 challenge provider
solvers:
- http01:
ingress:
class: nginx
Here's the kubernetes version:
kubectl version --short
Output: Client Version: v1.23.1
Server Version: v1.22.12-gke.2300
Initially, I thought the problem was when I was trying to get ClusterIssuer or Certificates in Cert-Manager. But later after more troubleshooting, I found that after Cert-Manager 1.5.5, we cannot directly update to Cert-Manager 1.9.1 since some CRD resource gets missed out in between hence causing issue in getting resources like Certificates, ClusterIssuer which are defined as CRDs in Cert-Manager package.
Solution: I downgraded my cert-manager back to 1.5.5 then followed the steps from Cert-Manager's blog on making CRDs ready for Cert-Manager update. Then, updated Cert-Manager back to 1.9.1. And everything was working fine. Here're the steps that I followed:
Downgrade back to Cert-Manager 1.5.5:
kaf https://github.com/cert-manager/cert-manager/releases/download/v1.5.5/cert-manager.yaml
Install cmctl for cert-manager:
curl -sSL -o cmctl.tar.gz https://github.com/cert-manager/cert-manager/releases/download/v1.9.1/cmctl-linux-amd64.tar.gz
tar xzf cmctl.tar.gz
mv cmctl /usr/local/bin
cmctl upgrade --help
Upgrade api-versions before cert-manager updates:
cmctl upgrade migrate-api-version
Now, its safe to upgrade cert-manager to 1.7.1
4. Upgrade to Cert-Manager 1.7.1 or 1.9.1:
kaf https://github.com/cert-manager/cert-manager/releases/download/v1.7.2/cert-manager.yaml
kaf https://github.com/cert-manager/cert-manager/releases/download/v1.7.2/cert-manager.yaml
Verify by fetching the CRD resources:
kubectl get clusterissuer
kubectl get certificates

How can I fix 'failed calling webhook "webhook.cert-manager.io"'?

I'm trying to set up a K3s cluster. When I had a single master and agent setup cert-manager had no issues. Now I'm trying a 2 master setup with embedded etcd. I opened TCP ports 6443 and 2379-2380 for both VMs and did the following:
VM1: curl -sfL https://get.k3s.io | sh -s server --token TOKEN --cluster-init
VM2: curl -sfL https://get.k3s.io | sh -s server --token TOKEN --server https://MASTER_IP:6443
# k3s kubectl get nodes
NAME STATUS ROLES AGE VERSION
VM1 Ready control-plane,etcd,master 130m v1.22.7+k3s1
VM2 Ready control-plane,etcd,master 128m v1.22.7+k3s1
Installing cert-manager works fine:
# k3s kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.0/cert-manager.yaml
# k3s kubectl get pods --namespace cert-manager
NAME READY STATUS
cert-manager-b4d6fd99b-c6fpc 1/1 Running
cert-manager-cainjector-74bfccdfdf-gtmrd 1/1 Running
cert-manager-webhook-65b766b5f8-brb76 1/1 Running
My manifest has the following definition:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: info#example.org
privateKeySecretRef:
name: letsencrypt-account-key
solvers:
- selector: {}
http01:
ingress: {}
Which results in the following error:
# k3s kubectl apply -f manifest.yaml
Error from server (InternalError): error when creating "manifest.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": context deadline exceeded
I tried disabling both firewalls, waiting a day, reset and re-setup, but the error persists. Google hasn't been much help either. The little info I can find goes over my head for the most part and no tutorial seems to do any extra steps.
Try to specify the proper ingress class name in your Cluster Issuer, like this:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: info#example.org
privateKeySecretRef:
name: letsencrypt-account-key
solvers:
- http01:
ingress:
class: nginx
Also, make sure that you have the cert manager annotation and the tls secret name specified in your Ingress like this:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: letsencrypt
...
spec:
tls:
- hosts:
- domain.com
secretName: letsencrypt-account-key
A good starting point for troubleshooting issues with the webhook can be found int the docs, e.g. there is a section for problems on GKE private clusters.
In my case, however, this didn't really solve the problem. For me the issue was that when I played around with cert-manager I happen to install and uninstall it multiple times. It turned out that just removing the namespace, e.g. kubectl delete namespace cert-manager didn't remove the webhooks and other non-obvious resources.
Following the official guide for uninstalling cert-manager and applying the manifests again solved the issue.
I do this it, and work for me.
helm install
cert-manager jetstack/cert-manager
--namespace cert-manager
--create-namespace
--version v1.8.0
--set webhook.securePort=10260
source: https://hackmd.io/#maelvls/debug-cert-manager-webhook

Nginx Ingress: service "ingress-nginx-controller-admission" not found

We created a kubernetes cluster for a customer about one year ago with two environments; staging and production separated in namespaces. We are currently developing the next version of the application and need an environment for this development work, so we've created a beta environment in its own namespace.
This is a bare metal kubernetes cluster with MetalLB and and nginx-ingress. The nginx ingress controllers is installed with helm and the ingresses are created with the following manifest (namespaces are enforced by our deployment pipeline and are not visible in the manifest):
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: api-ingress
annotations:
#ingress.kubernetes.io/ssl-redirect: "true"
#kubernetes.io/tls-acme: "true"
#certmanager.k8s.io/issuer: "letsencrypt-staging"
#certmanager.k8s.io/acme-challenge-type: http01
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/configuration-snippet: |
more_set_headers "X-Robots-Tag: noindex, nofollow";
nginx.ingress.kubernetes.io/enable-cors: "true"
nginx.ingress.kubernetes.io/cors-allow-methods: "GET, OPTIONS"
nginx.ingress.kubernetes.io/cors-allow-origin: "*"
nginx.ingress.kubernetes.io/cors-allow-credentials: "true"
spec:
tls:
- hosts:
- ${API_DOMAIN}
secretName: api-cert
rules:
- host: ${API_DOMAIN}
http:
paths:
- backend:
serviceName: api
servicePort: 80
When applying the manifest kubernetes responds with the following error:
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://ingress-nginx-controller-admission.ingress-nginx.svc:443/extensions/v1beta1/ingresses?timeout=30s: service "ingress-nginx-controller-admission" not found
I've attempted to update the apiVersion of the ingress manifest to networking.k8s.io/v1beta1 (this is the apiVersion the new nginx-ingress controllers are installed with via helm), but I'm getting the same error.
My initial suspicion is that this is related to a change in the nginx-ingress between the current installation and the installation from one year ago, even if the ingress controllers are separated by namespaces. But i cant find any services called ingress-nginx-controller-admission in any of my namespaces, so I'm clueless how to proceed.
I had the same problem and found a solution from another SO thread.
I had previously installed nginx-ingress using the manifests. I deleted the namespace it created, and the clusterrole and clusterrolebinding as noted in the documentation, but that does not remove the ValidatingWebhookConfiguration that is installed in the manifests, but NOT when using helm by default. As Arghya noted above, it can be enabled using a helm parameter.
Once I deleted the ValidatingWebhookConfiguration, my helm installation went flawlessly.
kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission
You can check if there is a validation webhook and a service. If they don't exist double check the deployment and add these.
kubectl get -A ValidatingWebhookConfiguration
NAME CREATED AT
ingress-nginx-admission 2020-04-22T15:01:33Z
kubectl get svc -n ingress-nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-nginx-controller NodePort 10.96.212.217 <none> 80:32268/TCP,443:32683/TCP 2m34s
ingress-nginx-controller-admission ClusterIP 10.96.151.42 <none> 443/TCP 2m34s
Deployment yamls here have the webhook and service.
Since you have used helm to install it you can enable/disable the webhook via a helm parameter as defined here
There is some issue with SSL cert it seems in the webhook.
Chaning failurePolicy: Fail to Ignore worked for me in the
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-0.32.0/deploy/static/provider/baremetal/deploy.yaml
for more info check:
https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/
my problem is proven to be a ssl cert issue. after I delete"ValidatingWebhookConfiguration",
the issue is resolved
For me issue was with Kubernetes version 1.18 and I upgraded to 1.19.1 and it worked just fine.
Pod status
k get pods -n ingress-nginx
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-cgpj7 0/1 Completed 0 3m44s
ingress-nginx-admission-patch-mksxs 0/1 Completed 0 3m44s
ingress-nginx-controller-5fb6f67b9c-ps67k 0/1 CrashLoopBackOff 5 3m45s
Error logs from pod
I0916 07:15:34.317477 8 main.go:104] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem"
F0916 07:15:34.318721 8 main.go:107] ingress-nginx requires Kubernetes v1.19.0 or higher
k get po -n ingress-nginx
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-2tk8p 0/1 Completed 0 104s
ingress-nginx-admission-patch-nlv5w 0/1 Completed 0 104s
ingress-nginx-controller-79c4d49bb9-7bgcj 1/1 Running 0 105s
I faced this issue when working on a Kubernetes cluster.
The issue arose when I was migrating resources from one nodepool to another nodepool in a test Kubernetes Cluster.
I forgot that I had not migrated out the Nginx ingress and the Cert Manager out of the noodpool that I wanted to decommission. So after migrating other applications out of the noodpool that I wanted to decommission I deleted the noodpool, which consequently deleted Nginx ingress and the Cert Manager from the Kubernetes Cluster.
All I had to do was to redeploy the Nginx ingress and the Cert Manager to the new noodpool.

Cert-manager order is in invalid state

I’m migrating from a GitLab managed Kubernetes cluster to a self managed cluster. In this self managed cluster need to install nginx-ingress and cert-manager. I have already managed to do the same for a cluster used for review environments. I use the latest Helm3 RC to managed this, so I won’t need Tiller.
So far, I ran these commands:
# Add Helm repos locally
helm repo add stable https://kubernetes-charts.storage.googleapis.com
helm repo add jetstack https://charts.jetstack.io
# Create namespaces
kubectl create namespace managed
kubectl create namespace production
# Create cert-manager crds
kubectl apply --validate=false -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.11/deploy/manifests/00-crds.yaml
# Install Ingress
helm install ingress stable/nginx-ingress --namespace managed --version 0.26.1
# Install cert-manager with a cluster issuer
kubectl apply -f config/production/cluster-issuer.yaml
helm install cert-manager jetstack/cert-manager --namespace managed --version v0.11.0
This is my cluster-issuer.yaml:
# Based on https://docs.cert-manager.io/en/latest/reference/issuers.html#issuers
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: XXX # This is an actual email address in the real resource
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- selector: {}
http01:
ingress:
class: nginx
I installed my own Helm chart named docs. All resources from the Helm chart are installed as expected. Using cURL, I can fetch the page over HTTP. Google Chrome redirects me to an HTTPS page with an invalid certificate though.
The additional following resources have been created:
$ kubectl get secrets
NAME TYPE DATA AGE
docs-tls kubernetes.io/tls 3 18m
$ kubectl get certificaterequests.cert-manager.io
NAME READY AGE
docs-tls-867256354 False 17m
$ kubectl get certificates.cert-manager.io
NAME READY SECRET AGE
docs-tls False docs-tls 18m
$ kubectl get orders.acme.cert-manager.io
NAME STATE AGE
docs-tls-867256354-3424941167 invalid 18m
It appears everything is blocked by the cert-manager order in an invalid state. Why could it be invalid? And how do I fix this?
It turns out that in addition to a correct DNS A record for #, there were some AAAA records that pointed to an IPv6 address I don’t know. Removing those records and redeploying resolved the issue for me.