cert-manager: let's encrypt refuses ACME account - kubernetes

I followed the cert-manager tutorial to enable tls in my k3s cluster. So I modified the letsencrypt-staging issuer file to look like this:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
# The ACME server URL
server: https://acme-staging-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: mail#example.com
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-staging
# Enable the HTTP-01 challenge provider
solvers:
- http01:
ingress:
class: traefik
but when I deploy it, I get the error Failed to verify ACME account: Get "https://acme-staging-v02.api.letsencrypt.org/directory": read tcp 10.42.0.96:45732->172.65.46.172:443: read: connection reset by peer. But thats only with the staging clusterIssuer. The production example from te tutorial works flawlessly. I resacherd this error and it seems to be somthing with the kubernetes dns but I don't know how to test the dns or any other way to figure this error out.
Tested the kubernetes DNS and it is up and running, so it must be an error with cert-manager,especially because the prod certificates status says `Ready=True

So it seems like I ran into a let's encrypt limit. After waiting for a day, the certificate now works

Related

cert-manager fails to create certificate in K8s cluster with Istio and LetsEncrypt

I have a Kubernetes cluster with Istio installed and I want to secure the gateway with TLS using cert-manager.
So, I deployed a cert-manager, issuer and certificate as per this tutorial: https://github.com/tetratelabs/istio-weekly/blob/main/istio-weekly/003/demo.md
(to a cluster reachable via my domain)
But, the TLS secret does not get created - only what seems to be a temporary one with a random string appended: my-domain-com-5p8rd
The cert-manager Pod has these 2 lines spammed in the logs:
W0208 19:30:20.548725 1 reflector.go:424] k8s.io/client-go#v0.26.0/tools/cache/reflector.go:169: failed to list *v1.Challenge: the server could not find the requested resource (get challenges.acme.cert-manager.io)
E0208 19:30:20.548785 1 reflector.go:140] k8s.io/client-go#v0.26.0/tools/cache/reflector.go:169: Failed to watch *v1.Challenge: failed to list *v1.Challenge: the server could not find the requested resource (get challenges.acme.cert-manager.io)
Now, I don't understand why it's trying to reach "challenges.acme.cert-manager.io", because my Issuer resource has spec.acme.server: https://acme-v02.api.letsencrypt.org/directory
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: letsencrypt-prod
namespace: istio-system
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: removed#my.domain.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- selector: {}
http01:
ingress:
class: istio
then
kubectl get certificate -A
shows the certificate READY = False
kubectl describe certificaterequest -A
returns
Status:
Conditions:
Last Transition Time: 2023-02-08T18:09:55Z
Message: Certificate request has been approved by cert-manager.io
Reason: cert-manager.io
Status: True
Type: Approved
Last Transition Time: 2023-02-08T18:09:55Z
Message: Waiting on certificate issuance from order istio-system/my--domain-com-jl6gm-3167624428: "pending"
Reason: Pending
Status: False
Type: Ready
Events: <none>
notes:
The cluster does not have a Load Balancer, so I expose the
ingress-gateway with nodePort(s).
accessing the https://my.domain.com/.well-known/acme-challenge/
cluster is installed on Kubeadm
cluster networking is done via Calico
http01 challenge
Thanks.
Figured this out.
Turns out, the 'get challenges.acme.cert-manager.io' is not a HTTP get, but rather a resource GET within K8s cluster.
There is 'challenges.acme.cert-manager.io' CustomResourceDefinition in cert-manager.yml
Running this command
kubectl get crd -A
returns a list of all CustomResourceDefinitions, but this one was missing.
I copied it out from cert-manager.yml to separate file and applied it manually - suddenly the challenge got created and so did the secret.
Why it didn't get applied with everything else in cert-manager.yml is beyond me.

Waiting for HTTP-01 challenge propagation: wrong status code '404', expected '200'

I just set up cert-manager on Kubernetes GCP but when I check my logs I get this error:
cert-manager/challenges "msg"="propagation check failed" "error"="wrong status code '404', expected '200'" "dnsName"="api.lumiwealth.com" "resource_kind"="Challenge" "resource_name"="test-certificate-h4m8c-1804713970-576085961" "resource_namespace"="backend" "resource_version"="v1" "type"="HTTP-01"
From what I can tell the issue is that the ingress that gets created does not have access to the external internet, I confirmed this by running this in Terminal:
curl http://api.lumiwealth.com/.well-known/acme-challenge/vhoLg-lNAgXAwEJlknfBbRlYuKuHBakgeG_d40c09Zk
Which returns:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>Cannot GET /.well-known/acme-challenge/vhoLg-lNAgXAwEJlknfBbRlYuKuHBakgeG_d40c09Zk</pre>
</body>
</html>
Here are my YAML files:
Issuer:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
email: "rob#lumiwealth.com"
privateKeySecretRef:
name: letsencrypt-prod
server: "https://acme-v02.api.letsencrypt.org/directory"
solvers:
- http01:
ingress:
class: ingress-gce
Test certificate:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: test-certificate
namespace: backend
spec:
secretName: certificate-test
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- api.lumiwealth.com
When I kubectl apply the certificate it creates an ingress in GCP that looks like this (but doesn't seem to have network access? not sure how it could have possibly gotten the IP address from my DNS)
Any ideas what I'm missing?
I believe the issue is a routing issue rather than a network issue.
When you query
curl http://api.lumiwealth.com/.well-known/acme-challenge/vhoLg-lNAgXAwEJlknfBbRlYuKuHBakgeG_d40c09Zk
This does indeed work and can exit the cluster and the broader internet. What that query does is that then it tries to access the challenge file within the cluster.
Would you kindly check for the values of
kubectl -A get challenges
To make sure that there is only one set of challenges?, if there are more you may want to delete everything and start over.
So all you have to do is modify your ingress routes to capture the route
.well-known/acme-challenge/*
This has to be routed to the ACME solver pod/service within your cluster.
The basic troubleshooting steps for https01 Docs
You can access the URL from the public internet
The ACME solver pod is up and running
Use kubectl describe ingress to check the status of the HTTP01 solver ingress. (unless you use acme.cert-manager.io/http01-edit-in-place, then check the same ingress as your domain)

Can get TLS certificates from cert-manager/letsencrypt for either testing or production enviroments in kubernetes, but not both

I wrote a bash script to automate the deployment of an application in a kubernetes cluster using helm and kubectl. I use cert-manager to automate issuing and renewing of TLS certificates, obtained by letsencrypt, needed by the application itself.
The script can deploy the application in either one of many environments such as testing (test) and production (prod) using different settings and manifests as needed. For each environment I create a separate namespace and deploy the needed resources in it. In production I use the letsencrypt production server (spec.acme.server: https://acme-v02.api.letsencrypt.org/directory) whereas, in any other env such as testing, I use the staging server (spec.acme.server: https://acme-staging-v02.api.letsencrypt.org/directory). The hostnames I request the certificates for are a different set depending on the environment: xyz.test.mysite.tld in testing vs xyz.mysite.tld in production. I provide the same contact e-mail address for all environments.
Here the full manifest of the letsencrypt issuer for testing:
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: letsencrypt-staging
spec:
acme:
email: operations#mysite.tld
server: https://acme-staging-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-staging-issuer-private-key
solvers:
- http01:
ingress:
class: public-test-it-it
And here the full manifest of the letsencrypt issuer for production:
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: letsencrypt-production
spec:
acme:
email: operations#mysite.tld
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-production-issuer-private-key
solvers:
- http01:
ingress:
class: public-prod-it-it
When I deploy the application the first time, either in test or prod environements, everything works as expected, and cert-manager gets the TLS certificates signed by letsencrypt (staging or production server respectively) and stored in secrets.
But when I deploy the application in another environment (so that I have both test and prod running in parallel), cert-manager can't get the certificates signed anymore, and the chain certificaterequest->order->challenge stops at the challenge step with the following output:
kubectl describe challenge xyz-tls-certificate
...
Status:
Presented: true
Processing: true
Reason: Waiting for HTTP-01 challenge propagation: wrong status code '404', expected '200'
State: pending
Events: <none>
and I can verify that indeed I get a 404 when trying to curl any of the challenges' URLs:
curl -v http://xyz.test.mysite.tld/.well-known/acme-challenge/IECcFDmQF_fzGKcA9hJvFGEWRjDCAE_fs8dnBXlr_wY
* Trying vvv.xxx.yyy.zzz:80...
* Connected to xyz.test.mysite.tld (vvv.xxx.yyy.zzz) port 80 (#0)
> GET /.well-known/acme-challenge/IECcFDmQF_fzGKcA9hJvFGEWRjDCAE_fs8dnBXlr_wY HTTP/1.1
> Host: xyz.test.mysite.tld
> User-Agent: curl/7.74.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 404 Not Found
< date: Thu, 21 Jul 2022 09:48:08 GMT
< content-length: 21
< content-type: text/plain; charset=utf-8
<
* Connection #0 to host xyz.test.mysite.tld left intact
default backend - 404
So letsencrypt can't access the challenges' URLs and won't sign the TLS certs.
I tried to debug the 404 error and found that I can successfully curl the pods and services backing the challenges from another pod running in the cluster/namespace, but I get 404s from the outside world. This seems like an issue with the ingress controller (HAProxytech/kubernetes-ingress in my case), but I can't explain why the mechanism worked upon first deployment and then not anymore..
I inspected the cert-manager logs and found lines such:
kubectl logs -n cert-manager cert-manager-...
I0721 13:27:45.517637 1 ingress.go:99] cert-manager/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="xyz.test.mysite.tld" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-8668s" "related_resource_namespace"="app-test-it-it" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="xyz-tls-certificate-hwvjf-2516368856-1193545890" "resource_namespace"="app-test-it-it" "resource_version"="v1" "type"="HTTP-01"
E0721 13:27:45.527238 1 sync.go:186] cert-manager/challenges "msg"="propagation check failed" "error"="wrong status code '404', expected '200'" "dnsName"="xyz.test.mysite.tld" "resource_kind"="Challenge" "resource_name"="xyz-tls-certificate-hwvjf-2516368856-1193545890" "resource_namespace"="app-test-it-it" "resource_version"="v1" "type"="HTTP-01"
which seems to confirm that cert-manager could self-check, from within the cluster, that the challenges' URLs are in place, but those are not reachable by the outside world (propagation check failed).
It seems like cert-manager set-up challenges' pods/services/ingresses all right, but then requests sent to the challenges' URLs are not routed to the backing pods/services. And this only the second time I try to deploy the app..
I also verified that, after issuing the certificates upon the first deployment, cert-manager (correctly) removed all related pods/services/ingresses from the related namespace, so there should not be any conflict from duplicated challenges' resources.
I restate here that the certificates are issued flawlessly the first time I deploy the application, either in test or prod environment, but they won't be issued anymore if I deploy the app again in a different environment.
Any idea why this is the case?
I finally found out what the issue was..
Basically, I was installing a separate HAProxy ingress controller (haproxytech/kubernetes-ingress) per environment (test/prod), and therefore each namespace had its own ingress controller which I referenced in my manifests.
This should have worked in principle, but it turned out cert-manager could not reference the right ingress-controller upon setting up the letsencrypt challenges.
The solution consisted in creating a single HAproxy ingress controller (in its own separate namespace) to serve the whole cluster and be referenced by all other environments/namespaces. This way the challenges for both testing and production environment where correctly set-up by cert-manager and verified by letsencrypt, which signed the required certificates.
In the end I highly recommend using a single HAproxy ingress controller per cluster, installed in its own namespace.
This configuration is less redundant and eliminates potential issues such as the one I faced.

Letsencrypt/Cert Manager workflow for apps served through Istio VirtualService/Gateway

Is there a common (or any) workflow to issue and renew LE certificates for apps configured in an Istio VirtualService & Gateway? The Istio docs only cover an Ingress use case, and I don't think it covers handling renewals.
My real world use case is about making this work with a wildcard cert and custom applications, but for the sake of simplicity, I want to figure this out using the Prometheus service installed with the Istio demo. The VirtualService and Gateway are necessary for my real world use case.
Here is how I am currently serving Prometheus over https with a self-signed cert. I am running Istio version 1.5.2 on GKE K8s version 1.15.11. Cert Manager is installed as well.
So how would I adapt this to use Cert Manager for issuing and renewing an LE cert for prom.example.com?
---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: prometheus-gateway
#namespace: istio-system
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 443
name: http-prom
protocol: HTTPS
hosts:
- "prom.example.com"
tls:
mode: SIMPLE # enables HTTPS on this port
serverCertificate: /etc/istio/ingressgateway-certs/tls.crt
privateKey: /etc/istio/ingressgateway-certs/tls.key
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: prometheus-vs
spec:
hosts:
- "prom.example.com"
gateways:
- prometheus-gateway
http:
- match:
- port: 443
route:
- destination:
host: prometheus
port:
number: 9090
TL;DR
Configure cert-manager with DNS domain verification to issue certificate, renewal is handled automatically.
Few notes on the example in Istio docs that hopefully will clarify the workflow:
cert-manager knows nothing about Istio, it is key role is to issue and renew certificate then save them to a secret object in kubernetes.
LE ACME verification is typically done with DNS e.g. AWS Route53
Issued Certificate secret will be in a specific k8s namespace and not visible outside that.
Istio knows nothing about cert-manager, all what it needs is the issued certificate secrets which is configured in the gateway with SDS. This means two things:
The name of the SDS secret must match the one cert-manager produces (this is the only link between them)
The secret must be in the same namespace where the Istio gateway will be.
Finally, your VirtualServices just need a gateway that is configured properly as above. The good news is, VirtualService can link to gateway in any namespace if you used the full qualified name.
So you can have your gateway(s) in the same namespace where you issue the Certificate object to avoid copying secrets around, then your VirtualServices can be in any namespace just use the full gateway name.
There is an example for this in istio documentation:
This example demonstrates the use of Istio as a secure Kubernetes Ingress controller with TLS certificates issued by Let’s Encrypt.
You will start with a clean Istio installation, create an example service, expose it using the Kubernetes Ingress resource and get it secured by instructing cert-manager (bundled with Istio) to manage issuance and renewal of TLS certificates that will be further delivered to the Istio ingress gateway and hot-swapped as necessary via the means of Secrets Discovery Service (SDS).
Hope it helps.

No matches for kind ClusterIssuer on a Digital Ocean Kubernetes Cluster

I have been following this guide to create an nginx-ingress which works fine.
Next I want to create a ClusterIssuer object called letsencrypt-staging, and use the Let's Encrypt staging server but get this error.
kubectl create -f staging_issuer.yaml
error: unable to recognize "staging_issuer.yaml": no matches for kind
"ClusterIssuer" in version "certmanager.k8s.io/v1alpha1"
I have searched for solutions but can't find anything that works for me or that I can understand. What I found is mostly bug reports.
Here is my yaml file I used to create the ClusterIssuer.
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
# The ACME server URL
server: https://acme-staging-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: your_email_address_here
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-staging
# Enable the HTTP-01 challenge provider
http01: {}
Try following this link, the cert-manager LetsEncrypt has notified that it will be blocking all traffic for versions < 0.8.0, Hence you can use Jetstack's installation steps, and then you can follow
this link to get TLS certificates creation, It has worked for me.
Let me know, if you face issues
I fixed the problem by running helm del --purge cert-manager
and then
helm install --name cert-manager --namespace kube-system stable/cert-manager --set createCustomResource=true
Sometimes is the space character used in the .yaml file. Ensure that you are not using tabs instead of spaces. You can delete the line in the "Kind" (or any that is showing error) and write again using space bar to separate, not tabs.