cert-manager HTTP01 certificate challenge is inaccessible when rewrite-target is enabled - kubernetes

We have a dozen of services exposed using a ingress-nginx controller in GKE.
In order to route the traffic correctly on the same domain name, we need to use a rewrite-target rule.
The services worked well without any maintenance since their launch in 2019, that is until recently; when cert-manager suddenly stopped renewing the Let's Encrypt certificates, we "resolved" this by temporarily removing the "tls" section from the ingress definition, forcing our clients to use the http version.
After that we removed all traces of cert-manager attempting to set it up from scratch.
Now, the cert-manager is creating the certificate signing request, spawns an acme http solver pod and adds it to the ingress, however upon accessing its url I can see that it returns an empty response, and not the expected token.
This has to do with the rewrite-target annotation that messes up the routing of the acme challenge.
What puzzles me the most, is that this used to work before. (It was set up by a former employee)
Disabling rewrite-target is unfortunately not an option, because it will stop the routing from working correctly.
Using dns01 won't work because our ISP does not support programmatic changes of the DNS records.
Is there a way to make this work without disabling rewrite-target?
P.S.
Here's a number of similar cases reported on Github:
https://github.com/cert-manager/cert-manager/issues/2826
https://github.com/cert-manager/cert-manager/issues/286
https://github.com/cert-manager/cert-manager/issues/487
None of them help.
Here's the definition of my ClusterIssuer
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
# The ACME server URL
server: https://acme-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: mail#domain.com
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-prod
# Enable the HTTP-01 challenge provider
solvers:
- http01:
ingress:
class: nginx

Please share the cluster issuer or issue you are using.
ingressClass
If the ingressClass field is specified, cert-manager will create
new Ingress resources in order to route traffic to the
acmesolver pods, which are responsible for responding to ACME
challenge validation requests.
Ref : https://cert-manager.io/v0.12-docs/configuration/acme/http01/#ingressclass
Mostly we don't see the HTTP solver challenge it comes and get removed if DNS or HTTP working fine.
Also, make sure your ingress doesn't have SSL-redirect annotation that could be also once reason behind certs not getting generated.
Did you try checking the other object of cert-manager like order and certificate status request ? kubectl describe challenge are you getting 404 there ?
If you are trying continuously there could be chance you hit rate limit of let's encrypt to request generating certificates.
Troubleshooting : https://cert-manager.io/docs/faq/troubleshooting/#troubleshooting-a-failed-certificate-request

When you configure an Issuer with http01, the default serviceType is NodePort. This means, it won't even go through the ingress controller. From the docs:
By default, type NodePort will be used when you don't set HTTP01 or when you set serviceType to an empty string. Normally there's no need to change this.
I'm not sure how the rest of your setup looks like, but http01 cause the acme server to make HTTP requests (not https). You need to make sure your nginx has listener for http (80). It does follow redirects, so you can listen on http and redirect all traffic to https, this is legit and working.
The cert-manager creates an ingress resource for validation. It directs traffic to the temporary pod. This ingress has it's own set of rules, and you can control it using this setting. You can try and disable or modify the rewrite-targets on this resource.
Another thing I would try is to access this URL from inside the cluster (bypassing the ingress nginx). If it works directly, then it's an ingress / networking problem, otherwise it's something else.
Please share the relevant nginx and cert-manager logs, it might be useful for debugging or understanding where your problem exist.

Related

Can get TLS certificates from cert-manager/letsencrypt for either testing or production enviroments in kubernetes, but not both

I wrote a bash script to automate the deployment of an application in a kubernetes cluster using helm and kubectl. I use cert-manager to automate issuing and renewing of TLS certificates, obtained by letsencrypt, needed by the application itself.
The script can deploy the application in either one of many environments such as testing (test) and production (prod) using different settings and manifests as needed. For each environment I create a separate namespace and deploy the needed resources in it. In production I use the letsencrypt production server (spec.acme.server: https://acme-v02.api.letsencrypt.org/directory) whereas, in any other env such as testing, I use the staging server (spec.acme.server: https://acme-staging-v02.api.letsencrypt.org/directory). The hostnames I request the certificates for are a different set depending on the environment: xyz.test.mysite.tld in testing vs xyz.mysite.tld in production. I provide the same contact e-mail address for all environments.
Here the full manifest of the letsencrypt issuer for testing:
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: letsencrypt-staging
spec:
acme:
email: operations#mysite.tld
server: https://acme-staging-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-staging-issuer-private-key
solvers:
- http01:
ingress:
class: public-test-it-it
And here the full manifest of the letsencrypt issuer for production:
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: letsencrypt-production
spec:
acme:
email: operations#mysite.tld
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-production-issuer-private-key
solvers:
- http01:
ingress:
class: public-prod-it-it
When I deploy the application the first time, either in test or prod environements, everything works as expected, and cert-manager gets the TLS certificates signed by letsencrypt (staging or production server respectively) and stored in secrets.
But when I deploy the application in another environment (so that I have both test and prod running in parallel), cert-manager can't get the certificates signed anymore, and the chain certificaterequest->order->challenge stops at the challenge step with the following output:
kubectl describe challenge xyz-tls-certificate
...
Status:
Presented: true
Processing: true
Reason: Waiting for HTTP-01 challenge propagation: wrong status code '404', expected '200'
State: pending
Events: <none>
and I can verify that indeed I get a 404 when trying to curl any of the challenges' URLs:
curl -v http://xyz.test.mysite.tld/.well-known/acme-challenge/IECcFDmQF_fzGKcA9hJvFGEWRjDCAE_fs8dnBXlr_wY
* Trying vvv.xxx.yyy.zzz:80...
* Connected to xyz.test.mysite.tld (vvv.xxx.yyy.zzz) port 80 (#0)
> GET /.well-known/acme-challenge/IECcFDmQF_fzGKcA9hJvFGEWRjDCAE_fs8dnBXlr_wY HTTP/1.1
> Host: xyz.test.mysite.tld
> User-Agent: curl/7.74.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 404 Not Found
< date: Thu, 21 Jul 2022 09:48:08 GMT
< content-length: 21
< content-type: text/plain; charset=utf-8
<
* Connection #0 to host xyz.test.mysite.tld left intact
default backend - 404
So letsencrypt can't access the challenges' URLs and won't sign the TLS certs.
I tried to debug the 404 error and found that I can successfully curl the pods and services backing the challenges from another pod running in the cluster/namespace, but I get 404s from the outside world. This seems like an issue with the ingress controller (HAProxytech/kubernetes-ingress in my case), but I can't explain why the mechanism worked upon first deployment and then not anymore..
I inspected the cert-manager logs and found lines such:
kubectl logs -n cert-manager cert-manager-...
I0721 13:27:45.517637 1 ingress.go:99] cert-manager/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="xyz.test.mysite.tld" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-8668s" "related_resource_namespace"="app-test-it-it" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="xyz-tls-certificate-hwvjf-2516368856-1193545890" "resource_namespace"="app-test-it-it" "resource_version"="v1" "type"="HTTP-01"
E0721 13:27:45.527238 1 sync.go:186] cert-manager/challenges "msg"="propagation check failed" "error"="wrong status code '404', expected '200'" "dnsName"="xyz.test.mysite.tld" "resource_kind"="Challenge" "resource_name"="xyz-tls-certificate-hwvjf-2516368856-1193545890" "resource_namespace"="app-test-it-it" "resource_version"="v1" "type"="HTTP-01"
which seems to confirm that cert-manager could self-check, from within the cluster, that the challenges' URLs are in place, but those are not reachable by the outside world (propagation check failed).
It seems like cert-manager set-up challenges' pods/services/ingresses all right, but then requests sent to the challenges' URLs are not routed to the backing pods/services. And this only the second time I try to deploy the app..
I also verified that, after issuing the certificates upon the first deployment, cert-manager (correctly) removed all related pods/services/ingresses from the related namespace, so there should not be any conflict from duplicated challenges' resources.
I restate here that the certificates are issued flawlessly the first time I deploy the application, either in test or prod environment, but they won't be issued anymore if I deploy the app again in a different environment.
Any idea why this is the case?
I finally found out what the issue was..
Basically, I was installing a separate HAProxy ingress controller (haproxytech/kubernetes-ingress) per environment (test/prod), and therefore each namespace had its own ingress controller which I referenced in my manifests.
This should have worked in principle, but it turned out cert-manager could not reference the right ingress-controller upon setting up the letsencrypt challenges.
The solution consisted in creating a single HAproxy ingress controller (in its own separate namespace) to serve the whole cluster and be referenced by all other environments/namespaces. This way the challenges for both testing and production environment where correctly set-up by cert-manager and verified by letsencrypt, which signed the required certificates.
In the end I highly recommend using a single HAproxy ingress controller per cluster, installed in its own namespace.
This configuration is less redundant and eliminates potential issues such as the one I faced.

How to pass DNS validation for internal cluster domain for a kubernetes cert-manager ACME certificate

I run a kubernetes cluster with cert-manager installed for managing ACME (Let's Encrypt) certificates. I'm using DNS domain validation with Route 53 and it works all fine.
The problem comes when I try to issue a certificate for a cluster internal domain. In this case domain validation does not pass since the validation challenge is presented on external Route53 zone only, while cert-manager is trying to look for domain name via cluster internal DNS.
Any hints on how this can be solved are welcome.
Assuming that you don't control public DNS for your cluster internal domain, you will not be able to receive LetsEncrypt certificates for it.
You may however set up another issuer that will grant you certificates for this domain, e.g. the SelfSigned issuer: https://cert-manager.io/docs/configuration/selfsigned/
Then set the issuerRef of your certificate object to point to your SelfSigned issuer:
(...)
issuerRef:
name: selfsigned-issuer
kind: ClusterIssuer
group: cert-manager.io

Kubernetes: Does auth-tls-verify-client work independent of TLS?

In Kubernetes, to enable client-certificate authN, the annotation nginx.ingress.kubernetes.io/auth-tls-verify-client can be used in an ingress. Will client-cert authN work even if I don't do TLS termination in that ingress? For instance, in this ingress, will client-cert authN still work if I remove the tls block from the ingress?
tls:
- hosts:
- mydomain.com
secretName: tls-secret
(More info: I have two ingresses for the same host, one which has a TLS section, and another ingress which has rule for a specific api-path, and has a client-cert section but no TLS section).
Also, if the request is sent on http endpoint (not https) I observed that the client-cert is ignored even if the annotation value is set to on. Is this a documented behavior?
If you define two ingresses as described then a certificate will be required unless you specify auth-tls-verify-client as optional. See the documentation mentioned in the comments.
Also TLS is required if you want to do client certificate authentication. The client certificate is used during the TLS handshake which is why specifying client certificates for one ingress applies to all where the host is the same (eg www.example.com)

force http to https on GKE ingress cloud loadbalancer [duplicate]

Is there a way to force an SSL upgrade for incoming connections on the ingress load-balancer? Or if that is not possible with, can I disable port :80? I haven't found a good documentation pages that outlines such an option in the YAML file. Thanks a lot in advance!
https://github.com/kubernetes/ingress-gce#frontend-https
You can block HTTP through the annotation kubernetes.io/ingress.allow-http: "false" or redirect HTTP to HTTPS by specifying a custom backend. Unfortunately GCE doesn't handle redirection or rewriting at the L7 layer directly for you, yet. (see https://github.com/kubernetes/ingress-gce#ingress-cannot-redirect-http-to-https)
Update: GCP now handles redirection rules for load balancers, including HTTP to HTTPS. There doesn't appear to be a method to create these through Kubernetes YAML yet.
This was already correctly answered by a comment on the accepted answer. But since the comment is buried I missed it several times.
As of GKE version 1.18.10-gke.600 you can add a k8s frontend config to redirect from http to https.
https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features#https_redirect
apiVersion: networking.gke.io/v1beta1
kind: FrontendConfig
metadata:
name: ssl-redirect
spec:
redirectToHttps:
enabled: true
# add below to ingress
# metadata:
# annotations:
# networking.gke.io/v1beta1.FrontendConfig: ssl-redirect
The annotation has changed:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: test
annotations:
kubernetes.io/ingress.allow-http: "false"
spec:
...
Here is the annotation change PR:
https://github.com/kubernetes/contrib/pull/1462/files
If you are not bound to the GCLB Ingress Controller you could have a look at the Nginx Ingress Controller. This controller is different to the builtin one in multiple ways. First and foremost you need to deploy and manage one by yourself. But if you are willing to do so, you get the benefit of not depending on the GCE LB (20$/month) and getting support for IPv6/websockets.
The documentation states:
By default the controller redirects (301) to HTTPS if TLS is enabled for that ingress . If you want to disable that behaviour globally, you
can use ssl-redirect: "false" in the NGINX config map.
The recently released 0.9.0-beta.3 comes with an additional annotation for explicitly enforcing this redirect:
Force redirect to SSL using the annotation ingress.kubernetes.io/force-ssl-redirect
Google has responded to our requests and is testing HTTP->HTTPS SSL redirection on their load balancers. Their latest answer said it should be in Alpha sometime before the end of January 2020.
Their comment:
Thank you for your patience on this issue. The feature is currently in testing and we expect to enter Alpha phase before the end of January. Our PM team will have an announcement with more details as we get closer to the Alpha launch.
My fingers are crossed that we'll have a straightforward solution to this very common feature in the near future.
UPDATE (April 2020):
HTTP(S) rewrites is now a Generally Available feature. It's still a bit rough around the edges and does not work out-of-the-box with the GCE Ingress Controller unfortunately. But time will tell and hopefully a native solution will appear.
A quick update. Here
Now a FrontEndConfig can be make to configure the ingress. Hopes it helps.
Example:
apiVersion: networking.gke.io/v1beta1
kind: FrontendConfig
metadata:
name: my-frontend-config
spec:
redirectToHttps:
enabled: true
responseCodeName: 301
You'll need to make sure that your load balancer supports HTTP and HTTPS
Worked on this for a long time. In case anyone isn't clear on the post above. You would rebuild your ingress with annotation -- kubernetes.io/ingress.allow-http: "falseā€ --
Then delete your ingress and redeploy. The annotation will have the ingress only create a LB for 443, instead of both 443 and 80.
Then you do a compute HTTP LB, not one for GKE.
Gui directions:
Create a load balancer and choose HTTP(S) Load Balancing -- Start configuration.
choose - From Internet to my VMs and continue
Choose a name for the LB
leave the backend configuration blank.
Under Host and path rules, select Advanced host and path rules with the action set to
Redirect the client to different host/path.
Leave the Host redirect field blank.
Select Prefix Redirect and leave the Path value blank.
Chose the redirect response code as 308.
Tick the Enable box for HTTPS redirect.
For the Frontend configuration, leave http and port 80, for ip address select the static
IP address being used for your GKE ingress.
Create this LB.
You will now have all http traffic go to this and 308 redirect to your https ingress for GKE. Super simple config setup and works well.
Note: If you just try to delete the port 80 LB that GKE makes (not doing the annotation change and rebuilding the ingress) and then adding the new redirect compute LB it does work, but you will start to see error messages on your Ingress saying error 400 invalid value for field 'resource.ipAddress " " is in use and would result in a conflict, invalid. It is trying to spin up the port 80 LB and can't because you already have an LB on port 80 using the same IP. It does work but the error is annoying and GKE keeps trying to build it (I think).
Thanks to the comment of #Andrej Palicka and according to the page he provided: https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features#https_redirect now I have an updated and working solution.
First we need to define a FrontendConfig resource and then we need to tell the Ingress resource to use this FrontendConfig.
Example:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-app-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: myapp-prd
networking.gke.io/managed-certificates: managed-cert
kubernetes.io/ingress.class: "gce"
networking.gke.io/v1beta1.FrontendConfig: myapp-frontend-config
spec:
defaultBackend:
service:
name: myapp-app-service
port:
number: 80
---
apiVersion: networking.gke.io/v1beta1
kind: FrontendConfig
metadata:
name: myapp-frontend-config
spec:
redirectToHttps:
enabled: true
responseCodeName: MOVED_PERMANENTLY_DEFAULT
You can disable HTTP on your cluster (note that you'll need to recreate your cluster for this change to be applied on the load balancer) and then set HTTP-to-HTTPS redirect by creating an additional load balancer on the same IP address.
I spend couple of hours on the same question, and ended up doing what I've just described. It works perfectly.
Redirecting to HTTPS in Kubernetes is somewhat complicated. In my experience, you'll probably want to use an ingress controller such as Ambassador or ingress-nginx to control routing to your services, as opposed to having your load balancer route directly to your services.
Assuming you're using an ingress controller, then:
If you're terminating TLS at the external load balancer and the LB is running in L7 mode (i.e., HTTP/HTTPS), then your ingress controller needs to use X-Forwarded-Proto, and issue a redirect accordingly.
If you're terminating TLS at the external load balancer and the LB is running in TCP/L4 mode, then your ingress controller needs to use the PROXY protocol to do the redirect.
You can also terminate TLS directly in your ingress controller, in which case it has all the necessary information to do the redirect.
Here's a tutorial on how to do this in Ambassador.

Is it possible to use multiple authentication types with nginx ingress?

I have some internal services (Logging, Monitoring, etc) exposed via nginx-ingress and protected via oauth2-proxy and some identity manager (Okta) behind. We use 2fa for additional security for our users.
This works great for user accounts. It does not work for other systems like external monitoring as we can not make a request with a token or basic auth credentials.
Is there any known solution to enable multiple authentication types in an ingress resource?
Everything I found so far is specific for one authentication process and trying to add basic auth as well did not work.
Current ingress
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
certmanager.k8s.io/cluster-issuer: cert-manager-extra-issuer
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/auth-signin: https://sso-proxy/oauth2/start?rd=https://$host$request_uri
nginx.ingress.kubernetes.io/auth-url: https://sso-proxy/oauth2/auth
This is simply not an advisable solution. You cannot use multiple authentication types in a single Ingress resource.
The better way to deal with it would be to create separate Ingresses for different authentication types.
I hope it helps.