Kubernetes - ALB Ingress controller - CertificateNotFound - kubernetes

Ingress controller cannot find the certificate. I get the error failed to reconcile listeners due to failed to create listener due to CertificateNotFound
Its a imported LetsEncrypt certificate and describing the certificate using aws cli works.
Here is the error in controller logs:
1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to reconcile listeners due to failed to create listener due to CertificateNotFound: Certificate 'arn:aws:acm:ap-south-1:1234:certificate/f9a3a88e-481c-4c91-9d55-1234' not found\n\tstatus code: 400, request id: 811dddad-532c-4846-ae10-c69c2f4b4c3b" "controller"="alb-ingress-controller" "request"={"Namespace":"test-web-dev","Name":"test-ingress-dev"}

Related

Nginx ingress with grpc, missing :te header

Trying to setup nginx grpc application in GKE following this (https://kubernetes.github.io/ingress-nginx/examples/grpc) tutorial but getting the following error when I add the nginx.ingress.kubernetes.io/backend-protocol: "GRPC" annotation.
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNKNOWN
details = "Missing :te header"
debug_error_string = "UNKNOWN:Error received from peer ipv4:34.149.11.89:443 {created_time:"2023-01-28T12:32:39.285586-08:00", grpc_status:2, grpc_message:"Missing :te header"}"

How to debug "Internal Server Error" in AWS Cloudformation (MWAA)

I try to deploy AWS MWAA via Cloudformation but come accross:
Resource handler returned message: "Invalid request provided: Internal server error (Service: Mwaa, Status Code: 500, Request ID: 21fea850-9cf7-4977-a947-d10dd3cc1a13)" (RequestToken: f2384bd0-68a6-44d3-d57d-fed3a52f817b, HandlerErrorCode: InvalidRequest)"
on the MWAA resource itself. The other ressources (LoadBalancer, Listener, etc...) are correctly deployed.
What are my solutions to debug this? Since the error message is not explicit and the resource is not created (so it has no logs).

Open edx with keycloak OIDC [SSL: CERTIFICATE_VERIFY_FAILED]

I am trying unsuccessfully to use KEYCLOAK to authenticate my edx users without success,
I followed the doc instructions to add a third party auth and everything works fine, but when I try to authenticate a user through the keyclok form I get the following error:
9112 [social] [user None] [ip 41.221.187.207] middleware.py:40 -
Authentication failed: HTTPSConnectionPool (host = 'mali-id.ml', port
= 443): Max retries exceeded with url: / auth / realms / Mali-Id / protocol / openid-connect / token (Caused by SSLError
(SSLCertVerificationError (1, '[SSL: CERTIFICATE_VERIFY_FAILED]
certificate verify failed: unable to get local issuer certificate
(_ssl.c: 1125)')))
My stack is configured as follows:
Keycloak deployed on Docker with Nginx as a proxy that listens on ports 80 and 443 on a separate vm
Edx deployed by Bitnami on a GCP vm

Unable to connect to www.googleapis.com from GKE

I have an application running in my GKE cluster that needs access to www.googleapis.com. I also make use of Network Policy to enhance security.
With a default deny all egress traffic in place, I cannot connect to www.googleapis.com naturally. I get the error
INFO 0827 14:33:53.313241 retry_util.py] Retrying request, attempt #3...
DEBUG 0827 14:33:53.313862 http_wrapper.py] Caught socket error, retrying: timed out
DEBUG 0827 14:33:53.314035 http_wrapper.py] Retrying request to url https://www.googleapis.com/storage/v1/b?project=development&projection=noAcl&key=AIzaSyDnac<key>bmJM&fields=nextPageToken%2Citems%2Fid&alt=json&maxResults=1000 after exception timed out
I found out that the hostname www.googleapis.com corresponds to the IP 216.58.207.36
So I went ahead an created an egress entry in my Network Policy
spec:
egress:
- ports:
- port: 443
protocol: TCP
to:
- ipBlock:
cidr: 216.58.207.36/32
And now from within the Pod, I can telnet this endpoint
$ telnet googleapis.com 443
Trying 216.58.207.36...
Connected to googleapis.com.
Escape character is '^]'.
But for some reason Im still encountering the same error
INFO 0827 14:36:15.767508 retry_util.py] Retrying request, attempt #5...
DEBUG 0827 14:36:15.768018 http_wrapper.py] Caught socket error, retrying: timed out
DEBUG 0827 14:36:15.768128 http_wrapper.py] Retrying request to url https://www.googleapis.com/storage/v1/b?project=development&projection=noAcl&key=AIzaSyDnac<key>bmJM&fields=nextPageToken%2Citems%2Fid&alt=json&maxResults=1000 after exception timed out
However if I delete the network policy, I can connect
INFO 0827 14:40:24.177456 base_api.py] Body: (none)
INFO 0827 14:40:24.177595 transport.py] Attempting refresh to obtain initial access_token
WARNING 0827 14:40:24.177864 multiprocess_file_storage.py] Credentials file could not be loaded, will ignore and overwrite.
DEBUG 0827 14:40:24.177957 multiprocess_file_storage.py] Read credential file
WARNING 0827 14:40:24.178036 multiprocess_file_storage.py] Credentials file could not be loaded, will ignore and overwrite.
DEBUG 0827 14:40:24.178090 multiprocess_file_storage.py] Read credential file
WARNING 0827 14:40:24.356631 multiprocess_file_storage.py] Credentials file could not be loaded, will ignore and overwrite.
DEBUG 0827 14:40:24.356972 multiprocess_file_storage.py] Read credential file
DEBUG 0827 14:40:24.357510 multiprocess_file_storage.py] Wrote credential file /var/lib/jenkins/.gsutil/credstore2.
connect: (www.googleapis.com, 443)
send: 'GET /storage/v1/b?project=development&fields=nextPageToken%2Citems%2Fid&alt=json&projection=noAcl&maxResults=1000 HTTP/1.1\r\nHost: www.googleapis.com\r\ncontent-length: 0\r\nauthorization: REDACTED
My Network Policy allows ALL ingress traffic by default
ingress:
- {}
podSelector: {}
Any idea what I might be missing here ? Is there some other IP address that I need to whitelist in this case ?
EDIT
When the network Policy is in place, I did a test using curl and I get
* Trying 2a00:1450:4001:80b::200a...
* TCP_NODELAY set
* Immediate connect fail for 2a00:1450:4001:80b::200a: Cannot assign requested address
* Trying 2a00:1450:4001:80b::200a...
* TCP_NODELAY set
* Immediate connect fail for 2a00:1450:4001:80b::200a: Cannot assign requested address
* Trying 2a00:1450:4001:80b::200a...
* TCP_NODELAY set
* Immediate connect fail for 2a00:1450:4001:80b::200a: Cannot assign requested address
* Trying 2a00:1450:4001:80b::200a...
* TCP_NODELAY set
* Immediate connect fail for 2a00:1450:4001:80b::200a: Cannot assign requested address
* Trying 2a00:1450:4001:80b::200a...
* TCP_NODELAY set
* Immediate connect fail for 2a00:1450:4001:80b::200a: Cannot assign requested address
* Trying 2a00:1450:4001:80b::200a...
* TCP_NODELAY set
* Immediate connect fail for 2a00:1450:4001:80b::200a: Cannot assign requested address
This does not happen when the Network Policy is deleted.
The comment from #mensi is correct, there are multiple IPs behind www.googleapis.com. You can for example see that by pinging the URL multiple times, you'll most likely get a different IP every time.
The easiest solution would be to allow all egress by default with:
spec:
podSelector: {}
egress:
- {}
policyTypes:
- Egress
You could also try allowing all of the Google API's public IP ranges, but as Google doesn't seem to publish a list of those (only the restricted.googleapis.com and private.googleapis.com here), that might be a bit tougher.

spring cloud gateway throws Load balancer does not have available server for client

Eureka + Gateway + BackendServerA + BackendServerB. After BackendServerB is down, it throws exception:
com.netflix.zuul.exception.ZuulException: Forwarding error
Caused by: com.netflix.client.ClientException: Load balancer does not have available server for client: service-B
Event after I manually reboot serverB, it still throws the same error with a 500 http code returned. However, if I turn on the other 3 servers before Gateway, it works properly. I used application.properties file to configure routes.