Unable to connect to www.googleapis.com from GKE - kubernetes

I have an application running in my GKE cluster that needs access to www.googleapis.com. I also make use of Network Policy to enhance security.
With a default deny all egress traffic in place, I cannot connect to www.googleapis.com naturally. I get the error
INFO 0827 14:33:53.313241 retry_util.py] Retrying request, attempt #3...
DEBUG 0827 14:33:53.313862 http_wrapper.py] Caught socket error, retrying: timed out
DEBUG 0827 14:33:53.314035 http_wrapper.py] Retrying request to url https://www.googleapis.com/storage/v1/b?project=development&projection=noAcl&key=AIzaSyDnac<key>bmJM&fields=nextPageToken%2Citems%2Fid&alt=json&maxResults=1000 after exception timed out
I found out that the hostname www.googleapis.com corresponds to the IP 216.58.207.36
So I went ahead an created an egress entry in my Network Policy
spec:
egress:
- ports:
- port: 443
protocol: TCP
to:
- ipBlock:
cidr: 216.58.207.36/32
And now from within the Pod, I can telnet this endpoint
$ telnet googleapis.com 443
Trying 216.58.207.36...
Connected to googleapis.com.
Escape character is '^]'.
But for some reason Im still encountering the same error
INFO 0827 14:36:15.767508 retry_util.py] Retrying request, attempt #5...
DEBUG 0827 14:36:15.768018 http_wrapper.py] Caught socket error, retrying: timed out
DEBUG 0827 14:36:15.768128 http_wrapper.py] Retrying request to url https://www.googleapis.com/storage/v1/b?project=development&projection=noAcl&key=AIzaSyDnac<key>bmJM&fields=nextPageToken%2Citems%2Fid&alt=json&maxResults=1000 after exception timed out
However if I delete the network policy, I can connect
INFO 0827 14:40:24.177456 base_api.py] Body: (none)
INFO 0827 14:40:24.177595 transport.py] Attempting refresh to obtain initial access_token
WARNING 0827 14:40:24.177864 multiprocess_file_storage.py] Credentials file could not be loaded, will ignore and overwrite.
DEBUG 0827 14:40:24.177957 multiprocess_file_storage.py] Read credential file
WARNING 0827 14:40:24.178036 multiprocess_file_storage.py] Credentials file could not be loaded, will ignore and overwrite.
DEBUG 0827 14:40:24.178090 multiprocess_file_storage.py] Read credential file
WARNING 0827 14:40:24.356631 multiprocess_file_storage.py] Credentials file could not be loaded, will ignore and overwrite.
DEBUG 0827 14:40:24.356972 multiprocess_file_storage.py] Read credential file
DEBUG 0827 14:40:24.357510 multiprocess_file_storage.py] Wrote credential file /var/lib/jenkins/.gsutil/credstore2.
connect: (www.googleapis.com, 443)
send: 'GET /storage/v1/b?project=development&fields=nextPageToken%2Citems%2Fid&alt=json&projection=noAcl&maxResults=1000 HTTP/1.1\r\nHost: www.googleapis.com\r\ncontent-length: 0\r\nauthorization: REDACTED
My Network Policy allows ALL ingress traffic by default
ingress:
- {}
podSelector: {}
Any idea what I might be missing here ? Is there some other IP address that I need to whitelist in this case ?
EDIT
When the network Policy is in place, I did a test using curl and I get
* Trying 2a00:1450:4001:80b::200a...
* TCP_NODELAY set
* Immediate connect fail for 2a00:1450:4001:80b::200a: Cannot assign requested address
* Trying 2a00:1450:4001:80b::200a...
* TCP_NODELAY set
* Immediate connect fail for 2a00:1450:4001:80b::200a: Cannot assign requested address
* Trying 2a00:1450:4001:80b::200a...
* TCP_NODELAY set
* Immediate connect fail for 2a00:1450:4001:80b::200a: Cannot assign requested address
* Trying 2a00:1450:4001:80b::200a...
* TCP_NODELAY set
* Immediate connect fail for 2a00:1450:4001:80b::200a: Cannot assign requested address
* Trying 2a00:1450:4001:80b::200a...
* TCP_NODELAY set
* Immediate connect fail for 2a00:1450:4001:80b::200a: Cannot assign requested address
* Trying 2a00:1450:4001:80b::200a...
* TCP_NODELAY set
* Immediate connect fail for 2a00:1450:4001:80b::200a: Cannot assign requested address
This does not happen when the Network Policy is deleted.

The comment from #mensi is correct, there are multiple IPs behind www.googleapis.com. You can for example see that by pinging the URL multiple times, you'll most likely get a different IP every time.
The easiest solution would be to allow all egress by default with:
spec:
podSelector: {}
egress:
- {}
policyTypes:
- Egress
You could also try allowing all of the Google API's public IP ranges, but as Google doesn't seem to publish a list of those (only the restricted.googleapis.com and private.googleapis.com here), that might be a bit tougher.

Related

gitlab-ce kas in docker container - dial tcp i/o timeout

We have a self-managed gitlab instance running in docker container and an external url set as: https://subdomain.domain.com:50080
I’ve put gitlab_kas[‘enable’] = true in the docker-compose file in “GITLAB_OMNIBUS_CONFIG: |” and try to add the agent with helm in “Connect a Kubernetes cluster”, but the kasAddress does not contain the 50080 port:
–set config.kasAddress=wss://subdomain.domain.com:/-/kubernetes-agent/
and the agent pod gives this error:
{“level”:“error”,“time”:“2022-09-07T07:32:50.899Z”,“msg”:“Error handling a connection”,“mod_name”:“reverse_tunnel”,“error”:“Connect(): rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing failed to WebSocket dial: failed to send handshake request: Get \"https://subdomain.domain.com/-/kubernetes-agent/\\\”: context deadline exceeded""}
If I add the port manually to the kasAddress in the helm command the gitlab-kas/current log gives this error:
2022-09-07_07:44:01.55475 {“level”:“error”,“time”:“2022-09-07T07:44:01.553Z”,“msg”:“AgentInfo()”,“correlation_id”:“01GCBE78S0SX2BA5B48M3813W4”,“grpc_service”:“gitlab.agent.reverse_tunnel.rpc.ReverseTunnel”,“grpc_method”:“Connect”,“error”:“Get "https://subdomain.domain.com:50080/api/v4/internal/kubernetes/agent_info\”: dial tcp PUBLIC_IP:50080: i/o timeout"}
I've changed the external_url to use the default 443 port for https but the same i/o timeout error is found in the kas log
The problem was upstream on the mikrotik router. I needed a masquerade nat rule for the srcnat chain. In other words, the gitlab was not able to reach itself on the public IP

curl request to cluster node port hangs on initializing NSS with certpath

I am attempting to make a local request to the kubernetes cluster that is hosted on my server, the cluster's NodePort is listening at the following address 172.20.120.1:30280. External client in production are required to make requests to 172.20.0.1:8000 (this cannot change), so I am attempting to add a DNAT rule to nat the traffic from:
172.20.0.1:8000 -> 172.20.120.1:30280 (k8s NodePort)
I am able to make curl request to 172.20.120.1:30280 directly and get a successful response back. However, when I make a curl request to 172.20.0.1:8000 it just hangs with the following message:
# curl -vvvk https://172.20.0.1:8000/v1/my-api
* About to connect() to 172.20.0.1 port 8000 (#0)
* Trying 172.20.0.1...
* Connected to 172.20.0.1 (172.20.0.1) port 8000 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
And then it eventually times out with the following error:
...
* NSS error -5961 (PR_CONNECT_RESET_ERROR)
* TCP connection reset by peer
* Closing connection 0
curl: (35) TCP connection reset by peer
When I make a request directly to 172.20.120.1:30280 I don't get that cert error and it works. I get a successful response back.
Does anyone know why I am getting that cert error?

Open edx with keycloak OIDC [SSL: CERTIFICATE_VERIFY_FAILED]

I am trying unsuccessfully to use KEYCLOAK to authenticate my edx users without success,
I followed the doc instructions to add a third party auth and everything works fine, but when I try to authenticate a user through the keyclok form I get the following error:
9112 [social] [user None] [ip 41.221.187.207] middleware.py:40 -
Authentication failed: HTTPSConnectionPool (host = 'mali-id.ml', port
= 443): Max retries exceeded with url: / auth / realms / Mali-Id / protocol / openid-connect / token (Caused by SSLError
(SSLCertVerificationError (1, '[SSL: CERTIFICATE_VERIFY_FAILED]
certificate verify failed: unable to get local issuer certificate
(_ssl.c: 1125)')))
My stack is configured as follows:
Keycloak deployed on Docker with Nginx as a proxy that listens on ports 80 and 443 on a separate vm
Edx deployed by Bitnami on a GCP vm

k8s-visualizer can't read from apiserver

I've tried multiple forks of github.com/brendandburns/gcp-live-k8s-visualizer/issues/6. the current fork i'm trying to get working is (as mentioned by flx in another thread: https://github.com/0ortmann/k8s-visualizer ). I can get the interface to start up; but when teh script.js goes to getJSON("/api..."....) it tried to pull the /api URI from the current port (i.e.8001) for which it gets an unauthorized response? my apiserver is running on port 8080... any ideas?
Update: the "problem" appears to be related to (a) the fact that i'm making the browser http request from a remote host (i.e. i'm not going to http://localhost) and (b) the request filtering that the kubectl proxy is doing... adding the --disable-filter to the kubectl proxy command and doing a curl <remotehostIP>:8001/api at least gets me a response Moved Permanently instead of unauthorized. however, any curl <remotehostIP>:8001/api/v1/pods or similar gets an http 500 error... also the kubectl proxy command has
W1003 15:22:23.805574 8666 proxy.go:116] Request filter disabled, your proxy is vulnerable to XSRF attacks, pleas
e be cautious
Starting to serve on [::]:8001I1003 15:22:23.961109 8666 logs.go:41] http: proxy error: unsupported protocol sche
me ""
I1003 15:22:23.961311 8666 logs.go:41] http: proxy error: unsupported protocol scheme ""
I1003 15:22:23.961451 8666 logs.go:41] http: proxy error: unsupported protocol scheme ""
I1003 15:22:23.962003 8666 logs.go:41] http: proxy error: unsupported protocol scheme ""
(unsupported protocol scheme messages repeat forever)...

Fiddler 2 error: SecureClientPipeDirect failed: System.IO.IOException Unable to read data from the transport connection

I am trying to decrypt the https traffic by fiddler2 which has just been upgraded.
What is the problem to get this errror?
17:27:45:6821 !SecureClientPipeDirect failed: System.IO.IOException Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. < A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond on pipe to (CN=192.168.0.100, O=DO_NOT_TRUST, OU=Created by http://www.fiddler2.com)
Thanks
The error message indicates that the client failed to complete the HTTPS handshake. What was the client? This message typically indicates that the client isn't configured to trust Fiddler's Root Certificate.
What, if any, other messages are shown on the Log tab?