Kubernetes Cloud SQL sidecar connection timed out. How to check if credentials work? - kubernetes

I'm trying to setup a Cloud SQL Proxy Docker image for PostgreSQL as mentioned here.
I can get my app to connect to the proxy docker image but the proxy times out. I suspect it's my credentials or the port, so how do I debug and find out if it works?
This is what I have on my project
kubectl create secret generic cloudsql-instance-credentials --from-file=credentials.json=my-account-credentials.json
My deploy spec snippet:
spec:
containers:
- name: mara ...
- name: cloudsql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.11
command: ["/cloud_sql_proxy",
"-instances=<MY INSTANCE NAME>=tcp:5432",
"-credential_file=/secrets/cloudsql/credentials.json"]
volumeMounts:
- name: cloudsql-instance-credentials
mountPath: /secrets/cloudsql
readOnly: true
volumes:
- name: cloudsql-instance-credentials
secret:
secretName: cloudsql-instance-credentials
The logs of my cloudsql-proxy show a timeout:
2019/05/13 15:08:25 using credential file for authentication; email=646092572393-compute#developer.gserviceaccount.com
2019/05/13 15:08:25 Listening on 127.0.0.1:5432 for <MY INSTANCE NAME>
2019/05/13 15:08:25 Ready for new connections
2019/05/13 15:10:48 New connection for "<MY INSTANCE NAME>"
2019/05/13 15:10:58 couldn't connect to <MY INSTANCE NAME>: dial tcp <MY PRIVATE IP>:3307: getsockopt: connection timed out
Questions:
I specify 5432 as my port, but as you can see in the logs above,it's hitting 3307. Is that normal and if not, how do I specify 5432?
How do I check if it is a problem with my credentials? My credentials file is from my service account 123-compute#developer.gserviceaccount.com
and the service account shown when I go to my cloud sql console is p123-<somenumber>#gcp-sa-cloud-sql.iam.gserviceaccount.com. They don't seem the same? Does that make a difference?
If I make the Cloud SQL instance available on a public IP, it works.

I specify 5432 as my port, but as you can see in the logs above,it's
hitting 3307
The proxy listens locally on the port you specified (in this case 5432), and connects to your Cloud SQL instance via port 3307. This is expected and normal.
How do I check if it is a problem with my credentials?
The proxy returns an authorization error if the Cloud SQL instance doesn't exist, or if the service account doesn't have access. The connection timeout error means it failed to reach the Cloud SQL instance.
My credentials file is from my service account 123-compute#developer.gserviceaccount.com and the service account shown when I go to my cloud sql console is p123-#gcp-sa-cloud-sql.iam.gserviceaccount.com. They don't seem the same?
One is just the name of the file, the other is the name of the service account itself. The name of the file doesn't have to match the name of the service account. You can check the name and IAM roles of a service account on the Service Account page.
2019/05/13 15:10:58 couldn't connect to : dial tcp :3307: getsockopt: connection timed out
This error means that the proxy failed to establish a network connection to the instance (usually because a path from the current location doesn't exist). There are two common causes for this:
First, make sure there isn't a firewall or something blocking outbound connections on port 3307.
Second, since you are using Private IP, you need to make sure the resource you are running the proxy on meets the networking requirements.

Proxy listen port 3307. This is mentioned on documentation
Port 3307 is used by the Cloud SQL Auth proxy to connect to the Cloud SQL Auth proxy server. -- https://cloud.google.com/sql/docs/postgres/connect-admin-proxy#troubleshooting
You may need to create a firewall like the following:
Direction: Egress
Action on match: Allow
Destination filters : IP ranges 0.0.0.0/0
Protocols and ports : tcp:3307 & tcp:5432

Related

Kubernetes Can't connect to database running on host machine

I have a problem. In my cluster I have a Ruby-On-Rails application which I want to map to the database on the hosted machine (Not containerized). It's a Postgres database, which listens to the following port when I run:
sudo netstat -tulpn | grep LISTEN
tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN 1109/postgres
Then I created a service which maps the DB_HOST to the local machine like this:
apiVersion: v1
kind: Service
metadata:
name: external-postgres-svc
namespace: myapp-nm
spec:
ports:
- port: 5432
targetPort: 5432
protocol: TCP
I also added an endpoint:
apiVersion: v1
kind: Endpoints
metadata:
name: external-postgres-svc
namespace: myapp-nm
subsets:
- addresses:
- ip: "10.0.2.2"
ports:
- port: 5432
And in my configmap I have the following config:
apiVersion: v1
kind: ConfigMap
metadata:
name: myapp-config
namespace: myapp-nm
data:
db_host: "external-postgres-svc.myapp-nm.svc"
db_port: "5432"
db_username: "myuser"
db_password: "mypass"
But then when all resources are created and the migration job runs, it never completes. After like 2-3 minutes it crashes and gives the error:
connection to server at port 5432 failed: Operation timed out
I have added:
listen_addresses = '*'
to the /etc/postgresql/14/main/postgresql.conf, I added:
host all all 0.0.0.0/0 md5
to the /etc/postgresql/14/main/pg_hba.conf,
so I think it should listen to incoming traffic. I also ran
sudo ufw allow 5432/TCP
to allow the firewall port on my machine and I checked if the user was correct and it is, so what can be the problem?
I can connect to the database if I am not in the cluster using the
ip
port
username
password
What am I doing wrong?
Error: Operation Timed Out : indicates that your server failed to issue a complete response within the allowed time period.
Check below solutions :
1) Check If /etc directory owned by another user, /etc directory needs to be set as root user and also open 5432 port. (Add the hostname and IP details of the host in /etc/hosts. In the case of a multi-node cluster, the /etc/hosts on each machine has to be updated with the details of all cluster nodes).
2) Check again that there's no route to the database server because it's being blocked by a firewall. Make sure you set a rule in the firewall to allow the Ruby-On-Rails application to connect.
3) Check the setting if you are not doing the connection locally ONLY>, in the postgresql.conf file:
Connection Settings - #listen_addresses = 'localhost' >>>> This should be = '*' instead of localhost
Save the conf and restart the service.
4) Check Tunnel issue :
Verify that attributes of the ssh process match what customers provided: Look at the output of ps aux | grep ssh, the relevant part is:
-L number_1:string_or_number_1:number_2 ... KnownHostsFile=/dev/null string_or_number_2
*some_number1: Looker side port number
*string_or_number_1: Database host number or IP address
*number_2: Database side port number
*string_or_number_2: Tunnel server host or IP address
How to manually set the port:
Note : It is not recommended since it is still possible that the looker set the port back again,instead. Recommend you to open needed ports on the DB.
Create a tunnel through the Looker database connections UI
Update tunnel via the API(PATCH /api/4.0/connections/:connection_name) after it's been created
Set the desired local_host_port
Make sure db_connection and the following fields are set by using API GET /api/4.0/ssh_tunnel/:ssh_tunnel_id or check from go-ssh-sidecar:
*tunnel_id: id of the tunnel
*port: new local_host_port
*host: localhost
Also check for Improper Tunnel migration:
The following types of issues may be related to SSH Tunnels on newly migrated K8s
1.Tunnel is not migrated, or tunnels are partially migrated
2.Tunnel is migrated with incorrect information
3.IPs have changed, and are not documented or announced to customers
4.Public keys have changed (specific to migrated tunnels)

Accessing an SMTP server when istio is enabled

getting error curl: (56) response reading failed while trying to send email via smtp using curl. checked the isto-proxy logs of sidecar but don't see any error logs related to this host. Tried the solution mentioned in How to access external SMTP server from within Kubernetes cluster with Istio Service Mesh as well but didn't work.
service entry
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: smtp
spec:
addresses:
- 192.168.8.45/32
hosts:
- smtp.example.com"
location: MESH_EXTERNAL
ports:
- name: tcp-smtp
number: 2255
protocol: TCP
Most probably port number is causing the error and if not, try deleting the mesh policies
Also please validate based on below points:
1.If you recently updated istio try downgrading it.
2.Look again in Sidecar logs for any conflicts or try disabling it.
3.When it comes to curl 56 error packet transmission; limit could be the problem.
The curl requests from primary container are routed via sidecar when istio is enabled, the response from smtp server is being masqueraded by sidecar and returned to primary container, which was quite misleading.
upon disabling Istio and trying to do curl on smtp port curl request failed with error Failed to connect to smtp.example.com port 2255: Operation timed out. which was because firewall from cluster to smtp server port was not open.
while istio was enabled the curl response didn't give timeout error but gave curl: (56) response reading failed which mislead me to think that the response was coming from smtp server.

Kubernetes(Istio) Mongodb enterprise cluster: HostUnreachable: Connection reset by peer

I have Istio1.6 running in my k8 cluster. In the cluster I have also deployed sharded mongodb cluster with istio-injection disabled.
And I have a different namespace for my app with istio-injection enabled. And from the pod if I try to connect to the mongo I get this connection reset by peer error:
root#mongo:/# mongo "mongodb://mongo-sharded-cluster-mongos-0.mongo-service.mongodb.svc.cluster.local:27017,mongo-sharded-cluster-mongos-1.mongo-service.mongodb.svc.cluster.local:27017/?ssl=false"
MongoDB shell version v4.2.8
connecting to: mongodb://mongo-sharded-cluster-mongos-0.mongo-service.mongodb.svc.cluster.local:27017,mongo-sharded-cluster-mongos-1.mongo-service.mongodb.svc.cluster.local:27017/?compressors=disabled&gssapiServiceName=mongodb&ssl=false
2020-06-18T19:59:14.342+0000 I NETWORK [js] DBClientConnection failed to receive message from mongo-sharded-cluster-mongos-0.mongo-service.mongodb.svc.cluster.local:27017 - HostUnreachable: Connection reset by peer
2020-06-18T19:59:14.358+0000 I NETWORK [js] DBClientConnection failed to receive message from mongo-sharded-cluster-mongos-1.mongo-service.mongodb.svc.cluster.local:27017 - HostUnreachable: Connection reset by peer
2020-06-18T19:59:14.358+0000 E QUERY [js] Error: network error while attempting to run command 'isMaster' on host 'mongo-sharded-cluster-mongos-1.mongo-service.mongodb.svc.cluster.local:27017' :
connect#src/mongo/shell/mongo.js:341:17
#(connect):2:6
2020-06-18T19:59:14.362+0000 F - [main] exception: connect failed
2020-06-18T19:59:14.362+0000 E - [main] exiting with code 1
But if I disable the istio-injection to my app(pod) then I can successfully connect and use mongo as expected.
Is there a work around for this, I would like to have istio-proxy injected to my app/pod and use mongodb?
Injecting Databases with istio is complicated.
I would start with checking your mtls, if it´s STRICT, I would change it to permissive and check if it works. It´s well described here.
You see requests still succeed, except for those from the client that doesn’t have proxy, sleep.legacy, to the server with a proxy, httpbin.foo or httpbin.bar. This is expected because mutual TLS is now strictly required, but the workload without sidecar cannot comply.
Is there a work around for this, I would like to have istio-proxy injected to my app/pod and use mongodb?
If changing mtls won´t work, then in istio You can set up database without injecting and then add it to istio registry using ServiceEntry object so it would be able to communicate with the rest of istio services.
To add your mongodb database to istio you can use ServiceEntry.
ServiceEntry enables adding additional entries into Istio’s internal service registry, so that auto-discovered services in the mesh can access/route to these manually specified services. A service entry describes the properties of a service (DNS name, VIPs, ports, protocols, endpoints). These services could be external to the mesh (e.g., web APIs) or mesh-internal services that are not part of the platform’s service registry (e.g., a set of VMs talking to services in Kubernetes). In addition, the endpoints of a service entry can also be dynamically selected by using the workloadSelector field. These endpoints can be VM workloads declared using the WorkloadEntry object or Kubernetes pods. The ability to select both pods and VMs under a single service allows for migration of services from VMs to Kubernetes without having to change the existing DNS names associated with the services.
Example of ServiceEntry
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: external-svc-mongocluster
spec:
hosts:
- mymongodb.somedomain # not used
addresses:
- 192.192.192.192/24 # VIPs
ports:
- number: 27018
name: mongodb
protocol: MONGO
location: MESH_INTERNAL
resolution: STATIC
endpoints:
- address: 2.2.2.2
- address: 3.3.3.3
If You have mtls enabled You will also need DestinationRule that will define how to communicate with the external service.
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: mtls-mongocluster
spec:
host: mymongodb.somedomain
trafficPolicy:
tls:
mode: MUTUAL
clientCertificate: /etc/certs/myclientcert.pem
privateKey: /etc/certs/client_private_key.pem
caCertificates: /etc/certs/rootcacerts.pem
Additionally take a look at this documentation
https://istiobyexample.dev/databases/
https://istio.io/latest/blog/2018/egress-mongo/

Istio enabled GKE cluster not reliably communicating with Google Service Infrastructure APIs

I have been unable to reliably allow my istio enabled Google Kubernetes Engine cluster to connect to Google Cloud Endpoints (service management API) via the extensible service proxy. When I deploy my Pods the proxy will always fail to startup causing the Pod to be restarted, and output the following error:
INFO:Fetching an access token from the metadata service
WARNING:Retrying (Retry(total=0, connect=None, read=None, redirect=0, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fea4abece90>: Failed to establish a new connection: [Errno 111] Connection refused',)': /computeMetadata/v1/instance/service-accounts/default/token
ERROR:Failed fetching metadata attribute: http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token
However after restarting, the proxy reports everything is fine, it was able to grab an access token and I am able to make requests to the Pod successfully:
INFO:Fetching an access token from the metadata service
INFO:Fetching the service config ID from the rollouts service
INFO:Fetching the service configuration from the service management service
INFO:Attribute zone: europe-west2-a
INFO:Attribute project_id: my-project
INFO:Attribute kube_env: KUBE_ENV
nginx: [warn] Using trusted CA certificates file: /etc/nginx/trusted-ca-certificates.crt
10.154.0.5 - - [23/May/2020:21:19:36 +0000] "GET /domains HTTP/1.1" 200 221 "-" "curl/7.58.0"
After about an hour, presumably because the access token has expired, the proxy logs indicate that it was again unable to fetch an access token and I can no longer make requests to my Pod.
2020/05/23 22:14:04 [error] 9#9: upstream timed out (110: Connection timed out)
2020/05/23 22:14:04[error]9#9: Failed to fetch service account token
2020/05/23 22:14:04[error]9#9: Fetch access token unexpected status: INTERNAL: Failed to fetch service account token
I have in place a ServiceEntry resource that should be allowing the proxy to make requests to the metadata server on the GKE node:
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: google-metadata-server
spec:
hosts:
- metadata.google.internal # GCE metadata server
addresses:
- 169.254.169.254 # GCE metadata server
location: MESH_EXTERNAL
ports:
- name: http
number: 80
protocol: HTTP
- name: https
number: 443
protocol: HTTPS
I have confirmed this is working by execing into one of the containers and running:
curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token
How can I prevent this behaviour and reliably have the proxy communicate with the Google Service Infrastructure APIs?
Although I am not entirely convinced this is the solution it appears that using a dedicated service account to generate access tokens within the extensible service proxy container prevents the behaviour reported above from happening, and I am able to reliably make requests to the proxy and upstream service, even after an hour.
The service account I am using has the following roles:
roles/cloudtrace.agent
roles/servicemanagement.serviceController
Assuming this is a stable solution to the problem I am much happier with this as an outcome because I am not 100% comfortable using the metadata server since it relies on the service account associated with the GKE node. This service account is often more powerful that it needs to be for ESP to do its job.
I will however be continuing to monitor this just in case the proxy upstream becomes unreachable again.

OpenShift route - Unable to connect to remote host: No route to host

I have deployed a grpc service running on OpenShift Origin. And this backed by a OpenShift service. And the service is exposed with an OpenShift route. I am trying to make this pod available via a service and route that maps the container port (50051) to outside world on port 8080.
The image that the service is trying to expose has, in its Dockerfile:
EXPOSE 50051
The route has the following:
Service Port: 8080/TCP
Target Port: 50051
In the DeploymentConfig I specify the port with:
ports:
- containerPort: 50051
protocol: TCP
However, when I try to access the application via the route and port, I get (from Java)
java.net.NoRouteToHostException: No route to host
And when I try to telnet the service IP:
telnet 172.30.197.247 8080
I am able to connect.
However, when I try to connect via the route it doesnt work:
telnet my.route.com 8080
Trying ...
telnet: connect to address : Connection refused
When I use:
curl -kv my-svc.myproject.svc.cluster.local:8080
I can connect.
So it seems the service is working but the route is not.
I have been going through the troubleshooting guide on https://docs.openshift.org/3.6/admin_guide/sdn_troubleshooting.html#debugging-the-router
The router setups in OpenShift focus on HTTP/HTTPS(SNI)/TLS(SNI). However it appears that you can use an externalIP to expose non-web application ports from the cluster. Because gRPC is an over the wire protocol, you might need to go this path.
There are multiple things to check :
Is you route point to your service ? Here is a example :
apiVersion: v1
kind: Route
spec:
host: my.route.com
to:
kind: Service
name: yourservice
weight: 100
If it's not the case, the route and the service are not connected.
You can check the router configuration. Connect to your router with oc rsh and check if you find your route name in the /var/lib/haproxy/conf/haproxy.config (the backend name format should be backend be_http_NAMESPACE_ROUTENAME). The server part below the backend part should contains the ip of your pod (you can obtain your pod ip with oc get pods -o wide command).
If it's not the case, the route is not registered in the router config. You can try to restart the router end recheck the haproxy.config file.
Can you connect to the pod ip from the router container ?