Istio enabled GKE cluster not reliably communicating with Google Service Infrastructure APIs - kubernetes

I have been unable to reliably allow my istio enabled Google Kubernetes Engine cluster to connect to Google Cloud Endpoints (service management API) via the extensible service proxy. When I deploy my Pods the proxy will always fail to startup causing the Pod to be restarted, and output the following error:
INFO:Fetching an access token from the metadata service
WARNING:Retrying (Retry(total=0, connect=None, read=None, redirect=0, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fea4abece90>: Failed to establish a new connection: [Errno 111] Connection refused',)': /computeMetadata/v1/instance/service-accounts/default/token
ERROR:Failed fetching metadata attribute: http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token
However after restarting, the proxy reports everything is fine, it was able to grab an access token and I am able to make requests to the Pod successfully:
INFO:Fetching an access token from the metadata service
INFO:Fetching the service config ID from the rollouts service
INFO:Fetching the service configuration from the service management service
INFO:Attribute zone: europe-west2-a
INFO:Attribute project_id: my-project
INFO:Attribute kube_env: KUBE_ENV
nginx: [warn] Using trusted CA certificates file: /etc/nginx/trusted-ca-certificates.crt
10.154.0.5 - - [23/May/2020:21:19:36 +0000] "GET /domains HTTP/1.1" 200 221 "-" "curl/7.58.0"
After about an hour, presumably because the access token has expired, the proxy logs indicate that it was again unable to fetch an access token and I can no longer make requests to my Pod.
2020/05/23 22:14:04 [error] 9#9: upstream timed out (110: Connection timed out)
2020/05/23 22:14:04[error]9#9: Failed to fetch service account token
2020/05/23 22:14:04[error]9#9: Fetch access token unexpected status: INTERNAL: Failed to fetch service account token
I have in place a ServiceEntry resource that should be allowing the proxy to make requests to the metadata server on the GKE node:
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: google-metadata-server
spec:
hosts:
- metadata.google.internal # GCE metadata server
addresses:
- 169.254.169.254 # GCE metadata server
location: MESH_EXTERNAL
ports:
- name: http
number: 80
protocol: HTTP
- name: https
number: 443
protocol: HTTPS
I have confirmed this is working by execing into one of the containers and running:
curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token
How can I prevent this behaviour and reliably have the proxy communicate with the Google Service Infrastructure APIs?

Although I am not entirely convinced this is the solution it appears that using a dedicated service account to generate access tokens within the extensible service proxy container prevents the behaviour reported above from happening, and I am able to reliably make requests to the proxy and upstream service, even after an hour.
The service account I am using has the following roles:
roles/cloudtrace.agent
roles/servicemanagement.serviceController
Assuming this is a stable solution to the problem I am much happier with this as an outcome because I am not 100% comfortable using the metadata server since it relies on the service account associated with the GKE node. This service account is often more powerful that it needs to be for ESP to do its job.
I will however be continuing to monitor this just in case the proxy upstream becomes unreachable again.

Related

viz extension crashloop with Request failed error unauthorized connection on server proxy-admin

I just tried to install Linkerd viz extension following official documentation but all the pods are in a crash loop.
linkerd viz install | kubectl apply -f -
Linkerd-getting-started
r proxy-admin
[ 29.797889s] INFO ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}: linkerd_app_inbound::policy::authorize::http: Request denied server=proxy-admin tls=None(NoClientHello) client=50.50.55.177:47068
[ 29.797910s] INFO ThreadId(02) daemon:admin{listen.addr=0.0.0.0:4191}:rescue{client.addr=50.50.55.177:47068}: linkerd_app_core::errors::respond: Request failed error=unauthorized connection on server proxy-admin
[ 29.817790s] INFO ThreadId(01) linkerd_proxy::signal: received SIGTERM, starting shutdown
The error appeared on Kubernetes cluster Server Version: v1.21.5-eks-bc4871b
The issue was the policy that come by default installation.
This authorizes unauthenticated requests from IPs in the clusterNetworks configuration. If the source IP (<public-ip-address-of-hel-k1>) is not in that list, these connections will be denied. To fix this, the authorization policy could be updated with the following:
spec:
client:
unauthenticated: true
networks:
- cidr: 0.0.0.0/0
The default policy missing the client part
networks:
- cidr: 0.0.0.0/0
To update the policy, get the server authorization
k get ServerAuthorization -n linkerd-viz
NAME SERVER
admin admin
grafana grafana
metrics-api metrics-api
proxy-admin proxy-admin
Now edit admin,grafana, proxy-admin and grafana and add the networks part.
k edit ServerAuthorization metrics-api
as after fixing this I was also getting errors for grafana which help me to fix by adding network part.
[ 32.278014s] INFO ThreadId(01) inbound:server{port=3000}:rescue{client.addr=50.50.53.140:44718}: linkerd_app_core::errors::respond: Request failed error=unauthorized connection on server grafana
[ 38.176927s] INFO ThreadId(01) inbound:server{port=3000}: linkerd_app_inbound::policy::authorize::http: Request denied server=grafana tls=None(NoClientHello) client=50.50.55.177:33170
All linkerd-viz pods in CrashLoopBackOff

Accessing an SMTP server when istio is enabled

getting error curl: (56) response reading failed while trying to send email via smtp using curl. checked the isto-proxy logs of sidecar but don't see any error logs related to this host. Tried the solution mentioned in How to access external SMTP server from within Kubernetes cluster with Istio Service Mesh as well but didn't work.
service entry
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: smtp
spec:
addresses:
- 192.168.8.45/32
hosts:
- smtp.example.com"
location: MESH_EXTERNAL
ports:
- name: tcp-smtp
number: 2255
protocol: TCP
Most probably port number is causing the error and if not, try deleting the mesh policies
Also please validate based on below points:
1.If you recently updated istio try downgrading it.
2.Look again in Sidecar logs for any conflicts or try disabling it.
3.When it comes to curl 56 error packet transmission; limit could be the problem.
The curl requests from primary container are routed via sidecar when istio is enabled, the response from smtp server is being masqueraded by sidecar and returned to primary container, which was quite misleading.
upon disabling Istio and trying to do curl on smtp port curl request failed with error Failed to connect to smtp.example.com port 2255: Operation timed out. which was because firewall from cluster to smtp server port was not open.
while istio was enabled the curl response didn't give timeout error but gave curl: (56) response reading failed which mislead me to think that the response was coming from smtp server.

Kubernetes(Istio) Mongodb enterprise cluster: HostUnreachable: Connection reset by peer

I have Istio1.6 running in my k8 cluster. In the cluster I have also deployed sharded mongodb cluster with istio-injection disabled.
And I have a different namespace for my app with istio-injection enabled. And from the pod if I try to connect to the mongo I get this connection reset by peer error:
root#mongo:/# mongo "mongodb://mongo-sharded-cluster-mongos-0.mongo-service.mongodb.svc.cluster.local:27017,mongo-sharded-cluster-mongos-1.mongo-service.mongodb.svc.cluster.local:27017/?ssl=false"
MongoDB shell version v4.2.8
connecting to: mongodb://mongo-sharded-cluster-mongos-0.mongo-service.mongodb.svc.cluster.local:27017,mongo-sharded-cluster-mongos-1.mongo-service.mongodb.svc.cluster.local:27017/?compressors=disabled&gssapiServiceName=mongodb&ssl=false
2020-06-18T19:59:14.342+0000 I NETWORK [js] DBClientConnection failed to receive message from mongo-sharded-cluster-mongos-0.mongo-service.mongodb.svc.cluster.local:27017 - HostUnreachable: Connection reset by peer
2020-06-18T19:59:14.358+0000 I NETWORK [js] DBClientConnection failed to receive message from mongo-sharded-cluster-mongos-1.mongo-service.mongodb.svc.cluster.local:27017 - HostUnreachable: Connection reset by peer
2020-06-18T19:59:14.358+0000 E QUERY [js] Error: network error while attempting to run command 'isMaster' on host 'mongo-sharded-cluster-mongos-1.mongo-service.mongodb.svc.cluster.local:27017' :
connect#src/mongo/shell/mongo.js:341:17
#(connect):2:6
2020-06-18T19:59:14.362+0000 F - [main] exception: connect failed
2020-06-18T19:59:14.362+0000 E - [main] exiting with code 1
But if I disable the istio-injection to my app(pod) then I can successfully connect and use mongo as expected.
Is there a work around for this, I would like to have istio-proxy injected to my app/pod and use mongodb?
Injecting Databases with istio is complicated.
I would start with checking your mtls, if it´s STRICT, I would change it to permissive and check if it works. It´s well described here.
You see requests still succeed, except for those from the client that doesn’t have proxy, sleep.legacy, to the server with a proxy, httpbin.foo or httpbin.bar. This is expected because mutual TLS is now strictly required, but the workload without sidecar cannot comply.
Is there a work around for this, I would like to have istio-proxy injected to my app/pod and use mongodb?
If changing mtls won´t work, then in istio You can set up database without injecting and then add it to istio registry using ServiceEntry object so it would be able to communicate with the rest of istio services.
To add your mongodb database to istio you can use ServiceEntry.
ServiceEntry enables adding additional entries into Istio’s internal service registry, so that auto-discovered services in the mesh can access/route to these manually specified services. A service entry describes the properties of a service (DNS name, VIPs, ports, protocols, endpoints). These services could be external to the mesh (e.g., web APIs) or mesh-internal services that are not part of the platform’s service registry (e.g., a set of VMs talking to services in Kubernetes). In addition, the endpoints of a service entry can also be dynamically selected by using the workloadSelector field. These endpoints can be VM workloads declared using the WorkloadEntry object or Kubernetes pods. The ability to select both pods and VMs under a single service allows for migration of services from VMs to Kubernetes without having to change the existing DNS names associated with the services.
Example of ServiceEntry
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: external-svc-mongocluster
spec:
hosts:
- mymongodb.somedomain # not used
addresses:
- 192.192.192.192/24 # VIPs
ports:
- number: 27018
name: mongodb
protocol: MONGO
location: MESH_INTERNAL
resolution: STATIC
endpoints:
- address: 2.2.2.2
- address: 3.3.3.3
If You have mtls enabled You will also need DestinationRule that will define how to communicate with the external service.
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: mtls-mongocluster
spec:
host: mymongodb.somedomain
trafficPolicy:
tls:
mode: MUTUAL
clientCertificate: /etc/certs/myclientcert.pem
privateKey: /etc/certs/client_private_key.pem
caCertificates: /etc/certs/rootcacerts.pem
Additionally take a look at this documentation
https://istiobyexample.dev/databases/
https://istio.io/latest/blog/2018/egress-mongo/

Kubernetes pods can not make https request after deploying istio service mesh

I am exploring the istio service mesh on my k8s cluster hosted on EKS(Amazon).
I tried deploying istio-1.2.2 on a new k8s cluster with the demo.yml file used for bookapp demonstration and most of the use cases I understand properly.
Then, I deployed istio using helm default profile(recommended for production) on my existing dev cluster with 100s of microservices running and what I noticed is my services can can call http endpoints but not able to call external secure endpoints(https://www.google.com, etc.)
I am getting :
curl: (35) error:1400410B:SSL routines:CONNECT_CR_SRVR_HELLO:wrong
version number
Though I am able to call external https endpoints from my testing cluster.
To verify, I check the egress policy and it is mode: ALLOW_ANY in both the clusters.
Now, I removed the the istio completely from my dev cluster and install the demo.yml to test but now this is also not working.
I try to relate my issue with this but didn't get any success.
https://discuss.istio.io/t/serviceentry-for-https-on-httpbin-org-resulting-in-connect-cr-srvr-hello-using-curl/2044
I don't understand what I am missing or what I am doing wrong.
Note: I am referring to this setup: https://istio.io/docs/setup/kubernetes/install/helm/
This is most likely a bug in Istio (see for example istio/istio#14520): if you have any Kubernetes Service object, anywhere in your cluster, that listens on port 443 but whose name starts with http (not https), it will break all outbound HTTPS connections.
The instance of this I've hit involves configuring an AWS load balancer to do TLS termination. The Kubernetes Service needs to expose port 443 to configure the load balancer, but it receives plain unencrypted HTTP.
apiVersion: v1
kind: Service
metadata:
name: breaks-istio
annotations:
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:...
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
spec:
selector: ...
ports:
- name: http-ssl # <<<< THIS NAME MATTERS
port: 443
targetPort: http
When I've experimented with this, changing that name: to either https or tcp-https seems to work. Those name prefixes are significant to Istio, but I haven't immediately found any functional difference between telling Istio the port is HTTPS (even though it doesn't actually serve TLS) vs. plain uninterpreted TCP.
You do need to search your cluster and find every Service that listens to port 443, and make sure the port name doesn't start with http-....

Kubernetes Cloud SQL sidecar connection timed out. How to check if credentials work?

I'm trying to setup a Cloud SQL Proxy Docker image for PostgreSQL as mentioned here.
I can get my app to connect to the proxy docker image but the proxy times out. I suspect it's my credentials or the port, so how do I debug and find out if it works?
This is what I have on my project
kubectl create secret generic cloudsql-instance-credentials --from-file=credentials.json=my-account-credentials.json
My deploy spec snippet:
spec:
containers:
- name: mara ...
- name: cloudsql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.11
command: ["/cloud_sql_proxy",
"-instances=<MY INSTANCE NAME>=tcp:5432",
"-credential_file=/secrets/cloudsql/credentials.json"]
volumeMounts:
- name: cloudsql-instance-credentials
mountPath: /secrets/cloudsql
readOnly: true
volumes:
- name: cloudsql-instance-credentials
secret:
secretName: cloudsql-instance-credentials
The logs of my cloudsql-proxy show a timeout:
2019/05/13 15:08:25 using credential file for authentication; email=646092572393-compute#developer.gserviceaccount.com
2019/05/13 15:08:25 Listening on 127.0.0.1:5432 for <MY INSTANCE NAME>
2019/05/13 15:08:25 Ready for new connections
2019/05/13 15:10:48 New connection for "<MY INSTANCE NAME>"
2019/05/13 15:10:58 couldn't connect to <MY INSTANCE NAME>: dial tcp <MY PRIVATE IP>:3307: getsockopt: connection timed out
Questions:
I specify 5432 as my port, but as you can see in the logs above,it's hitting 3307. Is that normal and if not, how do I specify 5432?
How do I check if it is a problem with my credentials? My credentials file is from my service account 123-compute#developer.gserviceaccount.com
and the service account shown when I go to my cloud sql console is p123-<somenumber>#gcp-sa-cloud-sql.iam.gserviceaccount.com. They don't seem the same? Does that make a difference?
If I make the Cloud SQL instance available on a public IP, it works.
I specify 5432 as my port, but as you can see in the logs above,it's
hitting 3307
The proxy listens locally on the port you specified (in this case 5432), and connects to your Cloud SQL instance via port 3307. This is expected and normal.
How do I check if it is a problem with my credentials?
The proxy returns an authorization error if the Cloud SQL instance doesn't exist, or if the service account doesn't have access. The connection timeout error means it failed to reach the Cloud SQL instance.
My credentials file is from my service account 123-compute#developer.gserviceaccount.com and the service account shown when I go to my cloud sql console is p123-#gcp-sa-cloud-sql.iam.gserviceaccount.com. They don't seem the same?
One is just the name of the file, the other is the name of the service account itself. The name of the file doesn't have to match the name of the service account. You can check the name and IAM roles of a service account on the Service Account page.
2019/05/13 15:10:58 couldn't connect to : dial tcp :3307: getsockopt: connection timed out
This error means that the proxy failed to establish a network connection to the instance (usually because a path from the current location doesn't exist). There are two common causes for this:
First, make sure there isn't a firewall or something blocking outbound connections on port 3307.
Second, since you are using Private IP, you need to make sure the resource you are running the proxy on meets the networking requirements.
Proxy listen port 3307. This is mentioned on documentation
Port 3307 is used by the Cloud SQL Auth proxy to connect to the Cloud SQL Auth proxy server. -- https://cloud.google.com/sql/docs/postgres/connect-admin-proxy#troubleshooting
You may need to create a firewall like the following:
Direction: Egress
Action on match: Allow
Destination filters : IP ranges 0.0.0.0/0
Protocols and ports : tcp:3307 & tcp:5432