Istio mtls misconfiguration causes inconsistent behavior - kubernetes

I have deployed 2 istio enabled services on a GKE cluster.
istio version is 1.1.5 and GKE is on v1.15.9-gke.24
istio has been installed with global.mtls.enabled=true
serviceA communicates properly
serviceB apparently has TLS related issues.
I spin up a non-istio enabled deployment just for testing and exec into this test pod to curl these 2 service endpoints.
/ # curl -v serviceA
* Rebuilt URL to: serviceA/
* Trying 10.8.61.75...
* TCP_NODELAY set
* Connected to serviceA (10.8.61.75) port 80 (#0)
> GET / HTTP/1.1
> Host: serviceA
> User-Agent: curl/7.57.0
> Accept: */*
>
< HTTP/1.1 200 OK
< content-type: application/json
< content-length: 130
< server: istio-envoy
< date: Sat, 25 Apr 2020 09:45:32 GMT
< x-envoy-upstream-service-time: 2
< x-envoy-decorator-operation: serviceA.mynamespace.svc.cluster.local:80/*
<
{"application":"Flask-Docker Container"}
* Connection #0 to host serviceA left intact
/ # curl -v serviceB
* Rebuilt URL to: serviceB/
* Trying 10.8.58.228...
* TCP_NODELAY set
* Connected to serviceB (10.8.58.228) port 80 (#0)
> GET / HTTP/1.1
> Host: serviceB
> User-Agent: curl/7.57.0
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer
Execing into the envoy proxy of the problematic service and turning trace level logging on, I see this error
serviceB-758bc87dcf-jzjgj istio-proxy [2020-04-24 13:15:21.180][29][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:168] [C1484] handshake error: 1
serviceB-758bc87dcf-jzjgj istio-proxy [2020-04-24 13:15:21.180][29][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:201] [C1484] TLS error: 268435612:SSL routines:OPENSSL_internal:HTTP_REQUEST
The envoy sidecars of both containers, display similar information when debugging their certificates.
I verify this by execing into both istio containers, cd-ing to /etc/certs/..data and running
openssl x509 -in root-cert.pem -noout -text
The two root-cert.pem are identical!
Since those 2 istio proxies have exactly the same tls configuration in terms of certs, why this cryptic SSL error on serviceB?
FWIW serviceB communicates with a non-istio enabled postgres service.
Could that be causing the issue?
curling the container of serviceB from within itself however, returns a healthy response.

Related

LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to localhost:443

I was following this doc (https://istio.io/latest/docs/tasks/traffic-management/ingress/secure-ingress/) to set up my Kubernetes on my Docker-Desktop and used istio ingress gateway. I deployed an echo test app, added virtual service that points to the test app endpoint at port 8081. Then I set the istio gateway to open port 443 with the following:
servers:
- hosts:
- some.random.host
port:
name: https
number: 443
protocol: HTTPS
tls:
mode: SIMPLE
credentialName: test-app-tls
where I also created a tls type secret with name test-app-tls using the certs and private key I generated.
(Just in case I forgot to mention something here, I tried with port 80 and http and everything works. Here is an example)
curl -i -H 'Host: some.random.host' 'localhost:80/host'
HTTP/1.1 200 OK
content-type: application/json
date: Tue, 02 Aug 2022 21:10:31 GMT
content-length: 148
x-envoy-upstream-service-time: 1
server: istio-envoy
{"name":"some-expected-response","address":"some-other-expected-response"}
Then I tried to curl my localhost to hit the test app in the cluster with the following command
curl -k -i -v -H 'Host: some.random.host' 'https://localhost:443/host'
it gave me this error
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/cert.pem
CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to localhost:443
* Closing connection 0
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to localhost:443
I also tried with https://127.0.0.1:443/host and still doesn't work.
I'm fairly new to setting up TLS for Kubernetes. Could anyone please help me with this?
Thank you very much!!

Waiting for HTTP-01 challenge propagation: failed to perform self check GET request - ISTIO

I get this error after waiting for a while ~1 min
Waiting for HTTP-01 challenge propagation: failed to perform self check GET request 'http://jenkins.xyz.in/.well-known/acme-challenge/AoV9UtBq1rwPLDXWjrq85G5Peg_Z6rLKSZyYL_Vfe4I': Get "http://jenkins.xyz.in/.well-known/acme-challenge/AoV9UtBq1rwPLDXWjrq85G5Peg_Z6rLKSZyYL_Vfe4I": dial tcp 103.66.96.201:80: connect: connection timed out
I am able to access this url in the browser from anywhere (internet)
curl -v http://jenkins.xyz.in/.well-known/acme-challenge/AoV9UtBq1rwPLDXWjrq85G5Peg_Z6rLKSZyYL_Vfe4I
* Trying 103.66.96.201:80...
* Connected to jenkins.xyz.in (103.66.96.201) port 80 (#0)
> GET /.well-known/acme-challenge/AoV9UtBq1rwPLDXWjrq85G5Peg_Z6rLKSZyYL_Vfe4I HTTP/1.1
> Host: jenkins.xyz.in
> User-Agent: curl/7.71.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< cache-control: no-cache, no-store, must-revalidate
< date: Wed, 13 Jan 2021 08:54:23 GMT
< content-length: 87
< content-type: text/plain; charset=utf-8
< x-envoy-upstream-service-time: 1
< server: istio-envoy
<
* Connection #0 to host jenkins.xyz.in left intact
AoV9UtBq1rwPLDXWjrq85G5Peg_Z6rLKSZyYL_VfT4I.EZvkP5Fpi6EYc_-tWTQgvaQxrrbSr2MEJkuXJaywatk
my setup is:
1. Istio Ingress load balancer running on node (192.168.14.118)
2. I am pointing my external IP and domain jenkins.xyz.in
to 192.168.14.118 through an another load balancer
request -> public IP -> load balancer -> 192.168.14.118
From outside it works fine.
but when I try this from node itself / from pod inside cluster I get :
$ curl -v http://jenkins.xyz.in/
* About to connect() to jenkins.xyz.in port 80 (#0)
* Trying 103.66.96.201...
I have read somewhere about hairpinning
Since my kubernetes node IP and the istio ingress loadbalacer external IPs are same, request might be looping.
EXTRA: I am running k8s on bare metal
is there any solution to get around this?
I found a work around.
As my node was not able to access the URL (loop),
I added another node to cluster and set Cert-Manager pods affinity to new node.
Cert-Manager was able to access the URL from new node. Although not a good solution, but worked for me.

Getting 404 on all outbound HTTP calls from pods inside istio mesh

I have istio v1.1.6 installed on Kubernetes v1.11 using Helm chart provided in istio repository with bunch of overrides including:
global:
outboundTrafficPolicy:
mode: ALLOW_ANY
pilot:
env:
PILOT_ENABLE_FALLTHROUGH_ROUTE: "1"
mixer:
enabled: true
galley:
enabled: true
security:
enabled: false
The problem is that I can't make any simple outbound HTTP request to a service running on port 80 (inside & outside of the mesh) from a pod which is inside istio mesh and has istio-proxy injected as sidecar. The response is always 404:
user#pod-12345-12345$ curl -v http://httpbin.org/headers
* Hostname was NOT found in DNS cache
* Trying 52.200.83.146...
* Connected to httpbin.org (52.200.83.146) port 80 (#0)
> GET /headers HTTP/1.1
> User-Agent: curl/7.38.0
> Host: httpbin.org
> Accept: */*
>
< HTTP/1.1 404 Not Found
< date: Wed, 15 May 2019 05:43:24 GMT
* Server envoy is not blacklisted
< server: envoy
< content-length: 0
<
* Connection #0 to host httpbin.org left intact
The response flag in the istio-proxy logs from envoy states that it can't find the proper route:
"GET / HTTP/1.1" 404 NR "-" 0 0 0 - "-" "curl/7.38.0" "238d0799-f83d-4e5e-94e7-79de4d14fa53" "httpbin.org" "-" - - 172.217.27.14:80 100.99.149.201:52892 -
NR: No route configured for a given request in addition to 404 response code.
Probably worths to add that:
Other outbound calls to any ports other than 80 works perfectly fine.
Checking proxy-status also shows nothing: all pods are SYNCED.
mTLS is disabled
The example above is a call to external service but calls to internal services (eg: curl another-service.svc.cluster.local/health) have the same issue.
I expecte calls to internal mesh services work out of the box, tho even I tried to define DestinationRoute and ServiceEntry and that did not help as well.
I don't really want to add traffic.sidecar.istio.io/excludeOutboundIPRanges: "0.0.0.0/0" annotation to deployment as according to docs:
this approach completely bypasses the sidecar, essentially disabling all of Istio’s features for the specified IPs
Any idea where else I can look or what is missing?
To me it looks like you have a short connection timeouts defined in your istio-proxy sidecar, please check here similar github issue of Envoy's project.
btw. as #Robert Panzer mentioned, sharing the whole dump of istio-proxy config would help much in investigating your particular case.

Kubernetes API Access from Pod

I'm trying to access the Kubernetes API in order to discover pods from within a deployed container. Although I'll do this programatically, right now, I'm just using cURL to check for issues.
I run this from a pod terminal:
curl -vvv -H "Authorization: Bearer $(</var/run/secrets/kubernetes.io/serviceaccount/token)" "https://kubernetes.default/api/v1/namespaces/$(</var/run/secrets/kubernetes.io/serviceaccount/namespace)/endpoints" --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
And I get a 403 result:
* About to connect() to kubernetes.default port 443 (#0)
* Trying 172.30.0.1...
* Connected to kubernetes.default (172.30.0.1) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
CApath: none
* NSS: client certificate not found (nickname not specified)
* SSL connection using TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
* Server certificate:
* subject: CN=10.0.75.2
* start date: Nov 23 16:55:27 2017 GMT
* expire date: Nov 23 16:55:28 2019 GMT
* common name: 10.0.75.2
* issuer: CN=openshift-signer#1511456125
> GET /api/v1/namespaces/myproject/endpoints HTTP/1.1 s/$(</var/run/secrets/kubernetes.io/serv
> User-Agent: curl/7.29.0
> Host: kubernetes.default
> Accept: */*> Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJteXByb2plY3QiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoiZGVmYXVsdC10b2tlbi00cXZidCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJkZWZhdWx0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiMjg3NzAzYjEtZDA4OC0xMWU3LTkzZjQtNmEyNGZhYWZjYzQxIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om15cHJvamVjdDpkZWZhdWx0In0.yl2HUhmxjrb4UqkAioq1TixWl_YqUPoxSvQPPSgl9Hzr97Hjm7icdL_mdptwEnOSErfzqSUBiMKJcIRdIa3Z7mfkgEk-f2H-M7TUU8GpXmD2Zex6Bcn_dq-Hsoed6W2PYpeFDoy98p5rSNTUL5MPMATOodeAulB0NG_zF01-8qTbLO_I6FRa3BCVXVMaZWBoZgwZ1acQbd4fJqDRsYmQMSi5P8a3nYgjBdifkQeTTb3S8Kmnszct41LoUlh9Xv29YVEyr1uQc5DSLAgQKj_NdSxkVq-MJP8z1PWV3OmHULNChocXr7RGKaNwlVpwpgNqsDAOqIyE1ozxlntIrotLBw
>
< HTTP/1.1 403 Forbidden
< Cache-Control: no-store
< Content-Type: application/json
< Date: Thu, 23 Nov 2017 22:18:01 GMT
< Content-Length: 282
<
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "User \"system:serviceaccount:myproject:default\" cannot list endpoints in project \"myproject\"",
"reason": "Forbidden",
"details": {
"kind": "endpoints"
},
"code": 403
}
* Connection #0 to host kubernetes.default left intact
I've tried to access a number of resources, like, endpoints, pods, etc. I've also omitted the namespace (as to access the whole cluster resources) to no avail.
I'm currently using OpenShift Origin, clean (just ran oc cluster up and deployed a test image to access the terminal in the web console).
It looks like you're on fully RBAC enabled cluster, and your default service account system:serviceaccount:myproject:default, as expected, is unauthorised. You should create and use dedicated service account for this pod and explicitly grant it access to what it needs to read.
https://kubernetes.io/docs/admin/authorization/rbac/
Pass an authorization token bearer within curl command. Without it, it's expected to be unauthorized.
More at: kubernetes documentation

Istio - Connection timeout when calling service-two from service-one (examples)

I'm following the tutorials to evaluate Istio as the service mesh for my K8s cluster, but for some reason I cannot make the simple example that uses a couple of services to work properly:
https://istio.io/docs/tasks/integrating-services-into-istio.html
If I try to call service-two from service-one, I get this error:
# kubectl exec -ti ${CLIENT} -- curl -v service-two:80
Defaulting container name to app.
Use 'kubectl describe pod/service-one-714088666-73fkp' to see all of the containers in this pod.
* Rebuilt URL to: service-two:80/
* Trying 10.102.51.89...
* connect to 10.102.51.89 port 80 failed: Connection refused
* Failed to connect to service-two port 80: Connection refused
* Closing connection 0
curl: (7) Failed to connect to service-two port 80: Connection refused
However, if I try to connect to service-two from another service in my cluster, even in a different namespace, then it works:
# kubectl exec -ti redis-4054078334-mj287 -n redis -- curl -v service-two.default:80
* Rebuilt URL to: service-two.default:80/
* Hostname was NOT found in DNS cache
* Trying 10.102.51.89...
* Connected to service-two.default (10.102.51.89) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: service-two.default
> Accept: */*
>
< HTTP/1.1 200 OK
* Server envoy is not blacklisted
< server: envoy
< date: Sat, 19 Aug 2017 14:43:01 GMT
< content-type: text/plain
< x-envoy-upstream-service-time: 2
< transfer-encoding: chunked
<
CLIENT VALUES:
client_address=127.0.0.1
command=GET
real path=/
query=nil
request_version=1.1
request_uri=http://service-two.default:8080/
SERVER VALUES:
server_version=nginx: 1.10.0 - lua: 10001
HEADERS RECEIVED:
accept=*/*
content-length=0
host=service-two.default
user-agent=curl/7.38.0
x-b3-sampled=1
x-b3-spanid=00000caf6e052e86
x-b3-traceid=00000caf6e052e86
x-envoy-expected-rq-timeout-ms=15000
x-forwarded-proto=http
x-ot-span-context=00000caf6e052e86;00000caf6e052e86;0000000000000000;cs
x-request-id=1290973c-7bca-95d2-8fa8-80917bb404ad
BODY:
* Connection #0 to host service-two.default left intact
-no body in request-
Any reason or explanation why I get this unexpected behaviour?
Thanks.
I figured out what happened: on service-one the init containers had not been properly completed, so it was not resolving correctly.