Getting 404 on all outbound HTTP calls from pods inside istio mesh - kubernetes

I have istio v1.1.6 installed on Kubernetes v1.11 using Helm chart provided in istio repository with bunch of overrides including:
global:
outboundTrafficPolicy:
mode: ALLOW_ANY
pilot:
env:
PILOT_ENABLE_FALLTHROUGH_ROUTE: "1"
mixer:
enabled: true
galley:
enabled: true
security:
enabled: false
The problem is that I can't make any simple outbound HTTP request to a service running on port 80 (inside & outside of the mesh) from a pod which is inside istio mesh and has istio-proxy injected as sidecar. The response is always 404:
user#pod-12345-12345$ curl -v http://httpbin.org/headers
* Hostname was NOT found in DNS cache
* Trying 52.200.83.146...
* Connected to httpbin.org (52.200.83.146) port 80 (#0)
> GET /headers HTTP/1.1
> User-Agent: curl/7.38.0
> Host: httpbin.org
> Accept: */*
>
< HTTP/1.1 404 Not Found
< date: Wed, 15 May 2019 05:43:24 GMT
* Server envoy is not blacklisted
< server: envoy
< content-length: 0
<
* Connection #0 to host httpbin.org left intact
The response flag in the istio-proxy logs from envoy states that it can't find the proper route:
"GET / HTTP/1.1" 404 NR "-" 0 0 0 - "-" "curl/7.38.0" "238d0799-f83d-4e5e-94e7-79de4d14fa53" "httpbin.org" "-" - - 172.217.27.14:80 100.99.149.201:52892 -
NR: No route configured for a given request in addition to 404 response code.
Probably worths to add that:
Other outbound calls to any ports other than 80 works perfectly fine.
Checking proxy-status also shows nothing: all pods are SYNCED.
mTLS is disabled
The example above is a call to external service but calls to internal services (eg: curl another-service.svc.cluster.local/health) have the same issue.
I expecte calls to internal mesh services work out of the box, tho even I tried to define DestinationRoute and ServiceEntry and that did not help as well.
I don't really want to add traffic.sidecar.istio.io/excludeOutboundIPRanges: "0.0.0.0/0" annotation to deployment as according to docs:
this approach completely bypasses the sidecar, essentially disabling all of Istio’s features for the specified IPs
Any idea where else I can look or what is missing?

To me it looks like you have a short connection timeouts defined in your istio-proxy sidecar, please check here similar github issue of Envoy's project.
btw. as #Robert Panzer mentioned, sharing the whole dump of istio-proxy config would help much in investigating your particular case.

Related

Can't connect to proxy from within the pod [ Connection reset by peer ]

I get "Connection reset by peer" every time I try to use proxy from the Kubernetes pod.
Here is the log when from the curl:
>>>> curl -x http://5.188.62.223:15624 -L http://google.com -vvv
* Trying 5.188.62.223:15624...
* Connected to 5.188.62.223 (5.188.62.223) port 15624 (#0)
> GET http://google.com/ HTTP/1.1
> Host: google.com
> User-Agent: curl/7.79.1
> Accept: */*
> Proxy-Connection: Keep-Alive
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer
Interesting fact, that I have no issues when I use same proxy on local computer, in docker and on remote host. Apparently smith within the cluster doesn't let me communicate with it.
Currently I use Azure hosted Kubernetes. But the same error happens on Digital Ocean as well.
I would be grateful for every leading clue of how I can bypass this restrictions, because Im out of ideas.
Server Info:
{
Major:"1",
Minor:"20",
GitVersion:"v1.20.7",
GitCommit:"ca90e422dfe1e209df2a7b07f3d75b92910432b5",
GitTreeState:"clean",
BuildDate:"2021-10-09T04:59:48Z",
GoVersion:"go1.15.12", Compiler:"gc",
Platform:"linux/amd64"
}
The yaml file I use in order to start the pod is just super basic. But originally I use airflow with Kubernetes executor, which actually spawn pretty similar basic pods:
apiVersion: v1
kind: Pod
metadata:
name: scrapeevent.test
spec:
affinity: {}
containers:
- command:
- /bin/sh
- -ec
- while :; do echo '.'; sleep 5 ; done
image: jaklimoff/mooncrops-opensea:latest
imagePullPolicy: Always
name: base
restartPolicy: Never

Waiting for HTTP-01 challenge propagation: failed to perform self check GET request - ISTIO

I get this error after waiting for a while ~1 min
Waiting for HTTP-01 challenge propagation: failed to perform self check GET request 'http://jenkins.xyz.in/.well-known/acme-challenge/AoV9UtBq1rwPLDXWjrq85G5Peg_Z6rLKSZyYL_Vfe4I': Get "http://jenkins.xyz.in/.well-known/acme-challenge/AoV9UtBq1rwPLDXWjrq85G5Peg_Z6rLKSZyYL_Vfe4I": dial tcp 103.66.96.201:80: connect: connection timed out
I am able to access this url in the browser from anywhere (internet)
curl -v http://jenkins.xyz.in/.well-known/acme-challenge/AoV9UtBq1rwPLDXWjrq85G5Peg_Z6rLKSZyYL_Vfe4I
* Trying 103.66.96.201:80...
* Connected to jenkins.xyz.in (103.66.96.201) port 80 (#0)
> GET /.well-known/acme-challenge/AoV9UtBq1rwPLDXWjrq85G5Peg_Z6rLKSZyYL_Vfe4I HTTP/1.1
> Host: jenkins.xyz.in
> User-Agent: curl/7.71.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< cache-control: no-cache, no-store, must-revalidate
< date: Wed, 13 Jan 2021 08:54:23 GMT
< content-length: 87
< content-type: text/plain; charset=utf-8
< x-envoy-upstream-service-time: 1
< server: istio-envoy
<
* Connection #0 to host jenkins.xyz.in left intact
AoV9UtBq1rwPLDXWjrq85G5Peg_Z6rLKSZyYL_VfT4I.EZvkP5Fpi6EYc_-tWTQgvaQxrrbSr2MEJkuXJaywatk
my setup is:
1. Istio Ingress load balancer running on node (192.168.14.118)
2. I am pointing my external IP and domain jenkins.xyz.in
to 192.168.14.118 through an another load balancer
request -> public IP -> load balancer -> 192.168.14.118
From outside it works fine.
but when I try this from node itself / from pod inside cluster I get :
$ curl -v http://jenkins.xyz.in/
* About to connect() to jenkins.xyz.in port 80 (#0)
* Trying 103.66.96.201...
I have read somewhere about hairpinning
Since my kubernetes node IP and the istio ingress loadbalacer external IPs are same, request might be looping.
EXTRA: I am running k8s on bare metal
is there any solution to get around this?
I found a work around.
As my node was not able to access the URL (loop),
I added another node to cluster and set Cert-Manager pods affinity to new node.
Cert-Manager was able to access the URL from new node. Although not a good solution, but worked for me.

Istio mtls misconfiguration causes inconsistent behavior

I have deployed 2 istio enabled services on a GKE cluster.
istio version is 1.1.5 and GKE is on v1.15.9-gke.24
istio has been installed with global.mtls.enabled=true
serviceA communicates properly
serviceB apparently has TLS related issues.
I spin up a non-istio enabled deployment just for testing and exec into this test pod to curl these 2 service endpoints.
/ # curl -v serviceA
* Rebuilt URL to: serviceA/
* Trying 10.8.61.75...
* TCP_NODELAY set
* Connected to serviceA (10.8.61.75) port 80 (#0)
> GET / HTTP/1.1
> Host: serviceA
> User-Agent: curl/7.57.0
> Accept: */*
>
< HTTP/1.1 200 OK
< content-type: application/json
< content-length: 130
< server: istio-envoy
< date: Sat, 25 Apr 2020 09:45:32 GMT
< x-envoy-upstream-service-time: 2
< x-envoy-decorator-operation: serviceA.mynamespace.svc.cluster.local:80/*
<
{"application":"Flask-Docker Container"}
* Connection #0 to host serviceA left intact
/ # curl -v serviceB
* Rebuilt URL to: serviceB/
* Trying 10.8.58.228...
* TCP_NODELAY set
* Connected to serviceB (10.8.58.228) port 80 (#0)
> GET / HTTP/1.1
> Host: serviceB
> User-Agent: curl/7.57.0
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer
Execing into the envoy proxy of the problematic service and turning trace level logging on, I see this error
serviceB-758bc87dcf-jzjgj istio-proxy [2020-04-24 13:15:21.180][29][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:168] [C1484] handshake error: 1
serviceB-758bc87dcf-jzjgj istio-proxy [2020-04-24 13:15:21.180][29][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:201] [C1484] TLS error: 268435612:SSL routines:OPENSSL_internal:HTTP_REQUEST
The envoy sidecars of both containers, display similar information when debugging their certificates.
I verify this by execing into both istio containers, cd-ing to /etc/certs/..data and running
openssl x509 -in root-cert.pem -noout -text
The two root-cert.pem are identical!
Since those 2 istio proxies have exactly the same tls configuration in terms of certs, why this cryptic SSL error on serviceB?
FWIW serviceB communicates with a non-istio enabled postgres service.
Could that be causing the issue?
curling the container of serviceB from within itself however, returns a healthy response.

Exposing virtual service with istio and mTLS globally enabled

I've this configuration on my service mesh:
mTLS globally enabled and meshpolicy default
simple-web deployment exposed as clusterip on port 8080
http gateway for port 80 and virtualservice routing on my service
Here the gw and vs yaml
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: http-gateway
spec:
selector:
istio: ingressgateway # Specify the ingressgateway created for us
servers:
- port:
number: 80 # Service port to watch
name: http-gateway
protocol: HTTP
hosts:
- "*"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: simple-web
spec:
gateways:
- http-gateway
hosts:
- '*'
http:
- match:
- uri:
prefix: /simple-web
rewrite:
uri: /
route:
- destination:
host: simple-web
port:
number: 8080
Both vs and gw are in the same namespace.
The deployment was created and exposed with these commands:
k create deployment --image=yeasy/simple-web:latest simple-web
k expose deployment simple-web --port=8080 --target-port=80 --name=simple-web
and with k get pods I receive this:
pod/simple-web-9ffc59b4b-n9f85 2/2 Running
What happens is that from outside, pointing to ingress-gateway load balancer I receive 503 HTTP error.
If I try to curl from ingressgateway pod I can reach the simple-web service.
Why I can't reach the website with mTLS enabled? What's the correct configuration?
As #suren mentioned in his answer this issue is not present in istio version 1.3.2 . So one of solutions is to use newer version.
If you chose to upgrade istio to newer version please review documentation 1.3 Upgrade Notice and Upgrade Steps as Istio is still in development and changes drastically with each version.
Also as mentioned in comments by #Manuel Castro this is most likely issue addressed in Avoid 503 errors while reconfiguring service routes and newer version simply handles them better.
Creating both the VirtualServices and DestinationRules that define the
corresponding subsets using a single kubectl call (e.g., kubectl apply
-f myVirtualServiceAndDestinationRule.yaml is not sufficient because the resources propagate (from the configuration server, i.e.,
Kubernetes API server) to the Pilot instances in an eventually
consistent manner. If the VirtualService using the subsets arrives
before the DestinationRule where the subsets are defined, the Envoy
configuration generated by Pilot would refer to non-existent upstream
pools. This results in HTTP 503 errors until all configuration objects
are available to Pilot.
It should be possible to avoid this issue by temporarily disabling mTLS or by using permissive mode during the deployment.
I just installed istio-1.3.2, and k8s 1.15.1, to reproduced your issue, and it worked without any modifications. This is what I did:
0.- create a namespace called istio and enable sidecar injection automatically.
1.- $ kubectl run nginx --image nginx -n istio
2.- $ kubectl expose deploy nginx --port 8080 --target-port 80 --name simple-web -n istio
3.- $kubectl craete -f gw.yaml -f vs.yaml
Note: these are your files.
The test:
$ curl a.b.c.d:31380/simple-web -I
HTTP/1.1 200 OK
server: istio-envoy
date: Fri, 11 Oct 2019 10:04:26 GMT
content-type: text/html
content-length: 612
last-modified: Tue, 24 Sep 2019 14:49:10 GMT
etag: "5d8a2ce6-264"
accept-ranges: bytes
x-envoy-upstream-service-time: 4
[2019-10-11T10:04:26.101Z] "HEAD /simple-web HTTP/1.1" 200 - "-" "-" 0 0 6 4 "10.132.0.36" "curl/7.52.1" "4bbc2609-a928-9f79-9ae8-d6a3e32217d7" "a.b.c.d:31380" "192.168.171.73:80" outbound|8080||simple-web.istio.svc.cluster.local - 192.168.171.86:80 10.132.0.36:37078 - -
And to be sure mTLS was enabled, this is from ingress-gateway describe command:
--controlPlaneAuthPolicy
MUTUAL_TLS
So, I don't know what is wrong, but you might want to go through these steps and discard things.
Note: the reason I am attacking istio gateway on port 31380 is because my k8s is on VMs right now, and I didn't want to spin up a GKE cluster for a test.
EDIT
Just deployed another deployment with your image, exposed it as simple-web-2, and worked again. May be I'm lucky with istio:
$ curl a.b.c.d:31380/simple-web -I
HTTP/1.1 200 OK
server: istio-envoy
date: Fri, 11 Oct 2019 10:28:45 GMT
content-type: text/html
content-length: 354
last-modified: Fri, 11 Oct 2019 10:28:46 GMT
x-envoy-upstream-service-time: 4
[2019-10-11T10:28:46.400Z] "HEAD /simple-web HTTP/1.1" 200 - "-" "-" 0 0 5 4 "10.132.0.36" "curl/7.52.1" "df0dd00a-875a-9ae6-bd48-acd8be1cc784" "a.b.c.d:31380" "192.168.171.65:80" outbound|8080||simple-web-2.istio.svc.cluster.local - 192.168.171.86:80 10.132.0.36:42980 - -
What's your k8s environment?
EDIT2
# istioctl authn tls-check curler-6885d9fd97-vzszs simple-web.istio.svc.cluster.local -n istio
HOST:PORT STATUS SERVER CLIENT AUTHN POLICY DESTINATION RULE
simple-web.istio.svc.cluster.local:8080 OK mTLS mTLS default/ default/istio-system

Istio - Connection timeout when calling service-two from service-one (examples)

I'm following the tutorials to evaluate Istio as the service mesh for my K8s cluster, but for some reason I cannot make the simple example that uses a couple of services to work properly:
https://istio.io/docs/tasks/integrating-services-into-istio.html
If I try to call service-two from service-one, I get this error:
# kubectl exec -ti ${CLIENT} -- curl -v service-two:80
Defaulting container name to app.
Use 'kubectl describe pod/service-one-714088666-73fkp' to see all of the containers in this pod.
* Rebuilt URL to: service-two:80/
* Trying 10.102.51.89...
* connect to 10.102.51.89 port 80 failed: Connection refused
* Failed to connect to service-two port 80: Connection refused
* Closing connection 0
curl: (7) Failed to connect to service-two port 80: Connection refused
However, if I try to connect to service-two from another service in my cluster, even in a different namespace, then it works:
# kubectl exec -ti redis-4054078334-mj287 -n redis -- curl -v service-two.default:80
* Rebuilt URL to: service-two.default:80/
* Hostname was NOT found in DNS cache
* Trying 10.102.51.89...
* Connected to service-two.default (10.102.51.89) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: service-two.default
> Accept: */*
>
< HTTP/1.1 200 OK
* Server envoy is not blacklisted
< server: envoy
< date: Sat, 19 Aug 2017 14:43:01 GMT
< content-type: text/plain
< x-envoy-upstream-service-time: 2
< transfer-encoding: chunked
<
CLIENT VALUES:
client_address=127.0.0.1
command=GET
real path=/
query=nil
request_version=1.1
request_uri=http://service-two.default:8080/
SERVER VALUES:
server_version=nginx: 1.10.0 - lua: 10001
HEADERS RECEIVED:
accept=*/*
content-length=0
host=service-two.default
user-agent=curl/7.38.0
x-b3-sampled=1
x-b3-spanid=00000caf6e052e86
x-b3-traceid=00000caf6e052e86
x-envoy-expected-rq-timeout-ms=15000
x-forwarded-proto=http
x-ot-span-context=00000caf6e052e86;00000caf6e052e86;0000000000000000;cs
x-request-id=1290973c-7bca-95d2-8fa8-80917bb404ad
BODY:
* Connection #0 to host service-two.default left intact
-no body in request-
Any reason or explanation why I get this unexpected behaviour?
Thanks.
I figured out what happened: on service-one the init containers had not been properly completed, so it was not resolving correctly.