How to access pod by IP when using Istio - kubernetes

I need to use spring boot admin to manage my pods. And I found that spring boot admin failed to call pod actuator endpoints.
I think it may be caused by Istio. Istio might block direct IP address visit. The following are my tests:
If I call a page by POD IP address, I got the following error:
# curl -I http://10.40.133.41
HTTP/1.1 503 Service Unavailable
content-length: 95
content-type: text/plain
date: Thu, 03 Nov 2022 03:10:54 GMT
server: envoy
If I call the same page by service name, it's fine:
# curl -I http://httpbin
HTTP/1.1 200 OK
server: envoy
date: Thu, 03 Nov 2022 03:09:02 GMT
content-type: text/html; charset=utf-8
content-length: 9593
access-control-allow-origin: *
access-control-allow-credentials: true
x-envoy-upstream-service-time: 3
Does anyone know how to enable a pod to access another pod by IP address directly when using Istio? Or is there any other solution to using spring boot admin when using Istio?
I tried to add service name as the Host request header when calling a web page. But it seems that Istio would find a pod by service name instead of using the IP address.
The following is an example that I used a wrong IP, and the system returns success status:
# curl -I -H "Host: httpbin" http://100.40.133.41
HTTP/1.1 200 OK
server: envoy
date: Thu, 03 Nov 2022 03:39:00 GMT
content-type: text/html; charset=utf-8
content-length: 9593
access-control-allow-origin: *
access-control-allow-credentials: true
x-envoy-upstream-service-time: 2

Related

AWS App Mesh request timeout for file stream which takes longer than 30 seconds

My network setup on AWS looks like the following:
ECS Fargate services with App Mesh, Envoy Proxy and ELB.
Everything is working fine, except when a (download) request is took longer than 30 seconds. One of our service creates a zip file on request and sends a download link to the user. If the zip is small, everything is working fine, the user can download it successfully. If the zip is bigger and the download takes more than 30 seconds, it will fail.
The bug has been tracked down to App Mesh - Virtual Node Listener timeouts.
Timeouts was on default settings (empty/unset) and the 30 sec bug happened.
When Request timeout was set to a big enough number, the download was successful, but a fixed timeout, like 600s still had the chance to produce the same bug for really big files.
When Request timeout was set to 0s (expected to this will work as "unlimited"), bigger downloads was successful too, but not sure if it is a right thing to do or not.
My question is:
App mesh Listener with 0s Request timeout is a good practice or it will produce different problems what I'm not aware?
If it is a bad practice, how can I force App Mesh, to don't kill my file stream after 30 seconds?
Example response header for the file download:
HTTP/2 200 OK
date: Wed, 05 Oct 2022 09:06:45 GMT
content-type: application/octet-stream
content-length: 17325639
content-disposition: attachment; filename="a08c94a3-068e-486f-92c7-371d00984ddc.zip"
expires: Wed, 05 Oct 2022 09:07:45 GMT
cache-control: private, max-age=60
last-modified: Wed, 05 Oct 2022 07:11:28 GMT
access-control-allow-headers: Cache-Control, X-CSRF-Token, X-Requested-With
access-control-allow-origin: *
server: envoy
x-envoy-upstream-service-time: 55
X-Firefox-Spdy: h2
The following header is set by the server but removed possibly by envoy:
connection: keep-alive
30 secs is the default timeout, it is possible to define higher timeouts values for the listeners.
An example of a VirtualNode using 120 secs as timeout for the listener:
apiVersion: appmesh.k8s.aws/v1beta2
kind: VirtualNode
metadata:
name: myapp
namespace: myns
spec:
podSelector:
matchLabels:
app: myapp
listeners:
- portMapping:
port: 80
protocol: http
timeout:
http:
perRequest:
value: 120
unit: s
backends:
- virtualService:
virtualServiceRef:
name: myapp
serviceDiscovery:
dns:
hostname: myapp.myns.svc.cluster.local

Waiting for HTTP-01 challenge propagation: failed to perform self check GET request - ISTIO

I get this error after waiting for a while ~1 min
Waiting for HTTP-01 challenge propagation: failed to perform self check GET request 'http://jenkins.xyz.in/.well-known/acme-challenge/AoV9UtBq1rwPLDXWjrq85G5Peg_Z6rLKSZyYL_Vfe4I': Get "http://jenkins.xyz.in/.well-known/acme-challenge/AoV9UtBq1rwPLDXWjrq85G5Peg_Z6rLKSZyYL_Vfe4I": dial tcp 103.66.96.201:80: connect: connection timed out
I am able to access this url in the browser from anywhere (internet)
curl -v http://jenkins.xyz.in/.well-known/acme-challenge/AoV9UtBq1rwPLDXWjrq85G5Peg_Z6rLKSZyYL_Vfe4I
* Trying 103.66.96.201:80...
* Connected to jenkins.xyz.in (103.66.96.201) port 80 (#0)
> GET /.well-known/acme-challenge/AoV9UtBq1rwPLDXWjrq85G5Peg_Z6rLKSZyYL_Vfe4I HTTP/1.1
> Host: jenkins.xyz.in
> User-Agent: curl/7.71.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< cache-control: no-cache, no-store, must-revalidate
< date: Wed, 13 Jan 2021 08:54:23 GMT
< content-length: 87
< content-type: text/plain; charset=utf-8
< x-envoy-upstream-service-time: 1
< server: istio-envoy
<
* Connection #0 to host jenkins.xyz.in left intact
AoV9UtBq1rwPLDXWjrq85G5Peg_Z6rLKSZyYL_VfT4I.EZvkP5Fpi6EYc_-tWTQgvaQxrrbSr2MEJkuXJaywatk
my setup is:
1. Istio Ingress load balancer running on node (192.168.14.118)
2. I am pointing my external IP and domain jenkins.xyz.in
to 192.168.14.118 through an another load balancer
request -> public IP -> load balancer -> 192.168.14.118
From outside it works fine.
but when I try this from node itself / from pod inside cluster I get :
$ curl -v http://jenkins.xyz.in/
* About to connect() to jenkins.xyz.in port 80 (#0)
* Trying 103.66.96.201...
I have read somewhere about hairpinning
Since my kubernetes node IP and the istio ingress loadbalacer external IPs are same, request might be looping.
EXTRA: I am running k8s on bare metal
is there any solution to get around this?
I found a work around.
As my node was not able to access the URL (loop),
I added another node to cluster and set Cert-Manager pods affinity to new node.
Cert-Manager was able to access the URL from new node. Although not a good solution, but worked for me.

Istio mtls misconfiguration causes inconsistent behavior

I have deployed 2 istio enabled services on a GKE cluster.
istio version is 1.1.5 and GKE is on v1.15.9-gke.24
istio has been installed with global.mtls.enabled=true
serviceA communicates properly
serviceB apparently has TLS related issues.
I spin up a non-istio enabled deployment just for testing and exec into this test pod to curl these 2 service endpoints.
/ # curl -v serviceA
* Rebuilt URL to: serviceA/
* Trying 10.8.61.75...
* TCP_NODELAY set
* Connected to serviceA (10.8.61.75) port 80 (#0)
> GET / HTTP/1.1
> Host: serviceA
> User-Agent: curl/7.57.0
> Accept: */*
>
< HTTP/1.1 200 OK
< content-type: application/json
< content-length: 130
< server: istio-envoy
< date: Sat, 25 Apr 2020 09:45:32 GMT
< x-envoy-upstream-service-time: 2
< x-envoy-decorator-operation: serviceA.mynamespace.svc.cluster.local:80/*
<
{"application":"Flask-Docker Container"}
* Connection #0 to host serviceA left intact
/ # curl -v serviceB
* Rebuilt URL to: serviceB/
* Trying 10.8.58.228...
* TCP_NODELAY set
* Connected to serviceB (10.8.58.228) port 80 (#0)
> GET / HTTP/1.1
> Host: serviceB
> User-Agent: curl/7.57.0
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer
Execing into the envoy proxy of the problematic service and turning trace level logging on, I see this error
serviceB-758bc87dcf-jzjgj istio-proxy [2020-04-24 13:15:21.180][29][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:168] [C1484] handshake error: 1
serviceB-758bc87dcf-jzjgj istio-proxy [2020-04-24 13:15:21.180][29][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:201] [C1484] TLS error: 268435612:SSL routines:OPENSSL_internal:HTTP_REQUEST
The envoy sidecars of both containers, display similar information when debugging their certificates.
I verify this by execing into both istio containers, cd-ing to /etc/certs/..data and running
openssl x509 -in root-cert.pem -noout -text
The two root-cert.pem are identical!
Since those 2 istio proxies have exactly the same tls configuration in terms of certs, why this cryptic SSL error on serviceB?
FWIW serviceB communicates with a non-istio enabled postgres service.
Could that be causing the issue?
curling the container of serviceB from within itself however, returns a healthy response.

Kubernetes API Access from Pod

I'm trying to access the Kubernetes API in order to discover pods from within a deployed container. Although I'll do this programatically, right now, I'm just using cURL to check for issues.
I run this from a pod terminal:
curl -vvv -H "Authorization: Bearer $(</var/run/secrets/kubernetes.io/serviceaccount/token)" "https://kubernetes.default/api/v1/namespaces/$(</var/run/secrets/kubernetes.io/serviceaccount/namespace)/endpoints" --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
And I get a 403 result:
* About to connect() to kubernetes.default port 443 (#0)
* Trying 172.30.0.1...
* Connected to kubernetes.default (172.30.0.1) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
CApath: none
* NSS: client certificate not found (nickname not specified)
* SSL connection using TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
* Server certificate:
* subject: CN=10.0.75.2
* start date: Nov 23 16:55:27 2017 GMT
* expire date: Nov 23 16:55:28 2019 GMT
* common name: 10.0.75.2
* issuer: CN=openshift-signer#1511456125
> GET /api/v1/namespaces/myproject/endpoints HTTP/1.1 s/$(</var/run/secrets/kubernetes.io/serv
> User-Agent: curl/7.29.0
> Host: kubernetes.default
> Accept: */*> Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJteXByb2plY3QiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoiZGVmYXVsdC10b2tlbi00cXZidCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJkZWZhdWx0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiMjg3NzAzYjEtZDA4OC0xMWU3LTkzZjQtNmEyNGZhYWZjYzQxIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om15cHJvamVjdDpkZWZhdWx0In0.yl2HUhmxjrb4UqkAioq1TixWl_YqUPoxSvQPPSgl9Hzr97Hjm7icdL_mdptwEnOSErfzqSUBiMKJcIRdIa3Z7mfkgEk-f2H-M7TUU8GpXmD2Zex6Bcn_dq-Hsoed6W2PYpeFDoy98p5rSNTUL5MPMATOodeAulB0NG_zF01-8qTbLO_I6FRa3BCVXVMaZWBoZgwZ1acQbd4fJqDRsYmQMSi5P8a3nYgjBdifkQeTTb3S8Kmnszct41LoUlh9Xv29YVEyr1uQc5DSLAgQKj_NdSxkVq-MJP8z1PWV3OmHULNChocXr7RGKaNwlVpwpgNqsDAOqIyE1ozxlntIrotLBw
>
< HTTP/1.1 403 Forbidden
< Cache-Control: no-store
< Content-Type: application/json
< Date: Thu, 23 Nov 2017 22:18:01 GMT
< Content-Length: 282
<
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "User \"system:serviceaccount:myproject:default\" cannot list endpoints in project \"myproject\"",
"reason": "Forbidden",
"details": {
"kind": "endpoints"
},
"code": 403
}
* Connection #0 to host kubernetes.default left intact
I've tried to access a number of resources, like, endpoints, pods, etc. I've also omitted the namespace (as to access the whole cluster resources) to no avail.
I'm currently using OpenShift Origin, clean (just ran oc cluster up and deployed a test image to access the terminal in the web console).
It looks like you're on fully RBAC enabled cluster, and your default service account system:serviceaccount:myproject:default, as expected, is unauthorised. You should create and use dedicated service account for this pod and explicitly grant it access to what it needs to read.
https://kubernetes.io/docs/admin/authorization/rbac/
Pass an authorization token bearer within curl command. Without it, it's expected to be unauthorized.
More at: kubernetes documentation

Istio - Connection timeout when calling service-two from service-one (examples)

I'm following the tutorials to evaluate Istio as the service mesh for my K8s cluster, but for some reason I cannot make the simple example that uses a couple of services to work properly:
https://istio.io/docs/tasks/integrating-services-into-istio.html
If I try to call service-two from service-one, I get this error:
# kubectl exec -ti ${CLIENT} -- curl -v service-two:80
Defaulting container name to app.
Use 'kubectl describe pod/service-one-714088666-73fkp' to see all of the containers in this pod.
* Rebuilt URL to: service-two:80/
* Trying 10.102.51.89...
* connect to 10.102.51.89 port 80 failed: Connection refused
* Failed to connect to service-two port 80: Connection refused
* Closing connection 0
curl: (7) Failed to connect to service-two port 80: Connection refused
However, if I try to connect to service-two from another service in my cluster, even in a different namespace, then it works:
# kubectl exec -ti redis-4054078334-mj287 -n redis -- curl -v service-two.default:80
* Rebuilt URL to: service-two.default:80/
* Hostname was NOT found in DNS cache
* Trying 10.102.51.89...
* Connected to service-two.default (10.102.51.89) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: service-two.default
> Accept: */*
>
< HTTP/1.1 200 OK
* Server envoy is not blacklisted
< server: envoy
< date: Sat, 19 Aug 2017 14:43:01 GMT
< content-type: text/plain
< x-envoy-upstream-service-time: 2
< transfer-encoding: chunked
<
CLIENT VALUES:
client_address=127.0.0.1
command=GET
real path=/
query=nil
request_version=1.1
request_uri=http://service-two.default:8080/
SERVER VALUES:
server_version=nginx: 1.10.0 - lua: 10001
HEADERS RECEIVED:
accept=*/*
content-length=0
host=service-two.default
user-agent=curl/7.38.0
x-b3-sampled=1
x-b3-spanid=00000caf6e052e86
x-b3-traceid=00000caf6e052e86
x-envoy-expected-rq-timeout-ms=15000
x-forwarded-proto=http
x-ot-span-context=00000caf6e052e86;00000caf6e052e86;0000000000000000;cs
x-request-id=1290973c-7bca-95d2-8fa8-80917bb404ad
BODY:
* Connection #0 to host service-two.default left intact
-no body in request-
Any reason or explanation why I get this unexpected behaviour?
Thanks.
I figured out what happened: on service-one the init containers had not been properly completed, so it was not resolving correctly.