istio-proxy closing long running TCP connection after 1 hour - kubernetes

TL;DR: How can we configure istio sidecar injection/istio-proxy/envoy-proxy/istio egressgateway to allow long living (>3 hours), possibly idle, TCP connections?
Some details:
We're trying to perform a database migration to PostgreSQL which is being triggered by one application which has Spring Boot + Flyway configured, this migration is expected to last ~3 hours.
Our application is deployed inside our kubernetes cluster, which has configured istio sidecar injection. After exactly one hour of running the migration, the connection is always getting closed.
We're sure it's istio-proxy closing the connection as we attempted the migration from a pod without istio sidecar injection and it was running for longer than one hour, however this is not an option going forward as this may imply some downtime in production which we can't consider.
We suspect this should be configurable in istio proxy setting the parameter idle_timeout - which was implemented here. However this isn't working, or we are not configuring it properly, we're trying to configure this during istio installation by adding --set gateways.istio-ingressgateway.env.ISTIO_META_IDLE_TIMEOUT=5s to our helm template.

If you use istio version higher than 1.7 you might try use envoy filter to make it work. There is answer and example on github provided by #ryant1986.
We ran into the same problem on 1.7, but we noticed that the ISTIO_META_IDLE_TIMEOUT setting was only getting picked up on the OUTBOUND side of things, not the INBOUND. By adding an additional filter that applied to the INBOUND side of the request, we were able to successfully increase the timeout (we used 24 hours)
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: listener-timeout-tcp
namespace: istio-system
spec:
configPatches:
- applyTo: NETWORK_FILTER
match:
context: SIDECAR_INBOUND
listener:
filterChain:
filter:
name: envoy.filters.network.tcp_proxy
patch:
operation: MERGE
value:
name: envoy.filters.network.tcp_proxy
typed_config:
'#type': type.googleapis.com/envoy.config.filter.network.tcp_proxy.v2.TcpProxy
idle_timeout: 24h
We also created a similar filter to apply to the passthrough cluster (so that timeouts still apply to external traffic that we don't have service entries for), since the config wasn't being picked up there either.

for ingress gateway, we use env.ISTIO_META_IDLE_TIMEOUT to set the idle-timeout for TCP or HTTP protocol.
for sidecar, you can use the similar envoyfilter (listener-timeout-tcp) to configure INBOUND direction or OUTBOUND direction.

Related

Is there a way to isolate deployments in Istio service mesh?

I am trying to learn about the microservice architecture and different microservices interacting with each other. I have written a simple microservice based web app and had a doubt regarding it. If a service has multiple versions running, load balancing is easily managed by the envoy siedcar in Istio. My question is that in case there is some vulnerability detected in one of the versions, is there a way to isolate the pod from receiving any more traffic.
We can manually do this with the help of a virtual service and the appropriate routing rule. But can it be dynamically performed based on some trigger event?
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: VirtualServiceName
spec:
hosts:
- SomeHost
http:
- route:
- destination:
host: SomeHost
subset: v1
weight: 0
- destination:
host: SomeHost
subset: v2
weight: 100
Any help is appreciated
According to istio documentation you can configure failover with LocalityLoadBalancerSetting.
If the goal of the operator is not to distribute load across zones and regions but rather to restrict the regionality of failover to meet other operational requirements an operator can set a ‘failover’ policy instead of a ‘distribute’ policy.
The following example sets up a locality failover policy for regions. Assume a service resides in zones within us-east, us-west & eu-west this example specifies that when endpoints within us-east become unhealthy traffic should failover to endpoints in any zone or sub-zone within eu-west and similarly us-west should failover to us-east.
failover:
- from: us-east
to: eu-west
- from: us-west
to: us-east
Failover requires outlier detection to be in place for it to work.
But it's rather for regions/zones not pods.
If it's about pods you could take a look at this istio documentation
While Istio failure recovery features improve the reliability and availability of services in the mesh, applications must handle the failure or errors and take appropriate fallback actions. For example, when all instances in a load balancing pool have failed, Envoy returns an HTTP 503 code. The application must implement any fallback logic needed to handle the HTTP 503 error code..
And take a look at this and this github issues.
During HTTP health checking Envoy will send an HTTP request to the upstream host. By default, it expects a 200 response if the host is healthy. Expected response codes are configurable. The upstream host can return 503 if it wants to immediately notify downstream hosts to no longer forward traffic to it.
I hope you find this useful.

GKE No load balancing between backends pod with session affinity(sticky sessions)

I have 2 backend application running on the same cluster on gke. Applications A and B. A has 1 pod and B has 2 pods. A is exposed to the outside world and receives IP address that he then sends to B via http requests in the header.
B has a Kubernetes service object that is configured like that.
apiVersion: v1
kind: Service
metadata:
name: svc-{{ .Values.component_name }}
namespace: {{ include "namespace" .}}
spec:
ports:
- port: 80
targetPort: {{.Values.app_port}}
protocol: TCP
selector:
app: pod-{{ .Values.component_name }}
type: ClusterIP
In that configuration, The http requests from A are equally balanced between the 2 pods of application B, but when I add sessionAffinity: ClientIP to the configuration, every http requests are sent to the same B pod even though I thought it should be a round robin type of interaction.
To be clear, I have the IP adress stored in the header X-Forwarded-For so the service should look at it to be sure to which B pod to send the request as the documentation says https://kubernetes.io/docs/concepts/services-networking/service/#ssl-support-on-aws
In my test I tried to create has much load has possible to one of the B pod to try to contact the second pod without any success. I made sure that I had different IPs in my headers and that it wasn't because some sort of proxy in my environment. The IPs were not previously used for test so it is not because of already existing stickiness.
I am stuck now because I don't know how to test it further and have been reading the doc and probably missing something. My guess was that sessionAffinity disable load balancing for ClusterIp type but this seems highly unlikely...
My questions are :
Is the comportment I am observing normal? What am I doing wrong?
This might help to understand if it is still unclear what I'm trying to say : https://stackoverflow.com/a/59109265/12298812
EDIT : I did test on the client upstream and I saw at least a little bit of the requests get to the second pod of B, but this load test was performed from the same IP for every request. So this time I should have seen only a pod get the traffic...
The behaviour suggests that x-forward-for header is not respected by cluster-ip service.
To be sure I would suggest to load test from upstream client service which consumes the above service and see what kind of behaviour you get. Chances are you will see the same incorrect behaviour there which will affect scaling your service.
That said, using session affinity for internal service is highly unusual as client IP addresses do not vary as much. Session affinity limits scaling ability of your application. Typically you use memcached or redis as session store which is likely to be more scalable than session affinity based solutions.

Does Kubernetes support green-blue deployment?

I would like to ask on the mechanism for stopping the pods in kubernetes.
I read https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods before ask the question.
Supposably we have a application with gracefully shutdown support
(for example we use simple http server on Go https://play.golang.org/p/5tmkPPMiSSt).
Server has two endpoints:
/fast, always send 200 http status code.
/slow, wait 10 seconds and send 200 http status code.
There is deployment/service resource with that configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app/name: test
template:
metadata:
labels:
app/name: test
spec:
terminationGracePeriodSeconds: 120
containers:
- name: service
image: host.org/images/grace:v0.1
livenessProbe:
httpGet:
path: /health
port: 10002
failureThreshold: 1
initialDelaySeconds: 1
readinessProbe:
httpGet:
path: /health
port: 10002
failureThreshold: 1
initialDelaySeconds: 1
---
apiVersion: v1
kind: Service
metadata:
name: test
spec:
type: NodePort
ports:
- name: http
port: 10002
targetPort: 10002
selector:
app/name: test
To make sure the pods deleted gracefully I conducted two test options.
First option (slow endpoint) flow:
Create deployment with replicas value equal 1.
Wait for pod readness.
Send request on /slow endpoint (curl http://ip-of-some-node:nodePort/slow) and delete pod (simultaneously, with 1 second out of sync).
Expected:
Pod must not end before http server completed my request.
Got:
Yes, http server process in 10 seconds and return response for me.
(if we pass --grace-period=1 option to kubectl, then curl will write - curl: (52) Empty reply from server)
Everything works as expected.
Second option (fast endpoint) flow:
Create deployment with replicas value equal 10.
Wait for pods readness.
Start wrk with "Connection: close" header.
Randomly delete one or two pods (kubectl delete pod/xxx).
Expected:
No socket errors.
Got:
$ wrk -d 2m --header "Connection: Close" http://ip-of-some-node:nodePort/fast
Running 2m test # http://ip-of-some-node:nodePort/fast
Thread Stats Avg Stdev Max +/- Stdev
Latency 122.35ms 177.30ms 1.98s 91.33%
Req/Sec 66.98 33.93 160.00 65.83%
15890 requests in 2.00m, 1.83MB read
Socket errors: connect 0, read 15, write 0, timeout 0
Requests/sec: 132.34
Transfer/sec: 15.64KB
15 socket errors on read, that is, some pods were disconnected from the service before all requests were processed (maybe).
The problem appears when a new deployment version is applied, scale down and rollout undo.
Questions:
What's reason of that behavior?
How to fix it?
Kubernetes version: v1.16.2
Edit 1.
The number of errors changes each time, but remains in the range of 10-20, when removing 2-5 pods in two minutes.
P.S. If we will not delete a pod, we don't got errors.
Does Kubernetes support green-blue deployment?
Yes, it does. You can read about it on Zero-downtime Deployment in Kubernetes with Jenkins,
A blue/green deployment is a change management strategy for releasing software code. Blue/green deployments, which may also be referred to as A/B deployments require two identical hardware environments that are configured exactly the same way. While one environment is active and serving end users, the other environment remains idle.
Container technology offers a stand-alone environment to run the desired service, which makes it super easy to create identical environments as required in the blue/green deployment. The loosely coupled Services - ReplicaSets, and the label/selector-based service routing in Kubernetes make it easy to switch between different backend environments.
I would also recommend reading Kubernetes Infrastructure Blue/Green deployments.
Here is a repository with examples from codefresh.io about blue green deployment.
This repository holds a bash script that allows you to perform blue/green deployments on a Kubernetes cluster. See also the respective blog post
Prerequisites
As a convention the script expects
The name of your deployment to be $APP_NAME-$VERSION
Your deployment should have a label that shows it version
Your service should point to the deployment by using a version selector, pointing to the corresponding label in the deployment
Notice that the new color deployment created by the script will follow the same conventions. This way each subsequent pipeline you run will work in the same manner.
You can see examples of the tags with the sample application:
service
deployment
You might be also interested in Canary deployment:
Another deployment strategy is using Canaries (a.k.a. incremental rollouts). With canaries, the new version of the application is gradually deployed to the Kubernetes cluster while getting a very small amount of live traffic (i.e. a subset of live users are connecting to the new version while the rest are still using the previous version).
...
The small subset of live traffic to the new version acts as an early warning for potential problems that might be present in the new code. As our confidence increases, more canaries are created and more users are now connecting to the updated version. In the end, all live traffic goes to canaries, and thus the canary version becomes the new “production version”.
EDIT
Questions:
What's reason of that behavior?
When new deployment is being applied old pods are being removed and new ones are being scheduled.
This is being done by Control Plan
For example, when you use the Kubernetes API to create a Deployment, you provide a new desired state for the system. The Kubernetes Control Plane records that object creation, and carries out your instructions by starting the required applications and scheduling them to cluster nodes–thus making the cluster’s actual state match the desired state.
You have only setup a readinessProbe, which tells your service if it should send traffic to the pod or not. This is not a good solution as like you can see in your example if you have 10 pods and remove one or two there is a gap and you receive socket error.
How to fix it?
You have to understand this is not broken so it doesn't need a fix.
This might be mitigated by implementing a check in your application to make sure it's sending request to working address or utilize other features like load balancing like ingress.
Also when you are updating deployment you can do checks before deleting the pod to check if it does have any traffic incoming/outgoing and roll the update to only not used pods.

Kubernetes pods can not make https request after deploying istio service mesh

I am exploring the istio service mesh on my k8s cluster hosted on EKS(Amazon).
I tried deploying istio-1.2.2 on a new k8s cluster with the demo.yml file used for bookapp demonstration and most of the use cases I understand properly.
Then, I deployed istio using helm default profile(recommended for production) on my existing dev cluster with 100s of microservices running and what I noticed is my services can can call http endpoints but not able to call external secure endpoints(https://www.google.com, etc.)
I am getting :
curl: (35) error:1400410B:SSL routines:CONNECT_CR_SRVR_HELLO:wrong
version number
Though I am able to call external https endpoints from my testing cluster.
To verify, I check the egress policy and it is mode: ALLOW_ANY in both the clusters.
Now, I removed the the istio completely from my dev cluster and install the demo.yml to test but now this is also not working.
I try to relate my issue with this but didn't get any success.
https://discuss.istio.io/t/serviceentry-for-https-on-httpbin-org-resulting-in-connect-cr-srvr-hello-using-curl/2044
I don't understand what I am missing or what I am doing wrong.
Note: I am referring to this setup: https://istio.io/docs/setup/kubernetes/install/helm/
This is most likely a bug in Istio (see for example istio/istio#14520): if you have any Kubernetes Service object, anywhere in your cluster, that listens on port 443 but whose name starts with http (not https), it will break all outbound HTTPS connections.
The instance of this I've hit involves configuring an AWS load balancer to do TLS termination. The Kubernetes Service needs to expose port 443 to configure the load balancer, but it receives plain unencrypted HTTP.
apiVersion: v1
kind: Service
metadata:
name: breaks-istio
annotations:
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:...
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
spec:
selector: ...
ports:
- name: http-ssl # <<<< THIS NAME MATTERS
port: 443
targetPort: http
When I've experimented with this, changing that name: to either https or tcp-https seems to work. Those name prefixes are significant to Istio, but I haven't immediately found any functional difference between telling Istio the port is HTTPS (even though it doesn't actually serve TLS) vs. plain uninterpreted TCP.
You do need to search your cluster and find every Service that listens to port 443, and make sure the port name doesn't start with http-....

HAProxy with Kubernetes in a DR setup

We have Kubernetes setup hosted on premises and are trying to allow clients outside of K8s to connect to services hosted in the K8s cluster.
In order to make this work using HA Proxy (which runs outside K8s), we have the HAProxy backend configuration as follows -
backend vault-backend
...
...
server k8s-worker-1 worker1:32200 check
server k8s-worker-2 worker2:32200 check
server k8s-worker-3 worker3:32200 check
Now, this solution works, but the worker names and the corresponding nodePorts are hard-coded in this config, which obviously is inconvenient as and when more workers are added (or removed/changed).
We came across the HAProxy Ingress Controller (https://www.haproxy.com/blog/haproxy_ingress_controller_for_kubernetes/) which sounds promising, but (we feel) effectively adds another HAProxy layer to the mix..and thus, adds another failure point.
Is there a better solution to implement this requirement?
Now, this solution works, but the worker names and the corresponding nodePorts are hard-coded in this config, which obviously is inconvenient as and when more workers are added (or removed/changed).
You can explicitly configure the NodePort for your Kubernetes Service so it doesn't pick a random port and you always use the same port on your external HAProxy:
apiVersion: v1
kind: Service
metadata:
name: <my-nodeport-service>
labels:
<my-label-key>: <my-label-value>
spec:
selector:
<my-selector-key>: <my-selector-value>
type: NodePort
ports:
- port: <service-port>
nodePort: 32200
We came across the HAProxy Ingress Controller (https://www.haproxy.com/blog/haproxy_ingress_controller_for_kubernetes/) which sounds promising, but (we feel) effectively adds another HAProxy layer to the mix..and thus, adds another failure point.
You could run the HAProxy ingress inside the cluster and remove the HAproxy outside the cluster, but this really depends on what type of service you are running. The Kubernetes Ingress is Layer 7 resource, for example. The DR here would be handled by having multiple replicas of your HAProxy ingress controller.