Exposed Service and Replica Set Relation in Kubernetes - kubernetes

I have a question about how kubernetes decides the serving pod when there are several replicas of the pod.
For Instance, let's assume I have a web application running on a k8s cluster as multiple pod replicas and they are exposed by a service.
When a client sends a request it goes to service and kube-proxy. But where and when does kubernetes make a decision about which pod should serve the request?
I want to know the internals of kubernetes for this matter. Can we control this? Can we decide which pod should serve based on client requests and custom conditions?

can we decide which pod should serve based on client requests and custom conditions?
As kube-proxy works on L4 load balancing stuff thus you can control the session based on Client IP. it does not read the header of client request.
you can control the session with the following field service.spec.sessionAffinityConfig in service obejct
following command provide the explanation
kubectl explain service.spec.sessionAffinityConfig
Following paragraph and link provide detail answer.
Client-IP based session affinity can be selected by setting service.spec.sessionAffinity to “ClientIP” (the default is “None”), and you can set the max session sticky time by setting the field service.spec.sessionAffinityConfig.clientIP.timeoutSeconds if you have already set service.spec.sessionAffinity to “ClientIP” (the default is “10800”)-service-proxies
Service object would be like this
kind: Service
apiVersion: v1
metadata:
name: my-service
spec:
selector:
app: my-app
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10000

Kubernetes service creates a load balancer(and an endpoint for it) and will use round robin by default to distribute requests among pods.
You can alter this behaviour.
As Suresh said you can also use sessionAffinity to ensure that requests for a particular session value always go to the same pod.

Related

Avoiding Prometheus call all instances of k8s service (only one, app-wide metrics collection)

I need to expose application-wide metrics for Prometheus collection from a Kubernetes application that is deployed with multiple instances, e.g. scaled by Horizontal Pod Autoscaler.
The scrape point is exposed by every instance of the pod for fail-over purposes, however I do not want Prometheus to actually call the scrape endpoint on every pod's instance, only one instance at a time and failover to another instance only if necessary.
The statistics is application-wide, not per-pod instance, all instance endpoints report the same data, and calling them in parallel would serve no useful purpose and only increase a workload on the backend system that has to be queried for statistics. I do not want 30 calls to the backend (assuming the app is scaled up to 30 pods) where just one call would suffice.
I hoped that exposing the scrape endpoint as a k8s service (and annotating the service for scraping) should do the trick. However instead of going through the service proxy and let it route the request to one of the pods, Prometheus seems to be going directly to the instances behind the service, and to all of them, rather than only one at a time.
Is there a way to avoid Prometheus calling all the instances, and have it call only one?
The service is defined as:
apiVersion: v1
kind: Service
metadata:
name: k8worker-msvc
labels:
app: k8worker-msvc
annotations:
prometheus.io/scrape: 'true'
prometheus.io/path: '/metrics'
prometheus.io/port: '3110'
spec:
selector:
app: k8worker
type: LoadBalancer
ports:
- protocol: TCP
port: 3110
targetPort: 3110
In case this is not possible, what are my options other than running leader election inside the app and reporting empty metrics data from non-leader instances?
Thanks for advice.
This implies the metrics are coming from some kind of backend database rather than a usual in-process exporter. Move the metrics endpoint to a new service connected to the same DB and only run one copy of it.

GKE No load balancing between backends pod with session affinity(sticky sessions)

I have 2 backend application running on the same cluster on gke. Applications A and B. A has 1 pod and B has 2 pods. A is exposed to the outside world and receives IP address that he then sends to B via http requests in the header.
B has a Kubernetes service object that is configured like that.
apiVersion: v1
kind: Service
metadata:
name: svc-{{ .Values.component_name }}
namespace: {{ include "namespace" .}}
spec:
ports:
- port: 80
targetPort: {{.Values.app_port}}
protocol: TCP
selector:
app: pod-{{ .Values.component_name }}
type: ClusterIP
In that configuration, The http requests from A are equally balanced between the 2 pods of application B, but when I add sessionAffinity: ClientIP to the configuration, every http requests are sent to the same B pod even though I thought it should be a round robin type of interaction.
To be clear, I have the IP adress stored in the header X-Forwarded-For so the service should look at it to be sure to which B pod to send the request as the documentation says https://kubernetes.io/docs/concepts/services-networking/service/#ssl-support-on-aws
In my test I tried to create has much load has possible to one of the B pod to try to contact the second pod without any success. I made sure that I had different IPs in my headers and that it wasn't because some sort of proxy in my environment. The IPs were not previously used for test so it is not because of already existing stickiness.
I am stuck now because I don't know how to test it further and have been reading the doc and probably missing something. My guess was that sessionAffinity disable load balancing for ClusterIp type but this seems highly unlikely...
My questions are :
Is the comportment I am observing normal? What am I doing wrong?
This might help to understand if it is still unclear what I'm trying to say : https://stackoverflow.com/a/59109265/12298812
EDIT : I did test on the client upstream and I saw at least a little bit of the requests get to the second pod of B, but this load test was performed from the same IP for every request. So this time I should have seen only a pod get the traffic...
The behaviour suggests that x-forward-for header is not respected by cluster-ip service.
To be sure I would suggest to load test from upstream client service which consumes the above service and see what kind of behaviour you get. Chances are you will see the same incorrect behaviour there which will affect scaling your service.
That said, using session affinity for internal service is highly unusual as client IP addresses do not vary as much. Session affinity limits scaling ability of your application. Typically you use memcached or redis as session store which is likely to be more scalable than session affinity based solutions.

Session Affinity Settings for multiple Pods exposed by a single service

I have a setup Metallb as LB with Nginx Ingress installed on K8S cluster.
I have read about session affinity and its significance but so far I do not have a clear picture.
How can I create a single service exposing multiple pods of the same application?
After creating the single service entry point, how to map the specific client IP to Pod abstracted by the service?
Is there any blog explaining this concept in terms of how the mapping between Client IP and POD is done in kubernetes?
But I do not see Client's IP in the YAML. Then, How is this service going to map the traffic to respective clients to its endpoints? this is the question I have.
kind: Service
apiVersion: v1
metadata:
name: my-service
spec:
selector:
app: my-app
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10000
Main concept of Session Affinity is to redirect traffic from one client always to specific node. Please keep in mind that session affinity is a best-effort method and there are scenarios where it will fail due to pod restarts or network errors.
There are two main types of Session Affinity:
1) Based on Client IP
This option works well for scenario where there is only one client per IP. In this method you don't need Ingress/Proxy between K8s services and client.
Client IP should be static, because each time when client will change IP he will be redirected to another pod.
To enable the session affinity in kubernetes, we can add the following to the service definition.
service.spec.sessionAffinity: ClientIP
Because community provided proper manifest to use this method I will not duplicate.
2) Based on Cookies
It works when there are multiple clients from the same IP, because it´s stored at web browser level. This method require Ingress object. Steps to apply this method with more detailed information can be found here under Session affinity based on Cookie section.
Create NGINX controller deployment
Create NGINX service
Create Ingress
Redirect your public DNS name to the NGINX service public/external IP.
About mapping ClientIP and POD, according to Documentation
kube-proxy is responsible for SessionAffinity. One of Kube-Proxy job
is writing to IPtables, more details here so thats how it is
mapped.
Articles which might help with understanding Session Affinity:
https://sookocheff.com/post/kubernetes/building-stateful-services/
https://medium.com/#diegomrtnzg/redirect-your-users-to-the-same-pod-by-using-session-affinity-on-kubernetes-baebf6a1733b
follow the service reference for session affinity
kind: Service
apiVersion: v1
metadata:
name: my-service
spec:
selector:
app: my-app
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10000

Exlusive client affinity

I am aware that client affinity is possible for a LoadBalancer type service in Kubernetes. The thing is that this affinity doesn't forbid that two different clientes access the same pod.
Is it possible to associate a pod exclusively always to the same client?
Thanks in advance and have a really nice day!
To only allow a specific external client/s to access a specific Pod/Deployment you can use whitelisting/source ranges. Restrictions can be applied to LoadBalancers as loadBalancerSourceRanges. You add a section to the Service like:
loadBalancerSourceRanges:
- 130.211.204.1/32
- 130.211.204.2/32
But not all cloud providers currently support it.
Alternatively you could expose the Pod with an Ingress and apply whitelisting on the Ingress. For whitelisting with an nginx Ingress you can add annotation to the Ingress such as nginx.ingress.kubernetes.io/whitelist-source-range: 49.36.X.X/32
No, this would imply that you’re running one copy of the service for every client which is a very non standard way to do things so you’ll have to build it yourself.
Not exactly to a POD.
You can use session affinity based on Client IP, that is of course only if the Client IP is static and only one client per IP.
apiVersion: v1
kind: Service
metadata:
name: wlp-service
labels:
app: wlp-service
spec:
type: LoadBalancer
sessionAffinity: ClientIP
ports:
- port: 443
targetPort: 7443
name: https
- port: 80
targetPort: 7080
name: http
selector:
app: POD_NAME
Second option is session affinity based on Cookies. This will work if there are several clients from the same IP, as cookies are stored locally on Client computer.
You will need to use an Ingress object and generate cookies. Your Ingress deployment will need to have:
Annotations:
affinity: cookie
session-cookie-hash: sha1/md5/index #choose one
session-cookie-name: INGRESSCOOKIE #name used in cookie value
You can read more about those two way on Redirect your users to the same pod by using session affinity on Kubernetes by medium.com
If I'm not mistaken Session Affinity will work only if IPVS kernel modules are installed on the node before running kube-proxy.
Run kube-proxy in IPVS Mode
Currently, local-up scripts, GCE scripts, and kubeadm support switching IPVS proxy mode via exporting environment variables (KUBE_PROXY_MODE=ipvs) or specifying flag (--proxy-mode=ipvs). Before running IPVS proxier, please ensure IPVS required kernel modules are already installed.
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack_ipv4
Finally, for Kubernetes v1.10, feature gate SupportIPVSProxyMode is set to true by default. For Kubernetes v1.11, the feature gate is entirely removed. However, you need to enable --feature-gates=SupportIPVSProxyMode=true explicitly for Kubernetes before v1.10.
Please check this StackOverflow question Is it possible to route traffic to a specific Pod?, also you can read more about IPVS on IPVS-Based In-Cluster Load Balancing Deep Dive

Is it possible to route traffic to a specific Pod?

Say I am running my app in GKE, and this is a multi-tenant application.
I create multiple Pods that hosts my application.
Now I want:
Customers 1-1000 to use Pod1
Customers 1001-2000 to use Pod2
etc.
If I have a gcloud global IP that points to my cluster, is it possible to route a request based on the incoming ipaddress/domain to the correct Pod that contains the customers data?
You can guarantee session affinity with services, but not as you are describing. So, your customers 1-1000 won't use pod-1, but they will use all the pods (as a service makes a simple load balancing), but each customer, when gets back to hit your service, will be redirected to the same pod.
Note: always within time specified in (default 10800):
service.spec.sessionAffinityConfig.clientIP.timeoutSeconds
This would be the yaml file of the service:
kind: Service
apiVersion: v1
metadata:
name: my-service
spec:
selector:
app: my-app
ports:
- name: http
protocol: TCP
port: 80
targetPort: 80
sessionAffinity: ClientIP
If you want to specify time, as well, this is what needs to be added:
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10
Note that the example above would work hitting ClusterIP type service directly (which is quite uncommon) or with Loadbalancer type service, but won't with an Ingress behind NodePort type service. This is because with an Ingress, the requests come from many, randomly chosen source IP addresses.
Not with Pods by themselves, but you should be able to with Services.
Pods are intended to be stateless and indistinguishable from one another.
But you should be able to create a Deployment per customer group, and a Service per Deployment. The Ingress nginx should be able to be told to map incoming requests by whatever attributes are relevant to specific customer group Services.