Disable Kubernetes replica set load balancing - kubernetes

I have a Kubernetes cluster of 3 nodes.
A sample deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
I do not have ingress, but I do have external load balancer that round-robins the traffic at 80.11.12.10, 80.11.12.11, 80.11.12.12. So I set my service like this.
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
externalIPs:
- 80.11.12.10
- 80.11.12.11
- 80.11.12.12
The problem is that due to existing kubernetes service load balancer the traffic get load balancing twice. Aside that it is unnecessary it is spoils the connection persistence. Is there a way to force Kubernetes to forward traffic on local machine pod for each node?

If you set service.spec.externalTrafficPolicy to the value Local, kube-proxy only proxies proxy requests to local endpoints, and does not forward traffic to other nodes.
kubectl patch svc servicename -p '{"spec":{"externalTrafficPolicy":"Local"}}'
If there are no local endpoints, packets sent to the node are dropped.
For clusterIP type service you need to use Service Topology
Service Topology enables a service to route traffic based upon the
Node topology of the cluster. For example, a service can specify that
traffic be preferentially routed to endpoints that are on the same
Node as the client, or in the same availability zone.
It's an alpha feature available from kubernetes 1.17 which needs to be turned on by enabling the feature flag

kube-proxy is using ipvs in order to maange load-balancing in recent k8s versions. ipvsadm can be used at the node level in order to choose which load-balancing algorithm you want ipvs to use. According to https://linux.die.net/man/8/ipvsadm, you can choose between 8 load-balancing algorithms: round robin, weighted round robin, least-connection, weighted least-connection, locality-based least-connection, locality-based least-connection with replication, destination-hashing, and source-hashing. For additional information, check the documentation for the --scheduler option of ipvsadm.

Related

Kubernetes Service for a Subset of StatefulSet Pods

I have a StatefulSet with 3 pods. The first is assigned to the master role, the rest have a read replica role.
redis-0 (master)
redis-1 (replica)
redis-2 (replica)
How can I create a Kubernetes Service that matches only the pods redis-1 and redis-2? Basically I want to service that points only to the pods acting as replicas?
Logically what I want is to select every pod in the STS except the first. In pseudocode:
selector: app=redis-sts && statefulset.kubernetes.io/pod-name!=redis-0
Alternatively, selecting all the relevant pods could be viable. Again in psuedocode:
selector: statefulset.kubernetes.io/pod-name=redis-1 || statefulset.kubernetes.io/pod-name=redis-2
Here is the relevant YAML with the selectors & service defined. Full YAML.
apiVersion: v1
kind: Service
metadata:
name: redis-service
spec:
ports:
- port: 6379
clusterIP: None
selector:
app: redis-sts
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
spec:
selector:
matchLabels:
app: redis-sts
serviceName: redis-service
replicas: 3
template:
metadata:
labels:
app: redis-sts
spec:
# ...
You may use pod name labels of your redis statefulset to create the service to access a particular read replica pod.
apiVersion: v1
kind: Service
metadata:
name: redis-1
spec:
type: LoadBalancer
externalTrafficPolicy: Local
selector:
statefulset.kubernetes.io/pod-name: redis-1
ports:
- protocol: TCP
port: 6379
targetPort: 6379
And then use the service name for the pod to access the specific pods.
externalTrafficPolicy: Local will only proxy the traffic to the node that has the instance of your pod.
The Service v1 API in 1.21 doesn't support the newer "set" based LabelSelector (matchLabels or matchExpressions)
You could write a controller to apply labels to the stateful set that can then meet the simple equality logic for Service selectors. There may be Redis operators that do this type of thing already.
An idea from this stateful set label question is to use an initContainer that has the pod write access to add the label.
i would suggest to not be depends on the Kubernetes service as if your master pod got killer or restarted read replica can get change anytime in redis cluster.
https://github.com/harsh4870/Redis-Rejson-HA-Helm-Chart
here is the helm chart which deploys the Redis the same way one master and two read replica but with sentinel.
Your node or python code has to hit the service of Redis and in return redis sentinel will give you all the IP addresses for Mater and slave replicas.
Using that IP you can always connect to the Read replica and Master as per need.
If your cluster gets restarted or POD restarted master read replicas may get changed with time.

Do services send traffic to local pods first?

I have a DaemonSet with a service pointing to it.
When a pod will access the ClusterIP of the my service, will it get the local pod running on the same node or any pod in the service?
Is there any way to achieve this? My understanding is that it would be the same thing than externaltrafficpolicy: local but for internal traffic.
By default, traffic sent to a ClusterIP or NodePort Service may be routed to any backend address for the Service. Since Kubernetes 1.7 it has been possible to route "external" traffic to the Pods running on the Node that received the traffic, but this is not supported for ClusterIP Services, and more complex topologies — such as routing zonally — have not been possible. The Service Topology feature resolves this by allowing the Service creator to define a policy for routing traffic based upon the Node labels for the originating and destination Nodes.
You need to use: Service topology
An example service which prefers local pods:
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app: my-app
ports:
- protocol: TCP
port: 80
targetPort: 9376
topologyKeys:
- "kubernetes.io/hostname"
- "*"
UPD 1:
There is another option to make sure, that requests sent to a port of some particular node will be handled on the same node - it's hostPort.
An example:
kind: Pod
apiVersion: v1
metadata:
name: test-api
labels:
app: test-api
spec:
containers:
- name: testapicontainer
image: myprivaterepo/testapi:latest
ports:
- name: web
hostPort: 55555
containerPort: 80
protocol: TCP
The above pod will expose container port 80 on a hostPort: 55555 - if you have DaemonSet for those pods - then you can be sure, that they will be run on each node and each request will be handled on the node which received it.
But, please be careful using it and read this: Configuration Best Practices

Active/Passive routing a headless service with mutliple backends

I'm setting up active/passive routing for an application in kube, that's outside of the typical K8S use case. I'm trying to find configuration related to routing or load balancing in headless services with multiple backends. So far I have managed to route traffic to my backends however I need to ensure that the traffic is routed correctly. The application requires TCP connections and the primary/secondary instances have differing configuration(requiring different deployment objects). If fail-over occurs, routing is expected to return to the primary once it is restored.
The routing consistently behaves as-desired, but no documentation or configuration would indicate as much. I've found documentation stating that it should be round-robin or random because of the order of the dns entries. The crux of the question is: can I depend on this behavior? Since this is undocumented and not explicitly configured I'm concerned that it will change in future versions, or deployments.
I'm using Rancher with the canal networking layer.
I've read through both the calico and flannel docs.
Neither Endpoints/endpoint slices, nor do dns entries indicate any order for routing.
Currently the setup has two deployments that are selected by a headless service. the deployed pods have a hostname of input-primary in deployment 1 and input-secondary in deployment 2. I can access either of them by dns as input-primary.myservice, or input-secondary.myservice.
the ingress controller tcp-services config map has an entry for my service:
25252: default/myservice:9999
and an abridged version of the k8s config:
ApiVersion: v1
kind: Service
metadata:
name:myservice
spec:
clusterIP: None
ports:
- name: input
port: 9999
selector:
app: myapp
type: ClusterIP
----
apiVersion: apps/v1beta2
kind: Deployment
metadata:
labels:
app: myapp
name: input-primary
spec:
hostname: input-primary
containers:
- ports:
- containerport: 9999
name: input
protocol: TCP
----
apiVersion: apps/v1beta2
kind: Deployment
metadata:
labels:
app: myapp
name: input-secondary
spec:
hostname: input-secondary
containers:
- ports:
- containerport: 9999
name: input
protocol: TCP```

Connect pods via service name in GCP K8's

I have a number of Services running against Pods hosted within a cluster on Google Cloud K8's.
Service 1 is an Ingress - basic-ingress
Service 2 is a NodeJS API Gateway w/ 2 Pods - security-gateway-svc
Service 3 is a NodeJS API w/ 2 Pods - some-random-api-svc
and so on service 4 / 5 / 6 etc....
My Ingress allows me to access exposed services via a sub domain however I would like to move my external API's behind my Gateway so I can handle auth etc in the gateway.
What I'd like to do is allow security-gateway-svc to connect to some-random-api-svc without having to go via dns or outside of my cluster.
I figured I could update my ingress so all sub domains use the same service entry and allow the Gateway to figure out where the traffic should go.
I can configure this just fine locally as everything runs on localhost and I specify a port so it's fairly straight forward.
Is it possible however to expose pods to other pods within a cluster via the service name instead of an actual domain / dns look up?
You service should be accessible within your cluster via the service name.
Point your gateway entry for each api to the service name.
Something like http://some-random-api-svc should work.
The easier way to make pods reachable within your kubernetes clulster is to use services link to services documentation. For this you need to create a yaml block that will create an internal hostname binded by an endpoint to your pod. In addition, a selector will allow you to bind one or multiple pods to that internal hostname. Here is an example:
---
apiVersion: v1
kind: Service
metadata:
name: $YOUR_SERVICE_NAME
namespace: $YOUR_NAMESPACE
labels:
app: $YOUR_SERVICE_NAME
spec:
ports:
- name: "8000"
port: 8000
targetPort: 8000
selector:
app: $YOUR_SERVICE_NAME
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: $YOUR_SERVICE_NAME
namespace: $YOUR_NAMESPACE
labels:
app: $YOUR_SERVICE_NAME
spec:
replicas: 1
selector:
matchLabels:
app: $YOUR_SERVICE_NAME
template:
metadata:
labels:
app: k2m
spec:
containers:
- name: $YOUR_SERVICE_NAME
image: alpine:latest
restartPolicy: Always
Finally, use the service name in your ingress controller route to redirect traffic to your api-gateway.
Kubernetes uses CoreDNS to perform in-cluster DNS resolution. By default, all Services are assigned DNS names in the (FQDN) form of <service-name>.<namespace>.svc.cluster.local. So your security-gateway-svc will be able to forward requests to some-random-api-svc via some-random-api-svc.<namespace>, without routing the traffic outside of Kubernetes. Keep in mind that you shouldn't be interacting with pods directly, because pods are ephemeral; always go through Services.

Kubernetes to find Pod IP from another Pod

I have the following pods hello-abc and hello-def.
And I want to send data from hello-abc to hello-def.
How would pod hello-abc know the IP address of hello-def?
And I want to do this programmatically.
What's the easiest way for hello-abc to find where hello-def?
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: hello-abc-deployment
spec:
replicas: 1
template:
metadata:
labels:
app: hello-abc
spec:
containers:
- name: hello-abc
image: hello-abc:v0.0.1
imagePullPolicy: Always
args: ["/hello-abc"]
ports:
- containerPort: 5000
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: hello-def-deployment
spec:
replicas: 1
template:
metadata:
labels:
app: hello-def
spec:
containers:
- name: hello-def
image: hello-def:v0.0.1
imagePullPolicy: Always
args: ["/hello-def"]
ports:
- containerPort: 5001
---
apiVersion: v1
kind: Service
metadata:
name: hello-abc-service
spec:
ports:
- port: 80
targetPort: 5000
protocol: TCP
selector:
app: hello-abc
type: NodePort
---
apiVersion: v1
kind: Service
metadata:
name: hello-def-service
spec:
ports:
- port: 80
targetPort: 5001
protocol: TCP
selector:
app: hello-def
type: NodePort
Preface
Since you have defined a service that routes to each deployment, if you have deployed both services and deployments into the same namespace, you can in many modern kubernetes clusters take advantage of kube-dns and simply refer to the service by name.
Unfortunately if kube-dns is not configured in your cluster (although it is unlikely) you cannot refer to it by name.
You can read more about DNS records for services here
In addition Kubernetes features "Service Discovery" Which exposes the ports and ips of your services into any container which is deployed into the same namespace.
Solution
This means, to reach hello-def you can do so like this
curl http://hello-def-service:${HELLO_DEF_SERVICE_PORT}
based on Service Discovery https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables
Caveat: Its very possible that if the Service port changes, only pods that are created after the change in the same namespace will receive the new environment variables.
External Access
In addition, you can also reach this your service externally since you are using the NodePort feature, as long as your NodePort range is accessible from outside.
This would require you to access your service by node-ip:nodePort
You can find out the NodePort which was randomly assigned to your service with kubectl describe svc/hello-def-service
Ingress
To reach your service from outside you should implement an ingress service such as nginx-ingress
https://github.com/helm/charts/tree/master/stable/nginx-ingress
https://github.com/kubernetes/ingress-nginx
Sidecar
If your 2 services are tightly coupled, you can include both in the same pod using the Kubernetes Sidecar feature. In this case, both containers in the pod would share the same virtual network adapter and accessible via localhost:$port
https://kubernetes.io/docs/concepts/workloads/pods/pod/#uses-of-pods
Service Discovery
When a Pod is run on a Node, the kubelet adds a set of environment
variables for each active Service. It supports both Docker links
compatible variables (see makeLinkVariables) and simpler
{SVCNAME}_SERVICE_HOST and {SVCNAME}_SERVICE_PORT variables, where the
Service name is upper-cased and dashes are converted to underscores.
Read more about service discovery here:
https://kubernetes.io/docs/concepts/services-networking/service/#environment-variables
You should be able to reach hello-def-service from pods in hello-abc via DNS as specified here: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#services
However, kube-dns or CoreDNS has to be configured/installed in your k8s cluster before DNS records can be utilized in your cluster.
Specifically, you should be reach hello-def-service via the DNS record http://hello-def-service for the service running in the same namespace as hello-abc-service
And you should be able to reach hello-def-service running in another namespace ohter_namespace via the DNS record hello-def-service.other_namespace.svc.cluster.local.
If, for some reason, you do not have DNS add-ons installed in your cluster, you still can find the virtual IP of the hello-def-service via environment variables in hello-abc pods. As is documented here.