Non uniform distribution of requests between internal services in minikube - kubernetes

I have 2 different microservices deployed as backend in minikube, call it deployment A and deployment B. Both these deployments have a different replica of pods running.
Deployment B is exposed as service B. Pods of deployment A call deployment B pods via service B which is of ClusterIP type.
The pods of deployment B have a scrapyd application running inside them with scraping spiders deployed in them. Each celery worker( pods of deployment A) takes a task from a redis queue and calls scrapyd service to schedule spiders on them.
Everything works fine but after I scale the application (deployment A and B seperately), I observe that the resource consumption is not uniform, using kubectl top pods
I observe that some of the pods of deployment B are not used at all. The pattern I observe is that only those pods of deployment B, that are up and running after all the pods of deployment A are up, are never utilized.
Is it normal behavior? I suspect the connection between pods of deployment A and B is persistent I am confused as to why request handling by pods of deployment B is not uniformly distributed? Sorry for the naive question. I am new to this field.
The manifest for deployment A is :
apiVersion: apps/v1
kind: Deployment
metadata:
name: celery-worker
labels:
deployment: celery-worker
spec:
replicas: 1
selector:
matchLabels:
pod: celery-worker
template:
metadata:
labels:
pod: celery-worker
spec:
containers:
- name: celery-worker
image: celery:latest
imagePullPolicy: Never
command: ['celery', '-A', 'mysite', 'worker', '-E', '-l', 'info',]
resources:
limits:
cpu: 500m
requests:
cpu: 200m
terminationGracePeriodSeconds: 200
and that of deployment B is
apiVersion: apps/v1
kind: Deployment
metadata:
name: scrapyd
labels:
app: scrapyd
spec:
replicas: 1
selector:
matchLabels:
pod: scrapyd
template:
metadata:
labels:
pod: scrapyd
spec:
containers:
- name: scrapyd
image: scrapyd:latest
imagePullPolicy: Never
ports:
- containerPort: 6800
resources:
limits:
cpu: 800m
requests:
cpu: 800m
terminationGracePeriodSeconds: 100
---
kind: Service
apiVersion: v1
metadata:
name: scrapyd
spec:
selector:
pod: scrapyd
ports:
- protocol: TCP
port: 6800
targetPort: 6800
Output of kubectl top pods :

So the solution to the above problem that I figured out is as follows :
In the current setup, install linkerd using this link. Linkerd Installation in Kubernetes
After that inject the linkerd proxy into celery deployment as follows :
cat celery/deployment.yaml | linkerd inject - | kubectl apply -f -
This ensures that the requests from celery are first passed on to this proxy and then Load balanced directly to scrapy at L7 Layer. In this case, kube-proxy is by passed and the default load balancing over L4 is no longer functional.

Related

StatefulSet: Longer rolling update lead Version mismatching

Application is deployed on K8s using StatefulSet because of stateful in nature. There is around 250+ pods are running and HPA has been implemented on it too that can scale upto 400 pods.
When new deployment occurs, it takes longer time (~ 10-15m) to update all pods in Rolling Update fashion.
Problem: End user get response from 2 version of pods until all pods are replaced with new revision.
I am googling for an architecture where overall deployment time can be reduced and getting the best possible solutions to use BLUE/GREEN strategy but it has bunch of impact with integrated services like monitoring, logging, telemetry etc because of 2 naming conventions.
Ideally I am looking for a solutions like maxSurge for Deployment in which firstly new pods are created and then traffic are shifted to it at a time but in case of StatefulSet, it won't support maxSurge with RollingUpdate strategy & controller will delete and recreate each Pod in the StatefulSet based on ordinal index from bigger to smaller.
The solution is to do a partitioning rolling update along with a canary deployment.
Let’s suppose we have the statefulset workload defined by the following yaml file:
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
version: "1.20"
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
version: "1.20"
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx # Label selector that determines which Pods belong to the StatefulSet
# Must match spec: template: metadata: labels
serviceName: "nginx"
replicas: 3
template:
metadata:
labels:
app: nginx # Pod template's label selector
version: "1.20"
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: nginx:1.20
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
You could patch the statefulset to create a partition, and change the image and version label for the remaining pods: (In this case, since there are only 3 pods, the last one will be the one that will change its image.)
$ kubectl patch statefulset web -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":2}}}}'
$ kubectl patch statefulset web --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"nginx:1.21"}]'
$ kubectl patch statefulset web --type='json' -p='[{"op": "replace", "path": "/spec/template/metadata/labels/version", "value":"1.21"}]'
At this point, you have a pod with the new image and version label ready to use, but since the version label is different, the traffic is still going to the other two pods. If you change the version in the yaml file and apply the new configuration, the rollout will be transparent, since there is already a pod ready to migrate the traffic:
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
version: "1.21"
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
version: "1.21"
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx # Label selector that determines which Pods belong to the StatefulSet
# Must match spec: template: metadata: labels
serviceName: "nginx"
replicas: 3
template:
metadata:
labels:
app: nginx # Pod template's label selector
version: "1.21"
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
$ kubectl apply -f file-name.yaml
Once traffic is migrated to the pod containing the new image and version label, you should patch again the statefulset and remove the partition with the command kubectl patch statefulset web -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":0}}}}'
Note: You will need to be very careful with the size of the partition, since the remaining pods will handle the whole traffic for some time.

Graceful scaledown of stateful apps in Kubernetes

I have a stateful application deployed in Kubernetes cluster. Now the challenge is how do I scale down the cluster in a graceful way so that each pod while terminating (during scale down) completes it’s pending tasks and then gracefully shuts-down. The scenario is similar to what is explained below but in my case the pods terminating will have few inflight tasks to be processed.
https://medium.com/#marko.luksa/graceful-scaledown-of-stateful-apps-in-kubernetes-2205fc556ba9 1
Do we have an official feature support for this from kubernetes api.
Kubernetes version: v1.11.0
Host OS: linux/amd64
CRI version: Docker 1.13.1
UPDATE :
Possible Solution - While performing a statefulset scale-down the preStop hook for the terminating pod(s) will send a message notification to a queue with the meta-data details of the resp. task(s) to be completed. Afterwards use a K8 Job to complete the tasks. Please do comment if the same is a recommended approach from K8 perspective.
Thanks In Advance!
Regards,
Balu
Your pod will be scaled down only after the in-progress job is completed. You may additionally configure the lifecycle in the deployment manifest with prestop attribute which will gracefully stop your application. This is one of the best practices to follow. Please refer this for detailed explanation and syntax.
Updated Answer
This is the yaml I tried to deploy on my local and tried generating the load to raise the cpu utilization and trigger the hpa.
Deployment.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
namespace: default
name: whoami
labels:
app: whoami
spec:
replicas: 1
selector:
matchLabels:
app: whoami
template:
metadata:
labels:
app: whoami
spec:
containers:
- name: whoami
image: containous/whoami
resources:
requests:
cpu: 30m
limits:
cpu: 40m
ports:
- name: web
containerPort: 80
lifecycle:
preStop:
exec:
command:
- /bin/sh
- echo "Starting Sleep"; date; sleep 600; echo "Pod will be terminated now"
---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: whoami
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: whoami
minReplicas: 1
maxReplicas: 3
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 40
# - type: Resource
# resource:
# name: memory
# targetAverageUtilization: 10
---
apiVersion: v1
kind: Service
metadata:
name: whoami-service
spec:
ports:
- port: 80
targetPort: 80
protocol: TCP
name: http
selector:
app: whoami
Once the pod is deployed, execute the below command which will generate the load.
kubectl run -i --tty load-generator --image=busybox /bin/sh
while true; do wget -q -O- http://whoami-service.default.svc.cluster.local; done
Once the replicas are created, I stopped the load and the pods are terminated after 600 seconds. This scenario worked for me. I believe this would be the similar case for statefulset as well. Hope this helps.

Can we create service to link two PODs from different Deployments >

My application has to deployments with a POD.
Can I create a Service to distribute load across these 2 PODs, part of different deployments ?
If so, How ?
Yes it is possible to achieve. Good explanation how to do it can be found on Kubernete documentation. However, keep in mind that both deployments should provide the same functionality, as the output should have the same format.
A Kubernetes Service is an abstraction which defines a logical set of Pods running somewhere in your cluster, that all provide the same functionality. When created, each Service is assigned a unique IP address (also called clusterIP). This address is tied to the lifespan of the Service, and will not change while the Service is alive. Pods can be configured to talk to the Service, and know that communication to the Service will be automatically load-balanced out to some pod that is a member of the Service.
Based on example from Documentation.
1. nginx Deployment. Keep in mind that Deployment can have more than 1 label.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
selector:
matchLabels:
run: nginx
env: dev
replicas: 2
template:
metadata:
labels:
run: nginx
env: dev
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
2. nginx-second Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-second
spec:
selector:
matchLabels:
run: nginx
env: prod
replicas: 2
template:
metadata:
labels:
run: nginx
env: prod
spec:
containers:
- name: nginx-second
image: nginx
ports:
- containerPort: 80
Now to pair Deployments with Services you have to use Selector based on Deployments labels. Below you can find 2 service YAMLs. nginx-service which pointing to both deployments and nginx-service-1 which points only to nginx-second deployment.
## Both Deployments
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
ports:
- port: 80
protocol: TCP
selector:
run: nginx
---
### To nginx-second deployment
apiVersion: v1
kind: Service
metadata:
name: nginx-service-1
spec:
ports:
- port: 80
protocol: TCP
selector:
env: prod
You can verify that service binds to deployment by checking the endpoints.
$ kubectl get pods -l run=nginx -o yaml | grep podIP
podIP: 10.32.0.9
podIP: 10.32.2.10
podIP: 10.32.0.10
podIP: 10.32.2.11
$ kk get ep nginx-service
NAME ENDPOINTS AGE
nginx-service 10.32.0.10:80,10.32.0.9:80,10.32.2.10:80 + 1 more... 3m33s
$ kk get ep nginx-service-1
NAME ENDPOINTS AGE
nginx-service-1 10.32.0.10:80,10.32.2.11:80 3m36s
Yes, you can do that.
Add a common label key pair to both the deployment pod spec and use that common label as selector in service definition
With the above defined service the requests would be load balanced across all the matching pods.

Kubenetes Pod showing status "Completed" without any jobs

I'm hosting an Angular website that connects to a C#-backend inside a Kubernetes Cluster. When I use a certain function on the website that I can't describe in more detail, the pod shows status "Completed", then goes into "CrashLoopBackOff" and then restarts. The problem is, there are no jobs set up for this Pod (in fact, I didn't even know Jobs are a thing until one hour ago). So my main question would be: How can a Pod go into the "Completed" status without running any jobs?
My .yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-demo
namespace: my-namespace
labels:
app: my-demo
spec:
replicas: 1
template:
metadata:
name: my-demo-pod
labels:
app: my-demo
spec:
nodeSelector:
"beta.kubernetes.io/os": windows
containers:
- name: my-demo-container
image: myregistry.azurecr.io/my.demo:latest
imagePullPolicy: Always
resources:
limits:
cpu: 1
memory: 800M
requests:
cpu: .1
memory: 300M
ports:
- containerPort: 80
imagePullSecrets:
- name: my-registry-secret
selector:
matchLabels:
app: my-demo
---
apiVersion: v1
kind: Service
metadata:
name: my-demo-service
namespace: my-namespace
spec:
ports:
- protocol: TCP
port: 80
name: my-demo-port
selector:
app: my-demo
Completed status indicates that the application called by the cmd or ENTRYPOINT exited with a non-error (i.e. 0) status code. This Completed->CrashLoopBackoff->Running cycle usually indicates that the process called on the container start doesn't daemonize itself and exits, which Kubernetes sees as the process 'completing', hence the status.
Check that your ENTRYPOINT in your Dockerfile or your cmd in your pod template are calling the right process (with the appropriate flags) for the process to be daemonized. You can also check the logs for the previous pod (i.e. using kubectl logs --previous) to see what output the application gave

kubernetes "n" independent pods with identity 1 to n

Requirement 1 (routes):
I need to be able to route to "n" independent Kubernetes deployments. Such as:
http://building-1-building
http://building-2-building
http://building-3-building
... n ...
Each building is receiving traffic that is independent of the others.
Requirement 2 (independence):
If pod for building-2-deployment dies, I would like only building-2-deployment to be restarted and the other n-1 deployments to be not affected.
Requirement 3 (kill then replace):
If pod for building-2-deployment is unhealthy, I would like it to be killed then a new one created. Not a replacement created, then the sick is killed.
When I update the image and issue a "kubectl apply -f building.yaml", I would like each deployment to be shut down then a new one started with the new SW. In other words, not create a second then kill the first.
Requirement 4 (yaml): This application is created and updated with a yaml file so that it's repeatable and archivable.
kubectl create -f building.yaml
kubectl apply -f building.yaml
Partial Solution:
The following yaml creates routes (requirement 1), operates each deployment independently (requirement 2), but fails on to kill before starting a replacement (requirement 3).
This partial solution is a little verbose as each deployment is replicated "n" time where only the "n" is changed.
I would appreciate a suggested to solve all 3 requirements.
apiVersion: v1
kind: Service
metadata:
name: building-1-service # http://building-1-service
spec:
ports:
- port: 80
targetPort: 80
type: NodePort
selector:
app: building-1-pod #matches name of pod being created by deployment
---
apiVersion: v1
kind: Service
metadata:
name: building-2-service # http://building-2-service
spec:
ports:
- port: 80
targetPort: 80
type: NodePort
selector:
app: building-2-pod #matches name of pod being created by deployment
---
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: building-1-deployment # name of the deployment
spec:
replicas: 1
selector:
matchLabels:
app: building-1-pod # matches name of pod being created
template:
metadata:
labels:
app: building-1-pod # name of pod, matches name in deployment and route "location /building_1/" in nginx.conf
spec:
containers:
- name: building-container # name of docker container
image: us.gcr.io//proj-12345/building:2018_03_19_19_45
resources:
limits:
cpu: "1"
requests:
cpu: "10m"
ports:
- containerPort: 80
---
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: building-2-deployment # name of the deployment
spec:
replicas: 1
selector:
matchLabels:
app: building-2-pod # matches name of pod being created
template:
metadata:
labels:
app: building-2-pod # name of pod, matches name in deployment and route "location /building_2/" in nginx.conf
spec:
containers:
- name: building-container # name of docker container
image: us.gcr.io//proj-12345/building:2018_03_19_19_45
resources:
limits:
cpu: "1"
requests:
cpu: "10m"
ports:
- containerPort: 80
You are in the luck. This is exactly what StatefulSets are for.