How scheduler talks to API server? - kubernetes

I would like to know, in which parts of the codes in kubernetes (
https://github.com/kubernetes/kubernetes) the scheduler talks to the API server and then the API server sends out the scheduling information to kubelet?

Scheduler register a informer to specify resource(e.g. pod, PV...), register some callback function to event(e.g. add, delete, update...), these code at: https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/eventhandlers.go#L319.
Then event callback will put pod spec in the queue, scheduler will check the queue, add use some algorithm to schedule the pod to some node. Finally, scheduler will update the pod information to apiserver.
Kubelet will check the apiserver to find which pod need update, then create the container, bind the volume...
p.s. It is complex to understand the whole lifecycle about how kubernetes works, please provide what you want to know exactly.

Related

Kubernetes. Can i execute task in POD from other POD?

I'm wondering if there is an option to execute the task from POD? So when I have POD which is, for example, listening for some requests and on request will delegate the task to other POD(worker POD). This worker POD is not alive when there are no jobs to do and if there is more than one job to do more PODs will be created. After the job is done "worker PODs" will be stopped. So worker POD is live during one task then killed and when new task arrived new worker POD is started. I hope that I described it properly. Do you know if this is possible in Kubernetes? Worker POD start may be done by, for example, rest call from the main POD.
There are few ways to achieve this behavior.
Pure Kubernetes Way
This solution requires ServiceAccount configuration. Please take a loot at Kubernetes documentation Configure Service Accounts for Pods.
Your application service/pod can handle different custom task. Applying specific serviceaccount to your pod in order to perform specific task is the best practice. In kubernetes using service account with predefined/specified RBAC rules allows you to handle this task almost out of the box.
The main concept is to configure specific RBAC authorization rules for specific service account by giving different permission (get,list,watch,create) to different Kubernetes resources like (pod,jobs).
In this scenario working pod is waiting for incoming request, after it receives specific request it can perform specific task against kubernetes api.
This can be extend i.e. by using sidecar container inside your working pod. More details about sidecar concept can be found in this article.
Create custom controller
Another way to achieve your goal is to use Custom Controller.
Example presented in Extending the Kubernetes Controller article is showing how custom controller watch kubernetes api in order to instrument underling worker pod (watching kubernetes configuration for any changes and then deletes corresponding pods). In your setup, such controller could watch your api for waiting/not processed request and perform additional task like kubernetes job creation inside k8s cluster.
Using existing solution like Job Processing Using a Work Queue.
RabbitMQ on Kubernetes
Coarse Parallel Processing Using a Work Queue
Kubernetes Message Queue
Keda

With Kubernetes Is there a way to wait for a pod to finish its ongoing tasks before updating it?

I'm managing a application inside kubernetes,
I have a front end (nginx, flask) and a backend (celery)
Long running tasks are sent to the backend using a middle-ware (rabbitmq)
My issue here is that i can receive long running tasks at anytime, and i don't want it to disturb my plan of upgrading the version of my application.
I'm using the command kubectl apply -f $MY_FILE to deploy/update my application. But if i do it when a celery po is busy, the pod will be terminated, and i'll be losing the task.
I tried using the readiness probe, but the pods are still being terminated.
My question is, is there a way for kube to target only 'free' pods, and wait for the busy on to finish ?
Thank you
You can use preStop hooks to complete ongoing task before the pod is terminated.
Kubernetes sends the preStop event immediately before the Container is terminated. Kubernetes’ management of the Container blocks until the preStop handler completes, unless the Pod’s grace period expires.For more details, see Termination of Pods.
https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/#define-poststart-and-prestop-handlers
One way is to create another deployment with the new image, expose it as a service. Pass on any new requests ONLY to this new deployment/service.
Meanwhile, the old deployment/service can still continue processing the existing requests and not take any new ones. Once all the requests are processed the old deployment/service can be deleted.
The only problem with this approach, roughly double the resources are required for some duration as old/new deployment/service run in parallel.
Something like a A/B testing. FYI ... Istio makes is easy with traffic management.

Need to create a pod for each new request from frontend service in Kubernetes

I have a use case in which front-end application in sending a file to back-end service for processing. And at a time only one request can be processed by backend service pod. And if multiple request came service should autoscale and send that request to new Pod.
So I am finding a way in which I can spawn a new POD against each request and after completion of processing by backend service pod it will return the result to front-end service and destroy itself.
So that each pod only process a single request at a time.
I explore the HPA autoscaling but did not find any suitable way.
Open to use any custom metric server for that, even can use Jobs if they are able to fulfill the above scenario.
So if someone have knowledge or tackle the same use case then help me so that I can also try that solution.
Thanks in advance.
There's not really anything built-in for this that I can think of. You could create a service account for your app that has permissions to create pods, and then build the spawning behavior into your app code directly. If you can get metrics about which pods are available, you could use HPA with Prometheus to ensure there is always at least one unoccupied backend, but that depends on what kind of metrics your stuff exposes.
As already said, there is no built in way for doing this , you need to find custom way to achive this.
One solution can be use of service account and http request to api server to create back end pod as soon as your service is received by front end pod, check status of back end pod and once it is up, forward request to back end.
Second way i can think of using some temp storage ( db or hostpath volume ) and write cronejob in your master to poll that storage and depending on status spawn pod having job container.

Specify scheduling order of a Kubernetes DaemonSet

I have Consul running in my cluster and each node runs a consul-agent as a DaemonSet. I also have other DaemonSets that interact with Consul and therefore require a consul-agent to be running in order to communicate with the Consul servers.
My problem is, if my DaemonSet is started before the consul-agent, the application will error as it cannot connect to Consul and subsequently get restarted.
I also notice the same problem with other DaemonSets, e.g Weave, as it requires kube-proxy and kube-dns. If Weave is started first, it will constantly restart until the kube services are ready.
I know I could add retry logic to my application, but I was wondering if it was possible to specify the order in which DaemonSets are scheduled?
Kubernetes itself does not provide a way to specific dependencies between pods / deployments / services (e.g. "start pod A only if service B is available" or "start pod A after pod B").
The currect approach (based on what I found while researching this) seems to be retry logic or an init container. To quote the docs:
They run to completion before any app Containers start, whereas app Containers run in parallel, so Init Containers provide an easy way to block or delay the startup of app Containers until some set of preconditions are met.
This means you can either add retry logic to your application (which I would recommend as it might help you in different situations such as a short service outage) our you can use an init container that polls a health endpoint via the Kubernetes service name until it gets a satisfying response.
retry logic is preferred over startup dependency ordering, since it handles both the initial bringup case and recovery from post-start outages

Kubernetes Lifecycle Hooks

I would like to take particular actions when a K8 Pod, or the node its running on, crashes/restarts/etc -- basically notify another part of the application that this has happened. I also need this to be guaranteed to execute. Can a kubernetes PreStop hook accomplish this? From my understanding, these are generally used to gracefully shutdown containers when a pod is deleted and the hook handler is guaranteed to run. It seems like most people use them in scenarios where they are shutting things down themselves.
Will the hooks also run when a node unexpectedly crashes? If not, is there a kubernetes solution for what I'm trying to accomplish?
PreStop hook doesn't work for nodes. PreStop hook is a task running during termination of containers and is executing specific command or HTTP request against a specific endpoint on the Container.
If you are interested in health monitoring of nodes, you may read about
node-problem-detector already installed by default to Kubernetes in GCE.