How Replicaset includes pods with specific labels - kubernetes

If I give some specific label to pods and define replicaset saying to include pods with same labels, it includes that pod in it. That is all fine and good..
( i know pods are not to be created separately, but are supposed to be created with deployments or replicaset.. but still how deployments/replicasets include pods whose label match in the defination, if they are already there for some reason)
BUT, how does this work behind the scene ? How replicaset knows that pod is to be included as it has the same label ? Lets say, I already have a pod with those labels, how does newly created replica set know that pod is to be included if it has pods less than desired number of pods ?
Does it get that information from etcd ? Or pods expose labels somehow ? How does this thing work really behind the scene ?

As stated in the Kubernetes documentation regarding ReplicaSet.
A ReplicaSet is defined with fields, including a selector that specifies how to identify Pods it can acquire, a number of replicas indicating how many Pods it should be maintaining, and a pod template specifying the data of new Pods it should create to meet the number of replicas criteria. A ReplicaSet then fulfills its purpose by creating and deleting Pods as needed to reach the desired number. When a ReplicaSet needs to create new Pods, it uses its Pod template.
It's recommended to use Deployments instead of ReplicaSets.
Deployment is an object which can own ReplicaSets and update them and their Pods via declarative, server-side rolling updates. While ReplicaSets can be used independently, today they’re mainly used by Deployments as a mechanism to orchestrate Pod creation, deletion and updates. When you use Deployments you don’t have to worry about managing the ReplicaSets that they create. Deployments own and manage their ReplicaSets. As such, it is recommended to use Deployments when you want ReplicaSets.
Like you mentioned if you have a Pod with label matching the ReplicaSet label, ReplicaSet will take control over the pod. If you deploy ReplicaSet with 3 replicas and Pod was deployed before that, then RS will spawn only 2 Pods with the matching label. It's explained with details and examples on Non-Template Pod acquisitions.
As to how it works behind the scenes, you can have a look at slides #47-56 of Kubernetes Architecture - beyond a black box - Part 1

Related

In Kubernetes, what is the real purpose of replicasets?

I am aware about the hierarchical order of k8s resources. In brief,
service: a service is what exposes the application to outer world (or with in cluster). (The service types like, CluserIp, NodePort, Ingress are not so much relevant to this question. )
deployment: a deployment is what is responsible to keep a set of pods running.
replicaset: a replica set is what a deployment in turn relies on to keep the set of pods running.
pod: - a pod consist of a container or a group of container
container - the actual required application is run inside the container.
The thing i want to empasise in this question is, why we have replicaset. Why don't the deployment directly handle or take responsibility of keeping the required number of pods running. But deployment in turn relies on replicset for this.
If k8s is designed this way there should be definitely some benefit of having replicaset. And this is what i want to explore/understand in depth.
Both essentially serves the same purpose. Deployments are a higher abstraction and as the name suggests it deals with creating, maintining and upgrading the deployment (collection of pods) as a whole.
Whereas, ReplicationControllers or Replica sets primary responsibility is to maintain a set of identical replicas (which you can achieve declaratively using deployments too, but internally it creates a resplicaset to enable this).
More specifically, when you are trying to perform a "rolling" update to your deployment, such as updating the image versions, the deployment internally creates a new replica set and performs the rollout. during the rollout you can see two replicasets for the same deployment.
So in other words, Deployment needs the lower level "encapsulation" of Replica sets to achive this.

Can a pod run on multiple nodes?

I have one kubernetes master and three kubernetes nodes. I made one pod which is running on specific node. I want to run that pod on 2 nodes. how can I achieve this? do replica concept help me? if yes how?
Yes, you can assign pods to one or more nodes of your cluster, and here are some options to achieve this:
nodeSelector
nodeSelector is the simplest recommended form of node selection constraint. nodeSelector is a field of PodSpec. It specifies a map of key-value pairs. For the pod to be eligible to run on a node, the node must have each of the indicated key-value pairs as labels (it can have additional labels as well). The most common usage is one key-value pair.
affinity and anti-affinity
Node affinity is conceptually similar to nodeSelector -- it allows you to constrain which nodes your pod is eligible to be scheduled on, based on labels on the node.
nodeSelector provides a very simple way to constrain pods to nodes with particular labels. The affinity/anti-affinity feature, greatly expands the types of constraints you can express. The key enhancements are
The affinity/anti-affinity language is more expressive. The language offers more matching rules besides exact matches created with a logical AND operation;
you can indicate that the rule is "soft"/"preference" rather than a hard requirement, so if the scheduler can't satisfy it, the pod will still be scheduled;
you can constrain against labels on other pods running on the node (or other topological domain), rather than against labels on the node itself, which allows rules about which pods can and cannot be co-located
DaemonSet
A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.
Some typical uses of a DaemonSet are:
running a cluster storage daemon on every node
running a logs collection daemon on every node
running a node monitoring daemon on every node
Please check this link to read more about how to assign pods to nodes.
It's not a good practice to run the pods directly on the nodes as the nodes/pods can crash at any time. It's better use the K8S controllers as mentioned in the K8S documentation here.
K8S supports multiple containers and depending on the requirement the appropriate controller can be used. By looking at the OP it's difficult to say which controller to use.
You can use daemonset, if you want to run pod on each node.
What I see is you are trying to deploy pod on each node, it's better if you allow the scheduler to make decision where the pod need to be deployed based on the resources.
This would be best in all worst scenario's.
I'm mean in case of node failures.

Kubernetes node affinity - Assigning a pod to a specific node?

The k8s node affinity documentation explains how to deploy a pod to a specific node by first tagging the node with a label and using nodeSelector to pick the node.
However, I have a use case where I have 40-50 deployments in a cluster and I want to add a new node to the cluster and set that node dedicated to one of those deployments/pods without altering all those deployments which don't have any nodeSelector specified
For example, lets say I have 3 deployments, with no nodeSelector defined and 3 worker nodes. This means k8s decides where the pods will be deployed and it could be deployed in one of those 3 nodes. Now I have to create a 4th deployment, and add 4th server and I want to dedicate the 4th deployment to the 4th server and also want to make sure that k8s won't schedule the first 3 deployments to this 4th node. How can I do this without going through all those 3 deployment scheme and apply a nodeSelector filter to not deploy on 4th node? (It would be ok to perform this change on 3 deployments, but I am talking about 50s of deployments in real life scenario)
The only thing that I can think of is to taint the node but if I do it, none of the pods will be scheduled there.
Is there a better approach here to achieve this goal that I am not aware of?
If you taint the 4th node, the 4th deployment with the toleration will be deployed on that node, while the other three won't.
You could use an admission controller to dynamically add NodeSelector into a pod spec at runtime rather than modifying existing deployment yamls. You could write any custom logic in the mutating web-hook to cater to your use case. For example you could have a label/annotation in the new deployment spec and based on the existence of that label/annotation you could dynamically add a NodeSelector.
PodNodeSelector admission web-hook is a good example to refer to.

What happens when we scale the kubernetes deployment and change one of the pod or container configuration?

When i scale the application by creating deployment .Let's say i am running nginx service on 3 cluster.
Nginx is running in containers in multiple pods .
If i change nginx configuration in one of the pod ,does it propagate to all the nodes and pods because it is running in cluster and scaled.
does it propagate to all the nodes and pods because it is running in
cluster and scaled.
No. Only when you change the deployment yaml. Then it re-creates pods 1 by 1 with the new configuration.
I would like to add a few more things to what was already said. First of all you are even not supposed to do any changes to Pods which are managed let's say by ReplicaSet, ReplicationController or Deployment. This are objects which provide additional abstraction layer and it is their responsibility to ensure that there are given number of Pods of a certain kind running in your kubernetes cluster.
It doesn't matter how many nodes your cluster consists of as mentioned controllers span across all nodes in the cluster.
Changes made in a single Pod will not only not propagate to other Pods but may be easily lost if such newly created Pod with changed configuration crashes.
Remember that one of the tasks of the Deployment is to make sure that certain number of Pods of a given type ( specified in a Pod template section of the Deployment ) are always up and running. When your manually reconfigured Pod goes down then your Deployment (actually ReplicaSet created by the Deployment) acts behind the scenes and recreates such Pod. But how does it recreate it ? Does it take into consideration changes introduced by you to such Pod ? Of course not, it will recreate it based on the template it is given in the Deployment.
If you want to make changes in your Pods one by one kubernetes allows you to do so by providing so called rolling update mechanism.
Here you can read about old-fashioned approach using ReplicationController which is not used any more as it is replaced by Deployments and ReplicaSets but I think it's still worth reading just to grasp the concept.
Currently Deployment is the way to go. About updating a Deployment you can read here. Note that the default update strategy is RollingUpdate which ensures that changes are not applied to all Pods at once but one by one.

How to delete pods inside "default" namespace permanently. As when I delete only pods it is coming back because of "replication controller"

"How to permanently delete pods inside "default" namespace? As when I delete pods, they are coming back because of "replication controller".
As this is in a Default namespace, I am sure that we can delete it permanently. Any idea how to do it ?
I'd like to add some update to what was already said in previous answer.
Basically in kubernetes you have several abstraction layers. As you can read in the documentation:
A Pod is the basic execution unit of a Kubernetes application–the
smallest and simplest unit in the Kubernetes object model that you
create or deploy. A Pod represents processes running on your Cluster .
It is rarely deployed as separate entity. In most cases it is a part of higher level object such as Deployment or ReplicationController. I would advise you to familiarize with general concept of controllers, especially Deployments, as they are currently the recommended way of setting up replication [source]:
Note: A Deployment that configures a ReplicaSet is now the recommended
way to set up replication.
As you can read further:
A ReplicationController ensures that a specified number of pod
replicas are running at any one time. In other words, a
ReplicationController makes sure that a pod or a homogeneous set of
pods is always up and available.
It applies also to situation when certain pods are deleted by user. Replication controller doesn't care why the pods were deleted. Its role is just to make sure they are always up and running. Its very simple concept. When you don't want certain pods to exist any more, you must get rid of the higher level object that manages them and ensures they are always available.
Read about Replication Controllers, then delete the ReplicationController.
It can't "ensure that a specified number of pod replicas are running" when it's dead.
kubectl delete replicationcontroller <name>