running pods and containers in Kubernetes - kubernetes

I am fairly new to Kubernates and what I am able to understand so far,
cluster is collection of node(s)
each node can have a set of running container(s)
set of tightly coupled container(s) itself can be grouped together to form a pod (despite of the node in which the container is running).
First of all, am I correct so far?
secondly, and going through docs about kube-scheduler says,
Control Plane component that watches for newly created pods with no assigned node, and selects a node for them to run on.
and docs also says pods are,
The smallest and simplest Kubernetes object. A Pod represents a set of running containers on your cluster.
My question, rather confusion is since we have already containers running in different nodes, why do we need additional node to run a pod on ?

cluster is collection of node(s)
each node can have a set of running container(s)
You are correct.
set of tightly coupled container(s) itself can be grouped together to form a pod (despite of the node in which the container is running).
All containers belonging to a pod run on the same node.
My question, rather confusion is since we have already containers
running in different nodes, why do we need additional node to run a
pod on ?
It's not the pod that actually runs. The only things that actually run on your nodes are containers. Pod is just a logical grouping of containers and is the basic unit in kubernetes to create a container. (Docker container logo is a whale, a group of whales is called a pod if you want a parallel to remember this). So if the containers that belong to the pod are running, the pod is termed as running.
In the following pod specification, nginx-container and debian-container containers belong to the pod named two-containers. When you create this pod object, kube-scheduler will select a node to run this pod (i.e., to run the two containers) and assigns a node to the pod. The kubelet running on that node then gets notified and starts the two containers on the node. Since the two containers belong to same pod, they are run in same network namespace.
apiVersion: v1
kind: Pod
metadata:
name: two-containers
spec:
restartPolicy: Never
volumes:
- name: shared-data
emptyDir: {}
containers:
- name: nginx-container
image: nginx
volumeMounts:
- name: shared-data
mountPath: /usr/share/nginx/html
- name: debian-container
image: debian
volumeMounts:
- name: shared-data
mountPath: /pod-data
command: ["/bin/sh"]
args: ["-c", "echo Hello from the debian container > /pod-data/index.html"]

Number 1 & 3 are correct.
For number 2 i would say 'Each node can have set pods and each pod can have 1 or more than 1 containers'
and for your last question lets say you create a deployment having 3 pods 2 of them were deployed to node A and it's resources get consumed by 2 of them (No memory or cpu left) but 3rd pod will be in pending state as long as their is no new node to run that pod.
Their is a concept of horizontal pod auto-scaling and cluster auto-scaling
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ & https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
These will further clear your confusion

Related

Why container restart policy is defined in Pod specification?

Does anyone know why restartPolicy field is defined on the Pod level instead of the container level?
It would seem that this setting is more closely related to the container, not the Pod.
Then how to controll restart policy of single container in multi-container Pod?
I think restart policy is part of the POD spec.
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- name: 1st
image: image-1
command: ["./bash", "-test1"]
- name: 2nd
image: image-2
command: ["./bash", "-test2"]
restartPolicy: Never
Restart policy gets set the at POD spec level, and get applied to all the containers in POD even if init container is there.
If there are multi containers inside the POD, we have to consider those as tightly coupled.
Official documents says something like this : link
Pods that run multiple containers that need to work together. A Pod can encapsulate an application composed of multiple co-located
containers that are tightly coupled and need to share resources. These
co-located containers form a single cohesive unit of service—for
example, one container serving data stored in a shared volume to the
public, while a separate sidecar container refreshes or updates those
files. The Pod wraps these containers, storage resources, and an
ephemeral network identity together as a single unit.
Note: Grouping multiple co-located and co-managed containers in a
single Pod is a relatively advanced use case. You should use this
pattern only in specific instances in which your containers are
tightly coupled.
If you want to restart the single container in POD you won't be able to do it, you have keep that container out of POD then by POD design.
Even if you will see the container restart policy it's talk about the : POD spec restart policy only.

Airflow on Kubernetes - NFS volume won't mount onto worker

I'm fairly new to Kubernetes so apologies for any mixups in terminology.
I'm using the official Airflow helm chart to create a development environment, and have my Dogs (and other) folders in a NFS volume on my local machine. I have configured the values.yaml like so (same for both the scheduler and worker):
# Mount additional volumes into scheduler.
extraVolumes:
- name: dags
nfs:
server: '10.106.0.113'
path: '/home/dev/projects/airflow-jobs/dags'
- name: plugins
nfs:
server: '10.106.0.113'
path: '/home/dev/projects/airflow-jobs/plugins'
- name: scripts
nfs:
server: '10.106.0.113'
path: '/home/dev/projects/airflow-jobs/scripts'
extraVolumeMounts:
- mountPath: '/opt/airflow/dags'
name: 'dags'
- mountPath: '/opt/airflow/plugins'
name: 'plugins'
- mountPath: '/opt/airflow/scripts'
name: 'scripts'
When I then spin this up, only one of the scheduler or worker pod will mount the volume successfully - the other will fail with the following message:
> kubectl describe pod airflow-worker-0
Warning FailedMount 2s kubelet Unable to attach or mount volumes: unmounted volumes=[dags plugins scripts], unattached volumes=[dags plugins scripts logs config kube-api-access-dnsjx]: timed out waiting for the condition
Why am I receiving this error - is it not possible to have two pods using the same NFS store? I had this working before using the same values.yaml file so I don't quite know what has changed!
Figured it out - it was due to the NFS mount being configured as ReadWriteOnce. As per the documentation here, this does allow multiple pods to access the volume, but only if they are located on the same node. So what was happening is that my Scheduler pod would spin up first, mount the volume, and then when the Worker pod followed it would be unable to do so because the Scheduler had reserved the volume. By coincidence, the first time I deployed these two pods must have been assigned the same node.
The simplest solution here would be to mount this as ReadWriteMany, but as I have limited permissions to my cluster and development environment, I simply made some changes to my deployment to ensure that the pods that needed access to this volume were on the same node. Plus, learning experience!
First - get the nodes that each pod is assigned to using kubectl get pods -o wide.
Get all the nodes in the cluster kubectl get nodes --show-labels
Pick a node to assign the two pods that need to share the NFS mount to. This was arbitrary, so lets call it "node123".
Update the labels of the node kubectl label nodes node123 airflow=nfs
Finally, in the values.yaml file, specify the nodeSelector property for the Scheduler and Worker nodes!
# Select certain nodes for airflow worker pods.
nodeSelector:
airflow: nfs
Then re-deploy the chart, and everything works as intended!

How to get pods actually scheduled on master node

I'm trying to gey pods scheduled on the master node. Succesfully untainted the node
kubectl taint node mymasternode
node-role.kubernetes.io/master:NoSchedule-
node/mymasternode untainted
But then changing replicas to 4 in the deploy.yaml and apply it all the pods are scheduled on the worker nodes that were workers already.
Is there an extra step needed to get pods scheduled on the master node as well?
To get pods scheduled on Control plane nodes which have a taint applied (which most Kubernetes distributions will do), you need to add a toleration to your manifests, as described in their documentation, rather than untaint the control plane node. Untainting the control plane node can be dangerous as if you run out of resources on that node, your cluster's operation is likely to suffer.
Something like the following should work
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
If you're looking to get a pod scheduled to every node, usually the approach is to create a daemonset with that toleration applied.
If you need to have a pod scheduled to a control plane node, without using a daemonset, it's possible to combine a toleration with scheduling information to get it assigned to a specific node. The simplest approach to this is to specify the target node name in the manifest.
This isn't a very flexible approach, so for example if you wanted to assign pods to any control plane node, you could apply a label to those nodes and use a node selector combined with the toleration to get the workloads assigned there.
By default master is tainted for not to schedule any pods on it by adding Tolerations we can allow pods to be schedule on Master but thats not guranteed to make sure its schedule on master only we add nodeSeletor this will ensure pods will only schedule on master.
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
nodeSelector:
node-role.kubernetes.io/master: ""
Proof Of Concept :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8s default-scheduler Successfully assigned default/nginx to controlplane
Normal Pulled 7s kubelet Container image "nginx" already present on machine
Normal Created 7s kubelet Created container nginx
Normal Started 6s kubelet Started container nginx

What is the difference between a pod and a deployment?

I have been creating pods with type:deployment but I see that some documentation uses type:pod, more specifically the documentation for multi-container pods:
apiVersion: v1
kind: Pod
metadata:
name: ""
labels:
name: ""
namespace: ""
annotations: []
generateName: ""
spec:
? "// See 'The spec schema' for details."
: ~
But to create pods I can just use a deployment type:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: ""
spec:
replicas: 3
template:
metadata:
labels:
app: ""
spec:
containers:
etc
I noticed the pod documentation says:
The create command can be used to create a pod directly, or it can
create a pod or pods through a Deployment. It is highly recommended
that you use a Deployment to create your pods. It watches for failed
pods and will start up new pods as required to maintain the specified
number. If you don’t want a Deployment to monitor your pod (e.g. your
pod is writing non-persistent data which won’t survive a restart, or
your pod is intended to be very short-lived), you can create a pod
directly with the create command.
Note: We recommend using a Deployment to create pods. You should use
the instructions below only if you don’t want to create a Deployment.
But this raises the question of what kind:pod is good for? Can you somehow reference pods in a deployment? I didn't see a way. It looks like what you get with pods is some extra metadata but none of the deployment options such as replica or a restart policy. What good is a pod that doesn't persist data, survives a restart? I think I'd be able to create a multi-container pod with a deployment as well.
Radek's answer is very good, but I would like to pitch in from my experience, you will almost never use an object with the kind pod, because that doesn't make any sense in practice.
Because you need a deployment object - or other Kubernetes API objects like a replication controller or replicaset - that needs to keep the replicas (pods) alive (that's kind of the point of using kubernetes).
What you will use in practice for a typical application are:
Deployment object (where you will specify your apps container/containers) that will host your app's container with some other specifications.
Service object (that is like a grouping object and gives it a so-called virtual IP (cluster IP) for the pods that have a certain label - and those pods are basically the app containers that you deployed with the former deployment object).
You need to have the service object because the pods from the deployment object can be killed, scaled up and down, and you can't rely on their IP addresses because they will not be persistent.
So you need an object like a service, that gives those pods a stable IP.
Just wanted to give you some context around pods, so you know how things work together.
Hope that clears a few things for you, not long ago I was in your shoes :)
Both Pod and Deployment are full-fledged objects in the Kubernetes API. Deployment manages creating Pods by means of ReplicaSets. What it boils down to is that Deployment will create Pods with spec taken from the template. It is rather unlikely that you will ever need to create Pods directly for a production use-case.
Kubernetes has three Object Types you should know about:
Pods - runs one or more closely related containers
Services - sets up networking in a Kubernetes cluster
Deployment - Maintains a set of identical pods, ensuring that they have the correct config and that the right number of them exist.
Pods:
Runs a single set of containers
Good for one-off dev purposes
Rarely used directly in production
Deployment:
Runs a set of identical pods
Monitors the state of each pod, updating as necessary
Good for dev
Good for production
And I would agree with other answers, forget about Pods and just use Deployment. Why? Look at the second bullet point, it monitors the state of each pod, updating as necessary.
So, instead of struggling with error messages such as this one:
Forbidden: pod updates may not change fields other than spec.containers[*].image
So just refactor or completely recreate your Pod into a Deployment that creates a pod to do what you need done. With Deployment you can change any piece of configuration you want to and you need not worry about seeing that error message.
Pod is container instance.
That is the output of replicas: 3
Think of one deployment can have many running instances(replica).
//deployment.yaml
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: tomcat-deployment222
spec:
selector:
matchLabels:
app: tomcat
replicas: 3
template:
metadata:
labels:
app: tomcat
spec:
containers:
- name: tomcat
image: tomcat:9.0
ports:
- containerPort: 8080
I want to add some informations from Kubernetes In Action book, so you can see all picture and connect relation between Kubernetes resources like Pod, Deployment and ReplicationController(ReplicaSet)
Pods
are the basic deployable unit in Kubernetes. But in real-world use cases, you want your deployments to stay up and running automatically and remain healthy without any manual intervention. For this the recommended approach is to use a Deployment, which under the hood create a ReplicaSet.
A ReplicaSet, as the name implies, is a set of replicas (Pods) maintained with their Revision history.
(ReplicaSet extends an older object called ReplicationController -- which is exactly the same but without the Revision history.)
A ReplicaSet constantly monitors the list of running pods and makes sure the running number of pods matching a certain specification always matches the desired number.
Removing a pod from the scope of the ReplicationController comes in handy
when you want to perform actions on a specific pod. For example, you might
have a bug that causes your pod to start behaving badly after a specific amount
of time or a specific event.
A Deployment
is a higher-level resource meant for deploying applications and updating them declaratively.
When you create a Deployment, a ReplicaSet resource is created underneath (eventually more of them). ReplicaSets replicate and manage pods, as well. When using a Deployment, the actual pods are created and managed by the Deployment’s ReplicaSets, not by the Deployment directly
Let’s think about what has happened. By changing the pod template in your Deployment resource, you’ve updated your app to a newer version—by changing a single field!
Finally, Roll back a Deployment either to the previous revision or to any earlier revision so easy with Deployment resource.
These images are from Kubernetes In Action book, too.
Pod is a collection of containers and basic object of Kuberntes. All containers of pod lie in same node.
Not suitable for production
No rolling updates
Deployment is a kind of controller in Kubernetes.
Controllers use a Pod Template that you provide to create the Pods for which it is responsible.
Deployment creates a ReplicaSet which in turn make sure that,
CurrentReplicas is always same as desiredReplicas .
Advantages :
You can rollout and rollback your changes using deployment
Monitors the state of each pod
Best suitable for production
Supports rolling updates
In Kubernetes we can deploy our workloads using different type of API objects like Pods, Deployment, ReplicaSet, ReplicationController and StatefulSets.
Out of those Pods are the smallest deployable unit in Kubernetes. Any workload/application that runs in Kubernetes, has to run inside a container part of a Pod. A Pod could run multiple containers (meaning multiple applications) within it. A Pod is a wrapper on top of one/many running containers. Using a Pod, kubernetes could control, monitor, operate the containers.
Now using stand alone Pods we can't do lot of things. We can't change configurations, volumes inside Pods. We can't restart the Pod if one is down.
So there is another API Object called Deployment comes into picture which maintains the desired state (how many instances, how much compute resource application uses) of the application. The Deployment maintaines multiple instances of same application by running multiple Pods. Deployments unlike Pods are mutable. Deployments uses another API Object called ReplicaSet to maintain the desired state. Deployments through ReplicaSet spawns another Pod if one is down.
So Pod runs applications in containers. Deployments run Pods and maintains desired state of the application.
Try to avoid Pods and implement Deployments instead for managing containers as objects of kind Pod will not be rescheduled (or self healed) in the event of a node failure or pod termination.
A Deployment is generally preferable because it defines a ReplicaSet to ensure that the desired number of Pods is always available and specifies a strategy to replace Pods, such as RollingUpdate.
May be this example will be helpful for beginners !!
1) Listing PODs
controlplane $ kubectl -n my-namespace get pods
NAME READY STATUS RESTARTS AGE
mysql 1/1 Running 0 92s
webapp-mysql-75dfdf859f-9c54j 1/1 Running 0 92s
2) Deleting web-app pode - which is created using deployment
controlplane $ kubectl -n my-namespace delete pod webapp-mysql-75dfdf859f-9c54j
pod "webapp-mysql-75dfdf859f-9c54j" deleted
3) Listing PODs ( You can see, it is recreated automatically)
controlplane $ kubectl -n my-namespace get pods
NAME READY STATUS RESTARTS AGE
mysql 1/1 Running 0 2m42s
webapp-mysql-75dfdf859f-mqrcx 1/1 Running 0 45s
4) Deleting mysql POD whcih is created directly ( with out deployment)
controlplane $ kubectl -n my-namespace delete pod mysql
pod "mysql" deleted
5) Listing PODs ( You can see mysql POD is lost for ever )
controlplane $ kubectl -n my-namespace get pods
NAME READY STATUS RESTARTS AGE
webapp-mysql-75dfdf859f-mqrcx 1/1 Running 0 76s
In kubernetes Pods are the smallest deployable units. Every time when we create a kubernetes object like Deployments, replica-sets, statefulsets, daemonsets it creates pod.
As mentioned above deployments create pods based on desired state mentioned in your deployment object. So for example you want 5 replicas of a application, you mentioned replicas: 5 in your deployment manifest. Now deployment controller is responsible to create 5 identical replicas (no less, no more) of given application with all metadata like RBAC policy, networks policy, labels, annotations, health check, resource quotas, taint/tolerations and others and associate with each pods it creates.
There are some cases when you wants to create pod, for example if you are running a test sidecar where you don't need to run application forever, you don't need multiple replicas, and you run application when you wants to execute in that case pod is suitable. For example helm test, which is a pod definition that specifies a container with a given command to run.
I am also a beginner in k8s so correct me if I am wrong.
We know that a pod is created when we create a deployment. What I observed is that if you see the YAML file of the deployment, you can see its kind:deployment. But if you see the YAML file of the pod, you see its kind:pod.

Will (can) Kubernetes run Docker containers on the master node(s)?

Kubernetes has master and minion nodes.
Will (can) Kubernetes run specified Docker containers on the master node(s)?
I guess another way of saying it is: can a master also be a minion?
Thanks for any assistance.
Update 2015-08-06: As of PR #12349 (available in 1.0.3 and will be available in 1.1 when it ships), the master node is now one of the available nodes in the cluster and you can schedule pods onto it just like any other node in the cluster.
A docker container can only be scheduled onto a kubernetes node running a kubelet (what you refer to as a minion). There is nothing preventing you from creating a cluster where the same machine (physical or virtual) runs both the kubernetes master software and a kubelet, but the current cluster provisioning scripts separate the master onto a distinct machine.
This is going to change significantly when Issue #6087 is implemented.
You need to taint your master node to run containers on it, although not recommended.
Run this on your master node:
kubectl taint nodes --all node-role.kubernetes.io/master-
Courtesy of Alex Ellis' blog post here.
You can try this code:
kubectl label node [name_of_node] node-short-name=node-1
Create yaml file (first.yaml)
apiVersion: v1
kind: Pod
metadata:
name: nginxtest
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
node-short-name: node-1
Create a pod
kubectl create –f first.yaml