I'm just confused between the replication controller and livenessProbs in K8S. Could anyone explain this?
ReplicationController and livenessProbe have nothing common so it is really hard to be confused, moreover Kubernetes documentation (check links) has a great explanation for both these objects.
Replication controller is older version of replicasets.
replication controller basically manage the state of replicas running inside kubernetes cluster.
Replication controller use at cluster level.
Liveness probe use at pod level. Liveness probe constantly on a frequent base ping one endpoint and check for the service liveness.if service is not live it will restart the pod.
ReplicationController and livenessProbe have nothing common.
Replication Controller in K8s makes sure that a specified number of pod replicas are running at any one time. Those pods supposed to be always up and available.
If there are too many pods, the ReplicationController terminates the extra pods. If there are too few, the ReplicationController starts more pods. Unlike manually created pods, the pods maintained by a ReplicationController are automatically replaced if they fail, are deleted, or are terminated.
Example Replication Controller config file:
apiVersion: v1
kind: ReplicationController
metadata:
name: nginx
spec:
replicas: 3
selector:
app: nginx
template:
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
Workflow of Replication Controllers:
More information you can find here: Replication Controller.
Useful articel: replication controller actions.
Liveness probe in K8s.
A Probe is a diagnostic performed periodically by the kubelet on a Container. To perform a diagnostic, the kubelet calls a Handler implemented by the Container.
The kubelet can optionally perform and react to two kinds of probes on running Containers:
livenessProbe: Indicates whether the Container is running. If the liveness probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state is Success.
More information you can find here: pod lifecycle.
Useful article: Kubernetes probes.
Related
Below is kubernetes POD definition
apiVersion: v1
kind: Pod
metadata:
name: static-web
labels:
role: myrole
spec:
containers:
- name: web
image: nginx
ports:
- name: web
containerPort: 80
protocol: TCP
as I have not specified the resources, how much Memory & CPU will be allocated? Is there a kubectl to find what is allocated for the POD?
If resources are not specified for the Pod, the Pod will be scheduled to any node and resources are not considered when choosing a node.
The Pod might be "terminated" if it uses more memory than available or get little CPU time as Pods with specified resources will be prioritized. It is a good practice to set resources for your Pods.
See Configure Quality of Service for Pods - your Pod will be classified as "Best Effort":
For a Pod to be given a QoS class of BestEffort, the Containers in the Pod must not have any memory or CPU limits or requests.
In your case, kubernetes will assign QoS called BestEffort to your pod.
That's means kube-scheduler has no idea how to schedule your pod and just do its best.
That's also means your pod can consume any amount of resource(cpu/mem)it want (but the kubelet will evict it if anything goes wrong).
To see the resource cost of your pod, you can use kubectl top pod xxxx
I've currently set up a PVC with the name minio-pvc and created a deployment based on the stable/minio chart with the values
mode: standalone
replicas: 1
persistence:
enabled: true
existingClaim: minio-pvc
What happens if I increase the number of replicas? Do i run the risk of corrupting data if more than one pod tries to write to the PVC at the same time?
Don't use deployment for stateful containers. Instead use StatefulSets.
StatefulSets are specifically designed for running stateful containers like databases. They are used to persist the state of the container.
Note that each pod is going to bind a separate persistent volume via pvc. There is no possibility of multiple instances of pods writing to same pv. Hope I answered your question.
In case you are sticking to Deployments instead of StatefulSets it won't be feasible for multiple replicas to write to the same PVC, since there is no guarantee that the different replicas are scheduled on the same node, and so you might have a pending pod waiting to establish a connection to the volume and fail. The solution is to choose a specific node and have all your replicas run on the same node.
Run the following and assign a label to one of your nodes:
kubectl label nodes <node-name> <label-key>=<label-value>
Say we choose label-key to be labelKey and label-value to be node1. Then you can go ahead and add the following to your YAML file and have the pods scheduled on the same node:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
labels:
app: my-app
spec:
replicas: 3
template:
spec:
nodeSelector:
labelKey: node1
containers:
...
I have a deployment with a defined number of replicas. I use readiness probe to communicate if my Pod is ready/ not ready to handle new connections – my Pods toggle between ready/ not ready state during their lifetime.
I want Kubernetes to scale the deployment up/ down to ensure that there is always the desired number of pods in a ready state.
Example:
If replicas is 4 and there are 4 Pods in ready state, then Kubernetes should keep the current replica count.
If replicas is 4 and there are 2 ready pods and 2 not ready pods, then Kubernetes should add 2 more pods.
How do I make Kubernetes scale my deployment based on the "ready"/ "not ready" status of my Pods?
I don't think this is possible. If pod is not ready, k8 will not make it ready as It is something which releated to your application.Even if it create new pod, how readiness will be guaranted. So you have to resolve the reasons behind non ready status and then k8. Only thing k8 does it keep them away from taking world load to avoid request failure
Ensuring you always have 4 pods running can be done by specifying the replicas property in your deployment definition:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 4 #here we define a requirement for 4 replicas
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
Kubernetes will ensure that if any pods crash, replacement pods will be created so that a total of 4 are always available.
You cannot schedule deployments on unhealthy nodes in the cluster. The master api will only create pods on nodes which are healthy and meet the quota criteria to create any additional pods on the nodes which are schedulable.
Moreover, what you define is called an auto-heal concept of k8s which in basic terms will be taken care of.
According to Termination of Pods, step 7 occurs simultaneously with 3. Is there any way I can prevent this from happening and have 7 occur only after the Pod's graceful termination (or expiration of the grace period)?
The reason why I need this is that my Pod's termination routine requires my-service-X.my-namespace.svc.cluster.local to resolve to the Pod's IP during the whole process, but the corresponding Endpoint gets removed as soon as I run kubectl delete on the Pod / Deployment.
Note: In case it helps making this clear, I'm running a bunch of clustered VerneMQ (Erlang) nodes which, on termination, dump their contents to other nodes on the cluster — hence the need for the nodenames to resolve correctly during the whole termination process. Only then should the corresponding Endpoints be removed.
Unfortunately kubernetes was designed to remove the Pod from the endpoints at the same time as the prestop hook is started (see link in question to kubernetes docs):
At the same time as the kubelet is starting graceful shutdown, the
control plane removes that shutting-down Pod from Endpoints
This google kubernetes docs says it even more clearly:
Pod is set to the “Terminating” State and removed from the
endpoints list of all Services
There also was also a feature request for that. which was not recognized.
Solution for helm users
But if you are using helm, you can use hooks (e.g. pre-delete,pre-upgrade,pre-rollback). Unfortunately this helm hook is an extra pod which can not access all pod resources.
This is an example for a hook:
apiVersion: batch/v1
kind: Job
metadata:
name: graceful-shutdown-hook
annotations:
"helm.sh/hook": pre-delete,pre-upgrade,pre-rollback
labels:
app.kubernetes.io/name: graceful-shutdown-hook
spec:
template:
spec:
containers:
- name: graceful-shutdown
image: busybox:1.28.2
command: ['sh', '-cx', '/bin/sleep 15']
restartPolicy: Never
backoffLimit: 0
Maybe you should consider using headless service instead of using ClusterIP one. That way your apps will discover using the actual endpoint IPs and the removal from endpoint list will not break the availability during shutdown, but will remove from discovery (or from ie. ingress controller backends in nginx contrib)
We are running a workload against a cluster hosting 2 instances of a small (3 container) pod. Accessing the pod using a service w/nodeport. If we stop a pod and rc starts a new one, our constant (low volume) workload has numerous failures (Rational Perf Tester, http test hitting the service on the master ... but likely same if it were hitting either minion ... master also has a minion). Anyway, if we just add a pod with kubectl scale, we also get errors. If we then take down this a pod (rc doesn't start a new one because we had one more than needed due to scale) ... no errors. Seems that service starts sending work to new pod because kubelet has done his thing, even though containers are not up. Thus, any time a pod is started ... it starts receiving work a little too soon (after kubelet did his work, but before all containers are ready). Is there a way to guarantee that the service will not route to this pod until all containers are up? Barring that is there some way to say wait 'n' seconds before sending to this pod? I may be wrong, but behavior seems to suggest this scenario.
This is precisely what the readinessProbe option is :)
It's documented more here and here, and is part of the container definition in a pod specification.
For example, you might use a pod specification like the one below to ensure that your nginx pod won't be marked as ready (and thus won't have traffic sent to it) until it responds to an HTTP request for /index.html:
apiVersion: v1
kind: ReplicationController
metadata:
name: my-nginx
spec:
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
lifecycle:
httpGet:
path: /index.html
port: 80
initialDelaySeconds: 10
timeoutSeconds: 5