How to make deployment restart if another pod restarts - kubernetes

I have a web deployment and a mongoDB statefulset. The web deployment connects to the mongodb but once in a while a error may occur in the mongodb and it reboots and starts up. The connection from the web deployment to the mongodb never get restarted. Is there a way in the web deployment. If the mongodb pod restarts to restart the web pod as well?

Yes, you can use a liveness probe on your application container that probes your Mongo Pod/StatefulSet. You can configure it in such a way that it fails if it fails to TCP connect to your Mongo Pod/StatefulSet when Mongo crashes (Maybe check every second)
Keep in mind that with this approach you will have to always start your Mongo Pod/StatefulSet first.
The sidecar function described in the other answer should work too, only it would take a bit more configuration.

Unfortunately, there's no easy way to do this within Kubernetes directly, as Kubernetes has no concept of dependencies between resources.
The best place to handle this is within the web server pod itself.
The ideal solution is to update the application to retry the connection on a failure.
A less ideal solution would be to have a side-car container that just polls the database and causes a failure if the database goes down, which should cause Kubernetes to restart the pod.

Related

k8s container initialization and load balancing

I have a deployment with one pod with my custom image. After executing kubectl create -f deployment.yaml, this pod becomes running. I see that everything is fine and it has "running" state in kubectl's output. But, i have one initialization script to start Apache Tomcat, it takes around 40-45 seconds to execute it and up server inside.
I also have load balancer deployment with nginx. Nginx redirects incoming requests to Apache Tomcat via proxy_pass. When i scale my deployment for 2 replicas and shut down one of them, sometimes application becomes stuck and freezing.
I feel that load balancing by k8s works not correctly, k8s is trying to use pod, which is initializing by script right now.
How can i tell k8s that pod in deployment hasn't been initialized and not to use it until it becomes totally up?
If I understand correctly mostly your problem is related to the application not being ready to accept requests because your initialization script hasn’t finished.
For that situation, you can easily setup different types of probes, such as liveliness and readiness. Such a solution would be useful, as your application wouldn’t be considered ready to accept requests unless the whole pod would start up and signal that it is alive.
Here you can read more about it: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

Kubernetes - waiting on Oracle DB container

Apologies for a basic question. I have a simple Kubernetes deployment where I have 3 containers (each in their own pod) deployed to a Kubernetes cluster.
The RESTapi container is dependent upon the OracleDB container starting. However, the OracleDB container takes a while to startup and by that time the RESTapi container has restarted a number of times due to not being able to connect and ends up in a Backoff state.
Is there a more elegant solution for this?
I’ve also noticed that when the RESTapi container goes into the Backoff state it stops retrying?
This is a community wiki answer posted for better visibility. Feel free to expand it.
The best approach in this case is to improve your “RESTapi” application to provide a more reliable and fault-tolerant service that will allow it reconnect to the database anyway.
From Kubernetes
production
best practices:
When the app starts, it shouldn't crash because a dependency such as a
database isn't ready.
Instead, the app should keep retrying to connect to the database until
it succeeds.
Kubernetes expects that application components can be started in any
order.
In other case you can use solution with Init Containers.
You can look at this question on stackoverflow, which is just one of many others about the practical use of Init Containers for the case described.
An elegant way to achieve this is by using a combination of Kubernetes Init Containers paired with k8s-wait-for scripts.
Essentially, what you would do is configure an Init Container for your RESTapi which uses k8s-wait-for. You configure k8s-wait-for to wait for a specific pod to be in a given state, in this case, you can provide the OracleDB pod and wait for it to be in a Ready state.
The resulting effect will be that the deployment of the RESTapi will be paused until the OracleDB is ready to go. That should alleviate the constant restarts.
https://github.com/groundnuty/k8s-wait-for

Specify scheduling order of a Kubernetes DaemonSet

I have Consul running in my cluster and each node runs a consul-agent as a DaemonSet. I also have other DaemonSets that interact with Consul and therefore require a consul-agent to be running in order to communicate with the Consul servers.
My problem is, if my DaemonSet is started before the consul-agent, the application will error as it cannot connect to Consul and subsequently get restarted.
I also notice the same problem with other DaemonSets, e.g Weave, as it requires kube-proxy and kube-dns. If Weave is started first, it will constantly restart until the kube services are ready.
I know I could add retry logic to my application, but I was wondering if it was possible to specify the order in which DaemonSets are scheduled?
Kubernetes itself does not provide a way to specific dependencies between pods / deployments / services (e.g. "start pod A only if service B is available" or "start pod A after pod B").
The currect approach (based on what I found while researching this) seems to be retry logic or an init container. To quote the docs:
They run to completion before any app Containers start, whereas app Containers run in parallel, so Init Containers provide an easy way to block or delay the startup of app Containers until some set of preconditions are met.
This means you can either add retry logic to your application (which I would recommend as it might help you in different situations such as a short service outage) our you can use an init container that polls a health endpoint via the Kubernetes service name until it gets a satisfying response.
retry logic is preferred over startup dependency ordering, since it handles both the initial bringup case and recovery from post-start outages

First request to a new ReplicaSet times out

I have a Kubernetes cluster on AWS, set up with kops.
I set up a Deployment that runs an Apache container and a Service for the Deployment (type: LoadBalancer).
When I update the deployment by running kubectl set image ..., as soon as the first pod of the new ReplicaSet becomes ready, the first couple of requests to the service time out.
Things I have tried:
I set up a readinessProbe on the pod, works.
I ran curl localhost on a pod, works.
I performed a DNS lookup for the service, works.
If I curl the IP returned by that DNS lookup inside a pod, the first request will timeout. This tells me it's not an ELB issue.
It's really frustrating since otherwise our Kubernetes stack is working great, but every time we deploy our application we run the risk of a user timing out on a request.
After a lot of debugging, I think I've solved this issue.
TL;DR; Apache has to exit gracefully.
I found a couple of related issues:
https://github.com/kubernetes/kubernetes/issues/47725
https://github.com/kubernetes/ingress-nginx/issues/69
504 Gateway Timeout - Two EC2 instances with load balancer
Some more things I tried:
Increase the KeepAliveTimeout on Apache, didn't help.
Ran curl on the pod IP and node IPs, worked normally.
Set up an externalName selector-less service for a couple of external dependencies, thinking it might have something to do with DNS lookups, didn't help.
The solution:
I set up a preStop lifecycle hook on the pod to gracefully terminate Apache to run apachectl -k graceful-stop
The issue (at least from what I can tell), is that when pods are taken down on a deployment, they receive a TERM signal, which causes apache to immediately kill all of its children. This might cause a race condition where kube-proxy still sends some traffic to pods that have received a TERM signal but not terminated completely.
Also got some help from this blog post on how to set up the hook.
I also recommend increasing the terminationGracePeriodSeconds in the PodSpec so apache has enough time to exit gracefully.

Complete Kubernetes Jobs when one container complete

Is it possible to have a Job that compete when a container complete?
For exemple, I want to run a Job of one pod with 2 containers:
Elasticsearch container
Some Java app container connecting to Elasticsearch
The Java app container runs and complete, but obviously the Elasticsearch container continues to run indefinitely.
As a result the Job never completes. What is the solution?
Cheers
This is probably not the easiest way to do it, but you could use the Kubernetes API to delete the job:
https://kubernetes.io/docs/api-reference/v1.7/#delete-41.
I'm not sure how you're starting the job, or how realistic this solution is in your scenario.
Not sure about your uses case. My understanding is Elasticsearch should be running all the time to query the data.
See you can run two different pods. One for Elasticsearch and another one for your java application. just call java application from your job.
You should look at the livenessProbe capability. This is a capability defined in the deployment, which runs every x seconds while your container is running, to make sure it is running correctly. When a liveness probe fails, Kubernetes will terminate the container. Here is the official Kubernetes documentation on liveness and readiness probes.
The strategy here would be to use the liveness probe on the Elasticsearch container to check that the Java app has a connection to it. As soon as the java app completes, the connection will no longer be there, causing the liveness probe to fail, and kubernetes will terminate the Elasticsearch container.
Look out though, I think kubectl tries to restart the container if it is terminated by a liveness probe failure. You might want to look into disabling that or something.