Control order of container termination in a single pod in Kubernetes

Control order of container termination in a single pod in Kubernetes - kubernetes

I have two containers inside one pod. One is my application container and the second is a CloudSQL proxy container. Basically my application container is dependent on this CloudSQL container.
The problem is that when a pod is terminated, the CloudSQL proxy container is terminated first and only after some seconds my application container is terminated.
So, before my container is terminated, it keeps sending requests to the CloudSQL container, resulting in errors:
could not connect to server: Connection refused Is the server running on host "127.0.0.1" and accepting TCP/IP connections on port 5432
That's why, I thought it would be a good idea to specify the order of termination, so that my application container is terminated first and only then the cloudsql one.
I was unable to find anything that could do this in the documentation. But maybe there is some way.

This is not directly possible with the Kubernetes pod API at present. Containers may be terminated in any order. The Cloud SQL pod may die more quickly than your application, for example if it has less cleanup to perform or fewer in-flight requests to drain.
From Termination of Pods:
When a user requests deletion of a pod, the system records the intended grace period before the pod is allowed to be forcefully killed, and a TERM signal is sent to the main process in each container.
You can get around this to an extent by wrapping the Cloud SQL and main containers in different entrypoints, which communicate their exit status between each other using a shared pod-level file system.
This solution will not work with the 1.16 release of the Cloud SQL proxy (see comments) as this release ceased to bundle a shell with the container. The 1.17 release is now available in Alpine or Debian Buster variants, so this version is now a viable upgrade target which is once again compatible with this solution.
A wrapper like the following may help with this:
containers:
- command: ["/bin/bash", "-c"]
args:
- |
trap "touch /lifecycle/main-terminated" EXIT
<your entry point goes here>
volumeMounts:
- name: lifecycle
mountPath: /lifecycle
- name: cloudsql_proxy
image: gcr.io/cloudsql-docker/gce-proxy
command: ["/bin/bash", "-c"]
args:
- |
/cloud_sql_proxy <your flags> &
PID=$!
function stop {
while true; do
if [[ -f "/lifecycle/main-terminated" ]]; then
kill $PID
fi
sleep 1
done
}
trap stop EXIT
# We explicitly call stop to ensure the sidecar will terminate
# if the main container exits outside a request from Kubernetes
# to kill the Pod.
stop &
wait $PID
volumeMounts:
- name: lifecycle
mountPath: /lifecycle
You'll also need a local scratch space to use for communicating lifecycle events:
volumes:
- name: lifecycle
emptyDir:
How does this solution work? It intercepts in the Cloud SQL proxy container the SIGTERM signal passed by the Kubernetes supervisor to each of your pod's containers on shutdown. The "main process" running in that container is a shell, which has spawned a child process running the Cloud SQL proxy. Thus, the Cloud SQL proxy is not immediately terminated. Rather, the shell code blocks waiting for a signal (by simple means of a file appearing in the file system) from the main container that it has successfully exited. Only at that point is the Cloud SQL proxy process terminated and the sidecar container returns.
Of course, this has no effect on forced termination in the event your containers take too long to shutdown and exceed the configured grace period.
The solution depends on the containers you are running having a shell available to them; this is true of the Cloud SQL proxy (except 1.16, and 1.17 onwards when using the alpine or debian variants), but you may need to make changes to your local container builds to ensure this is true of your own application containers.

Related

Kubernetes get log within container

Background: I use glog to register signal handler, but it cannot kill the init process (PID=1) with kill sigcall. That way, even though deadly signals like SIGABRT is raised, kubernetes controller manager won't be able to understand the pod is actually not functioning, thus kill the pod and restart a new one.
My idea is to add logic into my readiness/liveness probe: check the content for current container, whether it's in healthy state.
I'm trying to look into the logs on container's local filesystem /var/log, but haven't found anything useful.
I'm wondering if it's possible to issue a HTTP request to somewhere, to get the complete log? I assume it's stored somewhere.

You can find the kubernetes logs on Master machine at:
/var/log/pods
if using docker containers:
/var/lib/docker/containers

Containers are Ephemeral
Docker containers emit logs to the stdout and stderr output streams. Because containers are stateless, the logs are stored on the Docker host in JSON files by default.
The default logging driver is json-file. The logs are then annotated with the log origin, either stdout or stderr, and a timestamp. Each log file contains information about only one container.
As #Uri Loya said, You can find these JSON log files in /var/lib/docker/containers/ directory on a Linux Docker host. Here's how you can access them:
/var/lib/docker/containers/<container id>/<container id>-json.log
You can collect the logs with a log aggregator and store them in a place where they'll be available forever. It's dangerous to keep logs on the Docker host because they can build up over time and eat into your disk space. That's why you should use a central location for your logs and enable log rotation for your Docker containers.

Cilium pods stuck in Terminating state when running helm delete

I have cilium installed in my test cluster (AWS, with the AWS CNI deleted because we use the cilium CNI plugin) and whenever I delete the cilium namespace (or run helm delete), the hubble-ui pod gets stuck in terminating state. The pod has a couple of containers, but I notice that one container named backend exits with code 137 when the namespace is deleted, leaving the hubble-ui pod and the namespace that the pod is in, stuck in Terminating state. From what I am reading online, containers exit with 137 when they attempt to use more memory that they have been allocated. In my test cluster, no resource limits have been defined (spec.containers.[*].resources = {}) on the pod or namespace. There is no error message displayed as reason for the error. I am using the cilium helm package v1.12.3, but this issue has been going on even before we updated the helm package version.
I would like to know what is causing this issue as it is breaking my CI pipeline.
How can I ensure a graceful exit of the backend container? (as opposed to clearing finalizers).

So it appears that there is a bug in the backend application/container for the hubble-ui service. Kubernetes sends a SIGTERM signal to the container and it fails to respond. I verified this by getting a shell into the container and sending SIGTERM and SIGINT, which is what the application seems to listen for in order to exit and it just doesn’t respond to either signal.
Next, I added a preStop hook that looks like below and the pod behaved itself
...
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "kill -SIGILL 1; true"]

Init container in openshift

I am new to openshift, I have gone through Openshift website for more details but wanted to know if anyone has deployed init container.
I want to use that to take dump from database and restore it to new version of it with the help of init container
We are using postgres database
Any help would be appreciated.
Thanks!

I want to use that to take dump from database and restore it to new version of it with the help of init container
I would say you should rather use the operator instead of initContainer. Take a look at below Init Containers Design Considerations
There are some considerations that you should take into account when you create initcontainers:
They always get executed before other containers in the Pod. So, they
shouldn’t contain complex logic that takes a long time to complete.
Startup scripts are typically small and concise. If you find that
you’re adding too much logic to init containers, you should consider
moving part of it to the application container itself.
Init containers are started and executed in sequence. An init
container is not invoked unless its predecessor is completed
successfully. Hence, if the startup task is very long, you may
consider breaking it into a number of steps, each handled by an init
container so that you know which steps fail.
If any of the init containers fail, the whole Pod is restarted
(unless you set restartPolicy to Never). Restarting the Pod means
re-executing all the containers again including any init containers.
So, you may need to ensure that the startup logic tolerates being
executed multiple times without causing duplication. For example, if
a DB migration is already done, executing the migration command again
should just be ignored.
An init container is a good candidate for delaying the application
initialization until one or more dependencies are available. For
example, if your application depends on an API that imposes an API
request-rate limit, you may need to wait for a certain time period to
be able to receive responses from that API. Implementing this logic
in the application container may be complex; as it needs to be
combined with health and readiness probes. A much simpler way would
be creating an init container that waits until the API is ready
before it exits successfully. The application container would start
only after the init container has done its job successfully.
Init containers cannot use health and readiness probes as application
containers do. The reason is that they are meant to start and exit
successfully, much like how Jobs and CronJobs behave.
All containers on the same Pod share the same Volumes and network.
You can make use of this feature to share data between the
application and its init containers.
The only thing I found about using it for dumping data is this example about doing that with mysql, maybe it can guide you how to do it with postgresql.
In this scenario, we are serving a MySQL database. This database is used for testing an application. It doesn’t have to contain real data, but it must be seeded with enough data so that we can test the application's query speed. We use an init container to handle downloading the SQL dump file and restore it to the database, which is hosted in another container. This scenario can be illustrated as below:
The definition file may look like this:
apiVersion: v1
kind: Pod
metadata:
name: mydb
labels:
app: db
spec:
initContainers:
- name: fetch
image: mwendler/wget
command: ["wget","--no-check-certificate","https://sample-videos.com/sql/Sample-SQL-File-1000rows.sql","-O","/docker-entrypoint-initdb.d/dump.sql"]
volumeMounts:
- mountPath: /docker-entrypoint-initdb.d
name: dump
containers:
- name: mysql
image: mysql
env:
- name: MYSQL_ROOT_PASSWORD
value: "example"
volumeMounts:
- mountPath: /docker-entrypoint-initdb.d
name: dump
volumes:
- emptyDir: {}
name: dump
The above definition creates a Pod that hosts two containers: the init container and the application one. Let’s have a look at the interesting aspects of this definition:
The init container is responsible for downloading the SQL file that contains the database dump. We use the mwendler/wget image because we only need the wget command.
The destination directory for the downloaded SQL is the directory used by the MySQL image to execute SQL files (/docker-entrypoint-initdb.d). This behavior is built into the MySQL image that we use in the application container.
The init container mounts /docker-entrypoint-initdb.d to an emptyDir volume. Because both containers are hosted on the same Pod, they share the same volume. So, the database container has access to the SQL file placed on the emptyDir volume.
Additionally for best practices I would suggest to take a look at kubernetes operators, as far as I know that's the best practice way to menage databases in kubernetes.
If you're not familiar with operators I would suggest to start with kubernetes documentation and this short video on youtube.
Operators are methods of packaging Kubernetes that enable you to more easily manage and monitor stateful applications. There are many operators already available, such as the
Crunchy Data PostgreSQL Operator
Postgres Operator
which automates and simplifies deploying and managing open source PostgreSQL clusters on Kubernetes by providing the essential features you need to keep your PostgreSQL clusters up and running.

kubernetes: Call command in another containers which are in same pod

Is there any approach that one container can call command in another container? The containers are in the same pod.
I need many command line tools which are shipped as image as well as in packages. But I don’t want to install all of them into one container because of some concerns.

This is very possible as long as you have k8s v1.17+. You must enable shareProcessNamespace: true and then all the container processes are available to other containers in the same pod.
Here are the docs, have a look.

In general, no, you can't do this in Kubernetes (or in plain Docker). You should either move the two interconnected things into the same container, or wrap some sort of network service around the thing you're trying to call (and then probably put it in a separate pod with a separate service in front of it).
There might be something you could do if you set up a service account, installed a Kubernetes API sidecar container, and used the Kubernetes API to do the equivalent of kubectl exec, but I'd consider this a solution of last resort.

Containers in pod are isolated from each other except that they share volume and network namespace. So you would not be able to execute command from one container into another. However, you could expose the commands in container through APIs

We are currently running EKS v1.20 and we were able to achieve this using the shareProcessNamespace: true that mr haven mention. In our particular case, we needed a debian 10 php container to execute a SAS binary command with arguments. SAS is installed and running in a centos 7 container in the same pod. Using helm, we enabled shareProcessNamespace and in the container's arguments and command fields we built symlinks to that binary using bash -c once the pod came online. We grabbed the pid of the shared container by using pgrep and since we know that the centos container's entry point is tail -f /dev/null so we just look for that process $(pgrep tail) initially.
- image: some_php_container
command: ["bash", "-c"]
args: [ "SAS_PROC_PID=$(pgrep tail) && \
ln -sf /proc/$SAS_PROC_PID/root/usr/local/SAS/SAS_9.4/SASFoundation/9.4/bin/sas_u8 /usr/bin/sas && \
ln -sf /proc/$SAS_PROC_PID/root/usr/local/SAS /usr/local/SAS && \
. /opt/script_runner.sh" ]
Now the php container is able to execute the sas command with arguments and process data files using the SAS software running on the centos container.
One issue we quickly found out is if the resulting SAS container happened to die in the pod, the pid would change and thus the symlinks would be broken on the php container. So we just put in liveness probe to frequently check to see if the path to binary using current pid exist, if the probe fails, it restarts the php container and thus rebuilding the symlinks with the right pid.
livenessProbe:
exec:
command:
- bash
- -c
- SAS_PROC_PID=$(pgrep tail)
- test -f /proc/$SAS_PROC_PID/root/usr/local/SAS/SAS_9.4/SASFoundation/9.4/bin/sas_u8
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 1
Hopefully above info can help someone else.

You can do this without shareProcessNamespace by using a shared volume and some named pipes. It manages all the I/O for you and is trivially simple and extremely fast.
For a complete description and code, see this solution I created. Contains examples.

Monitor and take action based on pod log event

I have deployed PagerBot https://github.com/stripe-contrib/pagerbot to our internal k8s cluster as a learning opportunity. I had fun writing a helm chart for it!
The bot appears to disconnect from slack at an unknown time and never reconnect. I kill the pod and the deployment recreates it and it connects again (we are using the Slack RTM option).
The pod logs the following entry when it disconnects:
2018-02-24 02:31:14.382590 I [9:34765020] PagerBot::SlackRTMAdapter -- Closed connection to chat. --
I want to learn a method of monitoring for this log entry and taking action. Initially I thought a Liveness probe would be the way to go using a command that returns non-zero when this entry is logged. But the logs aren't stored internally to the container (that I can see).
How do you monitor and take action based on logs that can be seen using kubectl logs pod-name?
Can I achieve this in our Prometheus test deployment? Should I be using a known k8s feature?

I would argue the best course of action is to extend pagerbot to surface more than just the string literal pong in its /ping endpoint, then use that as its livelinessProbe, with a close second being to teach the thing to just reconnect, as that's almost certainly cheaper than tearing down the Pod
Having said that, one approach you may consider is a sidecar container that uses the Pod's service account credentials to monitor the sibling's container (akin to if kubectl logs -f -c pagerbot $my_pod_name | grep "Closed connection to chat"; then kill -9 $pagerbot_pid; fi type deal). That is a little awkward, but I can't immediately think of why it wouldn't work

I ended up landing on a "liveness probe" to solve my problem. I've added the following to my deployment for the pageyBot deployment:
livenessProbe:
exec:
command:
- bash
- -c
- "ss -an | grep -q 'EST.*:443 *$'"
initialDelaySeconds: 120
periodSeconds: 60
Basically tests to see if a connection is established for 443 which we noticed goes away when the bot disconnects.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse