How to pass args to pods based on Ordinal Index in StatefulSets? - kubernetes

Is it possible to pass different args to pods based on their ordinal index in StatefulSets? Didn't find the answer on the StatefulSets documentation. Thanks!

Recommended way, see https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/#statefulset
# Generate server-id from pod ordinal index.
[[ `hostname` =~ -([0-9]+)$ ]] || exit 1
ordinal=${BASH_REMATCH[1]}
# ${ordinal} now holds the replica number
server-id=$((100 + $ordinal))
# Copy appropriate conf.d files from config-map to emptyDir.
if [[ $ordinal -eq 0 ]]; then
# do something
else
# do something else
fi

I don't know of a non-hacky way to do it, but I know a hack that works. First, each pod in a StatefulSet gets a unique predictable name. It can discover that name via the downward API or just by calling hostname. So I have shell script as then entrypoint to my container and that script gets it's pod/hostname. From there it calls the "real" executable using command line args appropriate for the specific host.
For example, one of my scripts expects the pod name to be mapped into the environment as POD_NAME via downward api. It then does something like:
#!/bin/bash
pet_number=${POD_NAME##*-}
if [ pet_number == 0 ]
then
# stuff here
fi
# etc.

I found one less hacky way to pass in ordinal index into container using the lifecycle hooks
containers:
- name: cp-kafka
imagePullPolicy: Always
image: confluentinc/cp-kafka:4.0.0
resources:
requests:
memory: "2Gi"
cpu: 0.5
ports:
- containerPort: 9093
name: server
protocol: TCP
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "export KAFKA_BROKER_ID=${HOSTNAME##*-}"]
env:
- name: KAFKA_ZOOKEEPER_CONNECT
value: zk-cs.default.svc.cluster.local:2181
- name: KAFKA_ADVERTISED_LISTENERS
value: PLAINTEXT://localhost:9093

In case you want to track the progress of this, the ticket in question for this feature is here: https://github.com/kubernetes/kubernetes/issues/30427
The proposal involves putting the ordinal as a label on the pod and then using the downward api to pull that out into an environment variable or something.

You can avoid using the Downward API by using HOSTNAME:
command:
- bash
- c
- |
ordinal=${HOSTNAME##*-}
if [[ "$ordinal" = "0" ]]; then
...
else
...
fi

Related

Ensure pod deletion when one of container terminates [duplicate]

I have a Kubernetes JOB that does database migrations on a CloudSQL database.
One way to access the CloudSQL database from GKE is to use the CloudSQL-proxy container and then connect via localhost. Great - that's working so far. But because I'm doing this inside a K8s JOB the job is not marked as successfully finished because the proxy keeps on running.
$ kubectrl get po
NAME READY STATUS RESTARTS AGE
db-migrations-c1a547 1/2 Completed 0 1m
Even though the output says 'completed' one of the initially two containers is still running - the proxy.
How can I make the proxy exit on completing the migrations inside container 1?
The best way I have found is to share the process namespace between containers and use the SYS_PTRACE securityContext capability to allow you to kill the sidecar.
apiVersion: batch/v1
kind: Job
metadata:
name: my-db-job
spec:
template:
spec:
restartPolicy: OnFailure
shareProcessNamespace: true
containers:
- name: my-db-job-migrations
command: ["/bin/sh", "-c"]
args:
- |
<your migration commands>;
sql_proxy_pid=$(pgrep cloud_sql_proxy) && kill -INT $sql_proxy_pid;
securityContext:
capabilities:
add:
- SYS_PTRACE
- name: cloudsql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.17
command:
- "/cloud_sql_proxy"
args:
- "-instances=$(DB_CONNECTION_NAME)=tcp:5432"
One possible solution would be a separate cloudsql-proxy deployment with a matching service. You would then only need your migration container inside the job that connects to your proxy service.
This comes with some downsides:
higher network latency, no pod local mysql communication
possible security issue if you provide the sql port to your whole kubernetes cluster
If you want to open cloudsql-proxy to the whole cluster you have to replace tcp:3306 with tcp:0.0.0.0:3306 in the -instance parameter on the cloudsql-proxy.
There are 3 ways of doing this.
1- Use private IP to connect your K8s job to Cloud SQL, as described by #newoxo in one of the answers. To do that, your cluster needs to be a VPC-native cluster. Mine wasn't and I was not whiling to move all my stuff to a new cluster. So I wasn't able to do this.
2- Put the Cloud SQL Proxy container in a separate deployment with a service, as described by #Christian Kohler. This looks like a good approach, but it is not recommended by Google Cloud Support.
I was about to head in this direction (solution #2) but I decided to try something else.
And here is the solution that worked for me:
3- You can communicate between different containers in the same Pod/Job using the file system. The idea is to tell the Cloud SQL Proxy container when the main job is done, and then kill the cloud sql proxy. Here is how to do it:
In the yaml file (my-job.yaml)
apiVersion: v1
kind: Pod
metadata:
name: my-job-pod
labels:
app: my-job-app
spec:
restartPolicy: OnFailure
containers:
- name: my-job-app-container
image: my-job-image:0.1
command: ["/bin/bash", "-c"]
args:
- |
trap "touch /lifecycle/main-terminated" EXIT
{ your job commands here }
volumeMounts:
- name: lifecycle
mountPath: /lifecycle
- name: cloudsql-proxy-container
image: gcr.io/cloudsql-docker/gce-proxy:1.11
command: ["/bin/sh", "-c"]
args:
- |
/cloud_sql_proxy -instances={ your instance name }=tcp:3306 -credential_file=/secrets/cloudsql/credentials.json &
PID=$!
while true
do
if [[ -f "/lifecycle/main-terminated" ]]
then
kill $PID
exit 0
fi
sleep 1
done
securityContext:
runAsUser: 2 # non-root user
allowPrivilegeEscalation: false
volumeMounts:
- name: cloudsql-instance-credentials
mountPath: /secrets/cloudsql
readOnly: true
- name: lifecycle
mountPath: /lifecycle
volumes:
- name: cloudsql-instance-credentials
secret:
secretName: cloudsql-instance-credentials
- name: lifecycle
emptyDir:
Basically, when your main job is done, it will create a file in /lifecycle that will be identified by the watcher added to the cloud-sql-proxy container, which will kill the proxy and terminate the container.
I hope it helps! Let me know if you have any questions.
Based on: https://stackoverflow.com/a/52156131/7747292
Doesn't look like Kubernetes can do this alone, you would need to manually kill the proxy once the migration exits. Similar question asked here: Sidecar containers in Kubernetes Jobs?
Google cloud sql has recently launched private ip address connectivity for cloudsql. If the cloud sql instance and kubernetes cluster is in same region you can connect to cloudsql without using cloud sql proxy.
https://cloud.google.com/sql/docs/mysql/connect-kubernetes-engine#private-ip
A possible solution would be to set the concurrencyPolicy: Replace in the job spec ... this will agnostically replace the current pod with the new instance whenever it needs to run again. But, you have to make sure that the subsequent cron runs are separated enough.
Unfortunately the other answers weren't working for me because of CloudSQLProxy running in a distroless environment where there is no shell.
I managed to get around this by bundling a CloudSQLProxy binary with my deployment and running a bash script to start up CloudSQLProxy followed by my app.
Dockerfile:
FROM golang:1.19.4
RUN apt update
COPY . /etc/mycode/
WORKDIR /etc/mycode
RUN chmod u+x ./scripts/run_migrations.sh
RUN chmod u+x ./bin/cloud_sql_proxy.linux-amd64
RUN go install
ENTRYPOINT ["./scripts/run_migrations.sh"]
Shell Script (run_migrations.sh):
#!/bin/sh
# This script is run from the parent directory
dbConnectionString=$1
cloudSQLProxyPort=$2
echo "Starting Cloud SQL Proxy"
./bin/cloud_sql_proxy.linux-amd64 -instances=${dbConnectionString}=tcp:5432 -enable_iam_login -structured_logs &
CHILD_PID=$!
echo "CloudSQLProxy PID: $CHILD_PID"
echo "Migrating DB..."
go run ./db/migrations/main.go
MAIN_EXIT_CODE=$?
kill $CHILD_PID;
echo "Migrations complete.";
exit $MAIN_EXIT_CODE
K8s (via Pulumi):
import * as k8s from '#pulumi/kubernetes'
const jobDBMigrations = new k8s.batch.v1.Job("job-db-migrations", {
metadata: {
namespace: namespaceName,
labels: appLabels,
},
spec: {
backoffLimit: 4,
template: {
spec: {
containers: [
{
image: pulumi.interpolate`gcr.io/${gcpProject}/${migrationsId}:${migrationsVersion}`,
name: "server-db-migration",
args: [
dbConnectionString,
],
},
],
restartPolicy: "Never",
serviceAccount: k8sSAMigration.metadata.name,
},
},
},
},
{
provider: clusterProvider,
});

What is "/usr/bin/nsenter -m/proc/1/ns/mnt" in Kubernetes Daemonset?

I have read some tutorials of how to mount a volume in container and run the script on host/node directly. These are the examples given.
DeamonSet pod spec
hostPID: true
nodeSelector:
cloud.google.com/gke-local-ssd: "true"
volumes:
- name: setup-script
configMap:
name: local-ssds-setup
- name: host-mount
hostPath:
path: /tmp/setup
initContainers:
- name: local-ssds-init
image: marketplace.gcr.io/google/ubuntu1804
securityContext:
privileged: true
volumeMounts:
- name: setup-script
mountPath: /tmp
- name: host-mount
mountPath: /host
command:
- /bin/bash
- -c
- |
set -e
set -x
# Copy setup script to the host
cp /tmp/setup.sh /host
# Copy wait script to the host
cp /tmp/wait.sh /host
# Wait for updates to complete
/usr/bin/nsenter -m/proc/1/ns/mnt -- chmod u+x /tmp/setup/wait.sh
# Give execute priv to script
/usr/bin/nsenter -m/proc/1/ns/mnt -- chmod u+x /tmp/setup/setup.sh
# Wait for Node updates to complete
/usr/bin/nsenter -m/proc/1/ns/mnt /tmp/setup/wait.sh
# If the /tmp folder is mounted on the host then it can run the script
/usr/bin/nsenter -m/proc/1/ns/mnt /tmp/setup/setup.sh
containers:
- image: "gcr.io/google-containers/pause:2.0"
name: pause
(There is a configmap for composing the .sh files. I just skip that)
What does "/usr/bin/nsenter -m/proc/1/ns/mnt" mean? Is this a command to run something on host? what is "/proc/1/ns/mnt" ?
Lets start from the namepaces to understand this in detail :
Namespaces in container helps to isolate resources among the process. Namespaces controls the resources from the kernal and allocate to the process. This provides a great isolation among different containers that may run in a system.
Having said that, it will also make things complicated with these access restrictions to the namespaces. so comes the nsenter command , which will give the conatiners access to the namespaces. something similar to the sudo command.
.This command can give us access to mount, UTS, IPC, Network, PID,user,cgroup, and time namespaces.
the -m in your example is --mount which will access to the mount namespace specified by that file.

How to set environment variable in container from Kubernetes?

I want to set an environment variable (I'll just name it ENV_VAR_VALUE) to a container during deployment through Kubernetes.
I have the following in the pod yaml configuration:
...
...
spec:
containers:
- name: appname-service
image: path/to/registry/image-name
ports:
- containerPort: 1234
env:
- name: "ENV_VAR_VALUE"
value: "some.important.value"
...
...
The container needs to use the ENV_VAR_VALUE's value.
But in the container's application logs, it's value always comes out empty.
So, I tried checking it's value from inside the container:
$ kubectl exec -it appname-service bash
root#appname-service:/# echo $ENV_VAR_VALUE
some.important.value
root#appname-service:/#
So, the value was successfully set.
I imagine it's because the environment variables defined from Kubernetes are set after the container is already initialized.
So, I tried overriding the container's CMD from the pod yaml configuration:
...
...
spec:
containers:
- name: appname-service
image: path/to/registry/image-name
ports:
- containerPort: 1234
env:
- name: "ENV_VAR_VALUE"
value: "some.important.value"
command: ["/bin/bash"]
args: ["-c", "application-command"]
...
...
Even still, the value of ENV_VAR_VALUE is still empty during the execution of the command.
Thankfully, the application has a restart function
-- because when I restart the app, ENV_VAR_VALUE get used successfully.
-- I can at least do some other tests in the mean time.
So, the question is...
How should I configure this in Kubernetes so it isn't a tad too late in setting the environment variables?
As requested, here is the Dockerfile.
I apologize for the abstraction...
FROM ubuntu:18.04
RUN apt-get update && apt-get install -y some-dependencies
COPY application-script.sh application-script.sh
RUN ./application-script.sh
# ENV_VAR_VALUE is set in this file which is populated when application-command is executed
COPY app-config.conf /etc/app/app-config.conf
CMD ["/bin/bash", "-c", "application-command"]
You can try also running two commands in Kubernetes POD spec:
(read in env vars): "source /env/required_envs.env" (would come via secret mount in volume)
(main command): "application-command"
Like this:
containers:
- name: appname-service
image: path/to/registry/image-name
ports:
- containerPort: 1234
command: ["/bin/sh", "-c"]
args:
- source /env/db_cred.env;
application-command;
Why don't you move the
RUN ./application-script.sh
below
COPY app-config.conf /etc/app/app-config.conf
Looks like the app is running before the env conf is available for it.

How to roll kubernetes updates in intervals

We have a case where we need to make sure that pods in k8s have the latest version possible. What is the best way to accomplish this?
First idea was to kill the pod after some point, knowing that the new ones will come up pulling the latest image. Here is what we found so far. Still don't know how to do it.
Another idea is having rolling-update executed in intervals, like every 5 hours. Is there a way to do this?
As mentioned by #svenwltr using activeDeadlineSeconds is an easy option but comes with the risk of loosing all pods at once. To mitigate that risk I'd use a deployment to manage the pods and their rollout, and configure a small second container along with the actual application. The small helper could be configured like this (following the official docs):
apiVersion: v1
kind: Pod
metadata:
name: app-liveness
spec:
containers:
- name: liveness
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep $(( RANDOM % (3600) + 1800 )); rm -rf /tmp/healthy; sleep 600
image: gcr.io/google_containers/busybox
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
- name: yourapplication
imagePullPolicy: Always
image: nginx:alpine
With this configuration every pod would break randomly within the configured timeframe (here between 30 and 90mins) and that would trigger the start of a new pod. The imagePullPolicy: Always would then make sure that the image is updated during that cycle.
This of course assumes that your application versions are always available under the same name/tag.
Another alternative is to use a deployment and let the controller handle roll outs. To be more specific: If you update the image field in the deployment yaml, it automatically updates every pod. IMO that's the cleanest way, but it has some requirements:
You cannot use the latest tag. The assumption is that a container only needs an update, when the image tag changes.
If an updated happens, you have to update image tag manually, somehow. This might be done by a custom controller which checks for new tags and updates the deployment accordingly. Or this could be triggered by a Continuous Delivery system.
To use your linked feature you just have to specify activeDeadlineSeconds in your pods.
Not tested example:
apiVersion: v1
kind: Pod
metadata:
name: "nginx"
spec:
activeDeadlineSeconds: 3600
containers:
- name: nginx
image: nginx:alpine
imagePullPolicy: Always
The downside of this is, that you cannot control when the deadline kicks in. This means it might happen, that all your pods get killed at the same time and the whole service gets offline (that depends on you applications).
I tried using Pagid's solution, but unfortunately my observation and subsequent research indictate that his assertion that a failing container will restart the whole pod is incorrect. It turns out that only the failing container will be restarted, which obviously does not help much when the point is to restart the other containers in the pod at random intervals.
The good news is that I have a solution that seems to work which is based on his answer. Basically, instead of writing to /tmp/healthy, you instead write to a shared volume which each of the containers within the pod have mounted. You also need to add the liveness probe to each of those pods. Here's an example based on the one I am using:
volumes:
- name: healthcheck
emptyDir:
medium: Memory
containers:
- image: alpine:latest
volumeMounts:
- mountPath: /healthcheck
name: healthcheck
name: alpine
livenessProbe:
exec:
command:
- cat
- /healthcheck/healthy
initialDelaySeconds: 5
periodSeconds: 5
- name: liveness
args:
- /bin/sh
- -c
- touch /healthcheck/healthy; sleep $(( RANDOM % (3600) + 1800 )); rm -rf /healthcheck/healthy; sleep 600
image: gcr.io/google_containers/busybox
volumeMounts:
- mountPath: /healthcheck
name: healthcheck
livenessProbe:
exec:
command:
- cat
- /healthcheck/healthy
initialDelaySeconds: 5
periodSeconds: 5

Start kubernetes container with specific command

Using fleet I can specify a command to be run inside the container when it is started. It seems like this should be easily possible with Kubernetes as well, but I can't seem to find anything that says how. It seems like you have to create the container specifically to launch with a certain command.
Having a general purpose container and launching it with different arguments is far simpler than creating many different containers for specific cases, or setting and getting environment variables.
Is it possible to specify the command a kubernetes pod runs within the Docker image at startup?
I spend 45 minutes looking for this. Then I post a question about it and find the solution 9 minutes later.
There is an hint at what I wanted inside the Cassandra example. The command line below the image:
id: cassandra
kind: Pod
apiVersion: v1beta1
desiredState:
manifest:
version: v1beta1
id: cassandra
containers:
- name: cassandra
image: kubernetes/cassandra
command:
- /run.sh
cpu: 1000
ports:
- name: cql
containerPort: 9042
- name: thrift
containerPort: 9160
env:
- key: MAX_HEAP_SIZE
value: 512M
- key: HEAP_NEWSIZE
value: 100M
labels:
name: cassandra
Despite finding the solution, it would be nice if there was somewhere obvious in the Kubernetes project where I could see all of the possible options for the various configuration files (pod, service, replication controller).
for those looking to use a command with parameters, you need to provide an array
for example
command: [ "bin/bash", "-c", "mycommand" ]
or also
command:
- "bin/bash"
- "-c"
- "mycommand"
To answer Derek Mahar's question in the comments above:
What is the purpose of args if one could specify all arguments using command?
Dockerfiles can have an Entrypoint only or a CMD only or both of them together.
If used together then whatever is in CMD is passed to the command in ENTRYPOINT as arguments i.e.
ENTRYPOINT ["print"]
CMD ["hello", "world"]
So in Kubernetes when you specify a command i.e.
command: ["print"]
It will override the value of Entrypoint in the container's Dockerfile.
If you only specify arguments then those arguments will be passed to whatever command is in the container's Entrypoint.
In order to specify the command a kubernetes pod runs within the Docker image at startup we need to include the command and args fields inside the yaml file for command and arguments to be passed. For example,
apiVersion: v1
kind: Pod
metadata:
name: command-demo
labels:
purpose: demo-command
spec:
containers:
- name: command-demo-container
image: ubuntu
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
Additionally to the accepted answer, you can use variables with values from secrets in the commands as follows:
command: ["/some_command","-instances=$(<VARIABLE_NAME>)"]
env:
- name: <VARIABLE_NAME>
valueFrom:
secretKeyRef:
name: <secret_name>
key: <secret_key>