Recommended way to persistently change kube-env variables - kubernetes

We are using elasticsearch/kibana instead of gcp for logging (based on what is described here).
To have fluentd-elsticsearch pod's launched we've set LOGGING_DESTINATION=elasticsearch and ENABLE_NODE_LOGGING="true" in the "Compute Instance Template" -> "Custom metadata" -> "kube-env".
While this works fine when done manually it gets overwritten with every gcloud container clusters upgrade as a new Instance Template with defaults (LOGGING_DESTINATION=gcp ...) is created.
My question is: How do I persist this kind of configuration for GKE/GCE?
I thought about adding a k8s-user-startup-script but that's also defined in the Instance Template and therefore is overwritten by gcloud container clusters upgrade.
I've also tried to add a k8s-user-startup-script to the project metadata but that is not taken into account.
//EDIT
Current workaround (without recreating Instance Template and Instances) for manually switching back to elasticsearch is:
for node in $(kubectl get nodes -o name | cut -f2 -d/); do
gcloud compute ssh $node \
--command="sudo cp -a /srv/salt/fluentd-es/fluentd-es.yaml /etc/kubernetes/manifests/; sudo rm /etc/kubernetes/manifests/fluentd-gcp.yaml";
done
kubelet will pick that up, kill fluentd-gcp and start fluentd-es.
//EDIT #2
Now running a "startup-script" DaemonSet for this:
kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
name: startup-script
namespace: kube-system
labels:
app: startup-script
spec:
template:
metadata:
labels:
app: startup-script
spec:
hostPID: true
containers:
- name: startup-script
image: gcr.io/google-containers/startup-script:v1
securityContext:
privileged: true
env:
- name: STARTUP_SCRIPT
value: |
#! /bin/bash
set -o errexit
set -o pipefail
set -o nounset
# Replace Google-Cloud-Logging with EFK
if [[ ! -f /etc/kubernetes/manifests/fluentd-es.yaml ]]; then
if [[ -f /home/kubernetes/kube-manifests/kubernetes/fluentd-es.yaml ]]; then
# GCI images
cp -a /home/kubernetes/kube-manifests/kubernetes/fluentd-es.yaml /etc/kubernetes/manifests/
elif [[ -f /srv/salt/fluentd-es/fluentd-es.yaml ]]; then
# Debian based GKE images
cp -a /srv/salt/fluentd-es/fluentd-es.yaml /etc/kubernetes/manifests/
fi
test -f /etc/kubernetes/manifests/fluentd-es.yaml && rm /etc/kubernetes/manifests/fluentd-gcp.yaml
fi

There isn't a fully supported way to reconfigure the kube-env in GKE. As you've found, you can hack the instance template, but this isn't guaranteed to work across upgrades.
An alternative is to create your cluster without gcp logging enabled and then create a DaemonSet that places a fluentd-elasticsearch pod on each of your nodes. Using this technique you don't need to write a (brittle) startup script or rely on the fact that the built-in startup script happens to work when setting LOGGING_DESTINATION=elasticsearch (which may break across upgrades even if it wasn't getting overwritten).

Related

Terminate istio-proxy after cronjob completion

I have a k8s cronjob run my docker image transaction-service.
It starts and gets its job done successfully. When it's over, I expect the pod to terminate but... istio-proxy still lingers there:
And that results in:
Nothing too crazy, but I'd like to fix it.
I know I should call curl -X POST http://localhost:15000/quitquitquit
But I don't know where and how. I need to call that quitquitquit URL only when transaction-service is in a completed state. I read about preStop lifecycle hook, but I think I need more of a postStop one. Any suggestions?
You have a few options here:
On your job/cronjob spec, add the following lines and your job immediately after:
command: ["/bin/bash", "-c"]
args:
- |
trap "curl --max-time 2 -s -f -XPOST http://127.0.0.1:15020/quitquitquit" EXIT
while ! curl -s -f http://127.0.0.1:15020/healthz/ready; do sleep 1; done
echo "Ready!"
< your job >
Disable Istio injection at the Pod level in your Job/Cronjob definition:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
...
spec:
...
jobTemplate:
spec:
template:
metadata:
annotations:
# disable istio on the pod due to this issue:
# https://github.com/istio/istio/issues/11659
sidecar.istio.io/inject: "false"
Note: The annotation should be on the Pod's template, not on the Job's template.
You can use TTL mechanism for finished Jobs mentioned in kubernetes doc which help removing the whole pod.
In my Dockerfile I put
ADD ./entrypoint.sh /entrypoint.sh
RUN ["chmod", "+x", "/entrypoint.sh"]
RUN apk --no-cache add curl
ENTRYPOINT ["/entrypoint.sh"]
My entrypoint.sh looks like this:
#!/bin/sh
/app/myapp && curl -X POST http://localhost:15000/quitquitquit
It works.

Is there a way to enable shareProcessNamespace for helm post-install hook?

I'm running a pod with 3 containers (telegraf, fluentd and an in-house agent) that makes use of shareProcessNamespace: true.
I've written a python script to fetch the initial config for telegraf and fluentd from a central controller API endpoint. Since this is a one time operation, I plan to use helm post-install hook.
apiVersion: batch/v1
kind: Job
metadata:
name: agent-postinstall
annotations:
"helm.sh/hook-weight": "3"
"helm.sh/hook": "post-install"
spec:
template:
spec:
containers:
- name: agent-postinstall
image: "{{ .Values.image.agent.repository }}:{{ .Values.image.agent.tag | default .Chart.AppVersion }}"
imagePullPolicy: IfNotPresent
command: ['python3', 'getBaseCfg.py']
volumeMounts:
- name: config-agent-volume
mountPath: /etc/config
volumes:
- name: config-agent-volume
configMap:
name: agent-cm
restartPolicy: Never
backoffLimit: 1
It is required for the python script to check if telegraf/fluentd/agent processes are up, before getting the config. I intend to wait (with a timeout) until pgrep <telegraf/fluentd/agent> returns true and then fire APIs. Is there a way to enable shareProcessNamespace for the post-install hook as well? Thanks.
PS: Currently, the agent calls the python script along with its own startup script. It works, but it is kludgy. I'd like to move it out of agent container.
shareProcessNamespace
Most important part of this flag is it works only within one pod, all containers within one pod will share processes between each other.
In described approach job is supposed to be used. Job creates a separate pod so it won't work this way. Container should be a part of the "main" pod with all other containers to have access to running processes of that pod.
More details about process sharing.
Possible way to solution it
It's possible to get processes from the containers directly using kubectl command.
Below is an example how to check state of the processes using pgrep command. The pgrepContainer container needs to have the pgrep command already installed.
job.yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ .Release.Name }}-postinstall-hook"
annotations: "helm.sh/hook": post-install
spec:
template:
spec:
serviceAccountName: config-user # service account with appropriate permissions is required using this approach
volumes:
- name: check-script
configMap:
name: check-script
restartPolicy: Never
containers:
- name: post-install-job
image: "bitnami/kubectl" # using this image with kubectl so we can connect to the cluster
command: ["bash", "/mnt/script/checkScript.sh"]
volumeMounts:
- name: check-script
mountPath: /mnt/script
And configmap.yaml which contains script and logic which check three processes in loop for 60 iterations per 10 seconds each:
apiVersion: v1
kind: ConfigMap
metadata:
name: check-script
data:
checkScript.sh: |
#!/bin/bash
podName=test
pgrepContainer=app-1
process1=sleep
process2=pause
process3=postgres
attempts=0
until [ $attempts -eq 60 ]; do
kubectl exec ${podName} -c ${pgrepContainer} -- pgrep ${process1} 1>/dev/null 2>&1 \
&& kubectl exec ${podName} -c ${pgrepContainer} -- pgrep ${process2} 1>/dev/null 2>&1 \
&& kubectl exec ${podName} -c ${pgrepContainer} -- pgrep ${process3} 1>/dev/null 2>&1
if [ $? -eq 0 ]; then
break
fi
attempts=$((attempts + 1))
sleep 10
echo "Waiting for all containers to be ready...$[ ${attempts}*10 ] s"
done
if [ $attempts -eq 60 ]; then
echo "ERROR: Timeout"
exit 1
fi
echo "All containers are ready !"
echo "Configuring telegraf and fluentd services"
Final result will look like:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
test 2/2 Running 0 20m
test-postinstall-hook-dgrc9 0/1 Completed 0 20m
$ kubectl logs test-postinstall-hook-dgrc9
Waiting for all containers to be ready...10 s
All containers are ready !
Configuring telegraf and fluentd services
Above is an another approach, you can use its logic as base to achieve your end goal.
postStart
Also postStart hook can be considered to be used where some logic will be located. It will run after container is created. Since main application takes time to start and there's already logic which waits for it, it's not an issue that:
there is no guarantee that the hook will execute before the container ENTRYPOINT

Kubernetes: How to update a live busybox container's 'command'

I have the following manifest that created the running pod named 'test'
apiVersion: v1
kind: Pod
metadata:
name: hello-world
labels:
app: blue
spec:
containers:
- name: funskies
image: busybox
command: ["/bin/sh", "-c", "echo 'Hello World'"]
I want to update the pod to include the additional command
apiVersion: v1
kind: Pod
metadata:
name: hello-world
labels:
app: blue
spec:
containers:
restartPolicy: Never
- name: funskies
image: busybox
command: ["/bin/sh", "-c", "echo 'Hello World' > /home/my_user/logging.txt"]
What I tried
kubectl edit pod test
What resulted
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
# pods "test" was not valid:
# * spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`...
Other things I tried:
Updated the manifest and then ran apply - same issue
kubectl apply -f test.yaml
Question: What is the proper way to update a running pod?
You can't modify most properties of a Pod. Typically you don't want to directly create Pods; use a higher-level controller like a Deployment.
The Kubernetes documentation for a PodSpec notes (emphasis mine):
containers: List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated.
In all cases, no matter what, a container runs a single command, and if you want to change what that command is, you need to delete and recreate the container. In Kubernetes this always means deleting and recreating the containing Pod. Usually you shouldn't use bare Pods, but if you do, you can create a new Pod with the new command and delete the old one. Deleting Pods is extremely routine and all kinds of ordinary things cause it to happen (updating Deployments, a HorizontalPodAutoscaler scaling down, ...).
If you have a Deployment instead of a bare Pod, you can freely change the template: for the Pods it creates. This includes changing their command:. This will result in the Deployment creating a new Pod with the new command, and once it's running, deleting the old Pod.
The sorts of very-short-lived single-command containers you show in the question aren't necessarily well-suited to running in Kubernetes. If the Pod isn't going to stay running and serve requests, a Job could be a better match; but a Job believes it will only be run once, and if you change the pod spec for a completed Job I don't think it will launch a new Pod. You'd need to create a new Job for this case.
I am not sure what the whole requirement is.
but you can exec to the pod and update the details
$ kubectl exec <pod-name> -it -n <namespace> -- <command to execute>
like,
$ kubectl exec pod/hello-world-xxxx-xx -it -- /bin/bash
if tty support shell, use "/bin/sh" to update the content or command.
Editing the running pod, will not retain the changes in manifest file. so in that case you have to run a new pod with the changes.

What are production uses for Kubernetes pods without an associated deployment?

I have seen the one-pod <-> one-container rule, which seems to apply to business logic pods, but has exceptions when it comes to shared network/volume related resources.
What are encountered production uses of deploying pods without a deployment configuration?
I use pods directly to start a Centos (or other operating system) container in which to verify connections or test command line options.
As a specific example, below is a shell script that starts an ubuntu container. You can easily modify the manifest to test secret access or change the service account to test access control.
#!/bin/bash
RANDOMIZER=$(uuid | cut -b-5)
POD_NAME="bash-shell-$RANDOMIZER"
IMAGE=ubuntu
NAMESPACE=$(uuid)
kubectl create namespace $NAMESPACE
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: $POD_NAME
namespace: $NAMESPACE
spec:
containers:
- name: $POD_NAME
image: $IMAGE
command: ["/bin/bash"]
args: ["-c", "while true; do date; sleep 5; done"]
hostNetwork: true
dnsPolicy: Default
restartPolicy: Never
EOF
echo "---------------------------------"
echo "| Press ^C when pod is running. |"
echo "---------------------------------"
kubectl -n $NAMESPACE get pod $POD_NAME -w
echo
kubectl -n $NAMESPACE exec -it $POD_NAME -- /bin/bash
kubectl -n $NAMESPACE delete pod $POD_NAME
kubectl delete namespace $NAMESPACE
In our case, we use stand alone pods for debugging purposes only.
Otherwise you want your configuration to be stateless and written in YAML files.
For instance, debugging the dns resolution: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/
kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
kubectl exec -i -t dnsutils -- nslookup kubernetes.default

Use other deployment IP in YAML deployment configuration

I'm doing a prototype where one service depends on an availability of other. Scenario:
Service A is assumed to be already available in a local network. It was either deployed by K8S or manually (or even a managed one provided by AWS etc.).
Service B depends on environment variable SERVICE_A_IP and won't start without it. It's treated as a black box and can't be modified.
I want to pass Service A IP to Service B through K8S YAML configuration file. Perfect syntax for this occasion:
...
env:
- name: SERVICE_A_IP
valueFrom:
k8sDeployment:
name: service_a
key: deploymentIP
...
During the prototyping stage Service A is an another K8S deployment but it might not be so in a production environment. Thus I need to decouple from SERVICE_A_SERVICE_IP that will be available to Service B (given it's deployed after Service A). I'm not into DNS discovery as well as it would require container modification which is far from a perfect solution.
If I would do it manually with kubectl (or with a shell script) it would be like the following:
$ kubectl run service_a --image=service_a:latest --port=8080
$ kubectl expose deployment service_a
$ SERVICE_A_IP="$(kubectl describe service service_a | \
grep IP: | \
cut -f2 -d ':' | \
xargs)"
$ kubectl run service_b --image=service_b:latest --port=8080 \
--env="SERVICE_A_IP=${SERVICE_A_IP}"
It works. Though I want to do the same using YAML configuration without injecting SERVICE_A_IP into configuration file with shell (basically modifying the file).
Is there any way to do so? Please take the above setting as set in stone.
UPDATE
Not the best way though still:
$ kubectl create -f service_a.yml
deployment "service_a" created
service "service_a" created
$ SERVICE_A_IP="$(kubectl describe service service_a | \
grep IP: | \
cut -f2 -d ':' | \
xargs)"
$ kubectl create configmap service_a_meta \
--from-literal="SERVICE_A_IP=${SERVICE_A_IP}"
And then in service_b.yml:
...
env:
- name: SERVICE_A_IP
valueFrom:
configMapKeyRef:
name: service_a_meta
key: SERVICE_A_IP
...
That will work but still involves some shell and generally feels way too hax.
You can use attach handlers to lifecycle events for update your environment variables on start.
Here is an example:
apiVersion: v1
kind: Pod
metadata:
name: appB
spec:
containers:
- name: appB
image: nginx
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "export SERVICE_B_IP=$(host <SERVICE_B>.<SERVICE_B_NAMESPACE>.svc.cluster.local)"]
Kubernetes will run preStart script each time when pod with your appB container is starting right in appB container before execution of the main application.
But, because of that description:
PostStart
This hook executes immediately after a container is created. However, there is no guarantee that the hook will execute before the container ENTRYPOINT. No parameters are passed to the handler.
You need to add some sleep for your main app before the real start just to be sure that hook will be finished before application will be started.