Continuous deployment of airflow dags - kubernetes

I have the airflow deployed in Kubernetes and it is using the persistent volume method for dag deployment. I am trying to write a script (using GitHub action for CI/CD) for the deployment of my airflow dags which is somewhat like -
DAGS=(./*.py)
for dag in ${DAGS[#]};
do
kubectl cp "${dag}" --namespace=${NAMESPACE} ${WEB_SERVER_POD_NAME}:/path/to/dags/folder
done
I can successfully deploy new dags and even update them.
But the problem is, I am unable to remove old dags (I used for testing purpose) present in the dags folder of airflow.
Is there a way I can do it?
P.S. I cannot use the below command as it would delete any running dags -
kubectl exec --namespace=${NAMESPACE} ${WEB_SERVER_POD_NAME} -- bash -c "rm -rf /path/to/dags/folder/*"

I don't think this was an option when you originally posted, but for others:
Github Actions lets you create workflows that are manually triggered, and accept input values. https://github.blog/changelog/2020-07-06-github-actions-manual-triggers-with-workflow_dispatch/
The command would be something like:
kubectl exec --namespace=${NAMESPACE} ${WEB_SERVER_POD_NAME} -- bash -c "rm /path/to/dags/folder/${USER_INPUT_DAG_FILE_NAME}"

Related

kubectl cp from a completed pod to local computer

I would like to use kubectl cp to copy a file from a completed pod to my local host(local computer). I used kubectl cp /:/ , however, it gave me an error: cannot exec into a container in a completed pod; current phase is Succeeded error. Is there a way I can copy a file from a completed pod? It does not need to be kubectl cp. Any help appreciated!
Nope. If the pod is gone, it's gone for good. Only possibility would be if the data is stored in a PV or some other external resource. Pods are cattle, not pets.
You can find the files, because the containers of a pod in the state Completed are not deleted, they are just not running.
I am not aware of any way to do it via Kubernetes itself, but here is how to do it if your container runtime is Docker:
$ ssh <node where the pod is>
$ docker ps -a | grep <pod name>
$ docker cp <pod name>:/your/files ./
The files in containers are just overlayfs mounts; if the container still exists, the files still exist.
So if you are using containerd runtime or something else, look at /var/lib/containers or something (don't know where different runtimes do their overlayfs mounts, but it can't not be at the node. you could check if you find out where via $ mount).

Pipe file from container in k8s to local machine, edit and copy back to container in k8s?

I need to update a file in a container running in k8s using my local editor and save back the updates to the original file in the container without restarting/redeploying the container.
Right now I do:
$ kubectl exec tmp-shell -- cat /root/motd > motd && vi motd && kubectl cp motd tmp-shell:/root/motd
Is there some better way to do this?
I have looked at:
https://github.com/ksync/ksync
but seems heavyweight for something this "simple".
Notice:
I don't want to use the editor that might or might not be available inside the container - since an editor is not guaranteed to be available.
One option that might be available is ephemeral debug containers however they are an alpha feature so probably not enabled for you at time of writing. Barring that, yeah what you said is an option. It probably goes without saying but this is a very bad idea, might not work at all if the target file isn't writable (which it shouldn't be in most cases) either because of file permissions, or because the container is running in immutable mode. Also this would only matter if the thing using the file will detect the change without reloading.
A better medium term plan would be to store the content in the ConfgMap and mount it into place. That would let you update it whenever you want.
You can use https://github.com/waverage/kubelink utility. It is like ksync but simpler and based on kubectl cp and kubectl exec commands.
Install with command: pip install kubelink and then use like:
kubelink create --name mypreset --source /Users/bob/myproject --destination /code --namespace default --selector "app=backend"
kubelink watch mypreset
Kubelink watches your local files and uploads them to Kubernetes pods when they change.

How can I run a Kubernetes pod with the sole purpose of running exec against it?

Please before you comment or answer, this question is about a CLI program, not a service. Apparently 90% of Kubernetes has to do with running services, so there is sparse documentation for CLI programs meant to be part of a pipeline workflow.
I have a command line program that uses stdout for JSON results.
I have a docker image for the command line program.
If I create the container as a Kubernetes Job, than stdout and stderr are mixed and require heuristic scrubbing to get pure JSON out.
The stderr messages are from native libraries outside of my direct control.
Supposedly, if I run kubectl exec against a running pod, I will get the normal stdout/stderr pipes.
Is there a way to just have the pod running without an entrypoint (or some dummy service entrypoint) with the sole purpose of running kubectl exec against it?
Is there a way to just have the pod running without an entrypoint [...]?
A pod consists of one or more containers, each of which has an individual entrypoint. It is certainly possible to run a container with a dummy command, for example, you can build an image with:
CMD sleep inf
This will run a container that will persist until you kill it, and you could happily docker exec into it.
You can apply the same solution to k8s. You could build an image as described above and deploy that in a pod, or you could use an existing image and simply set the command, as in:
spec:
containers:
- name: mycontainer
image: myexistingimage
command: ["sleep", "inf"]
You can use kubectl as docker cli https://kubernetes.io/docs/reference/kubectl/docker-cli-to-kubectl/
kubectl run just do the job. There is no need for a workaround.
Aditionally, you can attach I/O and disable automatic restart:
kubectl run -i -t busybox --image=busybox --restart=Never

Command Line Option to Activate Airflow DAGs

We have a continuous integration pipeline which automatically deploys our Airflow DAGs to the Airflow server. When a new version of a DAG is deployed, its status is OFF by default. We would like to turn it ON as part of the tasks executed by the deployment process.
Is there a command line option in Airflow which allows to turn ON a DAG?
Thank you
Ok it seems I did not look carefully enough. The answer is just here in Airflow Documentation
You can turn OFF a DAG with the following command:
$ airflow pause <dag_id>
You can turn ON a DAG with the following command:
$ airflow unpause <dag_id>
Update:
The command airflow unpause has been removed.
You should now use
$ airflow dags unpause <dag_id>
instead of
$ airflow unpause <dag_id>
When you say new version, I am assuming you change the DAG_ID, have you consider to update the airflow.cfg to
dags_are_paused_at_creation = False?

Create a deployment from a pod in kubernetes

For a use case I need to create deployments from a pod when a script is being executed from inside the pod.
I am using google container engine for my cluster.
How to configure the container inside the pod to be able to run commands like kubectl create deployment.yaml?
P.S A bit clueless about it at the moment.
Your container is going to need to have kubectl available. There are some container images available, personally I can't vouch for any of them.
Personally I'd probably build my own and download the latest kubectl. A Dockerfile like this is probably a good starting point
FROM alpine:latest
RUN apk --no-cache add curl
RUN curl https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl -o /usr/local/bin/kubectl
RUN chmod +x /usr/local/bin/kubectl
This will build you a container image with kubectl, so you can then all the kubectl commands you want.