We have a continuous integration pipeline which automatically deploys our Airflow DAGs to the Airflow server. When a new version of a DAG is deployed, its status is OFF by default. We would like to turn it ON as part of the tasks executed by the deployment process.
Is there a command line option in Airflow which allows to turn ON a DAG?
Thank you
Ok it seems I did not look carefully enough. The answer is just here in Airflow Documentation
You can turn OFF a DAG with the following command:
$ airflow pause <dag_id>
You can turn ON a DAG with the following command:
$ airflow unpause <dag_id>
Update:
The command airflow unpause has been removed.
You should now use
$ airflow dags unpause <dag_id>
instead of
$ airflow unpause <dag_id>
When you say new version, I am assuming you change the DAG_ID, have you consider to update the airflow.cfg to
dags_are_paused_at_creation = False?
Related
We are using the Docker Compose TeamCity build runner and would like the containers to continue running after the build.
I understand that the docker-compose build step itself follows a successful 'up' with a 'down', so I have attempted to bring them back up in a subsequent command line step with simply:
docker-compose up -d
I can see from the log that this is initially successful but when the build process exits, so do the containers. I have also tried:
nohup docker-compose up -d &
The outcome is the same.
How do we keep the containers running when the build has finished?
For info, the environment is both TeamCity and its BuildAgent running on the same Ubuntu box.
I have achieved this by NOT using the Docker Compose build runner. I now just have a single command line build step doing:
docker-compose down
docker-compose up -d
This works, and I feel rather silly ;-)
I have the airflow deployed in Kubernetes and it is using the persistent volume method for dag deployment. I am trying to write a script (using GitHub action for CI/CD) for the deployment of my airflow dags which is somewhat like -
DAGS=(./*.py)
for dag in ${DAGS[#]};
do
kubectl cp "${dag}" --namespace=${NAMESPACE} ${WEB_SERVER_POD_NAME}:/path/to/dags/folder
done
I can successfully deploy new dags and even update them.
But the problem is, I am unable to remove old dags (I used for testing purpose) present in the dags folder of airflow.
Is there a way I can do it?
P.S. I cannot use the below command as it would delete any running dags -
kubectl exec --namespace=${NAMESPACE} ${WEB_SERVER_POD_NAME} -- bash -c "rm -rf /path/to/dags/folder/*"
I don't think this was an option when you originally posted, but for others:
Github Actions lets you create workflows that are manually triggered, and accept input values. https://github.blog/changelog/2020-07-06-github-actions-manual-triggers-with-workflow_dispatch/
The command would be something like:
kubectl exec --namespace=${NAMESPACE} ${WEB_SERVER_POD_NAME} -- bash -c "rm /path/to/dags/folder/${USER_INPUT_DAG_FILE_NAME}"
Difficulty running airflow commands when running Airflow on Kubernetes that I installed from the Helm stable/airflow repo. For instance I try to exec into the scheduler pod and run airflow list and I get the following error:
airflow.exceptions.AirflowConfigException: error: cannot use sqlite with the KubernetesExecutor airlow
Ok so I switch to the celery executor.
Same thing
airflow.exceptions.AirflowConfigException: error: cannot use sqlite with the CeleryExecutor
So what is the correct way to run airflow CLI commands when running on K8s?
Make sure you are using bash. /home/airflow/.bashrc imports the environment variables from /home/airflow/airflow_env.sh to setup the connection. The following are some examples:
kubectl exec -ti airflow-scheduler-nnn-nnn -- /bin/bash
$ airflow list_dags
Or with shell you can import the env vars yourself:
kubectl exec -ti airflow-scheduler-nnn-nnn -- sh -c ". /home/airflow/airflow_env.sh && airflow list_dags"
Airflow webserver is still running in the foreground with this command "airflow webserver -D". I am not sure if I am missing something here? Although, airflow scheduler -D is working properly
remove airflow-webserver.err and airflow-webserver-monitor.pid from AIRFLOW_HOME directory.
helped me
Please before you comment or answer, this question is about a CLI program, not a service. Apparently 90% of Kubernetes has to do with running services, so there is sparse documentation for CLI programs meant to be part of a pipeline workflow.
I have a command line program that uses stdout for JSON results.
I have a docker image for the command line program.
If I create the container as a Kubernetes Job, than stdout and stderr are mixed and require heuristic scrubbing to get pure JSON out.
The stderr messages are from native libraries outside of my direct control.
Supposedly, if I run kubectl exec against a running pod, I will get the normal stdout/stderr pipes.
Is there a way to just have the pod running without an entrypoint (or some dummy service entrypoint) with the sole purpose of running kubectl exec against it?
Is there a way to just have the pod running without an entrypoint [...]?
A pod consists of one or more containers, each of which has an individual entrypoint. It is certainly possible to run a container with a dummy command, for example, you can build an image with:
CMD sleep inf
This will run a container that will persist until you kill it, and you could happily docker exec into it.
You can apply the same solution to k8s. You could build an image as described above and deploy that in a pod, or you could use an existing image and simply set the command, as in:
spec:
containers:
- name: mycontainer
image: myexistingimage
command: ["sleep", "inf"]
You can use kubectl as docker cli https://kubernetes.io/docs/reference/kubectl/docker-cli-to-kubectl/
kubectl run just do the job. There is no need for a workaround.
Aditionally, you can attach I/O and disable automatic restart:
kubectl run -i -t busybox --image=busybox --restart=Never