I am new to Airflow, I have created a DAG that triggers the shell script, but I am able to run it and see the output from CLI but when I run it from the UI it is failing, Also, I am not able to see any logs,
Related
I'm executing some experiments on a Kubeflow cluster and I was wondering if there is a faster way than using the Kubeflow UI to set up the run input parameters.
I would like to connect from command line to the Kubefow cluster and run executions from there but i cannot find any documentation.
Thanks
Kubeflow pipelines has a command line tool called kfp, so for example you can use kfp run submit to start a run.
I have installed Kubeflow via MiniKF on my laptop.
I am trying to run a CNN on some MNIST data that uses TensorFlow 1.4.0. Currently, I am following this tutorial: https://codelabs.arrikto.com/codelabs/minikf-kale-katib-kfserving/index.html#0
The code is in a Jupyter Notebook server, and it runs completely fine. When I build a pipeline, it was completed successfully. But at the step for "Serving," when I run the kfserver command on my model, I get strange behavior: it gets stuck at the "Waiting for InferenceService."
An example screenshot from a successful run is shown below, where the process must end with the service being created:
enter image description here
Stuck at the "waiting for inferenceservice" means that the pipeline doesn't get the hostname of the servable. You can also validate this by running kubectl get inferenceservices, your new inference service should have it's READY state to True. If this is not the case, the deployment has failed.
There might be many different reasons why the inference service is not in ready state, there is a nice troubleshooting guide at kfserving's repo.
I've been working with Airflow for a while now, which was set up by a colleague. Lately I run into several errors, which require me to more in dept know how to fix certain things within Airflow.
I do understand what the 3 processes are, I just don't understand the underlying things that happen when I run them. What exactly happens when I run one of the commands? Can I somewhere see afterwards that they are running? And if I run one of these commands, does this overwrite older webservers/schedulers/workers or add a new one?
Moreover, if I for example run airflow webserver, the screen shows some of the things that are happening. Can I simply get out of this by pressing CTRL + C? Because when I do this, it says things like Worker exiting and Shutting down: Master. Does this mean I'm shutting everything down? How else should I get out of the webserver screen then?
Each process does what they are built to do while they are running (webserver provides a UI, scheduler determines when things need to be run, and workers actually run the tasks).
I think your confusion is that you may be seeing them as commands that tell some sort of "Airflow service" to do something, but they are each standalone commands that start the processes to do stuff. ie. Starting from nothing, you run airflow scheduler: now you have a scheduler running. Run airflow webserver: now you have a webserver running. When you run airflow webserver, it is starting a python flask app. While that process is running, the webserver is running, if you kill command, is goes down.
All three have to be running for airflow as a whole to work (assuming you are using an executor that needs workers). You should only ever had one scheduler running, but if you were to run two processes of airflow webserver (ignoring port conflicts, you would then have two separate http servers running using the same metadata database. Workers are a little different in that you may want multiple worker processes running so you can execute more tasks concurrently. So if you create multiple airflow worker processes, you'll end up with multiple processes taking jobs from the queue, executing them, and updating the task instance with the status of the task.
When you run any of these commands you'll see the stdout and stderr output in console. If you are running them as a daemon or background process, you can check what processes are running on the server.
If you ctrl+c you are sending a signal to kill the process. Ideally for a production airflow cluster, you should have some supervisor monitoring the processes and ensuring that they are always running. Locally you can either run the commands in the foreground of separate shells, minimize them and just keep them running when you need them. Or run them in as a background daemon with the -D argument. ie airflow webserver -D.
I'm using Airflow 1.9 and it was working fine for over 2 months but somehow now I am not able to start airflow webserver on Gunicorn.
nohup airflow webserver $* > webserver_new.logs &
just starts the web server process but log does not contain any mention of Gunicorn. The UI is not accessible. I have checked that the environment variable $AIRFLOW_HOME points to the correct path.
Also when the web server is being started it doesn't create a webserver-pid file in $AIRFLOW_HOME.
When I uninstall Gunicorn and start the Airflow web server I do not get any error but without Gunicorn the UI is not accessible. Basically it behaves the same whether gunicorn is present or not.
Environment
I use a Python 2.7 virtualenv on a CentOS box. Few other developers updated some Python packages like pyhive, thrift and six. I have uninstalled all those and uninstalled Airflow using pip (and installed back again).
Log contents
The web server logs do not contain any mention of Gunicorn and the do not contain any other error when started from the command line. The DAGs are running but the UI was still down.
[2018-02-21 14:13:36,082] {default_celery.py:41} WARNING - Celery Executor will run without SSL
Additional observation
After a manual start of Gunicorn I found that the workers are getting timed out as soon as they are created.
I found out that the problem was a dag which had a for loop to generate dynamic tasks(all tasks were dyanmic) but the task ids were same for each iteration, I removed that dag and the webserver came back like charm.
I am scheduling the dag and it shows in the running state but tasks are not getting triggered.Airflow scheduler and web server are up and running. I toggled the dag as ON on the UI. Still i cant able to fix the issue.I am using CeleryExecutor tried changing to the SequentialExecutor but no luck.
If you are using the CeleryExecutor you have to start the airflow workers too.
cmd: airflow worker
You need the following commands:
airflow worker
airflow scheduler
airflow webserver
If it still doesn't probably you have set start_date: datetime.today().