Airflow 1.9 - Tasks stuck in queue - postgresql

Latest Apache-Airflow install from PyPy (1.9.0)
Set up includes:
Apache-Airflow
Apache-Airflow[celery]
RabbitMQ 3.7.5
Celery 4.1.1
Postgres
I have the installation across 3 hosts.
Host #1
Airflow Webserver
Airflow Scheduler
RabbitMQ Server
Postgres Server
Host #2
Airflow Worker
Host #3
Airflow Worker
I have a simple DAG that executes a BashOperator Task that runs every 1 minute. I can see the scheduler "queue" the job however, it nevers gets added to a Celery/RabbitMQ queue and gets picked up by the workers. I have a custom RabbitMQ user, authentication seems fine. Flower, however, doesn't show any of the queues populating with data. It does see the two worker machines listening on their respective queues.
Things I've checked:
Airflow Pool configuration
Airflow environment variables
Upgrade/Downgrade Celery and RabbitMQ
Postgres permissions
RabbitMQ Permissions
DEBUG level airflow logs
I read the documentation section about jobs not running. My "start_date" variable is a static date that exists before the current date.
OS: Centos 7

I was able to figure it out but I'm not sure why this is the answer.
Changing the "broker_url" variable to use "pyamqp" instead of "amqp" was the fix.

Related

Airflow scheduler fails to start tasks

My problem:
Airflow scheduler is not assigning tasks.
Background:
I have Airflow running successfully on my local machine with sqlitedb. The sample dags as well as my custom DAGs ran without any issues.
When I try to migrate from sqlite database to Postgres (using this guide), the scheduler no longer seems to be assigning tasks. The DAG get stuck on "running" state but no task in any DAGs ever gets assigned a state.
Troubleshooting steps I've taken
The web server and the scheduler are running
The DAG is set to "ON".
After running airflow initdb, the public schema is populated with all of the airflow tables.
The user in my connection string owns the database as well as every table in the public schema.
Scheduler Log
The scheduler log keeps posting out this WARNING but I have not been able to use it to find any useful information aside form this other post with no responses.
[2020-04-08 09:39:17,907] {dag_processing.py:556} INFO - Launched DagFileProcessorManager with pid: 44144
[2020-04-08 09:39:17,916] {settings.py:54} INFO - Configured default timezone <Timezone [UTC]>
[2020-04-08 09:39:17,927] {settings.py:253} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=44144
[2020-04-08 09:39:19,914] {dag_processing.py:663} WARNING - DagFileProcessorManager (PID=44144) exited with exit code -11 - re-launching
Environment
PostgreSQL version 12.1
Airflow v1.10.9
This is all running on a MacBook Pro (Catalina) in a conda virtual environment.
Postgres was installed using postgresapp. Updated postgresapp to version 2.3.3e. PostgresSQL is still version 12.1 but by updating the postgresapp, the issue was solved.

Airflow distributed model services

Switching from localexecutor to celeryexecutor.
In this model, I have
Masternode1 - airflow webserver, airflow scheduler, rabbitmq
Masternode2 - airflow webserver, rabbitmq
Workernode1 - airflowworker
Workernode2 - airflowworker
Workernode3 - airflowworker
Question:
Where does the Flower service run for celery? Is it required to run that in all nodes or just any one of the nodes (since its only a UI)
Is there any other components trivial to manage a production workload ?
Is using Kafka for broker a reality and available to use ?
Thank you
Celery Flower is yet another (optional) service that you may want to independently either on a dedicated machine, or share one machine among few Airflow services.
You may, for an example, run the webserver and flower on one machine, scheduler and few Airflow workers each on a dedicated machine.
Kafka as broker for Celery is something people talk about quite a lot but as far as I know there is no concrete work in Celery for it. However, considering there is an interest to have Kafka support in Kombu, I would assume that the moment Kombu gets Kafka support, Celery soon follow as Kombu is the core Celery dependency.

Airflow web server starts without Gunicorn and is not accessible

I'm using Airflow 1.9 and it was working fine for over 2 months but somehow now I am not able to start airflow webserver on Gunicorn.
nohup airflow webserver $* > webserver_new.logs &
just starts the web server process but log does not contain any mention of Gunicorn. The UI is not accessible. I have checked that the environment variable $AIRFLOW_HOME points to the correct path.
Also when the web server is being started it doesn't create a webserver-pid file in $AIRFLOW_HOME.
When I uninstall Gunicorn and start the Airflow web server I do not get any error but without Gunicorn the UI is not accessible. Basically it behaves the same whether gunicorn is present or not.
Environment
I use a Python 2.7 virtualenv on a CentOS box. Few other developers updated some Python packages like pyhive, thrift and six. I have uninstalled all those and uninstalled Airflow using pip (and installed back again).
Log contents
The web server logs do not contain any mention of Gunicorn and the do not contain any other error when started from the command line. The DAGs are running but the UI was still down.
[2018-02-21 14:13:36,082] {default_celery.py:41} WARNING - Celery Executor will run without SSL
Additional observation
After a manual start of Gunicorn I found that the workers are getting timed out as soon as they are created.
I found out that the problem was a dag which had a for loop to generate dynamic tasks(all tasks were dyanmic) but the task ids were same for each iteration, I removed that dag and the webserver came back like charm.

Airflow Tasks are not getting triggered

I am scheduling the dag and it shows in the running state but tasks are not getting triggered.Airflow scheduler and web server are up and running. I toggled the dag as ON on the UI. Still i cant able to fix the issue.I am using CeleryExecutor tried changing to the SequentialExecutor but no luck.
If you are using the CeleryExecutor you have to start the airflow workers too.
cmd: airflow worker
You need the following commands:
airflow worker
airflow scheduler
airflow webserver
If it still doesn't probably you have set start_date: datetime.today().

Mesos cluster does not recover when physical host restart

I'm using mesosphere on 3 host over Ubuntu 14.04 as follow:
one with mesos master
two with mesos slave
All work fine, but after restart all physical hosts all scheduled job was lost. It's normal? I'm expected that zookeeper will store the current jobs, then when the system will need restart it, all jobs will be rescheduled after the master boot.
Update:
I'm using marathon and mesos on a same node, and I'm run marathon with flag --zk
With marathon's --zk and --ha enabled, Marathon should be storing its state in ZK and recovering it on restart, as long as Mesos allows it to reregister with the same framework ID.
However, you'll also need to enable the Mesos registry (even for a single master), to ensure that Mesos persists information about what frameworkIds are registered in the event of master failover. This can be accomplished by setting the --registry=replicated_log (default), --quorum=1 (since you only have 1 master), and --work_dir=/path/to/registry (where to store the state).
I solved the problem following this installation instructions: How To Configure a Production-Ready Mesosphere Cluster on Ubuntu 14.04
Although you found a solution, I'd like to explain more to this issue:)
In official doc:http://mesos.apache.org/documentation/latest/slave-recovery/
Note that if the operating system on the slave is rebooted, all
executors and tasks running on the host are killed and are not
automatically restarted when the host comes back up.
So all frameworks on Mesos will be killed after reboot. One way to restart the frameworks is to run all frameworks on Marathon, which will manage other frameworks and restart them in need.
However, then you need to auto-restart Marathon when it's killed. In the digitialocean link you mentioned, the Marathon is installed with script in init.d, so it can be restarted after rebooted. Otherwise, if you installed the Marathon via source code, you can use tools like supervisord to monitor Marathon.