Can't make Airflow use postgresql ( rds ) instead sqlite - postgresql

I Installed airflow on 3 ec2 nodes: webserver, scheduler and worker, i set same config to /airflow/airflow.cfg at all 3 nodes, configuration of DB is next sql_alchemy_conn = postgresql+psycopg2://airflow:password#rdsdatabaseaddreess.com/airflow.
After that i restarted service airflow and execute command airflow initdb
ec2-user#ip-10-0-0-143 airflow]$ /usr/local/bin/airflow initdb
DB: sqlite:////airflow/airflow.db
[2019-11-21 01:39:30,325] {db.py:368} INFO - Creating tables
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
WARNI [airflow.utils.log.logging_mixin.LoggingMixin] empty cryptography key - values will not be stored encrypted.
Done.
However airflow still using sqlite DB: sqlite:////airflow/airflow.db
Please advice.
With best regards.

Yeah, I find a solution: the setting sql_alchemy_conn = postgresql+psycopg2://airflow:██████████#rdsdatabaseaddreess.com/airflow must be in section [core]. It is mentioned in the official docs. However, it is located in section [database] in the example config.
So, finally, my settings look like this:
[core]
executed = LocalExecutor
sql_alchemy_conn = postgresql+psycopg2://airflow:████████#rdsdatabaseaddreess.com/airflow
[database]
sql_alchemy_conn = postgresql+psycopg2://airflow:████████#rdsdatabaseaddreess.com/airflow

Related

Airflow parallelism failuer while changing the DB to postgres

I have installed airflow locally and I am changing the executor to run parallel tasks
For that, I changed
1- the Database to Postgres 13.3
2- in the config file
sql_alchemy_conn = postgresql+psycopg2://postgres:postgres#localhost/postgres
3- executor = LocalExecutor
I have checked the DB and no errors
using
airflow db check --> INFO - Connection successful.
airflow db init --> Initialization done
Errors that I receive and I don't use SQLite at all
1- {dag_processing.py:515} WARNING - Because we cannot use more than 1 thread (parsing_processes = 2 ) when using SQLite. So we set parallelism to 1.
2- I receive this error from airflow web-interface
The scheduler does not appear to be running.
The DAGs list may not update, and new tasks will not be scheduled.
So shall i do any other change ?
Did you actually restart your Airflow webserver/scheduler after you changed the config?
The following logging statement:
{dag_processing.py:515} WARNING - Because we cannot use more than 1 thread (parsing_processes = 2 ) when using SQLite. So we set parallelism to 1.
It comes from Airflow 2.0.1 with the following code fragment
if 'sqlite' in conf.get('core', 'sql_alchemy_conn') and self._parallelism > 1:
self.log.warning(
"Because we cannot use more than 1 thread (parsing_processes = "
"%d ) when using sqlite. So we set parallelism to 1.",
self._parallelism,
)
self._parallelism = 1
This means that somehow, it is still on 'sqlite' based on your [core] sql_alchemy_conn setting. I think if you are certain you changed the airflow.cfg and restart all airflow service, that it might be picking up another copy of an airflow.cfg then you expect. Please inspect the logs to verify it is using the correct one.

Upgrade from Sequential executor to Celery executor in Apache Airflow

I have Apache Airflow running on an EC2 instance (Ubuntu). Everything is running fine.
The DB is SQLite and the executor is Sequential Executor (provided as default). But now I would like to run some DAGs which needs to be run at the same time every hour and every 2 minutes.
My question is how can I upgrade my current setup to Celery executor and postgres DB to have the advantage of parallel execution?
Will it work, if I install and setup the postgres, rabbitmq and celery. And make the necessary changes in the airflow.cfg configuration file?
Or do I need to re-install everything from scratch (including airflow)?
Please guide me on this.
Thanks
You can, indeed, install Postgres/RabbitMQ/Celery, then update your configuration file (airflow.cfg), initialise the database, and restart the Airflow services.
However, there is a side note: if required, you'd also have to migrate data from SQLite to Postgres. Most importantly, the database contains your connections and variables. It's possible to export variables beforehand and import them again using the Airflow CLI (see this answer, and the Airflow documentation).
It's also possible to import your connections using the CLI, as described in this Airflow guide (or the documentation).
If you just switched to the new database set up and you see something's missing, you can still easily switch back to the SQLite setup by reverting the changes to airflow.cfg.

Errors when configuring Apache Airflow to use a postgres database

I have been introducing myself to Apache Airflow, so far everything is going well however I have been using the default SQLite database and I now need to change to a PostgreSQL database. I have changed the executor to LocalExecutor and I have set the sql_alchemy_conn string to postgresql+psycopg2://airflow:airflow#postgres:5432/airflow which is the address of the airflow database I created in postgres.
Now when I run airflow initdb I recieve the error
airflow.exceptions.AirflowConfigException: error: cannot use sqlite with the LocalExecutor
I am using postgreSQL 9.4.24
Does anyone know why this is occuring?
resolved the issue
I was using the wrong postgres user for the location of the database. Should have been using postgresql+psycopg2://user:user#localhost:5432/airflow

Flask-Migrate script not doing applying changes to Postgres database

I have made some recent changes to my models in my Flask project. I tried to apply these changes to my Postgres DB, but the script doesn't seem to have any effect. When I run the upgrade it says
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Running upgrade -> ba60ca569e9f, empty message
but nothing changes in the DB. I dropped the database and recreated it and still nothing happened. What is going wrong?
Context impl SQLiteImpl. is a strong hint. My DB URI is determined by SQLALCHEMY_DATABASE_URI = os.environ.get('DATABASE_URI') or 'sqlite:///'. When I ran my project in my Docker-compose environment it worked because the DATABASE_URI was getting set correctly in a Dockerfile. When I ran it on my local environment it was not working. I could have run it on my server container and it should have worked.
I fixed this by correctly setting my DATABASE_URI to export DATABASE_URI=postgres://{USERNAME}:{PASSWORD}#127.0.0.1:5432/debateit. This let my local environment connect to the Postgres DB rather than the local SQLite.

Implementing Postgres Sql in Apache Airflow

I have Apache-Airflow implemented on an Ubuntu version 18.04.3 server. When I set it up, I used the sql lite generic database, and this uses the sequential executor. I did this just to play around and get used to the system. Now I'm trying to use the Local Executor, and will need to transition my database from sqlite to the recommended postgres sql.
Does anybody know how to make this transition? All of the tutorials I've found entail setting up Airflow with postgres sql from the beginning. I know there are a ton of moving parts and I'm scared of messsing up what I currently have running. Anybody who knows how to do this or can point me at where to look is much appreciated. Thanks!
Just to complete #lalligood answer with some commands:
In airflow.cfg file look for sql_alchemy_conn and update it to point to your PostgreSQL serv:
sql_alchemy_conn = postgresql+psycopg2://user:pass#hostadress:port/database
For instance:
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow#localhost:5432/airflow
As indicated in the above line you need both user and database called airflow, therefore you need to create that. To do so, open your psql command line and type the following commands to create a user and database called airflow and give all privileges over database airflow to user airflow:
CREATE USER airflow;
CREATE DATABASE airflow;
GRANT ALL PRIVILEGES ON DATABASE airflow TO airflow;
Now you are ready to init the airflow application using postgres:
airflow initdb
If everything was right, access the psql command line again, enter in airflow database with \c airflow command and type \dt command to list all tables of that database. You should see a list of airflow tables, currently it is 23.
Another option other than adding to the airflow.cfg file
is to set the ENV varibale AIRFLOW__CORE__SQL_ALCHEMY_CONN to the postgresql server you want.
Example: export AIRFLOW__CORE__SQL_ALCHEMY_CONN_SECRET=sql_alchemy_conn
Or you can set it in your Dockerfile setting.
See documentation here
I was able to get it working by doing the following 4 steps:
Assuming that you are starting from scratch, initialize your airflow environment with the SQLite database. The key takeaway here is for it to generate the airflow.cfg file.
Update the sql_alchemy_conn line in airflow.cfg to point to your PostgreSQL server.
Create the airflow role + database in PostgreSQL. (Revoke all permissions from public to airflow database & ensure airflow role owns airflow database!)
(Re)Initialize airflow (airflow initdb) & confirm that you see ~19 tables in the airflow database.