Implementing Postgres Sql in Apache Airflow - postgresql

I have Apache-Airflow implemented on an Ubuntu version 18.04.3 server. When I set it up, I used the sql lite generic database, and this uses the sequential executor. I did this just to play around and get used to the system. Now I'm trying to use the Local Executor, and will need to transition my database from sqlite to the recommended postgres sql.
Does anybody know how to make this transition? All of the tutorials I've found entail setting up Airflow with postgres sql from the beginning. I know there are a ton of moving parts and I'm scared of messsing up what I currently have running. Anybody who knows how to do this or can point me at where to look is much appreciated. Thanks!

Just to complete #lalligood answer with some commands:
In airflow.cfg file look for sql_alchemy_conn and update it to point to your PostgreSQL serv:
sql_alchemy_conn = postgresql+psycopg2://user:pass#hostadress:port/database
For instance:
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow#localhost:5432/airflow
As indicated in the above line you need both user and database called airflow, therefore you need to create that. To do so, open your psql command line and type the following commands to create a user and database called airflow and give all privileges over database airflow to user airflow:
CREATE USER airflow;
CREATE DATABASE airflow;
GRANT ALL PRIVILEGES ON DATABASE airflow TO airflow;
Now you are ready to init the airflow application using postgres:
airflow initdb
If everything was right, access the psql command line again, enter in airflow database with \c airflow command and type \dt command to list all tables of that database. You should see a list of airflow tables, currently it is 23.

Another option other than adding to the airflow.cfg file
is to set the ENV varibale AIRFLOW__CORE__SQL_ALCHEMY_CONN to the postgresql server you want.
Example: export AIRFLOW__CORE__SQL_ALCHEMY_CONN_SECRET=sql_alchemy_conn
Or you can set it in your Dockerfile setting.
See documentation here

I was able to get it working by doing the following 4 steps:
Assuming that you are starting from scratch, initialize your airflow environment with the SQLite database. The key takeaway here is for it to generate the airflow.cfg file.
Update the sql_alchemy_conn line in airflow.cfg to point to your PostgreSQL server.
Create the airflow role + database in PostgreSQL. (Revoke all permissions from public to airflow database & ensure airflow role owns airflow database!)
(Re)Initialize airflow (airflow initdb) & confirm that you see ~19 tables in the airflow database.

Related

Apache Airflow Init Db

I am trying to initialize a database for my project which is based on using apache airflow. I am not too familiar with what happened but I changed my value from airflow.cfg file to sql_alchemy_conn =postgresql+psycopg2:////Users/gabeuy/airflow/airflow.db. Then when I saved the changes and ran the command airflow db init, the error occurred and did not allow me to run the db.
I tried looking up different ways to change it, ensured that I had Postgres and psycopg installed but it still resulted in an error when I ran the command. I was expecting it to run so that I could access the airflow db local host with the DAGs. error occured
Your sql_alchemy_conn is pointing to a local file path (indicating a SQLite DB), but the protocol is indicating a PostgreSQL DB. The error is telling you it's missing a password, which is required by PostgreSQL.
For PostgreSQL, the expected URL format is:
postgresql+psycopg2://<user>:<password>#<host>/<db>
And for a SQLite DB, the expected URL format is:
sqlite:////<path/to/airflow.db>
A SQLite DB is convenient for testing purposes. A SQLite DB is stored as a single file on your computer which makes it easy to set up (airflow db init will generate the file if it doesn't exist). A PostgreSQL DB takes a bit more work to set up, but is generally advised for a production scenario.
For more information about Airflow database configuration, see: https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html.
And for more information about airflow db CLI commands, see: https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#db.

How to set default database in Postgresql database dump script?

I'd need to initialize postgres instance to Docker container from dump SQL-file. Otherwise it works fine but the problem is I cannot set database to be something else than "postgres". Creating new database works fine but schema clauses eg. CREATE TABLE end up going nowhere.
I tried to set default database with --env option in docker run command but it returns error --env requires a value.
Is there any way to set default database? Hopefully in SQL-clause.
Apparently you need to use /connect "dbname=[database name]" before schema clauses in order to point script towards correct dabase.
This wasn't (quite understandbly) included into the script when dump was generated only for a single database instead of the whole cluster.

Errors when configuring Apache Airflow to use a postgres database

I have been introducing myself to Apache Airflow, so far everything is going well however I have been using the default SQLite database and I now need to change to a PostgreSQL database. I have changed the executor to LocalExecutor and I have set the sql_alchemy_conn string to postgresql+psycopg2://airflow:airflow#postgres:5432/airflow which is the address of the airflow database I created in postgres.
Now when I run airflow initdb I recieve the error
airflow.exceptions.AirflowConfigException: error: cannot use sqlite with the LocalExecutor
I am using postgreSQL 9.4.24
Does anyone know why this is occuring?
resolved the issue
I was using the wrong postgres user for the location of the database. Should have been using postgresql+psycopg2://user:user#localhost:5432/airflow

How to migrate from internal postgres db to external postgres db in gitlab

Currently we have all in one single docker container for our production gitlab, where we are using bundled postgres and redis. So everything in same container. We want to use external postgres db and separate container for redis as well to follow the production standards.
How can I migrate from internal postgres db to external postgres db? If anyone provides process and steps that will be really helpful. We are new to this process. Please let us know If anyone knows
Thank you everyone for your inputs ,
PRS
You can follow the article "Migrating GitLab from internal to external PostgreSQL", which involves:
a database dump/reload, using pg_dumpall
sudo -u gitlab-psql /opt/gitlab/embedded/bin/pg_dumpall \
--username=gitlab-psql --host=/var/opt/gitlab/postgresql > /var/lib/pgsql/database.sql
sudo -u postgres psql -f /var/lib/pgsql/database.sql
Note: yuo can also use a backup of the database, but only if the external PostgreSQL version matches exactly the embedded one.
setting its password
sudo -u postgres psql -c "ALTER USER gitlab ENCRYPTED PASSWORD '***' VALID UNTIL 'infinity';"
and modifying the GitLab configuration:
That is:
# Disable the built-in Postgres
postgresql['enable'] = false
# Fill in the connection details
gitlab_rails['db_adapter'] = 'postgresql'
gitlab_rails['db_encoding'] = 'utf8'
gitlab_rails['db_host'] = '127.0.0.1'
gitlab_rails['db_port'] = 5432
gitlab_rails['db_database'] = "gitlabhq_production"
gitlab_rails['db_username'] = 'gitlab'
gitlab_rails['db_password'] = '***'
apply tour changes:
gitlab-ctl reconfigure && gitlab-ctl restart
#VonC
Hi, let me know about the process I have done below
We currently have single all in one docker gitlab container which is using bundled postgres and redis . To follow the production standards we are looking to maintain separate postgres and redis instances for our prod gitlab..We already have data in bundled db ..so we took back up current gitlab with bundled postgres ..it generated .tar file....Next we did change gitlab.rb to point external post gres db [ same version ]..then we are able connect to gitlab but didn;t see any data because nothing was there as it is fresh db. Later we did the restore using external postgres db ...now we can see all the data?? Can we do in this method ? Now our gitlab is attached to external postgres and I can see all the restored data. Will this process works ? Any downfalls?
How this process is different from pgdump and import ?

ERROR: cannot execute CREATE TABLE in a read-only transaction

I'm trying to setup the pgexercises data in my local machine. When I run: psql -U <username> -f clubdata.sql -d postgres -x I get the error: psql:clubdata.sql:6: ERROR: cannot execute CREATE SCHEMA in a read-only transaction.
Why did it create a read-only database on my local machine? Can I change this?
Normally the most plausible reasons for this kind of error are :
trying create statements on a read-only replica (the entire instance is read-only).
<username> has default_transaction_read_only set to ON
the database has default_transaction_read_only set to ON
The script mentioned has in its first lines:
CREATE DATABASE exercises;
\c exercises
CREATE SCHEMA cd;
and you report that the error happens with CREATE SCHEMA at line 6, not before.
That means that the CREATE DATABASE does work, when run by <username>.
And it wouldn't work if any of the reasons above was directly applicable.
One possibility that would technically explain this would be that default_transaction_read_only would be ON in the postgresql.conf file, and set to OFF for the database postgres, the one that the invocation of psql connects to, through an ALTER DATABASE statement that supersedes the configuration file.
That would be why CREATE DATABASE works, but then as soon as it connects to a different database with \c, the default_transaction_read_only setting of the session would flip to ON.
But of course that would be a pretty weird and unusual configuration.
Reached out to pgexercises.com and they were able to help me.
I ran these commands(separately):
psql -U <username> -d postgres
begin;
set transaction read write;
alter database exercises set default_transaction_read_only = off;
commit;
\q
Then I dropped the database from the terminal dropdb exercises and ran script again psql -U <username> -f clubdata.sql -d postgres -x -q
I was having getting cannot execute CREATE TABLE in a read-only transaction, cannot execute DELETE TABLE in a read-only transaction and others.
They all followed a cannot execute INSERT in a read-only transaction. It was like the connection had switched itself over to read-only in the middle of my batch processing.
Turns out, I was running out of storage!
Write access was disabled when the database could no longer write anything. I am using Postgres on Azure. I don't know if the same effect would happen if I was on a dedicated server.
I had same issue for Postgre Update statement
SQL Error: 0, SQLState: 25006 ERROR: cannot execute UPDATE in a read-only transaction
Verified Database access by running below query and it will return either true or false
SELECT pg_is_in_recovery()
true -> Database has only Read Access
false -> Database has full Access
if returns true then check with DBA team for the full access and also try for ping in command prompt and ensure the connectivity.
ping <database hostname or dns>
Also verify if you have primary and standby node for the database
In my case I had a master and replication nodes, and the master node became replication node, which I believe switched it into hot_standby mode. So I was trying to write data into a node that was meant only for reading, therefore the "read-only" problem.
You can query the node in question with SELECT pg_is_in_recovery(), and if it returns True then it is "read-only", and I suppose you should switch to using whatever master node you have now.
I got this information from: https://serverfault.com/questions/630753/how-to-change-postgresql-database-from-read-only-to-writable.
So full credit and my thanks goes to Craig Ringer!
Dbeaver: In my case
This was on.
This doesn't quite answer the original question, but I received the same error and found this page, which ultimately led to a fix.
My issue was trying to run a function with temp tables being created and dropped. The function was created with SECURITY DEFINER privileges, and the user had access locally.
In a different environment, I received the cannot execute DROP TABLE in a read-only transaction error message. This environment was AWS Aurora, and by default, non-admin developers were given read-only privileges. Their server connections were thus set up to use the read-only node of Aurora (-ro- is in the connection url), which must put the connection in the read-only state. Running the same function with the same user against the write node worked.
Seems like a good use case for table variables like SQL Server has! Or, at least, AWS should modify their flow to allow temp tables to be created and dropped on read nodes.
This occurred when I was restoring a production database locally, the database is still doing online recovery from the WAL records.
A little bit unexpected as I assumed pgbackgrest was creating instantly recoverable restores, perhaps not.
91902 postgres 20 0 1445256 14804 13180 D 4.3 0.3 0:28.06 postgres: startup recovering 000000010000001E000000A5
If like me you are trying to create DB on heroku and are stuck as this message shows up on the dataclip tab
I did this,
Choose Resources from(Overview Resources Deploy Metrics Activity Access Settings)
Choose Settings out of (Overview, Durability, Settings, Dataclip)
Then in Administration->Database Credentials choose View Credentials...
then open terminal and fill that info here and enter
psql --host=***************.amazonaws.com --port=5432 --username=*********pubxl --password --dbname=*******lol
then it'll ask for password, copy-paste from there and you can run Postgres cmds.
I suddenly started facing this error on postgres installed on my windows machine, when I was running alter query from dbeaver, all I did was deleted the connection of postgres from dbeaver and created a new connection
If you are using Azure Database for PostgreSQL your server gets into read-only mode when the storage used is near total capacity.
The error you get is exactly:
ERROR: cannot execute XXXXXXXXX in a read-only transaction
https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-compute-storage
I just had this error. My cause was not granting permission to the SEQUENCE
GRANT ALL ON SEQUENCE word_mash_word_cube_template_description_reference_seq TO ronshome_user;
If you are facing this issue with an RDS instance cluster, please check your endpoint and use the Writer instance endpoint. Then it should work now.
Issue can be dur to Intellij config:
Go to Database view> click on Data Source Properties (Shift + enter)> (Select your data source)>
Options tab> Under Connection : uncheck Read-only
For me it was Azure PostgreSQL failing over to standby during maintaince in Azure and never failing back to master when PostgreSQL was in HA mode. You can check this event in Service Health and also check which zone you current VM is running from. If it's 2 and not 1 them most likely that's the result of events described above.