Checking consistency of SQL tables schemas in Airflow - postgresql

I have an Airflow pipeline in which I create some tables in a PostgreSQL database (A). This works fine, as I have the tables schemas that reside in some .sql files, which are used to create the tables in aforementioned database. These definitions were downloaded manually from another database, which is also a PostgreSQL db (B).
I'd like to run a consistency check at the start of the Airflow main DAG to check that the tables schemas (found in .sql files) that I use to recreate them in (A) db, are identical to the ones on (B) db.
Note: I also have definitions for views in my .sql files.
How could I accomplish this? Any examples would also be helpful.

Related

What is the easiest way to generate a script to drop and create all objects in a database?

I'm used to working with SQL Server and the SQL Server Management Studio has the option to automatically generate a script to drop and recreate everything in a database (tables/views/procedures/etc). I find that when developing a new application and writing a bunch of junk in a local database for basic testing it's very helpful to have the options to just nuke the whole thing and recreate it in a clean slate, so I'm looking for a similar functionality within postgres/pgadmin.
PGAdmin has an option to generate a create script for a specific table but right clicking each table would be very tedious and I'm wondering if there's another way to do it.
To recreate a clean schema only database you can use the pg_dump client included with a Postgres server install. The options to use are:
-c
--clean
Output commands to clean (drop) database objects prior to outputting the commands for creating them. (Unless --if-exists is also specified, restore might generate some harmless error messages, if any objects were not present in the destination database.)
This option is ignored when emitting an archive (non-text) output file. For the archive formats, you can specify the option when you call pg_restore.
and:
-s
--schema-only
Dump only the object definitions (schema), not data.
This option is the inverse of --data-only. It is similar to, but for historical reasons not identical to, specifying --section=pre-data --section=post-data.
(Do not confuse this with the --schema option, which uses the word “schema” in a different meaning.)
To exclude table data for only a subset of tables in the database, see --exclude-table-data.
clean in Flyway
The database migration tool Flyway offers a clean command that drops all objects in the configured schemas.
To quote the documentation:
Clean is a great help in development and test. It will effectively give you a fresh start, by wiping your configured schemas completely clean. All objects (tables, views, procedures, …) will be dropped.
Needless to say: do not use against your production DB!

Run SQL script on multiple schemas with Flyway

I am migrating the DB using Flyway. I have a SQL script file which need to run on multiple schemas hosted on a single database.
In my SQL file if I mention ${db_schema} as the parameter and supply with different schema names, will that work? Is there any other approaches to handle this scenario?
SET search_path TO ${db_schema};
You should be able to use placeholders in flyway to handle more than one schema. Here's an article that outlines how that works.

How to copy a database schema from database A to database B

I have created a Postgresql database A using liquibase changesets. Now, I'm creating an application that allows creating a new database B and copies the schema from database A in real-time including the liquibase changesets as the database can still be updated later. Note that at the time of the copied schema in database A could already be updated, making the base changesets outdated.
My main question would be:
How to copy PostgreSQL schema x from database a (dynamically generated at run-time) to b using liquibase? Database b could be on another server.
If it's not possible with liquibase, what other tools or approaches would make this possible?
--
Let me add more context:
We initialize a new database a schema using liquibase changeset.
We can add a new table and field to the database an at run-time. Or during the time when the application is running. For example, we add a new table people to the schema of database a, which is not originally in the changeset. This is done using liquibase classes too. So changeset is added to databasechangelog table.
Now, we create a new database b.
We want to import the schema of the database a to b, with people table.
I hope that is clear.
Thanks.
All schema changes must be run through your schema migration tool
The point of using a database schema migration tool such as Liquibase or Flyway is to have a “single source of truth” regarding the structure of your database tables. Your set of Liquibase changesets (or Flyway scripts) is supposed to be that single source of truth for your database.
If you are altering the structure of you database at runtime, such as adding a table named people, outside the scope of your migration tool, well, then you have violated the rules of the game. You will have defeated the purpose of using a schema migration tool. The intention of using a schema migration tool is that you make all schema changes through that tool.
If you need to add a table while running in production, you should be dropping the physical file for the Liquibase changeset (or Flyway script) into the file system of your database server environment, and then invoking Liquibase (or Flyway) to run a migration.
Perhaps you have been misunderstanding the sequence of events:
If you have built a database on server "A", that means you installed Postgres, created an empty database, then installed the collection of Liquibase changesets you have carefully built, then ran a Liquibase migration operation on that server.
When you go to create a database on server "B", you should be following the same steps: Install Postgres, create an empty database, installing the very same collection of Liquibase changesets, and then running a Liquibase migration operation.
Alternatively, if making a copy of server "A" to create server "B", that copy should include the exact same Liquibase changesets. So at the end of your copy process, the two databases+changesets are identical.
Here's how I solved this problem of mine using the Liquibase Java library:
1.) Export the changelog from the source database into a temporary file (XML).
Liquibase liquibase = new Liquibase(liquibaseOutFile.getAbsolutePath(), new FileSystemResourceAccessor(), sourceDatabase);
liquibase.generateChangeLog(catalogAndSchema, changeLogWriter, new PrintStream(liquibaseOutFile.getAbsolutePath()), null);
2.) Execute the temporary file to the new data source.
Liquibase targetLiquibase = new Liquibase(liquibaseOutFile.getAbsolutePath(), new FileSystemResourceAccessor(), targetDatabase);
Contexts context = new Contexts();
targetLiquibase.update(context);
Here's the complete code: https://czetsuya-tech.blogspot.com/2019/12/generate-postgresql-schema-using-java.html

Backing up redshift database

I want to backup entire redshift cluster, such that I can use it in other databases like mysql or hadoop in future.
I was looking up and creating a manual screenshot seems to be an option but I guess that wont work for cross database languages.
So what would be the detailed steps to backup the entire cluster of redshift
Cluster backups can be done via the aws console, however these can only be restored to another redshift cluster.
Because Redshift is not the same as postgres in many ways, it will be inpossible / tricky to use standard tools like pg_dump and pg_restore.
I think that your best option is to :
extract the ddl from the Redshift tables that you wish to create elsewhere, most ide's have a simple way to do this.
modify the ddl to work with your target database (e.g. postgres will
be easy, mysql harder)
copy the contents of the Redshift database, one table at a time to s3 using
the unload command
import the data that you unloaded in step 3 to your target tables

Backup specific tables in AWS RDS Postgres Instance

I have two databases on Amazon RDS, both Postgres. Database 1 and 2
I need to restore an instance from a snapshot of Database 1 for my Staging environment. (Database 2 is my current Staging DB).
However, I want the data from a few of the tables in Database 2 to overwrite the tables in the newly restored snapshot. What is the best way to do this?
When restoring RDS from a Snapshot, a new database instance is created. If you only wish to copy a portion of the snapshot:
Restore the snapshot to a new (temporary) database
Connect to the new database and dump the desired tables using pg_dump
Connect to your staging server and restore the tables using pg_restore (most probably deleting any matching existing tables first)
Delete the temporary database
pg_dump actually outputs SQL commands that are then used to recreate tables and restore data. Look at the content of a dump to understand how the restore process actually works.
I hope this still works for someone else.
With my team we faced a similar issue. We also had 2 Postgres databases and we also just needed to backup some tables from db1 to db2.
What we did is to use a lambda function using Python (from AWS lambda ofc) that connected to both databases and validates if db1.table1 has the same data as db2.table1, if not, then the lambda function should write the missing data from db1.table1 into db2.table1. The approach of using lambda was because we wanted to automate the process due to the main db (let's say db1) is constantly being updated. In addition, it allowed us to only backup our desired tables (let's say 3 tables out of 10), instead of backing up the whole database.
Note: Maybe you want to do these writes using temporary tables to avoid issues with any constraints you have in your tables.