Migrate Tables from one one RDS Postgres schema to another within the same DB Instance - postgresql

I have a use case where I am splitting one service into multiple and want to migrate tables (with huge data) from one RDS Postgres schema to another within the same DB Instance with ongoing replication and ~zero downtime, I am exploring AWS DMS service, I can see it is possible to migrate the entire DB, is it possible to migrate only a specific schema and how?
Using alter table query is not an option because I cannot move the table in one shot in production, it needs to happen gradually. An AWS DMS-like solution will fit the use-case.
Thanks in advance.

Moving a table from one schema to another schema, can be done by just altering the schema:
ALTER TABLE [ IF EXISTS ] name
SET SCHEMA new_schema

Related

Accessing Aurora Postgres Materialized Views from Glue data catalog for Glue Jobs

I have an Aurora Serverless instance which has data loaded across 3 tables (mixture of standard and jsonb data types). We currently use traditional views where some of the deeply nested elements are surfaced along with other columns for aggregations and such.
We have two materialized views that we'd like to send to Redshift. Both the Aurora Postgres and Redshift are in Glue Catalog and while I can see Postgres views as a selectable table, the crawler does not pick up the materialized views.
Currently exploring two options to get the data to redshift.
Output to parquet and use copy to load
Point the Materialized view to jdbc sink specifying redshift.
Wanted recommendations on what might be most efficient approach if anyone has done a similar use case.
Questions:
In option 1, would I be able to handle incremental loads?
Is bookmarking supported for JDBC (Aurora Postgres) to JDBC (Redshift) transactions even if through Glue?
Is there a better way (other than the options I am considering) to move the data from Aurora Postgres Serverless (10.14) to Redshift.
Thanks in advance for any guidance provided.
Went with option 2. The Redshift Copy/Load process writes csv with manifest to S3 in any case so duplicating that is pointless.
Regarding the Questions:
N/A
Job Bookmarking does work. There is some gotchas though - ensure Connections both to RDS and Redshift are present in Glue Pyspark job, IAM self ref rules are in place and to identify a row that is unique [I chose the primary key of underlying table as an additional column in my materialized view] to use as the bookmark.
Using the primary key of core table may buy efficiencies in pruning materialized views during maintenance cycles. Just retrieve latest bookmark from cli using aws glue get-job-bookmark --job-name yourjobname and then just that in the where clause of the mv as where id >= idinbookmark
conn = glueContext.extract_jdbc_conf("yourGlueCatalogdBConnection")
connection_options_source = { "url": conn['url'] + "/yourdB", "dbtable": "table in dB", "user": conn['user'], "password": conn['password'], "jobBookmarkKeys":["unique identifier from source table"], "jobBookmarkKeysSortOrder":"asc"}
datasource0 = glueContext.create_dynamic_frame.from_options(connection_type="postgresql", connection_options=connection_options_source, transformation_ctx="datasource0")
That's all, folks

Importing existing table data to a new table in different database (postgres)

i would like to import all data of a existing table of one database to a new table present inside different database in postgres, any suggestions will be helpful.
The easiest way would be to pg_dump the table and pg_restore in the target database.
In case it is not an option, you should definitely take a look a postgres_fdw (Foreign Data Wrapper), which allows you to access data from different databases - even from different machines. It is slightly more complex than the traditional export/import approach, but it creates a direct connection to the foreign table.
Take a look at this example.

How to replicate rows into different tables of different database in postgresql?

I use postgresql. I have many databases in a server. There is one database which I use the most say 'main'. This 'main' has many tables inside it. And also other databases have many tables inside them.
What I want to do is, whenever a new row is inserted into 'main.users' table I wish to insert the same data into 'users' table of other databases. How shall I do it in postgresql? Similarly I wish to do the same for all actions like UPDATE, DELETE etc.,
I had gone through the "logical replication" concept as suggested by you. In my case I know the source db name up front and I will come to know the target db name as part of the query. So it is going to be dynamic.
How to achieve this? is there any db concept available in postgresql? Or I welcome all other possible ways as well. Please share me some idea on this.
If this is all on the same Postgres instance (aka "cluster"), then I would recommend to use a foreign table to access the tables from the "main" database in the other databases.
Those foreign tables look like "local" tables inside each database, but access the original data in the source database directly, so there is no need to synchronize anything.
Upgrade to a recent PostgreSQL release and use logical replication.
Add a trigger on the table in the master database that uses dblink to access and write the other databases.
Be sure to consider what should be done if the row alreasdy exists remotely, or if the rome server is unreachable.
Also not that updates propogated usign dblink are not rolled back if the inboking transaction is rolled back

How does AWS postgres RDS read replication handle schema switching?

I am wanting to know how an AWS postgres RDS does replication where I rename schemas to "swap" them within the read/write instance of the database.
Does it replicate this action to the read-replicas by sending on the "alter schema" rename commands I gave to my read/write instance? Or after my renames, does it see wholly different sets of data in the schemas and do a whole new copy of each out to the read-replicas?
For example...
In my RDS instance I have a read/write instance of "my_mega_database" which I want to create read-replicas of for my applications to connect to.
Typically, in "my_mega_database" there are two schemas "my_data" and "my_data_old", whereby "my_data" contains data that was delivered last night, and "my_data_old" contains data from the previous night. Each contains many tables and huge amounts of data.
If I were to do the following...
ALTER SCHEMA my_data_old RENAME TO my_data_tmp;
ALTER SCHEMA my_data RENAME TO my_data_old;
ALTER SCHEMA my_data_tmp RENAME TO my_data;
... I have affectively swapped these around.
My expectation is that these actions are replicated via the postgres WAL (ie: it sends the rename commands out to the replicas) and AWS RDS replication won't try and waste time copying huge amounts of data all over the place.
Is this correct?
(Speaking about PostgreSQL here, but RDS is probably similar.)
Renaming a schema (or any other object) is a small update in a catalog table, and no data are moved. Internally PostgreSQL uses only the numeric object ID, which stays the same.
You might wrap the three statements in a transaction to make the whole magic atomic.
The same is true on the standby, it is a trivial (meta)data modification.
The only thing that might be a problem are concurrent sessions holding locks.

CREATE SCHEMA in Redshift fails to create a schema

I've created a new database, customer_test, in Redshift on the same cluster where most of my data lives (in the dev database). I've created this database using a superuser.
My problem arises when I run create schema new_schema in the new database. The query runs fine, and when I query PG_NAMESPACE I see the schema there. I can even create tables. But if I disconnect and reconnect, the schema is gone. I don't see any specification in the RS documentation that schemas are temporary, and I've verified this is a superuser. Is there a reason my created schemas are disappearing? Thanks for the help.
I tried the same and faced the same issue. I created a table and inserted rows inside the table after which it was persistent.