Replicate views from Postgres to Postgres on AWS DMS - postgresql

I'm using AWS DMS to migrate data from one Postgres database to another Postgres database. Everything works fine, except one thing: the views are no replicated on my target database.
I've read that this cannot be done between heterogenous database (i.e. from Oracle to Postgres) using DMS, but I imagine that this is possible somehow when we're using the same database.
Does someone know how to replicate the views using AWS DMS from Postgres to Postgres?

DMS is a data migration service. View is a virtual table(represented by sql code/object) and it does not contain any data by itself like a table does.

If you aren't looking to migrate the actual DDL but instead were happy to replicate the data from views in the source to tables in the target (akin to materialising the views in the target) then you can do this, but you need to jump into JSON configuration and there's some caveats.
Per the Selection rules and actions portion of the DMS documentation, you can add to the object-locator object a table-type field, setting its value to one of view or all to migrate views. It defaults to table which skips views, and there's no configuration field for it in the console aside from modifying the JSON.
For example:
{
"rule-type": "selection",
"rule-id": "123456",
"rule-name": "some-rule",
"object-locator": {
"schema-name": "my_schema",
"table-name": "%",
"table-type": "view"
},
"rule-action": "include",
"filters": []
}
There are a couple of caveats though:
Not all sources support migrating views. Aurora PostgreSQL doesn't, for instance, and saving the task will give an error if you try so you may need to replace the endpoint with a vanilla PostgreSQL one
If you're migrating views, your only option is a full load - CDC isn't going to work, so partial migrations aren't on the cards

Related

Migrate Tables from one one RDS Postgres schema to another within the same DB Instance

I have a use case where I am splitting one service into multiple and want to migrate tables (with huge data) from one RDS Postgres schema to another within the same DB Instance with ongoing replication and ~zero downtime, I am exploring AWS DMS service, I can see it is possible to migrate the entire DB, is it possible to migrate only a specific schema and how?
Using alter table query is not an option because I cannot move the table in one shot in production, it needs to happen gradually. An AWS DMS-like solution will fit the use-case.
Thanks in advance.
Moving a table from one schema to another schema, can be done by just altering the schema:
ALTER TABLE [ IF EXISTS ] name
SET SCHEMA new_schema

AWS - DMS migration missing sequence , views , routines ... etc

I a trying to do the migration for our Postgres database to Aurora postgres
first I create a normal task it migrates all tables only except its constraints.
My tries to clone our database
I downloaded AWS SCT (Schema Conversion Tool) then set my configuration to generate a migration report, here is the report
We completed the analysis of your PostgreSQL source database and
estimate that 100% of the database storage objects and 99.1% of
database code objects can be converted automatically or with minimal
changes if you select Amazon Aurora (PostgreSQL compatible) as your
migration target. Database storage objects include schemas, tables,
table constraints, indexes, types, sequences and foreign tables.
Database code objects include triggers, views, materialized views,
functions, domains, rules, operators, collations, fts configurations,
fts dictionaries and aggregates. Based on the source code syntax
analysis, we estimate 99.9% (based on # lines of code) of your code
can be converted to Amazon Aurora (PostgreSQL compatible)
automatically. To complete the migration, we recommend 133 conversion
action(s) ranging from simple tasks to medium-complexity actions to
complex conversion actions.
my question:
1- is there a way to automate including everything in my source database
2- the report mentions we recommend 133 conversion action(s) where I can find these conversion actions.
3- is it safe to ongoing migration as in my case we need to run migration every day.
Sequence, Index, and Constraint are not migrated and it is mentioned in the official docs on AWS.
You can use this source.
This will help you to migrate Sequence, Index, and Constraint at once.
p.s: this doesn't include View and Routine.
There's no way AFAIK in AWS to automate everything if that was there then it would have been already added in SCT. however, if there are similar errors that are occurring in code/DDL/function like some datatype conversions. you can create a script that will take schema dump and convert all these data types to the desired ones.
Choose the SQL Conversion Actions tab in SCT tool.
The SQL Conversion Actions tab contains a list of SQL code items that can't be converted automatically. There are also recommendations for how to manually convert the SQL code. You can look into the errors and make changes accordingly.
In case if you are migrating to the same version of PG in aurora you can take a schema only dump and restore it into target aurora and later setup a full load/ongoing replication with DMS and you don't have to take SCT into consideration(most of the time worked for me). Just make sure you adhere to aurora limitations specific to the PG version
We have been using ongoing migration in our project at it's working great. There are some best practices we have developed but that will differ from project to project
DDL changes must be made on the target first and stop replication while doing it and resume once done
Separate the tables with high transactions as different DMS task as it will help you in troubleshooting and your rest of the tables can still be working
Always keep in mind DMS replicates data, not the view/function/procedures
Active monitoring of tasks and replication instances
And I would like to suggest if you are performing homogenous migration(PG -> PG) you should consider pg_dump & pg_restore that easy and sophisticated for the same versions and AWS aurora supports it.

Accessing Aurora Postgres Materialized Views from Glue data catalog for Glue Jobs

I have an Aurora Serverless instance which has data loaded across 3 tables (mixture of standard and jsonb data types). We currently use traditional views where some of the deeply nested elements are surfaced along with other columns for aggregations and such.
We have two materialized views that we'd like to send to Redshift. Both the Aurora Postgres and Redshift are in Glue Catalog and while I can see Postgres views as a selectable table, the crawler does not pick up the materialized views.
Currently exploring two options to get the data to redshift.
Output to parquet and use copy to load
Point the Materialized view to jdbc sink specifying redshift.
Wanted recommendations on what might be most efficient approach if anyone has done a similar use case.
Questions:
In option 1, would I be able to handle incremental loads?
Is bookmarking supported for JDBC (Aurora Postgres) to JDBC (Redshift) transactions even if through Glue?
Is there a better way (other than the options I am considering) to move the data from Aurora Postgres Serverless (10.14) to Redshift.
Thanks in advance for any guidance provided.
Went with option 2. The Redshift Copy/Load process writes csv with manifest to S3 in any case so duplicating that is pointless.
Regarding the Questions:
N/A
Job Bookmarking does work. There is some gotchas though - ensure Connections both to RDS and Redshift are present in Glue Pyspark job, IAM self ref rules are in place and to identify a row that is unique [I chose the primary key of underlying table as an additional column in my materialized view] to use as the bookmark.
Using the primary key of core table may buy efficiencies in pruning materialized views during maintenance cycles. Just retrieve latest bookmark from cli using aws glue get-job-bookmark --job-name yourjobname and then just that in the where clause of the mv as where id >= idinbookmark
conn = glueContext.extract_jdbc_conf("yourGlueCatalogdBConnection")
connection_options_source = { "url": conn['url'] + "/yourdB", "dbtable": "table in dB", "user": conn['user'], "password": conn['password'], "jobBookmarkKeys":["unique identifier from source table"], "jobBookmarkKeysSortOrder":"asc"}
datasource0 = glueContext.create_dynamic_frame.from_options(connection_type="postgresql", connection_options=connection_options_source, transformation_ctx="datasource0")
That's all, folks

How do I identify modified rows using AWS DMS to Redshift?

I thought I had a simple question, but I am having a hard time finding an answer. This makes me suspicious that I'm asking the wrong question...
I am new to Redshift... I am using a DMS migration task to pull data from a Sql Server database sitting on an EC2 instance into my Redshift database. I have it set to do a Full Load with Ongoing Replication. This is working.
However, I want to know specifically which rows have changed after ongoing replication makes its updates. It is replicating to my staging tables, and I do further transformations from there depending on certain changes to columns (eg history tracking), which is why I need to know what changed. I compare the staging tables to the existing facts and dimensions, but I don't want to compare the entire table, just the modified rows.
The source database is older and I can't trust the modification timestamp columns are always updated. I thought that setting the migration task to truncate the table, then ingesting ongoing changes would leave my staging table with just changed rows. In hindsight, maybe that was a silly thought.
The other route I am thinking is to turn on CDC in the source tables, load staging tables on the Sql Server side with the net changes and put DMS on those tables instead. I was hoping that extra step would not be necessary.
Help is appreciated!
Not sure if you're still looking for a answer on this but you could always use a transformation rule to flag rows that have changed from dms. e.g.
{
"rule-type": "transformation",
"rule-id": "1",
"rule-name": "1",
"rule-target": "column",
"object-locator": {
"schema-name": "%",
"table-name": "%"
},
"rule-action": "add-column",
"value": "dms_load_ts",
"expression": "current_timestamp",
"data-type": {
"type": "datetime"
}
}

How to replicate rows into different tables of different database in postgresql?

I use postgresql. I have many databases in a server. There is one database which I use the most say 'main'. This 'main' has many tables inside it. And also other databases have many tables inside them.
What I want to do is, whenever a new row is inserted into 'main.users' table I wish to insert the same data into 'users' table of other databases. How shall I do it in postgresql? Similarly I wish to do the same for all actions like UPDATE, DELETE etc.,
I had gone through the "logical replication" concept as suggested by you. In my case I know the source db name up front and I will come to know the target db name as part of the query. So it is going to be dynamic.
How to achieve this? is there any db concept available in postgresql? Or I welcome all other possible ways as well. Please share me some idea on this.
If this is all on the same Postgres instance (aka "cluster"), then I would recommend to use a foreign table to access the tables from the "main" database in the other databases.
Those foreign tables look like "local" tables inside each database, but access the original data in the source database directly, so there is no need to synchronize anything.
Upgrade to a recent PostgreSQL release and use logical replication.
Add a trigger on the table in the master database that uses dblink to access and write the other databases.
Be sure to consider what should be done if the row alreasdy exists remotely, or if the rome server is unreachable.
Also not that updates propogated usign dblink are not rolled back if the inboking transaction is rolled back