how to use from snapshot in postgres replication slot? - postgresql

i am using this package that it does cdc with logical slot in postgres but i want take snapshot from my database
also i read this article But I don't know how to get the rows from snapshot

Related

Replicate postgres database to Redshift - Ongoing replication

I have several postgres databases which need to be replicated as-is to a single aws redshift.
We have currently set up DMS services to the same. However, we keep encountering issues such as source database full, large column issues and most importantly the issue in DMS when new columns with defaults are added on postgres databases(This does not replicate with ongoing replication)
So, are there any other ways that we can set up this ongoing replication?

Create Full AWS RDS Snapshot

Can I create a full database snapshot or backup (manual/automated) for Amazon RDS Postgres Databases and not incremental ones. I want to create a job that provides full database parquet files everyday and share it with the Data warehousing team.
The document here says the following:
The first snapshot of a DB instance contains the data for the full DB instance. Subsequent snapshots of the same DB instance are incremental, which means that only the data that has changed after your most recent snapshot is saved.
The RDS snapshots do not create Parquet files, as they are based on EBS snapshots.
To get Parquet files the best way would be to setup a Glue job which can run automatically on your schedule.

postgres logical replication starting from given LSN

Postgres logical replication initial synchronization is very slow process, especially if original database is quite big.
I am wondering if it possible to start replication from given LSN?
The desired work flow will be
obtain current LSN from source database
create logical dump of desired objects in source database
restore dump on the target database
start logical replication from LSN acquired in step 1
I did not find any docs allowing step 4, does anybody know if it possible?
The documentation gives you a hint:
When a new replication slot is created using the streaming replication interface (see CREATE_REPLICATION_SLOT), a snapshot is exported (see Section 9.27.5), which will show exactly the state of the database after which all changes will be included in the change stream. This can be used to create a new replica by using SET TRANSACTION SNAPSHOT to read the state of the database at the moment the slot was created. This transaction can then be used to dump the database's state at that point in time, which afterwards can be updated using the slot's contents without losing any changes.
So the steps would be:
Start a replication connection to the database:
psql "dbname=yourdatabasename replication=database"
Create a replication slot and copy the snapshot name from the output. It is important to leave the connection open until the next step, otherwise the snapshot will cease to exist
CREATE_REPLICATION_SLOT slot_name LOGICAL pgoutput;
Dump the database at the snapshot with the following command. You can close the replication connection once that has started.
pg_dump --snapshot=snapshotname [...]
Restore the dump to the target database.
Start replication using the replication slot.

CDC change data capture start time - Postgres replication

I'm using AWS DMS for Postgres-postgres migration. For on-going replication for other engines there is a parameter CDC start time where we can specify the start time of picking up up changes for replication but unfortunately postgres does not support that parameter.
By default, my assumption is when you create the CDC task it utilizes the current start time for the CDC. But since postgres does not have the ability to filter the logs for the start time, I assume it starts from the beginning of the WAL. Is that right? My goal is instead of using DMS FULL LOAD I want to use only CDC feature but after the pg_dump is restored on the target how would I make sure no records are missed by CDC?
Thank you!
DMS Ongoing replication task when it starts, will create a replication slot. A replication slot can not be created with any open transactions. The LSN captured by the SLOT will be the first LSN read by DMS.
Now Postgres as source also support custom CDC start position: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Task.CDC.html

Migration from AWS Aurora to a local Postgres 9.6 database

I am considering using AWS Aurora, however I am concerned for being locked into AWS indefinitely. So I am wondering how difficult it would be to transfer data from Aurora to my own Postgres database.
Thanks!
This is a very valid concern. Firstly, there is no seamless migration like there is from Postgres to Aurora. Following, needs to be considered:
How to do it: You will have to take a dump of your aurora db and then import it into postgres.
Because of 1 above; you cannot have concurrent CURD operations running on your aurora during migration. Hence, you need to shut down all products connecting to your aurora till you migrate to Postgres. Hence, there will be downtime.
Because of 2 ; Depending on size of your DB; it might take few mins ( few GB of data ) to many hours if you have huge DB.
Hence, you need to consider how much data you have and how much downtime you can live with if you want to migrate back to Postgres.