How does AWS postgres RDS read replication handle schema switching? - postgresql

I am wanting to know how an AWS postgres RDS does replication where I rename schemas to "swap" them within the read/write instance of the database.
Does it replicate this action to the read-replicas by sending on the "alter schema" rename commands I gave to my read/write instance? Or after my renames, does it see wholly different sets of data in the schemas and do a whole new copy of each out to the read-replicas?
For example...
In my RDS instance I have a read/write instance of "my_mega_database" which I want to create read-replicas of for my applications to connect to.
Typically, in "my_mega_database" there are two schemas "my_data" and "my_data_old", whereby "my_data" contains data that was delivered last night, and "my_data_old" contains data from the previous night. Each contains many tables and huge amounts of data.
If I were to do the following...
ALTER SCHEMA my_data_old RENAME TO my_data_tmp;
ALTER SCHEMA my_data RENAME TO my_data_old;
ALTER SCHEMA my_data_tmp RENAME TO my_data;
... I have affectively swapped these around.
My expectation is that these actions are replicated via the postgres WAL (ie: it sends the rename commands out to the replicas) and AWS RDS replication won't try and waste time copying huge amounts of data all over the place.
Is this correct?

(Speaking about PostgreSQL here, but RDS is probably similar.)
Renaming a schema (or any other object) is a small update in a catalog table, and no data are moved. Internally PostgreSQL uses only the numeric object ID, which stays the same.
You might wrap the three statements in a transaction to make the whole magic atomic.
The same is true on the standby, it is a trivial (meta)data modification.
The only thing that might be a problem are concurrent sessions holding locks.

Related

How to dis-connect and then re-connect a postgres tablespace?

Is it possible to dis-connect and re-connect a POSTGRES tablespace and all the associated objects within that tablespace?
I have a Postgres database with two tablespaces, one on a high-speed SSD drive (I've named this FASTSPACE) , and the other on a slower, traditional magnetic HDD (named SLOWSPACE). The slower tablespace is reserved for large volumes of historic data which is rarely accessed.
Is it possible to temporarily disconnect SLOWSPACE, with the intention of reconnecting it at a later date? the DROP TABLESPACE documentation can only be used once all database objects within it have been dropped.
I'm aware that I can backup all the tables in SLOWSPACE, then delete them, and then DROP the tablespace, however this will take time (there are several Terabytes of data). If I then need the archived data again I'll have create a new version of the SLOWSPACE tablespace from blank, then re-create all the objects from the backups. Again, this will take time.
Is there any way of temporarily disconnecting SLOWSPACE from the database - whilst still leaving the rest of the database up and running?
Update - happy to accept Franks Heikens two letter answer - 'no'

Amazon RDS Postgresql snapshot preserves schema but loses all data

Using AWS RDS console I created a snapshot backup of a Postgresql v11 database containing multiple schemas. I then created a new instance from the backup. The process seemed to work fine without error. However, upon inspection of the data in the new instance, I noticed that in only one of my schemas the data was not preserved. The schema structure, tables, indexes, constraints, etc looked fine, but every table was empty (select count(*) from schema.table was 0 for every table in the schema). All other schemas looked fine and contained the expected data. I looked everywhere (could not find help for this online) and tried many tests myself (changing roles, rebuilding the schema, privileges, much more) while attempting to solve this issue. What would cause my snapshots to preserve the entire schema structure, but lose all of the data itself?
I finally realized that the only difference between the problem schema and the other was that all tables in the problem schema had been created with the 'UNLOGGED' keyword. This was done to increase write speed for millions of rows inserted when the schema was first built. However, when a snapshot is created/restored as described above, the process depends on the WAL files that are written with normal (logged) tables to restore the data. To fix my problem I simply altered all of the tables and set them to be logged (alter table schema.table set logged). After this, snapshots worked fine. For anyone else in the future that is doing something similar, should unlogged tables be needed for initial mass population of data to get better write speed, it would be a good to changed them to be logged after initial data population (if you plan on using snapshots or replications or similar). Side note, pg_dump/pg_restore does still work for unlogged tables.

Insert data into remote DB tables from multiple databases through trigger or replication or foreign data wrapper

I need some advice about the following scenario.
I have multiple embedded systems supporting PostgreSQL database running at different places and we have a server running on CentOS at our premises.
Each system is running at remote location and has multiple tables inside its database. These tables have the same names as the server's table names, but each system has different table name than the other systems, e.g.:
system 1 has tables:
sys1_table1
sys1_table2
system 2 has tables
sys2_table1
sys2_table2
I want to update the tables sys1_table1, sys1_table2, sys2_table1 and sys2_table2 on the server on every insert done on system 1 and system 2.
One solution is to write a trigger on each table, which will run on every insert of both systems' tables and insert the same data on the server's tables. This trigger will also delete the records in the systems after inserting the data into server. The problem with this solution is that if the connection with the server is not established due to network issue than that trigger will not execute or the insert will be wasted. I have checked the following solution for this
Trigger to insert rows in remote database after deletion
The second solution is to replicate tables from system 1 and system 2 to the server's tables. The problem with replication will be that if we delete data from the systems, it'll also delete the records on the server. I could add the alternative trigger on the server's tables which will update on the duplicate table, hence the replicated table can get empty and it'll not effect the data, but it'll make a long tables list if we have more than 200 systems.
The third solution is to write a foreign table using postgres_fdw or dblink and update the data inside the server's tables, but will this effect the data inside the server when we delete the data inside the system's table, right? And what will happen if there is no connectivity with the server?
The forth solution is to write an application in python inside each system which will make a connection to server's database and write the data in real time and if there is no connectivity to the server than it will store the data inside the sys1.table1 or sys2.table2 or whatever the table the data belongs and after the re-connect, the code will send the tables data into server's tables.
Which option will be best according to this scenario? I like the trigger solution best, but is there any way to avoid the data loss in case of dis-connectivity from the server?
I'd go with the fourth solution, or perhaps with the third, as long as it is triggered from outside the database. That way you can easily survive connection loss.
The first solution with triggers has the problems you already detected. It is also a bad idea to start potentially long operations, like data replication across a network of uncertain quality, inside a database transaction. Long transactions mean long locks and inefficient autovacuum.
The second solution may actually also be an option if you you have a recent PostgreSQL versions that supports logical replication. You can use a publication WITH (publish = 'insert,update'), so that DELETE and TRUNCATE are not replicated. Replication can deal well with lost connectivity (for a while), but it is not an option if you want the data at the source to be deleted after they have been replicated.

Replicating data between Postgres DBs

I have a Postgres DB that is used by a chat application. The chat system often truncates these tables when they grow to big but I need this data copied to another Postgres database. I will not be truncating the tables in this DB.
How I can configure a few tables on the chat-system's database to replicate data to another Postgres database. Is there a quick way to accomplish this?
Slony can replicate only select tables, but I'm not sure how it handles truncates, and it can be a pain to configure.
You might also use something like pgpool to send copies of the insert statements to a second database.
You might modify the source of your chat application to do two writes (one to each db) when a new record is created.
You could just write a script in Perl/PHP/Python to read from one and write to another, then fire it by cron so that you're sure it gets run before truncation.
If you only copy a batch of rows every other day, you may be better off with a plain INSERT to a different schema in the same database or a different database in the same database cluster (you need something like dblink for that).
The safest / fastest solution in the same database would be a data-modifying CTE. Something along these lines:
WITH del AS (
DELETE FROM tbl
WHERE <some condition>
RETURNING *
)
INSERT INTO backup.tbl
SELECT * FROM del;
For true replication consider these official sources:
https://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling
https://www.postgresql.org/docs/current/runtime-config-replication.html

Moving Postgres tablespaces and tables across EC2 instance

I have postgres database running on Amazon EC2 instance. I have few tablespaces created for
some monthly tables, such that each table is on individual tablespace. To get the maximum performance, I have created each tablespace on individual amazon ebs volume.
I want to move some of this tables to different instance and database. I will explain it with one example.
Lets say.
I have EC2 instance A with postgres setup as explained above.
I have another Amazon instance B running and I installed postgres on it as well.
I want to create the same table structure for some of the tables present in A on B. I want to detach the volumes from instance A and attach it to instance B.
Also, I want to create tablespaces on instance B, which will point to the newly attached volumes.
And when I start up this newly created postgres, I expect to see the tables populated with data from those volumes(database).
finally I will delete those tables from A
I know I am being rusty in writing, but couldn't find a better way to ask the question.
Is something along these lines is possible? Are there any pointers for achieving something like this?
No.
The data in the tablespace directory is only the data. You also need the metadata that's in the tables in the pg_catalog schema, as well as the information from pg_clog and pg_xlog to access it.
If you want to move things across using volumes, you must move the entire installation at once (all the tablespaces, including pg_default). Otherwise, you need to use pg_dump/pg_restore to transfer the data over.