Is it possible to merge two Postgres databases

Is it possible to merge two Postgres databases - postgresql

We have two copies of a simple application that is based on SQLite. The application has 10 tables with a variety of relations between the tables. We would like to merge the databases to a single Postgres database with the same schema. We can use Talend to facilitate this, however the issue is that there would be duplicate keys (as both the source databases are independent). Is there a systematic method by which we can insert data into Postgres with the original key plus an offset resulting from loading the first database?

Step 1. Restore the first database.
Step 2. Change foreign keys of all tables by adding the option on update cascade.
For example, if the column table_b.a_id refers to the column table_a.id:
alter table table_b
drop constraint table_b_a_id_fkey,
add constraint table_b_a_id_fkey
foreign key (a_id) references table_a(id)
on update cascade;
Step 3. Update primary keys of the tables by adding the desired offset, e.g.:
update table_a
set id = 10000+ id;
Step 4. Restore the second database.
If you have the possibility to edit the script with database schema (or do the transfer manually with your own script), you can merge steps 1 and 2 and edit the script before the restore (adding the option on update cascade for foreign keys in tables declarations).

Related

How to migrate a PostgreSQL table to partition table referenced by foreign keys

How would you migrate a table, referenced by a foreign key, to a partition table in PostgreSQL?
If I'm reading the docs correctly, version 13 now supports partition tables referenced by foreign keys, whereas in version 11, it explicitly mentioned foreign key references were not supported.
However, given previous solutions on how to migrate to a partition table, it's unclear how foreign key references would be updated.
For example, say I have two tables, Library and Book, where Book has a foreign key column called library_id pointing to a record in Library.
I haven't tested this, but what would be the caveat behind this strategy:
1. Rename table `Library` to `Library_old`.
2. Create partition table called `Library`.
3. Create child tables or use rule to automatically create child tables as needed.
4. Insert all data from `Library_old` into `Library`.
5. Rename column `Book.library_id` (currently pointing to `Library_old`) to `Book.library_id_old`.
5. Create a new column `Book.library_id` pointing to `Library`.
6. Iterate over each `library_id_old` value and update to `library_id`.
7. Delete column `Book.library_id_old`.
8. Delete `Library_old`.
Would this work, or am I missing anything? Is there a better migration plan?

How do I copy data from one table to another, perform schema change and keep them in sync until cut off in Postgres?

I have workloads that have heavy schema changes and other ETL operations that are locking.
Before doing schema changes on my primary table, I would like to first copy the existing contents from the primary table on to a temporary table, then perform the schema change, then sync all new changes and once the "time is right" (cutoff?), do the cut over and have the temporary table become the primary table.
I know that I can use Triggers in postgres to sync data between two tables, and also use COPY to copy data from one table to another.
But I am not sure how can I can copy existing data first, then issue trigger to ensure no data is lost. Then also do the cut off so that the new table is primary.
What I am thinking is -
I issue a COPY table from primary table (TableA) to temp table TableB.
I then perform the schema change in TableB
I then setup Trigger from TableA to TableB for INSERT/UPDATE/DELETE
... Now I am not sure how can I cut off so TableB becomes TableA. I can use RENAME perhaps?
It feels like I can run into some lost changes between Step 1 and Step 2?
Basically I am trying to ensure no data between the three high level operations. Is there a better way to do this?

How to write another query in IN function when partitioning

I have 2 local docker postgresql-10.7 servers set up. On my hot instance, I have a huge table that I wanted to partition by date (I achieved that). The data from the partitioned table (Let's call it PART_TABLE) is stored on the other server, only PART_TABLE_2019 is stored on HOT instance. And here comes the problem. I don't know how to partition 2 other tables that have foreign keys from PART_TABLE, based on FK. PART_TABLE and TABLE2_PART are both stored on HOT instance.
I was thinking something like this:
create table TABLE2_PART_2019 partition of TABLE2_PART for values in (select uuid from PART_TABLE_2019);
But the query doesn't work and I don't know if this is a good idea (performance wise and logically).
Let me just mention that I can solve this with either function or script etc. but I would like to do this without scripting.

From doc at https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE
"While primary keys are supported on partitioned tables, foreign keys
referencing partitioned tables are not supported. (Foreign key
references from a partitioned table to some other table are
supported.)"

With PostgreSQL v10, you can only define foreign keys on the individual partitions. But you could create foreign keys on each partition.
You could upgrade to PostgreSQL v11 which allows foreign keys to be defined on partitioned tables.
Can you explain what a HOT instance is and why it would makes this difficult?

How to nuke a postgres schema?

I'm coming up on the limits of my Postgres SQL knowledge, and I'm quite unsure how to diagnose this issue. Please pardon the noob-ness in my questions; I'm open to updating the question as the (expected) follow-up questions come.
I have a fairly complex database structure, in which under a schema, a number of tables are connected to one another by foreign keys. I unfortunately cannot reveal the schema itself.
One of the tables, let's call it "A", used to store close to 100K records. It's got foreign key relationships to two other tables, one called "B" with also approx. 100K records, and the other called "C" with approx. 100 records. There are 5 more tables as well.
I wanted to drop all of the tables. However, using:
truncate table schema.A cascade
takes a very long time (over 10 minutes without finishing), even though I have already removed all rows from the table (yes, I understand truncate is designed to do that exact operation). This is the first point that I don't understand: why would it take a long time to perform this operation?
Secondly, I tried:
drop table schema.A;
(using Postico, a GUI, rather than by entering SQL commands directly)
That also runs for over 10 minutes without finishing.
Are the foreign key relations the key blocker here?
If I wanted to "just quickly nuke" the schema, and start over from scratch (all of my table schemas are defined in a SQLAlchemy file, so recreating is trivial), would I have to drop the entire schema using admin privileges, or is it possible to do it as a user without admin privileges?

If you want to drop the schema:
DROP SCHEMA schema_name CASCADE
For the default schema:
DROP SCHEMA public CASCADE
To quickly reset a single schema database:
DROP SCHEMA public CASCADE;
CREATE SCHEMA public;

Db2 zos update set of values in tables

There are set of values to update. Example: table t1 has column c1 which has to be updated from 1 to x. There are around 300 such sets available in a file and around 15 such tables with over 100k of records.
What is the optimal way of doing this?
Approaches I can think of are:
individual update statement for old with new value in all tables
programmatically read the file and create dynamic update statement
using merge into table syntax
In one of the tables the column is primary key with tables referencing them as foreign key

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse