Merging postgres data

Merging postgres data - postgresql

I have data in two postgresql databases that needs to be merged into 1. Just to be clear, both databases have "good" data in them from a certain date that needs to be combined. This isn't merely appending the data from one into another. In other words, let's say that table foo has an serial id field. Both databases have a foo with ID=5555 and both values are valid (but different). So, the target database's foo keeps 5555 and the new record should get added with a new ID of nextval(foo_id_seq).
So, it's a big mess.
My thoughts are to create a tmp schema in the target db and to copy the needed data from source db. Then I need to essentially "upsert" the data. New records get inserted with new ideas (and foreign keys updated) and records that exist in both dbs get updated.
I don't believe there is a tool that will help me with this.
My questions.
How best to handle generating the new id? I know I could do it via selects and just leaving out the id column, but that's a lot of typing and would be slow. My thinking is to create a temporary trigger for these tables that will override the id supplied when doing an insert.
Finally notes:
Both databases are offline. And I'm the only one that can get to them.
Both database have the exact same schema
Target database is 9.2

try using something like:
INSERT INTO A(id, f1, f2)
SELECT nextval('A_seq'), tmp_A.f1, tmp_A.f2
FROM tmp_A
WHERE tmp_A.id IN (select A.id FROM A);
INSERT INTO A(id, f1, f2)
SELECT tmp_A.id, tmp_A.f1, tmp_A.f2
FROM tmp_A
WHERE tmp_A.id NOT IN (select A.id FROM A);
The idea - use one INSERT .. SELECT .. to insert the data with conflicts in id fields and other INSERT .. SELECT .. to insert the data without the conflict.
Or simply generate new id for every inserted record:
INSERT INTO A(id, f1, f2)
SELECT nextval('A_seq'), tmp_A.f1, tmp_A.f2
FROM tmp_A;

Related

Is this the correct way to bulk INSERT ON CONFLICT in Postgres?

I will provide a simplified example of my problem.
I have two tables: reviews and users.
reviews is updated with a bunch of reviews that users post. The process that fetches the reviews also returns information for the user that submitted it (and certain user data changes frequently).
I want to update users whenever I update reviews, in bulk, using COPY. The issue arises for users when the fetched data contains two or more reviews from the same user. If I do a simple INSERT ON CONFLICT, I might end up with errors since and INSERT statement cannot update the same row twice.
A SELECT DISTINCT would solve that problem, but I also want to guarantee that I insert the latest data into the users table. This is how I am doing it. Keep in mind I am doing this in bulk:
1. Create a temporary table so that we can COPY to/from it.
CREATE TEMPORARY TABLE users_temp (
id uuid,
stat_1 integer,
stat_2 integer,
account_age_in_mins integer);
2. COPY data into temporary table
COPY users_temp (
id,
stat_1,
stat_2,
account_age_in_mins) FROM STDIN CSV ENCODING 'utf-8';
3. Lock users table and perform INSERT ON CONFLICT
LOCK TABLE users in EXCLUSIVE MODE;
INSERT INTO users SELECT DISTINCT ON (1)
users_temp.id,
users_temp.stat_1,
users_temp.stat_2,
users_temp.account_age_in_mins
FROM users_temp
ORDER BY 1, 4 DESC, 2, 3
ON CONFLICT (id) DO UPDATE
SET
stat_1 = EXCLUDED.stat_1,
stat_2 = EXCLUDED.stat_2,
account_age_in_mins = EXCLUDED.account_age_in_mins';
The reason I am doing a SELECT DISTINCT and an ORDER BY in step 3) is because I:
Only want to return one instance of the duplicated rows.
From those
duplicates make sure that I get the most up to date record by
sorting on the account_age_in_mins.
Is this the correct method to achieve my goal?

This is a very good approach.
Maybe you can avoid the table-lock, when you lock only tuples you have in your temporary table.
https://dba.stackexchange.com/questions/106121/locking-in-postgres-for-update-insert-combination

Inserting into postgres database

I am working trying to write an insert query into a backup database. I writing place and entities tables into this database. The issue is entities is linked to place via place.id column. I added a column place.original_id in the place table to store it's original 'id'. so now that i entered place into the new database it's id column changed but i have the original id stored so I can still link entities table to it. I am trying to figure out how to write entities to get the new id
so far i am at this point:
insert into entities_backup (id, place_id)
select
nextval('public.entities_backup_id_seq'),
(select id from places where original_id = (select place_id from entities) as place_id
from
entities
I know I am missing something because this does not work. I need to grab the id column from places when entity.place_id = places.original_id. Any help would be great.

I think this is what you want
insert into entities_backup (id, place_id)
select nextval('public.entities_backup_id_seq'), places.id
from places, entities
where places.original_id = entities.place_id;

I am working trying to write an insert query into a backup database. I writing place and entities tables into this database. The issue is entities is linked to place via place.id column. I added a column place.original_id in the place table to store it's original 'id'. so now that i entered place into the new database it's id column changed but i have the original id stored so I can still link entities table to it.
It would be simpler to not have this problem in the first place.
Rather than trying to fix this up after the fact, the better solution is to dump and load places and entities complete with their primary and foreign keys intact. Oracle's EXPORT or a utility such as ora2pg should be able to do it.
Sorry I can't say more. I know Postgres, not Oracle.

Merging two tables in PostgreSQL using `INSERT ... SELECT ... WHERE NOT IN(..)`

All,
I am trying to bulk insert some data in a table using the COPY TO command and I can't seem to get around the unique key error. Here's my workflow.
Create a dump of the data I want to move to another server
COPY (
SELECT *
FROM mytable
WHERE created_at >= '2012-10-01')
TO 'D:\tmp\file.txt'
Create a new "temp" table in the target DB then COPY the data like so.
COPY temp FROM 'D:\tmp\file.txt'
I now want to move the data from the "temp" table in to the master table in the target DBlike so.
INSERT INTO master SELECT * FROM temp
WHERE id NOT IN (SELECT id FROM master)
This runs fine but nothing gets inserted and no fields are updated. Does anyone have a clue what might be going on here? The schemas for temp and master are identical. Any help on this matter would be great! I am using Postgresql 9.2
Adam

This can happen if there's a null value in the IN list.
In SQL, the presence of a null when making comparisons is always false (you need the special IN NULL test to get a match). This has the unfortunate consequence of making the entire list not match if there's any null values returned from SELECT id FROM master.
See if there are any rows returned from this query:
SELECT id
FROM master
WHERE id is null;
If not, then this isn't your problem.
If there are values, then the fix is to exclude null ids from the list:
INSERT INTO master
SELECT *
FROM temp
WHERE id NOT IN (SELECT id FROM master where id is not null)
The other thing to consider is that there are simply no values not already inserted!

Cascade new IDs to a second, related table

I have two tables, Contacts and Contacts_Detail. I am importing records into the Contacts table and need to run a SP to create a record in the Contacts_Detail table for each new record in the Contacts. There is an ID in the Contacts table and a matching ID_D in the Contacts_Detail table.
I'm using this to insert the record into Contacts_Detail but get the 'Subquery returned more than 1 value.' error and I can't figure out why. There are multiple records in Contacts that need have matching records in Contacts_Detail.
Insert into Contacts_Detail (ID_D)
select id from Contacts c
left join Contacts_Detail cd
on c.id = cd.id_d
where id_d is null
I'm open to a better way...
thanks.

It sounds like you're inserting blank child-records into your Contacts_Detail table -- so the first question I'd ask is: Why?
As for why your specific SQL isn't working...
A few things you can check:
Contacts table -- do you have any records there WHERE id is null?
(delete them -- then make the id field a primary key)
Contacts_Detail
table -- do you have any records there WHEERE id_d is null?
(delete them -- then go into your designer and create a relationship
/ enforce referential integrity.)
Verify that c.id is the primary
key, and cd.id_d is the correct foreign key to relate the tables.
Hope that helps

Why not just have a trigger? This seems a little simpler than having to determine for all time which rows are missing - that seems more like something you would do periodically to correct for some anomalies, not something you should have to do after every insert. Something like this should work:
CREATE TRIGGER dbo.NewContacts
ON dbo.Contacts
FOR INSERT
AS
BEGIN
INSERT dbo.Contacts_Detail(ID_D) SELECT ID FROM inserted;
END
GO
But I suspect you have a trigger on the Contacts_Detail table that is not written to correctly handle multi-row inserts, and that's where your subquery error is coming from. Can you show the trigger on Contacts_Detail?

Insert into table with Identity and foreign key columns

I was trying to insert values from one table to another from two different databases.
My issue is I have two tables with a relation and the first table is having an identity column also.
eg table first(id, Name) - table second(id, address)
So now both the table exist with values in a db and i am trying to copy values from this db to another db.
So when I insert values from first db to second db the the first table will insert values for the Id column by itself so now I have to link that id to the second table.
How can I do that?
UPDATE using MSSQL server 2000

You can use #scope_identity immediately after your insert in SQL server 2000 which will give you the last id within the current scope but I'm not sure how that would work with bulk inserting of data
http://msdn.microsoft.com/en-us/library/ms190315.aspx

If this were SQL Server 2005 or later I would suggest using the output clause in your insert statement to retrieve the ids just inserted, but that was not available in SQL Server 2000.
If your data contains some column or series of columns which is unique other than the identity column, then you can query your first table based on that series of columns to get the ids and use that to populate your second table.

If the target tables were empty you could use SET IDENTITY_INSERT ON - this would allow to insert original values to identity columns, and you will not have to update referenced IDs. Of course if there is any existing ids that can overlap inserted ids - that is not the solution.
If names in first tables are unique, you could boild mapping between new and old ids and perform update something like this:
UPDATE S
SET S.id = F.id
FROM second S
INNER JOIN first_original FO ON FO.id = S.id
INNER JOIN first F ON F.name = FO.name
If names are not unique, then original ids should be saved in "first" in order to provide mapping between old and new ids. It can be temporary new column that can be deleted after ids in "second" will be updated.
Or as Rich Andrews said you could use #scope_identity, but in this case you will have to perform insert one by one - declare a cursor on source table, insert each record, get its new id and insert it into "second" table.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Merging postgres data - postgresql

Related

Is this the correct way to bulk INSERT ON CONFLICT in Postgres?

Inserting into postgres database

Merging two tables in PostgreSQL using `INSERT ... SELECT ... WHERE NOT IN(..)`

Cascade new IDs to a second, related table

Insert into table with Identity and foreign key columns

Categories

Resources