Insert into table on conflict do update from csv - postgresql

I have a table with two columns, id that is varchar and data that is jsonb. I also have a csv-file with new IDs that I would like to insert into the table, the data I would like to assign to these IDs are identical, and if an ID already exists I would like to update the current data value with the new data. This is what I have done so far:
INSERT INTO "table" ("id", "data")
VALUES ('[IDs from CSV file]', ' {dataObject}')
ON CONFLICT (id) do UPDATE set data='{dataObject}';
I have got it working with a single ID, but I would now like to run this for every ID in my csv-file, hence the array in the example to illustrate this. Is there a way to do this using a query? I was thinking I could create a temporary table and import the IDs there, but I am still not sure how I would utilize that table with my query.

Yes, use a staging table to upload your csv into, make sure to truncate it before uploading. After uploading:
insert into prod_table
select * from csv_upload
on conflict (id) do update
set data = excluded.data;

Don't complicate the process unnecessarily.
Import csv to a temporary table T2
Update T1 where rows match in T2
Insert into T1 from T2 where rows do not match

Related

Delete duplicates from a huge table in Postgresql

I have an unusual problem: I need to delete duplicate records from a table in Postgresql. As i have duplicate records so i dont have primary key and unique index in this table. The table conatins like 20million records and it has duplicate records in it. While i am trying the below query it is taking too long time.
'DELETE FROM temp a using temp b where a.recordid=b.recordid and a.ctid < b.ctid;'
So what should be a better approach to handle such huge table with no index in it?
Appreciate for help.
if you have enough empty space, your can copy table without duplicates, then remove old table and rename new table
like this
INSERT INTO new_table
VALUES
SELECT
DISTINCT ON (column)
*
FROM old_table
ORDER BY column ASC
Use COPY TO to dump the table.
Then Unix sort -u to de-duplicate it.
Drop or truncate the table in Postgres, use COPY FROM to read it back in.
Add a primary key column.

Is there a way to treat a csv like a table to match keys and import data to appropriate rows in postgres?

We have multiple data sources we're trying to merge into a DB table that are not ordered and may not even have matching records. We have a column that is common to both that we'd like to match up and merge the records. I'm looking to find a command that we can write that will do something like:
if column1.table = column1.csvfile then update table set column2.table = column2.csvfile WHERE column1.table = column1.csvfile
Scanning through each row of the CSV.
COPY assumes that your data is in order.
file_fdw is made precisely for this requirement.
Define a foreign table on the CSV file, then you can query it like a regular table.
An easy way to do this is to create a temporary table (let's call it table2) with a structure that matches your CSV file, and COPY the file to there. Then you can run a simple update:
UPDATE table1
SET column2 = table2.column2
FROM table2
WHERE table1.column1 = table2.column1;
And then drop table2 when you're done.

How to clone or copy records in same table in postgres?

How to clone or copy records in same table in PostgreSQL by creating temporary table.
trying to create clones of records from one table to the same table with changed name(which is basically composite key in that table).
You can do it all in one INSERT combined with a SELECT.
i.e. say you have the following table definition and data populated in it:
create table original
(
id serial,
name text,
location text
);
INSERT INTO original (name, location)
VALUES ('joe', 'London'),
('james', 'Munich');
And then you can INSERT doing the kind of switch you're talking about without using a TEMP TABLE, like this:
INSERT INTO original (name, location)
SELECT 'john', location
FROM original
WHERE name = 'joe';
Here's an sqlfiddle.
This should also be faster (although for tiny data sets probably not hugely so in absolute time terms), since it's doing only one INSERT and SELECT as opposed to an extra SELECT and CREATE TABLE plus an UPDATE.
Did a bit of research, came up with a logic :
Create temp table
Copy records into it
Update the records in temp table
Copy it back to original table
CREATE TEMP TABLE temporary AS SELECT * FROM ORIGINAL WHERE NAME='joe';
UPDATE TEMP SET NAME='john' WHERE NAME='joe';
INSERT INTO ORIGINAL SELECT * FROM temporary WHERE NAME='john';
Was wondering if there was any shorter way to do it.

Insert into table with Identity and foreign key columns

I was trying to insert values from one table to another from two different databases.
My issue is I have two tables with a relation and the first table is having an identity column also.
eg table first(id, Name) - table second(id, address)
So now both the table exist with values in a db and i am trying to copy values from this db to another db.
So when I insert values from first db to second db the the first table will insert values for the Id column by itself so now I have to link that id to the second table.
How can I do that?
UPDATE using MSSQL server 2000
You can use #scope_identity immediately after your insert in SQL server 2000 which will give you the last id within the current scope but I'm not sure how that would work with bulk inserting of data
http://msdn.microsoft.com/en-us/library/ms190315.aspx
If this were SQL Server 2005 or later I would suggest using the output clause in your insert statement to retrieve the ids just inserted, but that was not available in SQL Server 2000.
If your data contains some column or series of columns which is unique other than the identity column, then you can query your first table based on that series of columns to get the ids and use that to populate your second table.
If the target tables were empty you could use SET IDENTITY_INSERT ON - this would allow to insert original values to identity columns, and you will not have to update referenced IDs. Of course if there is any existing ids that can overlap inserted ids - that is not the solution.
If names in first tables are unique, you could boild mapping between new and old ids and perform update something like this:
UPDATE S
SET S.id = F.id
FROM second S
INNER JOIN first_original FO ON FO.id = S.id
INNER JOIN first F ON F.name = FO.name
If names are not unique, then original ids should be saved in "first" in order to provide mapping between old and new ids. It can be temporary new column that can be deleted after ids in "second" will be updated.
Or as Rich Andrews said you could use #scope_identity, but in this case you will have to perform insert one by one - declare a cursor on source table, insert each record, get its new id and insert it into "second" table.

Is there a way to quickly duplicate record in T-SQL?

I need to duplicate selected rows with all the fields exactly same except ID ident int which is added automatically by SQL.
What is the best way to duplicate/clone record or records (up to 50)?
Is there any T-SQL functionality in MS SQL 2008 or do I need to select insert in stored procedures ?
The only way to accomplish what you want is by using Insert statements which enumerate every column except the identity column.
You can of course select multiple rows to be duplicated by using a Select statement in your Insert statements. However, I would assume that this will violate your business key (your other unique constraint on the table other than the surrogate key which you have right?) and require some other column to be altered as well.
Insert MyTable( ...
Select ...
From MyTable
Where ....
If it is a pure copy (minus the ID field) then the following will work (replace 'NameOfExistingTable' with the table you want to duplicate the rows from and optionally use the Where clause to limit the data that you wish to duplicate):
SELECT *
INTO #TempImportRowsTable
FROM (
SELECT *
FROM [NameOfExistingTable]
-- WHERE ID = 1
) AS createTable
-- If needed make other alterations to the temp table here
ALTER TABLE #TempImportRowsTable DROP COLUMN Id
INSERT INTO [NameOfExistingTable]
SELECT * FROM #TempImportRowsTable
DROP TABLE #TempImportRowsTable
If you're able to check the duplication condition as rows are inserted, you could put an INSERT trigger on the table. This would allow you to check the columns as they are inserted instead of having to select over the entire table.