T-SQL Replacement for Access Normalization Using Record Set? - tsql

I'm relatively new to T-SQL, so I hope someone with more experience/knowledge can help.
I have inherited an Access database that I'm moving to SQL Server. The original database imports and normalizes transaction data from Excel files in the following steps:
imports the Excel file to a staging table,
updates tables related to several of the columns if any new values are found, and
finally moves the data to the main table, but with inner joins to the PK columns on the updated tables from step 2 replacing the actual values.
Step 2 above makes use of a "normalizing" table:
CREATE TABLE [dbo].[tblNormalize](
[Normalize_ID] [int] IDENTITY(1,1) NOT NULL,
[Table_Raw] [nvarchar](255) NULL,
[Field_Raw] [nvarchar](255) NULL,
[Table_Normal] [nvarchar](255) NULL,
[Field_Normal] [nvarchar](255) NULL,
[Data_Type] [nvarchar](255) NULL,
CONSTRAINT [tblNormalize$ID] PRIMARY KEY CLUSTERED
(
[Normalize_ID] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
[Table_Raw] is the name of the staging table.
[Field_Raw] is the name of the field in the staging table - necessary since the field name could be different from what's in the tables to be updated.
[Table_Normal] is the name of the table to be updated.
[Field_Normal] is the name of the field to be updated.
For example, if one of the values in the Location column of the staging table is "Tennessee", this step would check the corresponding Location column in the Location table to make sure that "Tennessee" exists, and if not, inserts it as a new record and creates a new primary key.
So my question: How do I accomplish this step in T-SQL, without using a record set in Access? I've figured out how to use MERGE in a stored procedure to do it for individual columns with the relevant tables, but still using a record set in VBA to move through each row of the normalizing table while calling the stored procedure. (All the tables now reside on on SQL Server, and I've linked to them in Access using ODBC.) Here's what I have so far:
VBA:
Public Function funTestNormalize(strTableRaw As String)
'---Normalizes the data in the tblPayrollStaging table after it's been imported, using the dbo_tblNormalize table---
Dim db As Database, rst As Recordset, qdef As DAO.QueryDef
Set db = CurrentDb
Set rst = db.OpenRecordset("Select * From dbo_tblNormalize WHERE Table_Raw = '" & strTableRaw & "';", dbOpenDynaset, dbSeeChanges)
'Cycle through each row of dbo_tblNormalize (corresponds to the fields in tblPayrollStaging)
If Not rst.EOF Then
rst.MoveFirst
DoCmd.SetWarnings False
Set qdef = CurrentDb.QueryDefs("qryPassThru") 'Sets the QueryDef
qdef.Connect = CurrentDb.TableDefs("dbo_tblSheet").Connect 'Assigns a connection to the QueryDef
qdef.ReturnsRecords = False 'Avoids the "3065 error"
Do Until rst.EOF
With qdef
.SQL = "EXEC uspUpdateNormalizingTables " & rst![Table_Raw] & ", " & rst![Field_Raw] & ", " & rst![Table_Normal] & ", " & rst!Field_Normal & ";" 'Sets the .SQL value to the needed T-SQL
.Execute dbFailOnError 'Executes the QueryDef
End With
rst.MoveNext
Loop
End If
rst.Close
End Function
SQL Server (using SSMS):
CREATE PROCEDURE [dbo].[uspUpdateNormalizingTables]
-- Add the parameters for the stored procedure here
#tableRaw nvarchar(50),
#fieldRaw nvarchar(50),
#tableNormal nvarchar(50),
#fieldNormal nvarchar(50)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from interfering with SELECT statements.
SET NOCOUNT ON
EXEC('INSERT INTO ' + #tableNormal + ' (' + #fieldNormal + ')' +
' SELECT DISTINCT ' + #tableRaw + '.' + #fieldRaw +
' FROM ' + #tableRaw +
' WHERE (NOT EXISTS (SELECT ' + #fieldNormal + ' FROM ' + #tableNormal + ' WHERE ' + #tableNormal + '.' + #fieldNormal + ' = ' + #tableRaw + '.' + #fieldRaw + ')) AND (' + #tableRaw + '.' + #fieldRaw + ' IS NOT NULL);')
END
GO
Would I need to use cursors (which I haven't used yet, and would have to figure out), or is there maybe a more elegant solution which I haven't considered? Any help you can give is appreciated!

Related

How to create PK in thousands of datases where not exist

I have trouble working out how to create a pk on thousands of databases. I have tried using
sp_ineachdb by Aaron Bertrand, but it only works on the first database. I need that the script that finds the PK's to be created, runs against the current Db, which doesn't seem to be the case.
DECLARE #PKScript2 VARCHAR(max)='';
SELECT #PKScript2 += ' ALTER TABLE ' + QUOTENAME(SCHEMA_NAME(obj.SCHEMA_ID))+'.'+
QUOTENAME(obj.name) + ' ADD CONSTRAINT PK_'+ obj.name+
' PRIMARY KEY CLUSTERED (' + QUOTENAME(icol.name) + ')' + CHAR(13)
FROM sys.identity_columns icol INNER JOIN
sys.objects obj on icol.object_id= obj.object_id
WHERE NOT EXISTS (SELECT * FROM sys.key_constraints k
WHERE k.parent_object_id = obj.object_id AND k.type = 'PK')
AND obj.type = 'U'
Order by obj.name
PRINT (#PKScript2);
EXEC [master].[dbo].[sp_ineachdb] #command = #PKScript2, #database_list= '[vosk][vpb][vpbk][vsb][vsh][vst]'
For the sake of the example, I have only used 6 databases.

Codefluent SQL Server producer

We are using the SQL Server producer. We want an index for each foreign key column. SQL Server does not put indexes onto foreign key columns automatically. How can we create an index for each foreign key column automatically? Should we code an aspect for this?
CodeFluent Entities does not generate indices by default. However you can set index="true" on a property:
<cf:property name="Customer" index="true" />
And use the SQL Server Template Engine and the template provided by CodeFluent Entities "C:\Program Files (x86)\SoftFluent\CodeFluent\Modeler\Templates\SqlServer\[Template]CreateIndexes.sql" to create indices.
If you don't want to add index=true on each property, you can change the template to automatically include all properties, or you can write an aspect to add the attribute (this is more complex).
Another solution is to use a SQL script:
DECLARE #SQL NVARCHAR(max)
SET #SQL = ''
SELECT #SQL = #SQL +
'IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N''[dbo].[' + tab.name + ']'') AND name = N''IX_' + cols.name + ''')' + CHAR(13)+CHAR(10) +
'CREATE NONCLUSTERED INDEX [IX_' + cols.name + '] ON [dbo].[' + tab.name + ']( [' + cols.name + '] ASC ) ON [PRIMARY];' + CHAR(13)+CHAR(10)
FROM sys.foreign_keys keys
INNER JOIN sys.foreign_key_columns keyCols ON keys.object_id = keyCols.constraint_object_id
INNER JOIN sys.columns cols ON keyCols.parent_object_id = cols.object_id AND keyCols.parent_column_id = cols.column_id
INNER JOIN sys.tables tab ON keyCols.parent_object_id = tab.object_id
ORDER BY tab.name, cols.name
EXEC(#SQL)

SQL 2008R2: What's the fastest way to do a 'INSERT INTO target <all columns except two> SELECT <all columns> FROM source'

I'm doing a SQL-to-SQL conversion, and have 50+ tables to convert from old (source) to new (target) database. I think the answer is 'there's no really fast way to do this', but I'll ask the question anyways.
Each 'group' has..
Two Source tables: Anywhere from 10 to 700 columns.
These two tables have the same schema, although some columns have different collations.
Target tables: Number of columns = Columns in source tables + 2, as I added start_dt and end_dt.
I can't do a 'INSERT INTO Target SELECT * FROM Source' because of the two extra columns.
Question: What's the fastest way to do a 'INSERT INTO target SELECT FROM source'
Using a view in the designer I don't see a way to select all and have it show all columns, and then just remove the two I don't need. * displays as * instead of all column names.
I'll entertain third party apps on this one.
Thanks.
Jim
In SQL Server Management studio, expand your table. You will see a couple nodes appear below the table name. Columns, Keys, Constraints, etc... Drag the "Column" node in to a query window and all of the columns will be added to the query window. Tack on your 2 extra columns and execute it.
This is still somewhat manual, but it will save you a ton of typing.
Always try to stay away from SELECT *. In SSMS right click on the table/view -> script as -> select. (The wording may not be exact. I am working from memory.) Then you don't have to type out all the fields.
If you mean the speed of getting the data over.
If you can on the destination:
turn off all triggers
drop indexes
change the recovery of the DB to
bulk logged
Do the inserts in batches
What about using SSIS?
How many records are you talking about?
Question was answered here.
DECLARE #source_table sysname
DECLARE #target_table sysname
DECLARE #col_list varchar(max)
DECLARE #sql varchar(max)
--naturally replaced by cursor table loop in final code
SET #source_table = 'dbs'
SET #target_table = 'dbs'
SELECT #col_list = STUFF((
SELECT ', ' + src.name
FROM sys.columns src
INNER JOIN sys.columns trg ON
trg.name = src.name
WHERE
src.object_id = OBJECT_ID(#source_table) AND
trg.object_id = OBJECT_ID(#target_table)
ORDER BY src.column_id
FOR XML PATH('')
), 1, 1, '')
SET #sql = 'INSERT INTO ' + #target_table + ' ( ' + #col_list + ' ) SELECT ' + #col_list + ' FROM ' + #source_table + ' '
EXEC (#sql)
SET #sql = 'INSERT INTO ' + #target_table + ' ( ' + #col_list + ' ) SELECT ' + #col_list + ' FROM ' + #source_table + '_History '
EXEC (#sql)

How to add a column in TSQL after a specific column?

I have a table:
MyTable
ID
FieldA
FieldB
I want to alter the table and add a column so it looks like:
MyTable
ID
NewField
FieldA
FieldB
In MySQL I would so a:
ALTER TABLE MyTable ADD COLUMN NewField int NULL AFTER ID;
One line, nice, simple, works great. How do I do this in Microsoft's world?
Unfortunately you can't.
If you really want them in that order you'll have to create a new table with the columns in that order and copy data. Or rename columns etc. There is no easy way.
solution:
This will work for tables where there are no dependencies on the changing table which would trigger cascading events. First make sure you can drop the table you want to restructure without any disastrous repercussions. Take a note of all the dependencies and column constraints associated with your table (i.e. triggers, indexes, etc.). You may need to put them back in when you are done.
STEP 1: create the temp table to hold all the records from the table you want to restructure. Do not forget to include the new column.
CREATE TABLE #tmp_myTable
( [new_column] [int] NOT NULL, <-- new column has been inserted here!
[idx] [bigint] NOT NULL,
[name] [nvarchar](30) NOT NULL,
[active] [bit] NOT NULL
)
STEP 2: Make sure all records have been copied over and that the column structure looks the way you want.
SELECT TOP 10 * FROM #tmp_myTable ORDER BY 1 DESC
-- you can do COUNT(*) or anything to make sure you copied all the records
STEP 3: DROP the original table:
DROP TABLE myTable
If you are paranoid about bad things could happen, just rename the original table (instead of dropping it). This way it can be always returned back.
EXEC sp_rename myTable, myTable_Copy
STEP 4: Recreate the table myTable the way you want (should match match the #tmp_myTable table structure)
CREATE TABLE myTable
( [new_column] [int] NOT NULL,
[idx] [bigint] NOT NULL,
[name] [nvarchar](30) NOT NULL,
[active] [bit] NOT NULL
)
-- do not forget any constraints you may need
STEP 5: Copy the all the records from the temp #tmp_myTable table into the new (improved) table myTable.
INSERT INTO myTable ([new_column],[idx],[name],[active])
SELECT [new_column],[idx],[name],[active]
FROM #tmp_myTable
STEP 6: Check if all the data is back in your new, improved table myTable. If yes, clean up after yourself and DROP the temp table #tmp_myTable and the myTable_Copy table if you chose to rename it instead of dropping it.
You should be able to do this if you create the column using the GUI in Management Studio. I believe Management studio is actually completely recreating the table, which is why this appears to happen.
As others have mentioned, the order of columns in a table doesn't matter, and if it does there is something wrong with your code.
In SQL Enterprise Management Studio, open up your table, add the column where you want it, and then -- instead of saving the change -- generate the change script. You can see how it's done in SQL.
In short, what others have said is right. SQL Management studio pulls all your data into a temp table, drops the table, recreates it with columns in the right order, and puts the temp table data back in there. There is no simple syntax for adding a column in a specific position.
/*
Script to change the column order of a table
Note this will create a new table to replace the original table.
WARNING : Original Table could be dropped.
HOWEVER it doesn't copy the triggers or other table properties - just the data
*/
Generate a new table with the columns in the order that you require
Select Column2, Column1, Column3 Into NewTable from OldTable
Delete the original table
Drop Table OldTable;
Rename the new table
EXEC sp_rename 'NewTable', 'OldTable';
In Microsoft SQL Server Management Studio (the admin tool for MSSQL) just go into "design" on a table and drag the column to the new position. Not command line but you can do it.
This is absolutely possible. Although you shouldn't do it unless you know what you are dealing with.
Took me about 2 days to figure it out.
Here is a stored procedure where i enter:
---database name
(schema name is "_" for readability)
---table name
---column
---column data type
(column added is always null, otherwise you won't be able to insert)
---the position of the new column.
Since I'm working with tables from SAM toolkit (and some of them have > 80 columns) , the typical variable won't be able to contain the query. That forces the need of external file. Now be careful where you store that file and who has access on NTFS and network level.
Cheers!
USE [master]
GO
/****** Object: StoredProcedure [SP_Set].[TrasferDataAtColumnLevel] Script Date: 8/27/2014 2:59:30 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [SP_Set].[TrasferDataAtColumnLevel]
(
#database varchar(100),
#table varchar(100),
#column varchar(100),
#position int,
#datatype varchar(20)
)
AS
BEGIN
set nocount on
exec ('
declare #oldC varchar(200), #oldCDataType varchar(200), #oldCLen int,#oldCPos int
create table Test ( dummy int)
declare #columns varchar(max) = ''''
declare #columnVars varchar(max) = ''''
declare #columnsDecl varchar(max) = ''''
declare #printVars varchar(max) = ''''
DECLARE MY_CURSOR CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY FOR
select column_name, data_type, character_maximum_length, ORDINAL_POSITION from ' + #database + '.INFORMATION_SCHEMA.COLUMNS where table_name = ''' + #table + '''
OPEN MY_CURSOR FETCH NEXT FROM MY_CURSOR INTO #oldC, #oldCDataType, #oldCLen, #oldCPos WHILE ##FETCH_STATUS = 0 BEGIN
if(#oldCPos = ' + #position + ')
begin
exec(''alter table Test add [' + #column + '] ' + #datatype + ' null'')
end
if(#oldCDataType != ''timestamp'')
begin
set #columns += #oldC + '' , ''
set #columnVars += ''#'' + #oldC + '' , ''
if(#oldCLen is null)
begin
if(#oldCDataType != ''uniqueidentifier'')
begin
set #printVars += '' print convert('' + #oldCDataType + '',#'' + #oldC + '')''
set #columnsDecl += ''#'' + #oldC + '' '' + #oldCDataType + '', ''
exec(''alter table Test add ['' + #oldC + ''] '' + #oldCDataType + '' null'')
end
else
begin
set #printVars += '' print convert(varchar(50),#'' + #oldC + '')''
set #columnsDecl += ''#'' + #oldC + '' '' + #oldCDataType + '', ''
exec(''alter table Test add ['' + #oldC + ''] '' + #oldCDataType + '' null'')
end
end
else
begin
if(#oldCLen < 0)
begin
set #oldCLen = 4000
end
set #printVars += '' print #'' + #oldC
set #columnsDecl += ''#'' + #oldC + '' '' + #oldCDataType + ''('' + convert(character,#oldCLen) + '') , ''
exec(''alter table Test add ['' + #oldC + ''] '' + #oldCDataType + ''('' + #oldCLen + '') null'')
end
end
if exists (select column_name from INFORMATION_SCHEMA.COLUMNS where table_name = ''Test'' and column_name = ''dummy'')
begin
alter table Test drop column dummy
end
FETCH NEXT FROM MY_CURSOR INTO #oldC, #oldCDataType, #oldCLen, #oldCPos END CLOSE MY_CURSOR DEALLOCATE MY_CURSOR
set #columns = reverse(substring(reverse(#columns), charindex('','',reverse(#columns)) +1, len(#columns)))
set #columnVars = reverse(substring(reverse(#columnVars), charindex('','',reverse(#columnVars)) +1, len(#columnVars)))
set #columnsDecl = reverse(substring(reverse(#columnsDecl), charindex('','',reverse(#columnsDecl)) +1, len(#columnsDecl)))
set #columns = replace(replace(REPLACE(#columns, '' '', ''''), char(9) + char(9),'' ''), char(9), '''')
set #columnVars = replace(replace(REPLACE(#columnVars, '' '', ''''), char(9) + char(9),'' ''), char(9), '''')
set #columnsDecl = replace(replace(REPLACE(#columnsDecl, '' '', ''''), char(9) + char(9),'' ''),char(9), '''')
set #printVars = REVERSE(substring(reverse(#printVars), charindex(''+'',reverse(#printVars))+1, len(#printVars)))
create table query (id int identity(1,1), string varchar(max))
insert into query values (''declare '' + #columnsDecl + ''
DECLARE MY_CURSOR CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY FOR '')
insert into query values (''select '' + #columns + '' from ' + #database + '._.' + #table + ''')
insert into query values (''OPEN MY_CURSOR FETCH NEXT FROM MY_CURSOR INTO '' + #columnVars + '' WHILE ##FETCH_STATUS = 0 BEGIN '')
insert into query values (#printVars )
insert into query values ( '' insert into Test ('')
insert into query values (#columns)
insert into query values ( '') values ( '' + #columnVars + '')'')
insert into query values (''FETCH NEXT FROM MY_CURSOR INTO '' + #columnVars + '' END CLOSE MY_CURSOR DEALLOCATE MY_CURSOR'')
declare #path varchar(100) = ''C:\query.sql''
declare #query varchar(500) = ''bcp "select string from query order by id" queryout '' + #path + '' -t, -c -S '' + ##servername + '' -T''
exec master..xp_cmdshell #query
set #query = ''sqlcmd -S '' + ##servername + '' -i '' + #path
EXEC xp_cmdshell #query
set #query = ''del '' + #path
exec xp_cmdshell #query
drop table ' + #database + '._.' + #table + '
select * into ' + #database + '._.' + #table + ' from Test
drop table query
drop table Test ')
END
Even if the question is old, a more accurate answer about Management Studio would be required.
You can create the column manually or with Management Studio. But Management Studio will require to recreate the table and will result in a time out if you have too much data in it already, avoid unless the table is light.
To change the order of the columns you simply need to move them around in Management Studio. This should not require (Exceptions most likely exists) that Management Studio to recreate the table since it most likely change the ordination of the columns in the table definitions.
I've done it this way on numerous occasion with tables that I could not add columns with the GUI because of the data in them. Then moved the columns around with the GUI of Management Studio and simply saved them.
You will go from an assured time out to a few seconds of waiting.
If you are using the GUI to do this you must deselect the following option allowing the table to be dropped,
Create New Add new Column Table Script ex: [DBName].[dbo].[TableName]_NEW
COPY old table data to new table: INSERT INTO newTable ( col1,col2,...) SELECT col1,col2,... FROM oldTable
Check records old and new are the same:
DROP old table
rename newtable to oldtable
rerun your sp add new colum value
-- 1. Create New Add new Column Table Script
CREATE TABLE newTable
( [new_column] [int] NOT NULL, <-- new column has been inserted here!
[idx] [bigint] NOT NULL,
[name] [nvarchar](30) NOT NULL,
[active] [bit] NOT NULL
)
-- 2. COPY old table data to new table:
INSERT INTO newTable ([new_column],[idx],[name],[active])
SELECT [new_column],[idx],[name],[active]
FROM oldTable
-- 3. Check records old and new are the same:
select sum(cnt) FROM (
SELECT 'table_1' AS table_name, COUNT(*) cnt FROM newTable
UNION
SELECT 'table_2' AS table_name, -COUNT(*) cnt FROM oldTable
) AS cnt_sum
-- 4. DROP old table
DROP TABLE oldTable
-- 5. rename newtable to oldtable
USE [DB_NAME]
EXEC sp_rename newTable, oldTable
You have to rebuild the table. Luckily, the order of the columns doesn't matter at all!
Watch as I magically reorder your columns:
SELECT ID, Newfield, FieldA, FieldB FROM MyTable
Also this has been asked about a bazillion times before.

Postgres: INSERT if does not exist already

I'm using Python to write to a postgres database:
sql_string = "INSERT INTO hundred (name,name_slug,status) VALUES ("
sql_string += hundred + ", '" + hundred_slug + "', " + status + ");"
cursor.execute(sql_string)
But because some of my rows are identical, I get the following error:
psycopg2.IntegrityError: duplicate key value
violates unique constraint "hundred_pkey"
How can I write an 'INSERT unless this row already exists' SQL statement?
I've seen complex statements like this recommended:
IF EXISTS (SELECT * FROM invoices WHERE invoiceid = '12345')
UPDATE invoices SET billed = 'TRUE' WHERE invoiceid = '12345'
ELSE
INSERT INTO invoices (invoiceid, billed) VALUES ('12345', 'TRUE')
END IF
But firstly, is this overkill for what I need, and secondly, how can I execute one of those as a simple string?
Postgres 9.5 (released since 2016-01-07) offers an "upsert" command, also known as an ON CONFLICT clause to INSERT:
INSERT ... ON CONFLICT DO NOTHING/UPDATE
It solves many of the subtle problems you can run into when using concurrent operation, which some other answers propose.
How can I write an 'INSERT unless this row already exists' SQL statement?
There is a nice way of doing conditional INSERT in PostgreSQL:
INSERT INTO example_table
(id, name)
SELECT 1, 'John'
WHERE
NOT EXISTS (
SELECT id FROM example_table WHERE id = 1
);
CAVEAT This approach is not 100% reliable for concurrent write operations, though. There is a very tiny race condition between the SELECT in the NOT EXISTS anti-semi-join and the INSERT itself. It can fail under such conditions.
One approach would be to create a non-constrained (no unique indexes) table to insert all your data into and do a select distinct from that to do your insert into your hundred table.
So high level would be. I assume all three columns are distinct in my example so for step3 change the NOT EXITS join to only join on the unique columns in the hundred table.
Create temporary table. See docs here.
CREATE TEMPORARY TABLE temp_data(name, name_slug, status);
INSERT Data into temp table.
INSERT INTO temp_data(name, name_slug, status);
Add any indexes to the temp table.
Do main table insert.
INSERT INTO hundred(name, name_slug, status)
SELECT DISTINCT name, name_slug, status
FROM hundred
WHERE NOT EXISTS (
SELECT 'X'
FROM temp_data
WHERE
temp_data.name = hundred.name
AND temp_data.name_slug = hundred.name_slug
AND temp_data.status = status
);
Unfortunately, PostgreSQL supports neither MERGE nor ON DUPLICATE KEY UPDATE, so you'll have to do it in two statements:
UPDATE invoices
SET billed = 'TRUE'
WHERE invoices = '12345'
INSERT
INTO invoices (invoiceid, billed)
SELECT '12345', 'TRUE'
WHERE '12345' NOT IN
(
SELECT invoiceid
FROM invoices
)
You can wrap it into a function:
CREATE OR REPLACE FUNCTION fn_upd_invoices(id VARCHAR(32), billed VARCHAR(32))
RETURNS VOID
AS
$$
UPDATE invoices
SET billed = $2
WHERE invoices = $1;
INSERT
INTO invoices (invoiceid, billed)
SELECT $1, $2
WHERE $1 NOT IN
(
SELECT invoiceid
FROM invoices
);
$$
LANGUAGE 'sql';
and just call it:
SELECT fn_upd_invoices('12345', 'TRUE')
This is exactly the problem I face and my version is 9.5
And I solve it with SQL query below.
INSERT INTO example_table (id, name)
SELECT 1 AS id, 'John' AS name FROM example_table
WHERE NOT EXISTS(
SELECT id FROM example_table WHERE id = 1
)
LIMIT 1;
Hope that will help someone who has the same issue with version >= 9.5.
Thanks for reading.
You can make use of VALUES - available in Postgres:
INSERT INTO person (name)
SELECT name FROM person
UNION
VALUES ('Bob')
EXCEPT
SELECT name FROM person;
I know this question is from a while ago, but thought this might help someone. I think the easiest way to do this is via a trigger. E.g.:
Create Function ignore_dups() Returns Trigger
As $$
Begin
If Exists (
Select
*
From
hundred h
Where
-- Assuming all three fields are primary key
h.name = NEW.name
And h.hundred_slug = NEW.hundred_slug
And h.status = NEW.status
) Then
Return NULL;
End If;
Return NEW;
End;
$$ Language plpgsql;
Create Trigger ignore_dups
Before Insert On hundred
For Each Row
Execute Procedure ignore_dups();
Execute this code from a psql prompt (or however you like to execute queries directly on the database). Then you can insert as normal from Python. E.g.:
sql = "Insert Into hundreds (name, name_slug, status) Values (%s, %s, %s)"
cursor.execute(sql, (hundred, hundred_slug, status))
Note that as #Thomas_Wouters already mentioned, the code above takes advantage of parameters rather than concatenating the string.
There is a nice way of doing conditional INSERT in PostgreSQL using WITH query:
Like:
WITH a as(
select
id
from
schema.table_name
where
column_name = your_identical_column_value
)
INSERT into
schema.table_name
(col_name1, col_name2)
SELECT
(col_name1, col_name2)
WHERE NOT EXISTS (
SELECT
id
FROM
a
)
RETURNING id
we can simplify the query using upsert
insert into invoices (invoiceid, billed)
values ('12345', 'TRUE')
on conflict (invoiceid) do
update set billed=EXCLUDED.billed;
INSERT .. WHERE NOT EXISTS is good approach. And race conditions can be avoided by transaction "envelope":
BEGIN;
LOCK TABLE hundred IN SHARE ROW EXCLUSIVE MODE;
INSERT ... ;
COMMIT;
It's easy with rules:
CREATE RULE file_insert_defer AS ON INSERT TO file
WHERE (EXISTS ( SELECT * FROM file WHERE file.id = new.id)) DO INSTEAD NOTHING
But it fails with concurrent writes ...
The approach with the most upvotes (from John Doe) does somehow work for me but in my case from expected 422 rows i get only 180.
I couldn't find anything wrong and there are no errors at all, so i looked for a different simple approach.
Using IF NOT FOUND THEN after a SELECT just works perfectly for me.
(described in PostgreSQL Documentation)
Example from documentation:
SELECT * INTO myrec FROM emp WHERE empname = myname;
IF NOT FOUND THEN
RAISE EXCEPTION 'employee % not found', myname;
END IF;
psycopgs cursor class has the attribute rowcount.
This read-only attribute specifies the number of rows that the last
execute*() produced (for DQL statements like SELECT) or affected (for
DML statements like UPDATE or INSERT).
So you could try UPDATE first and INSERT only if rowcount is 0.
But depending on activity levels in your database you may hit a race condition between UPDATE and INSERT where another process may create that record in the interim.
Your column "hundred" seems to be defined as primary key and therefore must be unique which is not the case. The problem isn't with, it is with your data.
I suggest you insert an id as serial type to handly the primary key
If you say that many of your rows are identical you will end checking many times. You can send them and the database will determine if insert it or not with the ON CONFLICT clause as follows
INSERT INTO Hundred (name,name_slug,status) VALUES ("sql_string += hundred
+",'" + hundred_slug + "', " + status + ") ON CONFLICT ON CONSTRAINT
hundred_pkey DO NOTHING;" cursor.execute(sql_string);
INSERT INTO invoices (invoiceid, billed) (
SELECT '12345','TRUE' WHERE NOT EXISTS (
SELECT 1 FROM invoices WHERE invoiceid='12345' AND billed='TRUE'
)
)
I was looking for a similar solution, trying to find SQL that work work in PostgreSQL as well as HSQLDB. (HSQLDB was what made this difficult.) Using your example as a basis, this is the format that I found elsewhere.
sql = "INSERT INTO hundred (name,name_slug,status)"
sql += " ( SELECT " + hundred + ", '" + hundred_slug + "', " + status
sql += " FROM hundred"
sql += " WHERE name = " + hundred + " AND name_slug = '" + hundred_slug + "' AND status = " + status
sql += " HAVING COUNT(*) = 0 );"
Here is a generic python function that given a tablename, columns and values, generates the upsert equivalent for postgresql.
import json
def upsert(table_name, id_column, other_columns, values_hash):
template = """
WITH new_values ($$ALL_COLUMNS$$) as (
values
($$VALUES_LIST$$)
),
upsert as
(
update $$TABLE_NAME$$ m
set
$$SET_MAPPINGS$$
FROM new_values nv
WHERE m.$$ID_COLUMN$$ = nv.$$ID_COLUMN$$
RETURNING m.*
)
INSERT INTO $$TABLE_NAME$$ ($$ALL_COLUMNS$$)
SELECT $$ALL_COLUMNS$$
FROM new_values
WHERE NOT EXISTS (SELECT 1
FROM upsert up
WHERE up.$$ID_COLUMN$$ = new_values.$$ID_COLUMN$$)
"""
all_columns = [id_column] + other_columns
all_columns_csv = ",".join(all_columns)
all_values_csv = ','.join([query_value(values_hash[column_name]) for column_name in all_columns])
set_mappings = ",".join([ c+ " = nv." +c for c in other_columns])
q = template
q = q.replace("$$TABLE_NAME$$", table_name)
q = q.replace("$$ID_COLUMN$$", id_column)
q = q.replace("$$ALL_COLUMNS$$", all_columns_csv)
q = q.replace("$$VALUES_LIST$$", all_values_csv)
q = q.replace("$$SET_MAPPINGS$$", set_mappings)
return q
def query_value(value):
if value is None:
return "NULL"
if type(value) in [str, unicode]:
return "'%s'" % value.replace("'", "''")
if type(value) == dict:
return "'%s'" % json.dumps(value).replace("'", "''")
if type(value) == bool:
return "%s" % value
if type(value) == int:
return "%s" % value
return value
if __name__ == "__main__":
my_table_name = 'mytable'
my_id_column = 'id'
my_other_columns = ['field1', 'field2']
my_values_hash = {
'id': 123,
'field1': "john",
'field2': "doe"
}
print upsert(my_table_name, my_id_column, my_other_columns, my_values_hash)
The solution in simple, but not immediatly.
If you want use this instruction, you must make one change to the db:
ALTER USER user SET search_path to 'name_of_schema';
after these changes "INSERT" will work correctly.