Explain the effect of a parent column in a nested select - tsql

I have a scenario where I need to delete rows from a table using the outcome of a nested select. Like this:
DECLARE #tbl_big TABLE (bigID int);
INSERT INTO #tbl_big (bigID)
VALUES (1),(2),(3),(4),(5);
DECLARE #tbl_small TABLE (smallID int);
INSERT INTO #tbl_small (smallID)
VALUES (1),(2),(3);
DELETE FROM #tbl_big
WHERE (bigID IN (SELECT smallID FROM #tbl_small));
SELECT *
FROM #tbl_big; -- shows 4,5 as expected
However, during development I accidentally made a typo:
DELETE FROM #tbl_big WHERE (bigID IN (SELECT bigID FROM #tbl_small)); --bigID used instead of smallID
SELECT *
FROM #tbl_big; -- no rows
The result was that all rows within the parent table were deleted.
While this may be completely acceptable T-SQL, I've never seen it applied like this, nor would I expect the statement to even compile given that #tbl_small does not contain a bigID column.
Can anybody please clarify why/how this works, and is it valid T-SQL? Also, can you provide a real-world example where this is more useful than risky(!)?

bigID in the DELETE statement you mentioned referes to #tbl_big because it is legal to mention columns from the main table in the sub queries you write in the WHERE clause. For example, you can write the below:
DELETE FROM #tbl_big WHERE (bigID IN (SELECT smallID FROM #tbl_small WHERE smallID = bigID));
So, in your case, you just used all bigID values in your table in the sub query as a constant value.

Related

Is it possible to specify the column list for an INSERT statement in the SELECT statement?

I have to produce a dynamically generated T-SQL script that inserts records into various tables. I've done a bunch of searching and testing but can't seem to find the path I'm looking for.
I know that the following is valid SQL:
INSERT INTO [MyTable] ( [Col1], [Col2], [Col3] )
SELECT N'Val1', N'Val2', N'Val3';
But, is it at all possible to write something akin to this:
INSERT INTO [MyTable]
SELECT [Col1] = N'Val1', [Col2] = N'Val2', [Col3] = N'Val3';
By having the columns in the select statement, I'm able to do it all at once vs writing 2 separate lines. Obviously my idea doesn't work, I'm trying to figure out whether something similar is possible or I need to stick with the first one.
Much appreciated.
Best practice for insert statements is to specify the columns list in the insert clause, and for very good reasons:
It's far more readable. You know exactly what value goes into what column.
You don't have to provide values to nullable \ default valued columns.
You're not bound to the order of the columns in the table.
In case a column is added to the table, your insert statement might not break (It will if the newly added column is not nullable and doesn't have a default value).
In some cases, SQL Server demands you specify the columns list explicitly, like when identity_insert is set to on.
And in any case, the column names or aliases in the select clause of the insert...select statement does not have any effect as to what target columns the value column should go to. values are directed to target based only on their location in the statement.

How does SELECT INTO works with SAS

I'm new with SAS and I try to copy my Code from Access vba into SAS.
In Access I use often the SELECT INTO funtion, but it seems to me this function is not in SAS.
I have two tables and I get each day new data and I want to update my table with the new lines. Now I Need to check if some new lines appear -> if yes insert this lines into the old table.
I tried some Code from stackoverflow and other stuff from Google, but I didn't find something which works.
INSERT INTO OLD_TABLE T
VALUES (GRVID = VTGONR)
FROM NEW_TABLE V
WHERE not exists (SELECT V.VTGONR FROM NEW_TABLE V WHERE T.GRVID = V.VTGONR);
Not sure what the purpose of using the VALUES keyword is in your example. PROC SQL uses VALUES() to list static values. Like:
VALUES (100)
SAS just uses normal SQL syntax instead. See for example: https://www.techonthenet.com/sql/insert.php
To specify the observations to insert just use SELECT. You can add a WHERE clause as part of the select to limit the rows that you select to insert. To tell INSERT which columns to insert into list them inside () after the table name. Otherwise it will expect the order that the columns are listed in the select statement to match the order of the columns in the target table.
insert into old_table(GRVID)
select VTGONR from new_table
where VTGONR not in (select GRVID from old_table)
;

IF... ELSE... two mutually exclusive inserts INTO #temptable

I need to insert either set A or set B of records into a #temptable, depending on certain condition
My pseudo-code:
IF OBJECT_ID('tempdb..#t1') IS NOT NULL DROP TABLE #t1;
IF {some-condition}
SELECT {columns}
INTO #t1
FROM {some-big-table}
WHERE {some-filter}
ELSE
SELECT {columns}
INTO #t1
FROM {some-other-big-table}
WHERE {some-other-filter}
The two SELECTs above are exclusive (guaranteed by the ELSE operator). However, SQL compiler tries to outsmart me and throws the following message:
There is already an object named '#t1' in the database.
My idea of "fixing" this is to create #t1 upfront and then executing a simple INSERT INTO (instead of SELECT... INTO). But I like minimalism and am wondering whether this can be achieved in an easier way i.e. without explicit CREATE TABLE #t1 upfront.
Btw why is it NOT giving me an error on a conditional DROP TABLE in the first line? Just wondering.
You can't have 2 temp tables with the same name in a single SQL batch. One of the MSDN article says "If more than one temporary table is created inside a single stored procedure or batch, they must have different names". You can have this logic with 2 different temp tables or table variable/temp table declared outside the IF-Else block.
Using a Dyamic sql we can handle this situation. As a developoer its not a good practice. Best to use table variable or temp table.
IF 1=2
BEGIN
EXEC ('SELECT 1 ID INTO #TEMP1
SELECT * FROM #TEMP1
')
END
ELSE
EXEC ('SELECT 2 ID INTO #TEMP1
SELECT * FROM #TEMP1
')

use trigger to insert into table if data is not already present

I have two tables with the same structure. Table 1 has multiple rows which can have same values. Now i want to insert the same rows into table 2 excluding duplicate rows. I am able to do this normally using 'minus', but i want to write a trigger such that if a new row is inserted into table 1 and is not present in table 2 then insert in table 2 otherwise not. I am new to triggers. The trigger i have written gives me "trigger is mutating" error when i insert in table 1.
INSERT INTO t3(name1,name2,num1,num2) select name1,name2,num1,num2 from t1 group by name1,name2,num1,num2 minus select * from t3
when i write the above code it works fine but when i include this into my trigger it gives error. How do i perform the above with the help of a trigger?
Please help,
Thanks
Pranay
You don't need to requery the table from a row-level trigger. That's what the :NEW. syntax is for, e.g.:
INSERT INTO t3(name1,name2,num1,num2)
select :NEW.name1,:NEW.name2,:NEW.num1,:NEW.num2 from DUAL
minus select name1,name2,num1,num2 from t3;
Although I think the above code looks a bit silly. I'd prefer to put a unique constraint on t3 then add a handler in the trigger to take care of any DUP_VAL_ON_INDEX exceptions.

how to emulate "insert ignore" and "on duplicate key update" (sql merge) with postgresql?

Some SQL servers have a feature where INSERT is skipped if it would violate a primary/unique key constraint. For instance, MySQL has INSERT IGNORE.
What's the best way to emulate INSERT IGNORE and ON DUPLICATE KEY UPDATE with PostgreSQL?
With PostgreSQL 9.5, this is now native functionality (like MySQL has had for several years):
INSERT ... ON CONFLICT DO NOTHING/UPDATE ("UPSERT")
9.5 brings support for "UPSERT" operations.
INSERT is extended to accept an ON CONFLICT DO UPDATE/IGNORE clause. This clause specifies an alternative action to take in the event of a would-be duplicate violation.
...
Further example of new syntax:
INSERT INTO user_logins (username, logins)
VALUES ('Naomi',1),('James',1)
ON CONFLICT (username)
DO UPDATE SET logins = user_logins.logins + EXCLUDED.logins;
Edit: in case you missed warren's answer, PG9.5 now has this natively; time to upgrade!
Building on Bill Karwin's answer, to spell out what a rule based approach would look like (transferring from another schema in the same DB, and with a multi-column primary key):
CREATE RULE "my_table_on_duplicate_ignore" AS ON INSERT TO "my_table"
WHERE EXISTS(SELECT 1 FROM my_table
WHERE (pk_col_1, pk_col_2)=(NEW.pk_col_1, NEW.pk_col_2))
DO INSTEAD NOTHING;
INSERT INTO my_table SELECT * FROM another_schema.my_table WHERE some_cond;
DROP RULE "my_table_on_duplicate_ignore" ON "my_table";
Note: The rule applies to all INSERT operations until the rule is dropped, so not quite ad hoc.
For those of you that have Postgres 9.5 or higher, the new ON CONFLICT DO NOTHING syntax should work:
INSERT INTO target_table (field_one, field_two, field_three )
SELECT field_one, field_two, field_three
FROM source_table
ON CONFLICT (field_one) DO NOTHING;
For those of us who have an earlier version, this right join will work instead:
INSERT INTO target_table (field_one, field_two, field_three )
SELECT source_table.field_one, source_table.field_two, source_table.field_three
FROM source_table
LEFT JOIN target_table ON source_table.field_one = target_table.field_one
WHERE target_table.field_one IS NULL;
Try to do an UPDATE. If it doesn't modify any row that means it didn't exist, so do an insert. Obviously, you do this inside a transaction.
You can of course wrap this in a function if you don't want to put the extra code on the client side. You also need a loop for the very rare race condition in that thinking.
There's an example of this in the documentation: http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html, example 40-2 right at the bottom.
That's usually the easiest way. You can do some magic with rules, but it's likely going to be a lot messier. I'd recommend the wrap-in-function approach over that any day.
This works for single row, or few row, values. If you're dealing with large amounts of rows for example from a subquery, you're best of splitting it into two queries, one for INSERT and one for UPDATE (as an appropriate join/subselect of course - no need to write your main filter twice)
To get the insert ignore logic you can do something like below. I found simply inserting from a select statement of literal values worked best, then you can mask out the duplicate keys with a NOT EXISTS clause. To get the update on duplicate logic I suspect a pl/pgsql loop would be necessary.
INSERT INTO manager.vin_manufacturer
(SELECT * FROM( VALUES
('935',' Citroën Brazil','Citroën'),
('ABC', 'Toyota', 'Toyota'),
('ZOM',' OM','OM')
) as tmp (vin_manufacturer_id, manufacturer_desc, make_desc)
WHERE NOT EXISTS (
--ignore anything that has already been inserted
SELECT 1 FROM manager.vin_manufacturer m where m.vin_manufacturer_id = tmp.vin_manufacturer_id)
)
INSERT INTO mytable(col1,col2)
SELECT 'val1','val2'
WHERE NOT EXISTS (SELECT 1 FROM mytable WHERE col1='val1')
As #hanmari mentioned in his comment. when inserting into a postgres tables, the on conflict (..) do nothing is the best code to use for not inserting duplicate data.:
query = "INSERT INTO db_table_name(column_name)
VALUES(%s) ON CONFLICT (column_name) DO NOTHING;"
The ON CONFLICT line of code will allow the insert statement to still insert rows of data. The query and values code is an example of inserted date from a Excel into a postgres db table.
I have constraints added to a postgres table I use to make sure the ID field is unique. Instead of running a delete on rows of data that is the same, I add a line of sql code that renumbers the ID column starting at 1.
Example:
q = 'ALTER id_column serial RESTART WITH 1'
If my data has an ID field, I do not use this as the primary ID/serial ID, I create a ID column and I set it to serial.
I hope this information is helpful to everyone.
*I have no college degree in software development/coding. Everything I know in coding, I study on my own.
Looks like PostgreSQL supports a schema object called a rule.
http://www.postgresql.org/docs/current/static/rules-update.html
You could create a rule ON INSERT for a given table, making it do NOTHING if a row exists with the given primary key value, or else making it do an UPDATE instead of the INSERT if a row exists with the given primary key value.
I haven't tried this myself, so I can't speak from experience or offer an example.
This solution avoids using rules:
BEGIN
INSERT INTO tableA (unique_column,c2,c3) VALUES (1,2,3);
EXCEPTION
WHEN unique_violation THEN
UPDATE tableA SET c2 = 2, c3 = 3 WHERE unique_column = 1;
END;
but it has a performance drawback (see PostgreSQL.org):
A block containing an EXCEPTION clause is significantly more expensive
to enter and exit than a block without one. Therefore, don't use
EXCEPTION without need.
On bulk, you can always delete the row before the insert. A deletion of a row that doesn't exist doesn't cause an error, so its safely skipped.
For data import scripts, to replace "IF NOT EXISTS", in a way, there's a slightly awkward formulation that nevertheless works:
DO
$do$
BEGIN
PERFORM id
FROM whatever_table;
IF NOT FOUND THEN
-- INSERT stuff
END IF;
END
$do$;