'select top' returns too many rows - tsql

insert into
#resultSet
SELECT TOP (#topN)
field1,
field2
FROM
dbo.table1 DataLog
WHERE
DataLog.SelectedForProcessing is null
I'm passing 300 into #topN in the above sql, a value I've got configured in my app.config file, but this query running on 2 different servers has returned 304 rows in one instance and 307 rows in another instance.
I cant find anywhere that may be interfering with the 300, to turn it into 304 or 307, so I'm beginning to wonder whether SQL Server will just return a few extra rows sometimes? (Same code on another server IS returning the expected 300 rows)
Is this expected behaviour?

Test this
declare #topN int = 100;
select #topN ;
delete * from #resultSet;
insert into
#resultSet
SELECT TOP (#topN)
field1,
field2
FROM
dbo.table1 DataLog
WHERE
DataLog.SelectedForProcessing is null;
select count(*)
FROM
dbo.table1 DataLog
WHERE
DataLog.SelectedForProcessing is null;
select count(*) from #resultSet;

SQL Server will consistently return TOP N rows when N is a constant value - no wiggle room there.
I see two possibilities:
#topN is getting a different value on occasion
#resultSet is somehow not empty before having new values inserted
If #resultSet is a variable declared elsewhere in your scripts, check to see that no other INSERT INTO statements might be leaving unnecessary rows.
One easy way to implement this in run-time would be to simply add another command before this INSERT INTO statement:
DELETE #resultSet;
INSERT INTO
#resultSet
SELECT TOP (#topN)
field1,
field2
FROM
dbo.table1 DataLog
WHERE
DataLog.SelectedForProcessing IS NULL
;

Related

Batch return statements in postgreSQL?

I do
INSERT INTO table DEFAULT VALUES RETURNING id
which returns the ID, which is later used in the calling code. But if I do
INSERT INTO table DEFAULT VALUES RETURNING id;
INSERT INTO table DEFAULT VALUES RETURNING id;
INSERT INTO table DEFAULT VALUES RETURNING id;
I only get the latest returned value.
What's a proper way to do it, as an either "do this n times" construct or a union of the above (which should work for any return query)?
You can exploit the (lesser known fact) that you can run a SELECT statement without putting any column into the select list.
insert into the_table --<< no columns here!
select --<< no columns here either!
from generate_series(1,3);
Online example

IF... ELSE... two mutually exclusive inserts INTO #temptable

I need to insert either set A or set B of records into a #temptable, depending on certain condition
My pseudo-code:
IF OBJECT_ID('tempdb..#t1') IS NOT NULL DROP TABLE #t1;
IF {some-condition}
SELECT {columns}
INTO #t1
FROM {some-big-table}
WHERE {some-filter}
ELSE
SELECT {columns}
INTO #t1
FROM {some-other-big-table}
WHERE {some-other-filter}
The two SELECTs above are exclusive (guaranteed by the ELSE operator). However, SQL compiler tries to outsmart me and throws the following message:
There is already an object named '#t1' in the database.
My idea of "fixing" this is to create #t1 upfront and then executing a simple INSERT INTO (instead of SELECT... INTO). But I like minimalism and am wondering whether this can be achieved in an easier way i.e. without explicit CREATE TABLE #t1 upfront.
Btw why is it NOT giving me an error on a conditional DROP TABLE in the first line? Just wondering.
You can't have 2 temp tables with the same name in a single SQL batch. One of the MSDN article says "If more than one temporary table is created inside a single stored procedure or batch, they must have different names". You can have this logic with 2 different temp tables or table variable/temp table declared outside the IF-Else block.
Using a Dyamic sql we can handle this situation. As a developoer its not a good practice. Best to use table variable or temp table.
IF 1=2
BEGIN
EXEC ('SELECT 1 ID INTO #TEMP1
SELECT * FROM #TEMP1
')
END
ELSE
EXEC ('SELECT 2 ID INTO #TEMP1
SELECT * FROM #TEMP1
')

INSERT INTO, return number of rows inserted [duplicate]

My database driver for PostgreSQL 8/9 does not return a count of records affected when executing INSERT or UPDATE.
PostgreSQL offers the non-standard syntax "RETURNING" which seems like a good workaround. But what might be the syntax? The example returns the ID of a record, but I need a count.
INSERT INTO distributors (did, dname) VALUES (DEFAULT, 'XYZ Widgets')
RETURNING did;
I know this question is oooolllllld and my solution is arguably overly complex, but that's my favorite kind of solution!
Anyway, I had to do the same thing and got it working like this:
-- Get count from INSERT
WITH rows AS (
INSERT INTO distributors
(did, dname)
VALUES
(DEFAULT, 'XYZ Widgets'),
(DEFAULT, 'ABC Widgets')
RETURNING 1
)
SELECT count(*) FROM rows;
-- Get count from UPDATE
WITH rows AS (
UPDATE distributors
SET dname = 'JKL Widgets'
WHERE did <= 10
RETURNING 1
)
SELECT count(*) FROM rows;
One of these days I really have to get around to writing a love sonnet to PostgreSQL's WITH clause ...
I agree w/ Milen, your driver should do this for you. What driver are you using and for what language? But if you are using plpgsql, you can use GET DIAGNOSTICS my_var = ROW_COUNT;
http://www.postgresql.org/docs/current/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-DIAGNOSTICS
You can take ROW_COUNT after update or insert with this code:
insert into distributors (did, dname) values (DEFAULT, 'XYZ Widgets');
get diagnostics v_cnt = row_count;
It's not clear from your question how you're calling the statement. Assuming you're using something like JDBC you may be calling it as a query rather than an update. From JDBC's executeQuery:
Executes the given SQL statement, which returns a single ResultSet
object.
This is therefore appropriate when you execute a statement that returns some query results, such as SELECT or INSERT ... RETURNING. If you are making an update to the database and then want to know how many tuples were affected, you need to use executeUpdate which returns:
either (1) the row count for SQL Data Manipulation Language (DML)
statements or (2) 0 for SQL statements that return nothing
You could wrap your query in a transaction and it should show you the count before you ROLLBACK or COMMIT. Example:
BEGIN TRANSACTION;
INSERT .... ;
ROLLBACK TRANSACTION;
If you run the first 2 lines of the above, it should give you the count. You can then ROLLBACK (undo) the insert if you find that the number of affected lines isn't what you expected. If you're satisfied that the INSERT is correct, then you can run the same thing, but replace line 3 with COMMIT TRANSACTION;.
Important note: After you run any BEGIN TRANSACTION; you must either ROLLBACK; or COMMIT; the transaction, otherwise the transaction will create a lock that can slow down or even cripple an entire system, if you're running on a production environment.

Returning multiple SERIAL values from Posgtres batch insert

Im working with Postgres, using SERIAL as my primary key. After I insert a row I can get the generated key either by using 'RETURNING' or CURRVAL().
Now my problem is that I want to do a batch insert inside a transaction and get ALL the generated keys.
All I get with RETURNING and CURRVAL is the last generated id, the rest of the result get discarded.
How can I get it to return all of them?
Thanks
You can use RETURNING with multiple values:
psql=> create table t (id serial not null, x varchar not null);
psql=> insert into t (x) values ('a'),('b'),('c') returning id;
id
----
1
2
3
(3 rows)
So you want something more like this:
INSERT INTO AutoKeyEntity (Name,Description,EntityKey) VALUES
('AutoKey 254e3c64-485e-42a4-b1cf-d2e1e629df6a','Testing 5/4/2011 8:59:43 AM',DEFAULT)
returning EntityKey;
INSERT INTO AutoKeyEntityListed (EntityKey,Listed,ItemIndex) VALUES
(CURRVAL('autokeyentity_entityKey_seq'),'Test 1 AutoKey 254e3c64-485e-42a4-b1cf-d2e1e629df6a', 0),
(CURRVAL('autokeyentity_entityKey_seq'),'Test 2 AutoKey 254e3c64-485e-42a4-b1cf-d2e1e629df6a', 1),
(CURRVAL('autokeyentity_entityKey_seq'),'Test 3 AutoKey 254e3c64-485e-42a4-b1cf-d2e1e629df6a', 2)
returning EntityKey;
-- etc.
And then you'll have to gather the returned EntityKey values from each statement in your transaction.
You could try to grab the sequence's current value at the beginning and end of the transaction and use those to figure out which sequence values were used but that is not reliable:
Furthermore, although multiple sessions are guaranteed to allocate
distinct sequence values, the values might be generated out of
sequence when all the sessions are considered. For example, with a
cache setting of 10, session A might reserve values 1..10 and return
nextval=1, then session B might reserve values 11..20 and return
nextval=11 before session A has generated nextval=2. Thus, with a
cache setting of one it is safe to assume that nextval values are
generated sequentially; with a cache setting greater than one you
should only assume that the nextval values are all distinct, not
that they are generated purely sequentially. Also, last_value will
reflect the latest value reserved by any session, whether or not
it has yet been returned by nextval.
So, even if your sequences have cache values of one you can still have non-contiguous sequence values in your transaction. However, you might be safe if the sequence's cache value matches the number of INSERTs in your transaction but I'd guess that that's going to be too large to make sense.
UPDATE: I just noticed (thanks to the questioner's comments) that there are two tables involved, got a bit lost in the wall of text.
In that case, you should be able to use the current INSERTS:
INSERT INTO AutoKeyEntity (Name,Description,EntityKey) VALUES
('AutoKey 254e3c64-485e-42a4-b1cf-d2e1e629df6a','Testing 5/4/2011 8:59:43 AM',DEFAULT)
returning EntityKey;
INSERT INTO AutoKeyEntityListed (EntityKey,Listed,ItemIndex) VALUES
(CURRVAL('autokeyentity_entityKey_seq'),'Test 1 AutoKey 254e3c64-485e-42a4-b1cf-d2e1e629df6a', 0),
(CURRVAL('autokeyentity_entityKey_seq'),'Test 2 AutoKey 254e3c64-485e-42a4-b1cf-d2e1e629df6a', 1),
(CURRVAL('autokeyentity_entityKey_seq'),'Test 3 AutoKey 254e3c64-485e-42a4-b1cf-d2e1e629df6a', 2);
-- etc.
And grab the EntityKey values one at a time from the INSERTs on AutoEntityKey. Some sort of script might be needed to handle the RETURNING values. You could also wrap the AutoKeyEntity and related AutoKeyEntityListed INSERTs in a function, then use INTO to grab the EntityKey value and return it from the function:
INSERT INTO AutoKeyEntity /*...*/ RETURNING EntityKey INTO ek;
/* AutoKeyEntityListed INSERTs ... */
RETURN ek;
you can pre-assign consecutive ids using this:
SELECT setval(seq, nextval(seq) + num_rows - 1, true) as stop
it should be a faster alternative to calling nextval() gazillions of times.
you could also store ids in a temporary table:
create temporary blah (
id int
) on commit drop;
insert into table1 (...) values (...)
returning id into blah;
in postgres 9.1, can able to use CTEs:
with
ids as (
insert into table1 (...) values (...)
returning id
)
insert into table2 (...)
select ...
from ids;
In your application, gather values from the sequence :
SELECT nextval( ... ) FROM generate_series( 1, number_of_values ) n
Create your rows using those values, and simply insert (using a multiline insert). It's safe (SERIAL works as you'd expect, no reuse of values, concurrent proof, etc) and fast (you insert all the rows at once without many client-server roundtrips).
Replying to Scott Marlowe's comment in more detail :
Say you have a tree table with the usual parent_id reference to itself, and you want to import a large tree of records. Problem is you need the parent's PK value to be known to insert the children, so potentially this can need lots of individual INSERT statements.
So a solution could be :
build the tree in the application
grab as many sequence values as nodes to insert, using "SELECT nextval( ... ) FROM generate_series( 1, number_of_values ) n" (the order of the values does not matter)
assign those primary key values to the nodes
do a bulk insert (or COPY) traversing the tree structure, since the PKs used for relations are known
There are three ways to do this. Use currval(), use returning, or write a stored procdure to wrap either of those methods in a nice little blanket that keeps you from doing it all in half client half postgres.
Currval method:
begin;
insert into table a (col1, col2) values ('val1','val2');
select currval('a_id_seq');
123 -- returned value
-- client code creates next statement with value from select currval
insert into table b (a_fk, col3, col4) values (123, 'val3','val4');
-- repeat the above as many times as needed then...
commit;
Returning method:
begin;
insert into table a (col1, col2) values ('val1','val2'), ('val1','val2'), ('val1','val2') returning a_id; -- note we inserted three rows
123 -- return values
124
126
insert into table b (a_fk, col3, col4) values (123, 'val3','val4'), (124, 'val3','val4'), (126, 'val3','val4');
commit;
Perform a FOR LOOP and process records one by one. It might be less performant but it is concurrency safe.
Example code:
DO $$
DECLARE r record;
BEGIN
FOR r IN SELECT id FROM {table} WHERE {condition} LOOP
WITH idlist AS (
INSERT INTO {anotherTable} ({columns}) VALUES ({values})
RETURNING id
UPDATE {table} c SET {column} = (SELECT id FROM idlist) WHERE c.id = {table}.id;
END LOOP;
END $$;

UPDATE-no-op in SQL MERGE statement

I have a table with some persistent data in it. Now when I query it, I also have a pretty complex CTE which computes the values required for the result and I need to insert missing rows into the persistent table. In the end I want to select the result consisting of all the rows identified by the CTE but with the data from the table if they were already in the table, and I need the information whether a row has been just inserted or not.
Simplified this works like this (the following code runs as a normal query if you like to try it):
-- Set-up of test data, this would be the persisted table
DECLARE #target TABLE (id int NOT NULL PRIMARY KEY) ;
INSERT INTO #target (id) SELECT v.id FROM (VALUES (1), (2)) v(id);
-- START OF THE CODE IN QUESTION
-- The result table variable (will be several columns in the end)
DECLARE #result TABLE (id int NOT NULL, new bit NOT NULL) ;
WITH Source AS (
-- Imagine a fairly expensive, recursive CTE here
SELECT * FROM (VALUES (1), (3)) AS Source (id)
)
MERGE INTO #target AS Target
USING Source
ON Target.id = Source.id
-- Perform a no-op on the match to get the output record
WHEN MATCHED THEN
UPDATE SET Target.id=Target.id
WHEN NOT MATCHED BY TARGET THEN
INSERT (id) VALUES (SOURCE.id)
-- select the data to be returned - will be more columns
OUTPUT source.id, CASE WHEN $action='INSERT' THEN CONVERT(bit, 1) ELSE CONVERT(bit, 0) END
INTO #result ;
-- Select the result
SELECT * FROM #result;
I don't like the WHEN MATCHED THEN UPDATE part, I'd rather leave the redundant update away but then I don't get the result row in the OUTPUT clause.
Is this the most efficient way to do this kind of completing and returning data?
Or would there be a more efficient solution without MERGE, for instance by pre-computing the result with a SELECT and then perform an INSERT of the rows which are new=0? I have difficulties interpreting the query plan since it basically boils down to a "Clustered Index Merge" which is pretty vague to me performance-wise compared to the separate SELECT followed by INSERT variant. And I wonder if SQL Server (2008 R2 with CU1) is actually smart enough to see that the UPDATE is a no-op (e.g. no write required).
You could declare a dummy variable and set its value in the WHEN MATCHED clause.
DECLARE #dummy int;
...
MERGE
...
WHEN MATCHED THEN
UPDATE SET #dummy = 0
...
I believe it should be less expensive than the actual table update.