Using ##identity in a stored procedure good/bad? - tsql

I have a stored procedure that inserts a record which has Identity column.
Immediately after inserting I am using ##identity to insert a records in child table.
Are there any implications doing that ?

It's usually not as good as SCOPE_IDENTITY, if your version offers this, because ##Identity isn't limited to the current scope.
It will retrieve the most recent identity even if it was from a different sp in a different table.
Pinal Dave has a straightforward explanation of the IDENTITY offerings here:
http://blog.sqlauthority.com/2007/03/25/sql-server-identity-vs-scope_identity-vs-ident_current-retrieve-last-inserted-identity-of-record/

SCOPE_IDENTITY should be used. If the INSERT should fire a trigger that also performs an identity insert, you'll get the wrong value (i.e., the value generated by the trigger's insert) from ##identity.

##IDENTITY is the last identity value inserted for ANY record. If you get high user concurrency, you're going to end up with the wrong identity value, i.e., you'll get a value which another request just inserted.
For the last identity value inserted in the current scope, use SCOPE_IDENTITY.

Just to add my favoriate artilce on this if for nothing else then for its title "Identity Crisis"

Related

PostgreSQL: Return auto-generated ids from COPY FROM insertion

I have a non-empty PostgreSQL table with a GENERATED ALWAYS AS IDENTITY column id. I do a bulk insert with the C++ binding pqxx::stream_to, which I'm assuming uses COPY FROM. My problem is that I want to know the ids of the newly created rows, but COPY FROM has no RETURNING clause. I see several possible solutions, but I'm not sure if any of them is good, or which one is the least bad:
Provide the ids manually through COPY FROM, taking care to give the values which the identity sequence would have provided, then afterwards synchronize the sequence with setval(...).
First stream the data to a temp-table with a custom index column for ordering. Then do something likeINSERT INTO foo (col1, col2)
SELECT ttFoo.col1, ttFoo.col2 FROM ttFoo
ORDER BY ttFoo.idx RETURNING foo.id
and depend on the fact that the identity sequence produces ascending numbers to correlate them with ttFoo.idx (I cannot do RETURNING ttFoo.idx too because only the inserted row is available for that which doesn't contain idx)
Query the current value of the identity sequence prior to insertion, then check afterwards which rows are new.
I would assume that this is a common situation, yet I don't see an obviously correct solution. What do you recommend?
You can find out which rows have been affected by your current transaction using the system columns. The xmin column contains the ID of the inserting transaction, so to return the id values you just copied, you could:
BEGIN;
COPY foo(col1,col2) FROM STDIN;
SELECT id FROM foo
WHERE xmin::text = (txid_current() % (2^32)::bigint)::text
ORDER BY id;
COMMIT;
The WHERE clause comes from this answer, which explains the reasoning behind it.
I don't think there's any way to optimise this with an index, so it might be too slow on a large table. If so, I think your second option would be the way to go, i.e. stream into a temp table and INSERT ... RETURNING.
I think you can create id with type is uuid.
The first step, you should random your ids after that bulk insert them, by this way your will not need to return ids from database.

Avoiding double SELECT/INSERT by INSERT'ing placeholder

Is it possible to perform a query that will SELECT for some values and if those values do not exist, perform an INSERT and return the very same values - in a single query?
Background:
I am writing an application with a large deal of concurrency. At one point a function will check a database to see if a certain key value exists using SELECT. If the key exists, the function can safely exit. If the value does not exist, the function will perform a REST API call to capture the necessary data, then INSERT the values into the database.
This works fine until it is run concurrently. Two threads (I am using Go, so goroutines) will each independently run the SELECT. Since both queries report that the key does not exist, both will independently perform the REST API call and both will attempt to INSERT the values.
Currently, I avoid double-insertion by using a duplicate constraint. However, I would like to avoid even the double API call by having the first query SELECT for the key value and if it does not exist, INSERT a placeholder - then return those values. This way, subsequent SELECT queries report that the key value already exists and will not perform the API calls or INSERT.
In Pseudo-code, something like this:
SELECT values FROM my_table WHERE key=KEY_HERE;
if found;
RETURN SELECTED VALUES;
if not found:
INSERT values, key VALUES(random_placeholder, KEY_HERE) INTO table;
SELECT values from my_table WHERE key=KEY_HERE;
The application code will insert a random value so that a routine/thread can determine if it was the one that generated the new INSERT and will subsequently go ahead and perform the Rest API call.
This is for a Go application using the pgx library.
Thanks!
You could write a stored procedure and it would be a single query for the client to execute. PostgreSQL, of course, would still execute multiple statements. PostgreSQL insert statement can return values with the returning keyword, so you may not need the 2nd select.
Lock the table in an appropriate lock mode.
For example in the strictest possible mode ACCESS EXCLUSIVE:
BEGIN TRANSACTION;
LOCK elbat
IN ACCESS EXCLUSIVE MODE;
SELECT *
FROM elbat
WHERE id = 1;
-- if there wasn't any row returned make the API call and
INSERT INTO elbat
(id,
<other columns>)
VALUES (1,
<API call return values>);
COMMIT;
-- return values the to the application
Once one transaction has acquired the ACCESS EXCLUSIVE lock, no other transaction isn't even reading from the table until the acquiring transaction ends. And ACCESS EXCLUSIVE won't be granted unless there are no other (even weaker) locks. That way the instance of your component that gets the lock first will do the check and the INSERT if necessary. The other one will be blocked in the meantime and the time it finally gets access, the INSERT has already been done in the first transaction, it need not make the API call anymore (unless the first one fails for some reason and rolled back).
If this is too strict for your use case, you may need to find out which lock level might be appropriate for you. Maybe, if you can make any component accessing the database (or at least the table) cooperative (and it sounds like you can do this), even advisory locks are enough.

Way to migrate a create table with sequence from postgres to DB2

I need to migrate a DDL from Postgres to DB2, but I need that it works the same as in Postgres. There is a table that generates values from a sequence, but the values can also be explicitly given.
Postgres
create sequence hist_id_seq;
create table benchmarksql.history (
hist_id integer not null default nextval('hist_id_seq') primary key,
h_c_id integer,
h_c_d_id integer,
h_c_w_id integer,
h_d_id integer,
h_w_id integer,
h_date timestamp,
h_amount decimal(6,2),
h_data varchar(24)
);
(Look at the sequence call in the hist_id column to define the value of the primary key)
The business logic inserts into the table by explicitly providing an ID, and in other cases, it leaves the database to choose the number.
If I change this in DB2 to a GENERATED ALWAYS it will throw errors because there are some provided values. On the other side, if I create the table with GENERATED BY DEFAULT, DB2 will throw an error when trying to insert with the same value (SQL0803N), because the "internal sequence" does not take into account the already inserted values, and it does not retry with a next value.
And, I do not want to restart the sequence each time a provided ID was inserted.
This is the problem in BenchmarkSQL when trying to port it to DB2: https://sourceforge.net/projects/benchmarksql/ (File sqlTableCreates)
How can I implement the same database logic in DB2 as it does in Postgres (and apparently in Oracle)?
You're operating under a misconception: that sources external to the db get to dictate its internal keys. Ideally/conceptually, autogenerated ids will never need to be seen outside of the db, as conceptually there should be unique natural keys for export or reporting. Still, there are times when applications will need to manage some ids, often when setting up related entities (eg, JPA seems to want to work this way).
However, if you add an id value that you generated from a different source, the db won't be able to manage it. How could it? It's not efficient - for one thing, attempting to do so would do one of the following
Be unsafe in the face of multiple clients (attempt to add duplicate keys)
Serialize access to the table (for a potentially slow query, too)
(This usually shows up when people attempt something like: SELECT MAX(id) + 1, which would require locking the entire table for thread safety, likely including statements that don't even touch that column. If you try to find any "first-unused" id - trying to fill gaps - this gets more complicated and problematic)
Neither is ideal, so it's best to not have the problem in the first place. This is usually done by having id columns be autogenerated, but (as pointed out earlier) there are situations where we may need to know what the id will be before we insert the row into the table. Fortunately, there's a standard SQL object for this, SEQUENCE. This provides a db-managed, thread-safe, fast way to get ids. It appears that in PostgreSQL you can use sequences in the DEFAULT clause for a column, but DB2 doesn't allow it. If you don't want to specify an id every time (it should be autogenerated some of the time), you'll need another way; this is the perfect time to use a BEFORE INSERT trigger;
CREATE TRIGGER Add_Generated_Id NO CASCADE BEFORE INSERT ON benchmarksql.history
NEW AS Incoming_Entity
FOR EACH ROW
WHEN Incoming_Entity.id IS NULL
SET id = NEXTVAL FOR hist_id_seq
(something like this - not tested. You didn't specify where in the project this would belong)
So, if you then add a row with something like:
INSERT INTO benchmarksql.history (hist_id, h_data) VALUES(null, 'a')
or
INSERT INTO benchmarksql.history (h_data) VALUES('a')
an id will be generated and attached automatically. Note that ALL ids added to the table must come from the given sequence (as #mustaccio pointed out, this appears to be true even in PostgreSQL), or any UNIQUE CONSTRAINT on the column will start throwing duplicate-key errors. So any time your application needs an id before inserting a row in the table, you'll need some form of
SELECT NEXT VALUE FOR hist_id_seq
FROM sysibm.sysdummy1
... and that's it, pretty much. This is completely thread and concurrency safe, will not maintain/require long-term locks, nor require serialized access to the table.

Insert data from staging table into multiple, related tables?

I'm working on an application that imports data from Access to SQL Server 2008. Currently, I'm using a stored procedure to import the data individually by record. I can't go with a bulk insert or anything like that because the data is inserted into two related tables...I have a bunch of fields that go into the Account table (first name, last name, etc.) and three fields that will each have a record in an Insurance table, linked back to the Account table by the auto-incrementing AccountID that's selected with SCOPE_IDENTITY in the stored procedure.
Performance isn't very good due to the number of round trips to the database from the application. For this and some other reasons I'm planning to instead use a staging table and import the data from there. Reading up on my options for approaching this, a cursor that executes the same insert stored procedure on the data in the staging table would make sense. However it appears that cursors are evil incarnate and should be avoided.
Is there any way to insert data into one table, retrieve the auto-generated IDs, then insert data for the same records into another table using the corresponding ID, in a set-based operation? Or is a cursor my only option here?
Look at the OUTPUT clause. You should be able to add it to your INSERT statement to do what you want.
BTW, if you need to output columns into the second table that weren't inserted into the first one, then use MERGE instead of INSERT (as suggested in the comment to the original question) as its OUTPUT clause supports referencing other columns from the source table(s). Otherwise, keeping it with an INSERT is more straightforward, and it does give you access to the inserted identity column.
I'm having experiment to worked out in inserting multiple record into related table using databinding. So, try this!
Hopefully this is very helpful. Follow this link How to insert record into related tables. for more information.

How to get the key fields for a table in plpgsql function?

I need to make a function that would be triggered after every UPDATE and INSERT operation and would check the key fields of the table that the operation is performed on vs some conditions.
The function (and the trigger) needs to be an universal one, it shouldn't have the table name / fields names hardcoded.
I got stuck on the part where I need to access the table name and its schema part - check what fields are part of the PRIMARY KEY.
After getting the primary key info as already posted in the first answer you can check the code in http://github.com/fgp/pg_record_inspect to get record field values dynamicaly in PL/pgSQL.
Have a look at How do I get the primary key(s) of a table from Postgres via plpgsql? The answer in that one should be able to help you.
Note that you can't use dynamic SQL in PL/pgSQL; it's too strongly-typed a language for that. You'll have more luck with PL/Perl, on which you can access a hash of the columns and use regular Perl accessors to check them. (PL/Python would also work, but sadly that's an untrusted language only. PL/Tcl works too.)
In 8.4 you can use EXECUTE 'something' USING NEW, which in some cases is able to do the job.