Within transaction: Use inserted id for reference to next insertion - postgresql

I want to create a wallet for a user, when the user itself is being created. Ideally i want this to happen within a transaction, as one may not exist without the other.
I want something like this, in Ecto paradigm.
BEGIN;
INSERT INTO albums [...];
INSERT INTO album_images (lastval(), image_id) [...];
COMMIT;
Taken from https://github.com/elixir-ecto/ecto/issues/2154.
How would achieve such?

Consider using Multi and at the end you would put all the things processed in the Multi to Repo.transaction().
Ecto.Multi will help you to organize this flow, because Multi.run accepts Multi structure that contains result of previous computation - that's why you can safely use it, because if first operation fails, the second one will be rejected by the transaction as well.
The best way to write it is to put the business operations to separate functions, where the second one accept Multi with the name of the result of previous operation.
Multi.new
|> Multi.insert(:albums, insert_albums(arguments))
|> Multi.run(:album_images, AlbumImage, :insert, [])
|> Repo.transaction()
where AlbumImage.insert might look like:
defmodule AlbumImage do
def insert(%{albums: albums}) do
# code
end
end

Related

JDBC batch for multiple prepared statements

Is it possible to batch together commits from multiple JDBC prepared statements?
In my app the user will insert one or more records along with records in related tables. For example, we'll need to update a record in the "contacts" table, delete related records in the "tags" table, and then insert a fresh set of tags.
UPDATE contacts SET name=? WHERE contact_id=?;
DELETE FROM tags WHERE contact_id=?;
INSERT INTO tags (contact_id,tag) values (?,?);
// insert more tags as needed here...
These statements need to be part of a single transaction, and I want to do them in a single round trip to the server.
To send them in a single round-trip, there are two choices: for each command create a Statement and then call .addBatch(), or for each command create a PreparedStatement, and then call .setString(), .setInt() etc. for parameter values, then call .addBatch().
The problem with the first choice is that sending a full SQL string in the .addBatch() call is inefficient and you don't get the benefit of sanitized parameter inputs.
The problem with the second choice is that it may not preserve the order of the SQL statements. For example,
Connection con = ...;
PreparedStatement updateState = con.prepareStatement("UPDATE contacts SET name=? WHERE contact_id=?;");
PreparedStatement deleteState = con.prepareStatement("DELETE FROM contacts WHERE contact_id=?;");
PreparedStatement insertState = con.prepareStatement("INSERT INTO tags (contact_id,tag) values (?,?);");
updateState.setString(1, "Bob");
updateState.setInt(1, 123);
updateState.addBatch();
deleteState.setInt(1, 123);
deleteState.addBatch();
... etc ...
... now add more parameters to updateState, and addBatch()...
... repeat ...
con.commit();
In the code above, are there any guarantees that all of the statements will execute in the order we called .addBatch(), even across different prepared statements? Ordering is obviously important; we need to delete tags before we insert new ones.
I haven't seen any documentation that says that ordering of statements will be preserved for a given connection.
I'm using Postgres and the default Postgres JDBC driver, if that matters.
The batch is per statement object, so a batch is executed per executeBatch() call on a Statement or PreparedStatement object. In other words, this only executes the statements (or value sets) associated with the batch of that statement object. It is not possible to 'order' execution across multiple statement objects. Within an individual batch, the order is preserved.
If you need statements executed in a specific order, then you need to explicitly execute them in that order. This either means individual calls to execute() per value set, or using a single Statement object and generating the statements in the fly. Due to the potential of SQL injection, this last approach is not recommended.

Postgres 'if not exists' fails because the sequence exists

I have several counters in an application I am building, as am trying to get them to be dynamically created by the application as required.
For a simplistic example, if someone types a word into a script it should return the number of times that word has been entered previously. Here is an example of sql that may be executed if they typed the word example.
CREATE SEQUENCE IF NOT EXISTS example START WITH 1;
SELECT nextval('example')
This would return 1 the first time it ran, 2 the second time, etc.
The problem is when 2 people click the button at the same time.
First, please note that a lot more is happening in my application than just these statements, so the chances of them overlapping is much more significant than it would be if this was all that was happening.
1> BEGIN;
2> BEGIN;
1> CREATE SEQUENCE IF NOT EXISTS example START WITH 1;
2> CREATE SEQUENCE IF NOT EXISTS example START WITH 1; -- is blocked by previous statement
1> SELECT nextval('example') -- returns 1 to user.
1> COMMIT; -- unblocks second connection
2> ERROR: duplicate key value violates unique constraint
"pg_type_typname_nsp_index"
DETAIL: Key (typname, typnamespace)=(example, 109649) already exists.
I was under the impression that by using "IF NOT EXISTS", the statement should just be a no-op if it does exist, but it seems to have this race condition where that is not the case. I say race condition because if these two are not executed at the same time, it works as one would expect.
I have noticed that IF NOT EXISTS is fairly new to postgres, so maybe they haven't worked out all of the kinks yet?
EDIT:
The main reason we were considering doing things this way was to avoid excess locking. The thought being that if two people were to increment at the same time, using a sequence would mean that neither user should have to wait for the other (except, as in this example, for the initial creation of that sequence)
Sequences are part of the database schema. If you find yourself modifying the schema dynamically based on the data stored in the database, you are probably doing something wrong. This is especially true for sequences, which have special properties e.g. regarding their behavior with respect to transactions. Specifically, if you increment a sequence (with the help of nextval) in the middle of a transaction and then you rollback that transaction, the value of the sequence will not be rolled back. So most likely, this kind of behavior is something that you don't want with your data. In your example, imagine that a user tries to add word. This results in the corresponding sequence being incremented. Now imagine that the transaction does not complete for reason (e.g. maybe the computer crashes) and it gets rolled back. You would end up with the word not being added to the database but with the sequence being incremented.
For the particular example that you mentioned, there is an easy solution; create an ordinary table to store all the "sequences". Something like that would do it:
CREATE TABLE word_frequency (
word text NOT NULL UNIQUE,
frequency integer NOT NULL
);
Now I understand that this is just an example, but if this approach doesn't work for your actual use case, let us know and we can adjust it to your needs.
Edit: Here's how you the above solution works. If a new word is added, run the following query ("UPSERT" syntax in postgres 9.5+ only):
INSERT INTO word_frequency(word,frequency)
VALUES ('foo',1)
ON CONFLICT (word)
DO UPDATE
SET frequency = word_frequency.frequency + excluded.frequency
RETURNING frequency;
This query will insert a new word in word_frequency with frequency 1, or if the word exists already it will increment the existing frequency by 1. Now what happens if two transaction try to do that at the same time? Consider the following scenario:
client 1 client 2
-------- --------
BEGIN
BEGIN
UPSERT ('foo',1)
UPSERT ('foo',1) <====
COMMIT
COMMIT
What will happen is that as soon as client 2 tries increment the frequency for foo (marked with the arrow above), that operation will block because the row was modified by a different transaction. When client 1 commits, client 2 will get unblocked and continue without any errors. This is exactly how we wanted it to work. Also note, that postgresql will use row-level locking to implement this behavior, so other insertions will not be blocked.
EDIT: The main reason we were considering doing things this way was to
avoid excess locking. The thought being that if two people were to
increment at the same time, using a sequence would mean that neither
user should have to wait for the other (except, as in this example,
for the initial creation of that sequence)
It sounds like you're optimizing for a problem that likely does not exist. Sure, if you have 100,000 simultaneous users that are only inserting rows (since a sequence will only be used then normally) there is the possibility of some contention with the sequence but realistically there will be other bottle necks long before the sequence gets in the way.
I'd advise you to first prove that the sequence is an issue. With a proper database design (which dynamic DDL is not) the sequence will not be the bottle neck.
As a reference, DDL is not transaction safe in most databases.

How to find out if a sequence was initialized in this session?

I need to read the current value of a sequence in a function. However, for the first time in each session I try to use currval(), I get following error:
currval of sequence "foo_seq" is not yet defined in this session
Hint for those who might find this question by googling for this error: you need to initialize the sequence for each session, either by nextval() or setval().
I could use something like lastval() or even setval('your_table_id_seq', (SELECT MAX(id) FROM your_table)); instead, but this seems seems either prone to gaps or slower than simple currval(). My aim is to avoid gaps and inconsistencies (I know some of the values will be added manually), so using nextval() before logic handling them is not ideal for my purpose. I would need this to initialize the sequence for the session anyway, but I would prefer to do something like this:
--start of the function here
IF is_not_initialized THEN
SELECT setval('foo_seq', (SELECT MAX(id) FROM bar_table)) INTO _current;
ELSE
SELECT currval('foo_seq') INTO _current;
END IF;
--some magic with the _current variable and nextvalue() on the right position
The point is that I have no idea how might "is_not_initialized" look like and whether is it possible at all. Is there any function or other trick to do it?
EDIT: Actually, my plan is to let each group of customers choose between proper sequence, no sequence at all, and the strange "something like a sequence" I'm asking for now. Even if the customer wanted such a strange sequence, it would be used only for the columns where it is needed - usually because there are some analog data and we need to store their keys (usually almost gapless sequence) into the DB for backward compatibility.
Anyway, you are right that this is hardly proper solution and that no sequence might be better than such a messy workaround in those situations, so I'll think (and discuss with customers) again whether it is really needed.
Craig, a_horse and pozs have provided information which can help you understand principles of using sequences. Apart from the question how are you going to use it, here is a function which returns current value of a sequence if it has been initialized or null otherwise.
If a sequence seq has not been initialized yet, currval(seq) raises exception with sqlstate 55000.
create or replace function current_seq_value(seq regclass)
returns integer language plpgsql
as $$
begin
begin
return (select currval(seq));
exception
when sqlstate '55000' then return null;
end;
end $$;
select current_seq_value('my_table_id_seq')
My aim is to avoid gaps and inconsistencies
You cannot use sequences if you want to avoid gaps. Nor can you reasonably use sequences if you want to assign some values manually.
The approach you are taking is unsound. It will not work. Forget about it, it isn't going to do what you think it's going to do.
I just wrote a sample implementation of a trivial gapless sequence generator for someone a few days ago, and there's a more complete one in this question.
You need to understand that unlike true sequences, gapless sequences are transactional. A consequence is that only one running transaction can have an uncommitted ID. If 100 concurrent transactions try to get IDs, only one of them will actually get the ID. The others will have to wait until that one commits or rolls back. So they're terrible for concurrency, especially if combined with long running transactions. They can also cause deadlocks if you use multiple different gapless sequences and different transactions might access them in different orders.
So think carefully whether you really need this.
Read: PostgreSQL gapless sequences

ExecuteSprocAccessor does not function for CUD operations?

I have several stored procedures in my database. For example a delete stored procedure like:
alter procedure [dbo].[DeleteFactor]
#Id uniqueidentifier
as
begin
delete from Factors where Id = #Id
end
When I call this from code like this:
dc.ExecuteSprocAccessor("DeleteFactor", id);
then the row does not get deleted. However this code functions:
dc.ExecuteNonQuery("DeleteFactor", id);
id is a passed in parameter and of type Guid.
Can anyone explain why the second does work and the first approach does not? I find it quite strange as the first method is clearly to be used with stored procedures.
According to Retrieving Data as Objects, the ExecuteSprocAccessor method uses deferred execution (ala LINQ). So, in the first approach, since you are not accessing the results of the DeleteFactor stored procedure the SQL call is not being made.
I would use the second method anyway since you really are executing a non-query. Also, the first approach may lead to some confusion since the ExecuteSprocAccessor is designed to retrieve data. e.g. "Is data supposed to be returned here? Maybe something was missed?"
Just call ToArray or ToList on the result of your ExecuteSprocAccessor to make it execute.

Execute statements for every record in a table

I have a temporary table (or, say, a function which returns a table of values).
I want to execute some statements for each record in the table.
Can this be done without using cursors?
I'm not opposed to cursors, but would like a more elegant syntax\way of doing it.
Something like this randomly made-up syntax:
for (select A,B from #temp) exec DoSomething A,B
I'm using Sql Server 2005.
I dont think what you want to to is that easy.
What i have found is that you can create a scalar function taking the arguments A and B and then from within the function execute an Extended Stored Procedure. This might achieve what you want to do, but it seems that this might make the code even more complex.
I think for readibility and maintainability, you should stick to the CURSOR implementation.
I would look into changing the stored proc so that it can work against a set of data rather than a single row input.
Would CROSS/OUTER APPLY do what you want if you need RBAR processing.
It's elegant, but depends on what processing you need to do