When using pg-promise (based on node-postgres), a multi-query seems to be atomic.
For example, the following PostgreSQL query does not insert any rows at all even though only the second INSERT fails due to a duplicate id. No transactions are used.
insert into mytable (id) values (1); insert into mytable (id) values (1)
This behavior seems counter-intuitive and differs from that of psql. Is this a bug?
My tests indicate that yes, surprisingly, it is atomic, i.e. if one query fails, they all fail, same as inside a transaction.
I will investigate why that is, and post an update, if I find anything. See the open issue.
UPDATE
The investigation has confirmed that it is indeed how PostgreSQL works when multiple queries are sent in a single string.
Documentation for methods multi and multiResult has been amended accordingly:
The operation is atomic, i.e. all queries are executed in a single transaction, unless there are explicit BEGIN/COMMIT commands included in the query string to divide it into multiple transactions.
Related
Is it possible to perform a query that will SELECT for some values and if those values do not exist, perform an INSERT and return the very same values - in a single query?
Background:
I am writing an application with a large deal of concurrency. At one point a function will check a database to see if a certain key value exists using SELECT. If the key exists, the function can safely exit. If the value does not exist, the function will perform a REST API call to capture the necessary data, then INSERT the values into the database.
This works fine until it is run concurrently. Two threads (I am using Go, so goroutines) will each independently run the SELECT. Since both queries report that the key does not exist, both will independently perform the REST API call and both will attempt to INSERT the values.
Currently, I avoid double-insertion by using a duplicate constraint. However, I would like to avoid even the double API call by having the first query SELECT for the key value and if it does not exist, INSERT a placeholder - then return those values. This way, subsequent SELECT queries report that the key value already exists and will not perform the API calls or INSERT.
In Pseudo-code, something like this:
SELECT values FROM my_table WHERE key=KEY_HERE;
if found;
RETURN SELECTED VALUES;
if not found:
INSERT values, key VALUES(random_placeholder, KEY_HERE) INTO table;
SELECT values from my_table WHERE key=KEY_HERE;
The application code will insert a random value so that a routine/thread can determine if it was the one that generated the new INSERT and will subsequently go ahead and perform the Rest API call.
This is for a Go application using the pgx library.
Thanks!
You could write a stored procedure and it would be a single query for the client to execute. PostgreSQL, of course, would still execute multiple statements. PostgreSQL insert statement can return values with the returning keyword, so you may not need the 2nd select.
Lock the table in an appropriate lock mode.
For example in the strictest possible mode ACCESS EXCLUSIVE:
BEGIN TRANSACTION;
LOCK elbat
IN ACCESS EXCLUSIVE MODE;
SELECT *
FROM elbat
WHERE id = 1;
-- if there wasn't any row returned make the API call and
INSERT INTO elbat
(id,
<other columns>)
VALUES (1,
<API call return values>);
COMMIT;
-- return values the to the application
Once one transaction has acquired the ACCESS EXCLUSIVE lock, no other transaction isn't even reading from the table until the acquiring transaction ends. And ACCESS EXCLUSIVE won't be granted unless there are no other (even weaker) locks. That way the instance of your component that gets the lock first will do the check and the INSERT if necessary. The other one will be blocked in the meantime and the time it finally gets access, the INSERT has already been done in the first transaction, it need not make the API call anymore (unless the first one fails for some reason and rolled back).
If this is too strict for your use case, you may need to find out which lock level might be appropriate for you. Maybe, if you can make any component accessing the database (or at least the table) cooperative (and it sounds like you can do this), even advisory locks are enough.
I have a script that select rows from InfluxDB, and bulk insert it into TimescaleDB.
I am inserting data each 2000 rows, to make it faster.
Thing is when I get one error, all 2000 rows is ignored. Is it possible to insert the 1999 rows, and ignore the failing one ?
Since PostgreSQL implements ACID transactions, the entire transaction is rollbacked on an error. The minimal granularity for transaction is one statement, e.g., INSERT INTO statement with batch of values, and this is default. So if failure happens, it is not possible to ignore it and commit the rest.
I assume you use INSERT INTO statement. It provides ON CONFLICT clause, which can be used in the case if the observed error is due to conflict.
Another work around is to move into a temporal table and then insert into hypertable after cleaning.
BTW, have you looked to Outflux tool from Timescale if it can help?
I assume this question has been asked before, but unfortunately I cannot find the answer to my question.
I have a table, and I am using an update statement to update a column. Simultaneously I am running a create table query with a select statement that is retrieving data from the table and column that is also being updated.
My questions are: can this lead to wrong results in the output of the create table statement? does the update query finish 1st then the create table with the select execute? I just know that the create table statement is taking way longer to execute.
In PostgreSQL readers never lock writers and vice versa. This is guaranteed by PostgreSQL's MVCC implementation that keeps old row versions around.
If the updating transaction isn't finished yet, the reading transaction will see the old value, and the result is consistent.
There is nothing inside PostgreSQL that should slow down the SELECT statement noticeably, but of course I/O contention is a possible explanation.
I have a postgresql (>9.5) table with primary_key id and a unique key col. When I use
INSERT INTO table_a (col) VLUES('xxx') ON CONFLICT(col) DO NOTHING;
to perform a upsert, let's say a row with an id 1 is generated.
If I run the sql again, nothing will happen, but actually the id 2 will be generated and abandoned.
Then if I insert a new record, for example,
INSERT INTO table_a (col) VLUES('yyy') ON CONFLICT(col) DO NOTHING;
Another row with id 3 will be generated and id 2 is wasted!
Is there anyway to avoid this waste?
Presumably id is a serial. Under the hood this causes a nextval() call from a sequence. A number nextval() once returned will never be returned again. And the call of nextval() happens before checking for possible conflicts.
From "9.16. Sequence Manipulation Functions":
nextval
(...)
Important: To avoid blocking concurrent transactions that obtain numbers from the same sequence, a nextval operation is never rolled back; that is, once a value has been fetched it is considered used and will not be returned again. This is true even if the surrounding transaction later aborts, or if the calling query ends up not using the value. For example an INSERT with an ON CONFLICT clause will compute the to-be-inserted tuple, including doing any required nextval calls, before detecting any conflict that would cause it to follow the ON CONFLICT rule instead. Such cases will leave unused "holes" in the sequence of assigned values. Thus, PostgreSQL sequence objects cannot be used to obtain "gapless" sequences.
Concluded that means, that the answer to your question is no, there is no way to avoid this unless you generate the values yourself somehow.
I am creating a table in Postgresql 9.5 where id is the primary key. While inserting rows in the table if anyone tries to insert duplicate id, i want it to get ignored instead of raising exception. Is there any way such that i can set this while table creation itself that duplicate entries get ignored.
There are many techniques to resolve duplicate insertion issue while writing insertion query i.e. using ON CONFLICT DO NOTHING, or using WHERE EXISTS clause etc. But i want to handle this at table creation end so that the person writing insertion query doesn't need to bother any.
Creating RULE is one of the possible solution. Are there other possible solutions? Maybe something like this:
`CREATE TABLE dbo.foo (bar int PRIMARY KEY WITH (FILLFACTOR=90, IGNORE_DUP_KEY = ON))`
Although exact this statement doesn't work on Postgresql 9.5 on my machine.
add a trigger before insert or rule on insert do instead - otherwise has to be handled by inserting query. both solutions will require more resources on each insert.
Alternative way to use function with arguments for insert, that will check for duplicates, so end users will use function instead of INSERT statement.
WHERE EXISTS sub-query is not atomic btw - so you can still have exception after check...
9.5 ON CONFLICT DO NOTHING is the best solution still