On the http://www.bennadel.com/blog/70-SQL-Query-Order-of-Operations.htm site I see that the order of operations are:
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
However, I can't seem to find where INSERT, UPDATE, and DELETE fall into the order. Where do INSERT, UPDATE, and DELETE fall in the Order of Operations?
The link describes the order of operations for SELECT.
INSERT, UPDATE, and DELETE is not part of SELECT.
Related
I have a table in PostgreSQL which I'd like to treat as a queue. I have some selection criteria which I'm using to lock and then delete rows from the table like this:
DELETE FROM queue
WHERE itemid = (
SELECT itemid
FROM queue
ORDER BY itemid
WHERE some_column='some value'
FOR UPDATE SKIP LOCKED
)
RETURNING *;
How does row locking work in PostgreSQL? When the SELECT query is executed will it lock all matching rows atomically? I'm asking this because grouping is important for me and I want to process all rows where some_column='some value' in the same worker.
Clarification: What I really want to know is whether it can happen that two workers are executing the same query (the one above) for the same parameters (some value) and one of them locks a few rows for update and the other worker picks up the rest. This is what I'd like to avoid. What I expect to happen is that one of the workers will get all the rows (if row locking is atomic) and the other one gets nothing. Is this the case?
If two of your queries are running concurrently, each of them can return and delete some of the rows in the table. In that sense, your query is not atomic.
You should serialize your processes, either outside the database or using PostgreSQL advisory locks.
Since you're working on a queuing table, be sure to check out SKIP LOCK:
https://www.2ndquadrant.com/en/blog/what-is-select-skip-locked-for-in-postgresql-9-5/
Please help with my understanding of how triggers and locks can interact
I bulk load records to a table with statements something like this…..
BEGIN;
INSERT INTO table_a VALUES (record1) , (record2), (record3)………;
INSERT INTO table_a VALUES (record91) , (record92), (record93)………;
…..
….
COMMIT;
There can be several hundred records in a single insert, and there can be several dozen INSERT statements between COMMITs
Table_a has a trigger on it defined as….
AFTER INSERT ON table_a FOR EACH ROW EXECUTE PROCEDURE foo();
The procedure foo() parses each new row as it’s added, and will (amongst other stuff) update a record in a summary table_b (uniquely identified by primary key). So, for every record inserted into table_a a corresponding record will be updated in table_b
I have a 2nd process that also attempts to (occasionally) update records in table_b. On very rare occasions it may attempt to update the same row in table_b that the bulk process is updating
Questions – should anything in the bulk insert statements affect my 2nd process being able to update records in table_b? I understand that the bulk insert process will obtain a row lock each time it updates a row in table_b, but when will that row lock be released? – when the individual record (record1, record2, record3 etc etc) has been inserted? Or when the entire INSERT statement has completed? Or when the COMMIT is reached?
Some more info - my overall purpose for this question is to try to understand why my 2nd process occasionally pauses for a minute or more when trying to update a row in table_b that is also being updated by the bulk-load process. What appears to be happening is that the lock on the target record in table_b isn't actually being released until the COMMIT has been reached - which is contrary to what I think ought to be happening. (I think a row-lock should be released as soon as the UPDATE on that row is done)
UPDATE after answer(s) - yes of course you're both right. In my mind I had somehow convinced myself that the individual updates performed within the trigger were somehow separate from the overall BEGIN and COMMIT of the whole transaction. Silly me.
The practice of adding multiple records with one INSERT, and multiple INSERTs between COMMITs was introduced to improve the bulk load speed (which it does) I had forgotten about the side-effect of increasing the time before locks would be released.
What should happen when the transaction is rolled back? It is rather obvious that all inserts on table_a, as well as all updates on table_b, should be rolled back. This is why all rows of table_b updated by the trigger will be locked until the transaction completes.
Committing after each insert (reducing the number of rows inserted in a single transaction) will reduce the chance of conflicts with concurrent processes.
Problem is following: remove all records from one table, and insert them to another.
I have a table that is partitioned by date criteria. To avoid partitioning each record one by one, I'm collecting the data in one table, and periodically move them to another table. Copied records have to be removed from first table. I'm using DELETE query with RETURNING, but the side effect is that autovacuum is having a lot of work to do to clean up the mess from original table.
I'm trying to achieve the same effect (copy and remove records), but without creating additional work for vacuum mechanism.
As I'm removing all rows (by delete without where conditions), I was thinking about TRUNCATE, but it does not support RETURNING clause. Another idea was to somehow configure the table, to automatically remove tuple from page on delete operation, without waiting for vacuum, but I did not found if it is possible.
Can you suggest something, that I could use to solve my problem?
You need to use something like:
--Open your transaction
BEGIN;
--Prevent concurrent writes, but allow concurrent data access
LOCK TABLE table_a IN SHARE MODE;
--Copy the data from table_a to table_b, you can also use CREATE TABLE AS to do this
INSERT INTO table_b AS SELECT * FROM table_a;
--Zeroying table_a
TRUNCATE TABLE table_a;
--Commits and release the lock
COMMIT;
In postgresql: multiple sessions want to get one record from the the table, but we need to make sure they don't interfere with each other. I could do it using message queue: put the data in a queue, and them let each session get data from the queue. But is it doable in postgresql? since it will be easier for SQL guys to cal stored procedure. Any way to configure a stored procedure so that no concurrent calling will happen, or use some special lock?
I would recommend making sure the stored procedure uses SELECT FOR UPDATE, which should prevent the same row in the table from being accessed by multiple transactions.
Per the Postgres doc:
FOR UPDATE causes the rows retrieved by the SELECT statement to be
locked as though for update. This prevents them from being modified or
deleted by other transactions until the current transaction ends. That
is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE,
SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of
these rows will be blocked until the current transaction ends. The FOR
UPDATE lock mode is also acquired by any DELETE on a row, and also by
an UPDATE that modifies the values on certain columns. Currently, the
set of columns considered for the UPDATE case are those that have an
unique index on them that can be used in a foreign key (so partial
indexes and expressional indexes are not considered), but this may
change in the future.
More SELECT info.
So you don't end up locking all of the rows in the table at once (i.e. by SELECTing all of the records), I would recommend you use ORDER BY to sort the table in a consistent manner, and then do a LIMIT 1, so that it only gets the next one in the queue. Also add a WHERE clause that checks for a certain column value (i.e. processed), and then once processed set the column to a value that will prevent the WHERE clause from picking it up.
I am in the unfortunate situation of needing to add triggers to a table to track changes to a legacy system. I have insert, update, and delete triggers on TABLE_A each one of them writes the values of two columns to a TABLE_B, and a bit flag that is set to 1 if populated by the delete trigger.
Every entry in TABLE_B shows up twice. An insert crates two rows, and update creates two rows (we believe), and a delete creates an insert and then a delete.
Is the legacy application doing this, or is SQL doing it?
EDIT (adding more detail):
body of triggers:
.. after delete
INSERT INTO TableB(col1, isdelete) SELECT col1, 1 from DELETED
.. after insert
INSERT INTO TableB(col1, isdelete) SELECT col1, 0 from INSERTED
.. after update
INSERT INTO TableB(col1, isdelete) SELECT col1, 0 from DELETED
I have tried profiler, and do not see any duplicate statements being executed.
It may be that the application is changing the data again when it sees the operations on its data.
It's also possible that triggers exist elsewhere - is there any possiblity that there is a trigger on TableB that is creating extra rows?
More detail would be needed to address the question more fully.