question about PostgreSql update with subquery in concurrent scenarios - postgresql

seems postgresql have some issue in transaction when update with subquery
there are one record in table with column lock_id is null;
TESTCASE1
1、excute update table set lock_id = 1 where id in (select id from table where lock_id is null order by creation_date limit 100) but no commit;
2、excute update table set lock_id = 2 where id in (select id from table where lock_id is null order by creation_date limit 100) but no commit;
3、commit step 1;
4、commit step 2;
5 query the column lock_id ,result is 2
TESTCASE2
1、excute update table set lock_id = 1 where lock_id is null but no commit;
2、excute update table set lock_id = 2 where lock_id is null but no commit;
3、commit step 1;
4、commit step 2;
5 query the column lock_id ,result is 1
seems when there are subquery in update's condition,the second update will cover up the first one
For me,I have some concurrent scenarios, two machine will try to excute update like testcase1 to lock 100 records and then query the records for subsequent processings by lock_id,but sometimes two machine both handle a same record;
How to avoid this behaviour?

Related

Postgres - fill in missing data in new table

Given two tables, A and B:
A B
----- -----
id id
high high
low low
bId
I want to find rows in table A where bId is null, create an entry in B based off the data in A, and update the row in A to reference the newly created row. I can create the rows but I'm having trouble updating table A with the reference to the new row:
begin transaction;
with rows as (
insert into B (high, low)
select high, low
from A a
where a.bId is null
returning id as bId, a.id as aId
)
update A
set bId=(select bId from rows where id=rows.aId)
where id=rows.aId;
--commit;
rollback;
However, this fails with a cryptic error: ERROR: missing FROM-clause entry for table a.
Using a Postgres query, how can I achieve this?
either
update "A"
set "bId"=(select "bId" from rows where id=rows."aId")
without the where clause or
update "A"
set "bId"=(select "bId" from rows where id=rows."aId")
FROM rows
where "A".id=rows.aId;
I dont know if your tables realy have that names, as mentioned in the comments try to avoid uppercase tables and fieldnames and try to avoid reserved keynames.
I found a way to get it to work but I feel like it's not the most efficient.
begin transaction;
do $body$
declare
newId int4;
tempB record;
begin
create temp table TempAB (
High float8,
Low float8,
AID int4
);
insert into TempAB (High, Low, AId)
select high, low, id
from A
where bId is null;
for tempB in (select * from TempAB)
loop
insert into B (high, low)
values (tempB.high, tempB.low)
returning id into newId;
update A
set bId=newId
where id=tempB.AId;
end loop;
end $body$;
rollback;
--commit;

Function taking forever to run for large number of records

I have created the following function in Postgres 9.3.5:
CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
RETURNS text AS
$BODY
$Declare
result text;
BEGIN
select min(id) into result from table
where id_used is null and id_type = val2;
update table set
id_used = 'Y',
col1 = val1,
id_used_date = now()
where id_type = val2
and id = result;
RETURN result;
END;
$BODY$
LANGUAGE plpgsql VOLATILE COST 100;
When I run this function in a loop of over a 1000 or more records it just does freezing and just says "query is running". When I check my table nothing is being updated. When I run it for one or two records it runs fine.
Example of the function when being run:
select get_result('123','idtype');
table columns:
id character varying(200),
col1 character varying(200),
id_used character varying(1),
id_used_date timestamp without time zone,
id_type character(200)
id is the table index.
Can someone help?
Most probably you are running into race conditions. When you run your function a 1000 times in quick succession in separate transactions, something like this happens:
T1 T2 T3 ...
SELECT max(id) -- id 1
SELECT max(id) -- id 1
SELECT max(id) -- id 1
...
Row id 1 locked, wait ...
Row id 1 locked, wait ...
UPDATE id 1
...
COMMIT
Wake up, UPDATE id 1 again!
COMMIT
Wake up, UPDATE id 1 again!
COMMIT
...
Largely rewritten and simplified as SQL function:
CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
RETURNS text AS
$func$
UPDATE table t
SET id_used = 'Y'
, col1 = val1
, id_used_date = now()
FROM (
SELECT id
FROM table
WHERE id_used IS NULL
AND id_type = val2
ORDER BY id
LIMIT 1
FOR UPDATE -- lock to avoid race condition! see below ...
) t1
WHERE t.id_type = val2
-- AND t.id_used IS NULL -- repeat condition (not if row is locked)
AND t.id = t1.id
RETURNING id;
$func$ LANGUAGE sql;
Related question with a lot more explanation:
Atomic UPDATE .. SELECT in Postgres
Explain
Don't run two separate SQL statements. That is more expensive and widens the time frame for race conditions. One UPDATE with a subquery is much better.
You don't need PL/pgSQL for the simple task. You still can use PL/pgSQL, the UPDATE stays the same.
You need to lock the selected row to defend against race conditions. But you cannot do this with the aggregate function you head because, per documentation:
The locking clauses cannot be used in contexts where returned rows
cannot be clearly identified with individual table rows; for example
they cannot be used with aggregation.
Bold emphasis mine. Luckily, you can replace min(id) easily with the equivalent ORDER BY / LIMIT 1 I provided above. Can use an index just as well.
If the table is big, you need an index on id at least. Assuming that id is indexed already as PRIMARY KEY, that would help. But this additional partial multicolumn index would probably help a lot more:
CREATE INDEX foo_idx ON table (id_type, id)
WHERE id_used IS NULL;
Alternative solutions
Advisory locks May be the superior approach here:
Postgres UPDATE ... LIMIT 1
Or you may want to lock many rows at once:
How to mark certain nr of rows in table on concurrent access

Postgres: Update a query counter once a result is returned

I have a situation where a particular row can only be queried for a fixed number of times, say 1000. After which, it is made unavailable to that particular party permanently. Each query returns only 1 result i.e. LIMIT = 1.
I intend to implement this by having a counter that starts at 1000 and decrement with the number of times it gets queried.
Is there anyway where upon returning that result, that I am able to have its counter is immediately incremented?
This is as opposed to waiting for the result to be received by the application layer and then sending an UPDATE statement to increment the counter. Because between the time the result is returned till the time the UPDATE query is received, there can be another query.
You can use a CTE that does the SELECT and the UPDATE in one query:
CREATE TABLE foo(
id int primary key,
content text,
counter int default 0);
INSERT INTO foo(id, content, counter) VALUES(1, 'foo', default);
INSERT INTO foo(id, content, counter) VALUES(2, 'bar', default);
INSERT INTO foo(id, content, counter) VALUES(3, 'baz', default);
-- select the data and update the counter:
WITH step_1 AS (
SELECT * FROM foo WHERE counter < 5 ORDER BY id LIMIT 1 -- now you can use LIMIT as well
), step_2 AS (
UPDATE foo SET counter = foo.counter + 1 FROM step_1 WHERE foo.id = step_1.id RETURNING foo.*
)
SELECT * FROM step_2;
Unfortunately you can not create SELECT triggers in PostgreSQL . but you can achieve this by Transactions
testdb=# BEGIN;
SELECT something FROM Some_table WHERE <where_criteria>;
UPDATE Some_table SET value = value - 1 WHERE <where_criteria>
COMMIT;
That where_criteria should be same for both statements

Postgres group by update - slow query

I am using postgres 9.X.
I have two tables
Table A
(
id integer
);
Table B
(
id integer,
Value integer
);
Both table are indexed on id.
Table A can have duplicate ID's
Example:
Table A
ID
1
1
1
2
1
I intend to insert number of occurrences of ID into table B (This table has all the ID's that are in Table A, but value is 0 initially)
Table B
ID Value
1 4
2 1
3 0
4 0
Here is my SQL statement
update tableB set value = value + sq.total
from
( select id, count(*) as total from TableA group by id ) as sq
where sq.id = tableB.id;
With 3-10 Million entries in TableA, it is taking an awful amount of time. Is there a way I can optimize this query?
Do you need tableB to be initially populated? An INSERT...SELECT from tableA into an empty tableB (with no indexes on tableB) should be faster:
insert into tableb (id, value)
select id, count(*)
from tablea
group by id
and then add your indexes to tableB once the data is there.

In SQL Server 2000, how to delete the specified rows in a table that does not have a primary key?

Let's say we have a table with some data in it.
IF OBJECT_ID('dbo.table1') IS NOT NULL
BEGIN
DROP TABLE dbo.table1;
END
CREATE TABLE table1 ( DATA INT );
---------------------------------------------------------------------
-- Generating testing data
---------------------------------------------------------------------
INSERT INTO dbo.table1(data)
SELECT 100
UNION ALL
SELECT 200
UNION ALL
SELECT NULL
UNION ALL
SELECT 400
UNION ALL
SELECT 400
UNION ALL
SELECT 500
UNION ALL
SELECT NULL;
How to delete the 2nd, 5th, 6th records in the table? The order is defined by the following query.
SELECT data
FROM dbo.table1
ORDER BY data DESC;
Note, this is in SQL Server 2000 environment.
Thanks.
In short, you need something in the table to indicate sequence. The "2nd row" is a non-sequitur when there is nothing that enforces sequence. However, a possible solution might be (toy example => toy solution):
If object_id('tempdb..#NumberedData') Is Not Null
Drop Table #NumberedData
Create Table #NumberedData
(
Id int not null identity(1,1) primary key clustered
, data int null
)
Insert #NumberedData( data )
SELECT 100
UNION ALL SELECT 200
UNION ALL SELECT NULL
UNION ALL SELECT 400
UNION ALL SELECT 400
UNION ALL SELECT 500
UNION ALL SELECT NULL
Begin Tran
Delete table1
Insert table1( data )
Select data
From #NumberedData
Where Id Not In(2,5,6)
If ##Error <> 0
Commit Tran
Else
Rollback Tran
Obviously, this type of solution is not guaranteed to work exactly as you want but the concept is the best you will get. In essence, you stuff your rows into a table with an identity column and use that to identify the rows to remove. Removing the rows entails emptying the original table and re-populating with only the rows you want. Without a unique key of some kind, there just is no clean way of handling this problem.
As you are probably aware you can do this in later versions using row_number very straightforwardly.
delete t from
(select ROW_NUMBER() over (order by data) r from table1) t
where r in (2,5,6)
Even without that it is possible to use the undocumented %%LOCKRES%% function to differentiate between 2 identical rows
SELECT data,%%LOCKRES%%
FROM dbo.table1`
I don't think that's available in SQL Server 2000 though.
In SQL Sets don't have order but cursors do so you could use something like the below. NB: I was expecting to be able to use DELETE ... WHERE CURRENT OF but that relies on a PK so the code to delete a row is not as simple as I was hoping for.
In the event that the data to be deleted is a duplicate then there is no guarantee that it will delete the same row as CURRENT OF would have. However in this eventuality the ordering of the tied rows is arbitrary anyway so whichever row is deleted could equally well have been given that row number in the cursor ordering.
DECLARE #RowsToDelete TABLE
(
rowidx INT PRIMARY KEY
)
INSERT INTO #RowsToDelete SELECT 2 UNION SELECT 5 UNION SELECT 6
DECLARE #PrevRowIdx int
DECLARE #CurrentRowIdx int
DECLARE #Offset int
SET #CurrentRowIdx = 1
DECLARE #data int
DECLARE ordered_cursor SCROLL CURSOR FOR
SELECT data
FROM dbo.table1
ORDER BY data
OPEN ordered_cursor
FETCH NEXT FROM ordered_cursor INTO #data
WHILE EXISTS(SELECT * FROM #RowsToDelete)
BEGIN
SET #PrevRowIdx = #CurrentRowIdx
SET #CurrentRowIdx = (SELECT TOP 1 rowidx FROM #RowsToDelete ORDER BY rowidx)
SET #Offset = #CurrentRowIdx - #PrevRowIdx
DELETE FROM #RowsToDelete WHERE rowidx = #CurrentRowIdx
FETCH RELATIVE #Offset FROM ordered_cursor INTO #data
/*Can't use DELETE ... WHERE CURRENT OF as here that requires a PK*/
SET ROWCOUNT 1
DELETE FROM dbo.table1 WHERE (data=#data OR data IS NULL OR #data IS NULL)
SET ROWCOUNT 0
END
CLOSE ordered_cursor
DEALLOCATE ordered_cursor
To perform any action on a set of rows (such as deleting them), you need to know what identifies those rows.
So, you have to come up with criteria that identifies the rows you want to delete.
Providing a toy example, like the one above, is not particularly useful.
You plan ahead and if you anticipate this is possible you add a surrogate key column or some such.
In general you make sure you don't create tables without PK's.
It's like asking "Say I don't look both directions before crossing the road and I step in front of a bus."