I'm seeing a strange issue since upgrading to postgres 10 from 9.4. This parameterised update query used to work reliably, but now it regularly (but not always) fails to enforce the LIMIT clause
UPDATE tablename SET someValue = ? WHERE myKey IN (
SELECT myKey
FROM tablename
WHERE status = 'good'
ORDER BY timestamp
ASC LIMIT ? FOR UPDATE
)
try this :
SELECT ... FOR UPDATE SKIP LOCKED
limit ? for update skip locked is a feature added from 9.5.
with this concept, every thread just sees the records that are not locked for update by other threads and so no race occurs.
Related
Gorm doesn't seem to support the use of Limit with Update, and doesn't throw an error either.
resUpdate := tx.Model(&daos.Voucher{}).
Where("status = ?", models.VoucherStatusAvailable).
Limit(quantity).
Scan(&vouchers).
Update("status", models.VoucherStatusBooked)
This query will update every field in my DB from VoucherStatusAvailable to VoucherStatusBooked, without regards for the use of Limit.
Someone quickly addressed the subject here : https://penkovski.com/post/gorm-update-returning/
But their solution of putting the Limit in the Where clause :
resUpdate := tx.Model(&daos.Voucher{}).
Where("status = ? LIMIT ?", models.VoucherStatusAvailable, quantity).
Update("status", models.VoucherStatusBooked)
doesn't work when using gorm soft delete, since it put parentheses around the where clause, and the AND deleted_at = NULL filter after it.
Initially I was doing it with two separated query (a select then an update) but in this case there is a chance of conflict.
Does anyone have an idea of how I should do the selection of X members of the table, then update their status, without having a risk of conflict with the selected items ?
Edit : The DBMS I'm using is PostgreSQL
Different point were answered :
Postgre doesn't support LIMIT and ORDER BY, so I needed an alternative.
The right alternative seemed to be CTEs, as pointed out in a comment.
gorm support custom plugin, and there is a plugin that implement CTE : https://github.com/WinterYukky/gorm-extra-clause-plugin. Unfortunately, it isn't compatible with the Update clause.
So ultimately, the only solution I found is to do a CTE in raw SQL :
var updated []Voucher
resUpdate := tx.Raw(`WITH v AS (
SELECT * from vouchers WHERE status = ?
LIMIT ?
)
UPDATE vouchers SET status = ?
WHERE EXISTS (SELECT * FROM v WHERE vouchers.id = v.id)
RETURNING *`,
models.ArticleStatusAvailable, quantity, models.VoucherStatusBooked).
Scan(&updated)
I recently upgraded my postgres db from 9.5.4 to 10.7 and noticed some odd behavior with an existing query.
The trimmed down version looks like this:
update
mytable
set
job_id = 6
where
id in (
select * from
(
select id from
mytable
where job_id is null
limit 2
) x for update)
and job_id is null
I would expect the number of rows to be updated to equal 2, but instead it is updating all the records that match the subquery without the limit. If I remove the for update statement, or the matching job_id is null statement, the records updated does equal 2 as expected. Before we updated this query would update the correct number of rows.
Did some behavior in 10.x change?
I am using MemSql as my DB and I need to have SELECT ... FOR UPDATE functionality. However it is not supported in 6.5 version, which I am using. Is there any workaround for this problem?
My problem is as follows: multiple processes pick a single record (that has not been process yet) from the same table, do some job out of SQL code then do UPDATE for marking the record as processed. If I had a possibility to do SELECT ... FOR UPDATE then I could lock the record for assuring that only one process can pick it.
As a workaround that I can think of is using some LockToken column and do something like
UPDATE Tbl SET LockToken = 'a_unique_token' WHERE LockToken IS NULL LIMIT 1;
SELECT * FROM Tbl WHERE LockToken = 'a_unique_token';
but in this case I get
Error Code: 1749. Feature 'UPDATE...LIMIT must be constrained to a single partition' is not supported by MemSQL Distributed.
I could also do the job with LOCK TABLES, but according to this they are not supported as well.
Is there any workaround to this type of problem?
Yes, your workaround is a good idea. One way you could workaround that error is to pick a specific row to lock instead of using LIMIT 1, like UPDATE Tbl SET LockToken = 'a_unique_token' WHERE LockToken IS NULL and id = (select id from Tbl WHERE LockToken IS NULL limit 1). (Or you could use (select min(id) from Tbl WHERE LockToken IS NULL) or something similar to pick an id depending on what you want.) This should work well if you have an index on id.
Also, you could check out version 6.7 where select for update is now supported: https://docs.memsql.com/sql-reference/v6.7/select/.
I have something like this. With this part of code I detect if a vehicle stopped at least 5 minutes.
And works but, with a large amount of data, it starts to be slow.
I did a lot of tests and I'm sure that my problem is in the not exists block.
My table:
CREATE TABLE public.messages
(
id bigint PRIMARY KEY DEFAULT nextval('messages_id_seq'::regclass),
messagedate timestamp with time zone NOT NULL,
vehicleid integer NOT NULL,
driverid integer NOT NULL,
speedeffective double precision NOT NULL,
-- ... few nonsense properties
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.messages OWNER TO postgres;
CREATE INDEX idx_messages_1 ON public.messages
USING btree (vehicleid, messagedate);
And my query:
SELECT
*
FROM
messages m
WHERE
m.speedeffective > 0
and m.next_speedeffective = 0
and not exists( -- my problem
select id
from messages
where
vehicleid = m.vehicleid
and speedeffective > 5 -- I forgot this condition
and messagedate > m.messagedate
and messagedate <= m.messagedate + interval '5 minutes'
)
I can't figure out how to build the condition in a more performant way.
Edit DAY2:
I added a previous table like this to use in the second table:
WITH messagesx as (
SELECT
vehicleid,
messagedate
FROM
messages
WHERE
speedeffective > 5
)
and now works better. I think that I'm missing a little detail.
Typically, a 'NOT EXISTS' will slow down your query as it requires a full scan of the table for each of the outer rows. Try to incorporate the same functionality within a join (I'm trying to rewrite the query here, without knowing the table, so I might make a mistake here):
SELECT
*
FROM
messages m1
LEFT JOIN
messages m2
ON m1.vehicleid = m2.vehicleid AND m1.messagedate < m2.messagedate AND m1.messagedate <= m2.messagedate+interval '5 minutes'
WHERE
speedeffective > 0
and next_speedeffective = 0
and m2.vehicleid IS NULL
Take note that the NOT EXISTS is rewritten as the non-hit of the join condition.
Based on this answer: https://stackoverflow.com/a/36445233/5000827
and reading about NOT IN, NOT EXISTS and LEFT JOIN (where join is NULL)
For PostgreSQL, NOT EXISTS and LEFT JOIN are anti-join and works at the same way. (This is the reason why the #CountZukula answer's result is almost the same than mine)
The problem was on the kind of operation: Nest or Hash.
So, based on this: https://www.postgresql.org/docs/9.6/static/routine-vacuuming.html
PostgreSQL's VACUUM command has to process each table on a regular basis for several reasons:
To recover or reuse disk space occupied by updated or deleted rows.
To update data statistics used by the PostgreSQL query planner.
To update the visibility map, which speeds up index-only scans.
To protect against loss of very old data due to transaction ID wraparound or multixact ID wraparound.
I made a VACUUM ANALYZE to messages table and the same query works way fast.
So, with the VACUUM PostgreSQL can decide better.
Hello I have a simple table like that:
+------------+------------+----------------------+----------------+
|id (serial) | date(date) | customer_fk(integer) | value(integer) |
+------------+------------+----------------------+----------------+
I want to use every row like a daily accumulator, if a customer value arrives
and if doesn't exist a record for that customer and date, then create a new row for that customer and date, but if exist only increment the value.
I don't know how implement something like that, I only know how increment a value using SET, but more logic is required here. Thanks in advance.
I'm using version 9.4
It sounds like what you are wanting to do is an UPSERT.
http://www.postgresql.org/docs/devel/static/sql-insert.html
In this type of query, you update the record if it exists or you create a new one if it does not. The key in your table would consist of customer_fk and date.
This would be a normal insert, but with ON CONFLICT DO UPDATE SET value = value + 1.
NOTE: This only works as of Postgres 9.5. It is not possible in previous versions. For versions prior to 9.1, the only solution is two steps. For 9.1 or later, a CTE may be used as well.
For earlier versions of Postgres, you will need to perform an UPDATE first with customer_fk and date in the WHERE clause. From there, check to see if the number of affected rows is 0. If it is, then do the INSERT. The only problem with this is there is a chance of a race condition if this operation happens twice at nearly the same time (common in a web environment) since the INSERT has a chance of failing for one of them and your count will always have a chance of being slightly off.
If you are using Postgres 9.1 or above, you can use an updatable CTE as cleverly pointed out here: Insert, on duplicate update in PostgreSQL?
This solution is less likely to result in a race condition since it's executed in one step.
WITH new_values (date::date, customer_fk::integer, value::integer) AS (
VALUES
(today, 24, 1)
),
upsert AS (
UPDATE mytable m
SET value = value + 1
FROM new_values nv
WHERE m.date = nv.date AND m.customer_fk = nv.customer_fk
RETURNING m.*
)
INSERT INTO mytable (date, customer_fk, value)
SELECT date, customer_fk, value
FROM new_values
WHERE NOT EXISTS (SELECT 1
FROM upsert up
WHERE up.date = new_values.date
AND up.customer_fk = new_values.customer_fk)
This contains two CTE tables. One contains the data you are inserting (new_values) and the other contains the results of an UPDATE query using those values (upsert). The last part uses these two tables to check if the records in new_values are not present in upsert, which would mean the UPDATE failed, and performs an INSERT to create the record instead.
As a side note, if you were doing this in another SQL engine that conforms to the standard, you would use a MERGE query instead. [ https://en.wikipedia.org/wiki/Merge_(SQL) ]