MemSql > workaround for SELECT ... FOR UPDATE - select

I am using MemSql as my DB and I need to have SELECT ... FOR UPDATE functionality. However it is not supported in 6.5 version, which I am using. Is there any workaround for this problem?
My problem is as follows: multiple processes pick a single record (that has not been process yet) from the same table, do some job out of SQL code then do UPDATE for marking the record as processed. If I had a possibility to do SELECT ... FOR UPDATE then I could lock the record for assuring that only one process can pick it.
As a workaround that I can think of is using some LockToken column and do something like
UPDATE Tbl SET LockToken = 'a_unique_token' WHERE LockToken IS NULL LIMIT 1;
SELECT * FROM Tbl WHERE LockToken = 'a_unique_token';
but in this case I get
Error Code: 1749. Feature 'UPDATE...LIMIT must be constrained to a single partition' is not supported by MemSQL Distributed.
I could also do the job with LOCK TABLES, but according to this they are not supported as well.
Is there any workaround to this type of problem?

Yes, your workaround is a good idea. One way you could workaround that error is to pick a specific row to lock instead of using LIMIT 1, like UPDATE Tbl SET LockToken = 'a_unique_token' WHERE LockToken IS NULL and id = (select id from Tbl WHERE LockToken IS NULL limit 1). (Or you could use (select min(id) from Tbl WHERE LockToken IS NULL) or something similar to pick an id depending on what you want.) This should work well if you have an index on id.
Also, you could check out version 6.7 where select for update is now supported: https://docs.memsql.com/sql-reference/v6.7/select/.

Related

How can I use a UNION statement or an OR statement inside psql's UPDATE command?

I have an app that vends a 'code' to users through an api. A code belongs to a pool of codes that when a user hits an endpoint, he/she will get a code from this 'pool'. At the moment there is only 1 'pool' of codes where a code can be vended. That idea is best expressed below in the following sql.
<<-SQL
UPDATE codes SET vended_at = NOW()
WHERE id = (
SELECT "codes"."id"
FROM "codes"
INNER JOIN "code_batches" ON "code_batches"."id" = "codes"."code_batch_id"
WHERE "codes"."vended_at" IS NULL
AND "code_batches"."active" = true
ORDER BY "code_batches"."end_at" ASC
FOR UPDATE OF "codes" SKIP LOCKED
LIMIT 1
)
RETURNING *;
SQL
So basically, when the end point is pinged, I am returning a code that is active and its vended_at field is NULL.
Now what I need to do is to build off of this sql so that a user can get a code from this pool or from a second pool. So for example, lets say that if the user couldn't get a code from this pool (we will call it A represented by the above sql), I need to vend a code from another pool (we will call it B).
I looked up the documentation of postgresql and I think what I want to do is to either 1). Use a UNION somehow to combine pools A and B into one megapool to vend a code or if I can't vend a code through pool A, use postgresql's OR clause to select from pool B.
The problem is that I can't seem to be able to use either of these syntaxes. I've tried something along the lines like this, tweaking it with different variations.
<<-SQL
UPDATE codes SET vended_at = NOW()
WHERE id = (
SELECT "codes"."id"
FROM "codes"
INNER JOIN "code_batches" ON "code_batches"."id" = "codes"."code_batch_id"
WHERE "codes"."vended_at" IS NULL
AND "code_batches"."active" = true
ORDER BY "code_batches"."end_at" ASC
FOR UPDATE OF "codes" SKIP LOCKED
LIMIT 1
) UNION (
######## SELECT SOME OTHER STUFF #########
)
RETURNING *;
SQL
or
<<-SQL
UPDATE codes SET vended_at = NOW()
WHERE id = (
SELECT "codes"."id"
FROM "codes"
INNER JOIN "code_batches" ON "code_batches"."id" = "codes"."code_batch_id"
WHERE "codes"."vended_at" IS NULL
AND "code_batches"."active" = true
ORDER BY "code_batches"."end_at" ASC
FOR UPDATE OF "codes" SKIP LOCKED
LIMIT 1
) OR (
######## SELECT SOME OTHER STUFF USING OR #########
)
RETURNING *;
SQL
So far the syntax is off and I'm starting to wonder if I can even use this approach for what I'm trying to do. I can't determine if my approach is wrong or if maybe I am using UNION, OR, and SUB-SELECTS wrong. Does anyone have any advice I can try to accomplish my goal? Thank you.
####### EDIT ########
To illustrate and make the concept even easier, I essentially want to do this.
<<-SQL
UPDATE codes SET vended_at = NOW()
WHERE id = (
CRITERIA 1
)
OR/UNION
(
CRITERIA 2
)
RETURNING *;
SQL
Use one table to store both pools.
Add a pool_number column to the codes table to indicate which pool the code is in, then just add
ORDER BY pool_number
to your existing query.

How can you use 'For update skip locked' in postgres without locking rows in all tables used in the query?

When you want to use postgres's SELECT FOR UPDATE SKIP LOCKED functionality to ensure that two different users reading from a table and claiming tasks do not get blocked by each other and also do not get tasks already being read by another user:
A join is being used in the query to retrieve tasks. We do not want any other table to have row-level locking except the table that contains the main info. Sample query below - Lock only the rows in the table -'task' in the below query
SELECT v.someid , v.info, v.parentinfo_id, v.stage FROM task v, parentinfo pi WHERE v.stage = 'READY_TASK'
AND v.parentinfo_id = pi.id
AND pi.important_info_number = (
SELECT MAX(important_info_number) FROM parentinfo )
ORDER BY v.id limit 200 for update skip locked;
Now if user A is retrieving some 200 rows of this table, user B should be able to retrieve another set of 200 rows.
EDIT: As per the comment below, the query will be changed to :
SELECT v.someid , v.info, v.parentinfo_id, v.stage FROM task v, parentinfo pi WHERE v.stage = 'READY_TASK'
AND v.parentinfo_id = pi.id
AND pi.important_info_number = (
SELECT MAX(important_info_number) FROM parentinfo) ORDER BY v.id limit 200 for update of v skip locked;
How best to place order by such that rows are ordered? While the order would get effected if multiple users invoke this command, still some order sanctity should be maintained of the rows that are being returned.
Also, does this also ensure that multiple threads invoking the same select query would be retrieving a different set of rows or is the locking only done for update commands?
Just experimented with this a little bit - multiple select queries will end up retrieving different set of rows. Also, order by ensures the order of the final result obtained.
Yes,
FOR UPDATE OF "TABLE_NAME" SKIP LOCKED
will lock only TABLE_NAME

Postgresql Update statement

I am currently operating with 2 tables: one is a live one and one is a stage one. Above code updates values in the live table using staging table as a source. It only updates values in column "firstname" if the row in the stage table already exists in the live table and some other simple criteria.
Update LiveTable
SET LiveTable.firstname = TestTable.firstname
FROM TestTable
WHERE EXISTS (SELECT 1 FROM LiveTable WHERE LiveTable.userid = TestTable.userid)
AND TestTable.firstname IS NOT NULL
AND LEN(TestTable.firstname) > len(LiveTable.firstname);
Above code jets the job done but takes quite some time. I was wondering if there is any faster way to do it.
I have tries to create FUNCTION to do the same thing, but was not able to get it to work.
Use a join between the two tables
Update LiveTable
SET LiveTable.firstname = TestTable.firstname
FROM TestTable
WHERE LiveTable.userid = TestTable.userid
AND TestTable.firstname IS NOT NULL
AND length(TestTable.firstname) > len(LiveTable.firstname);
The condition TestTable.firstname IS NOT NULL is not really needed because length(TestTable.firstname) > len(LiveTable.firstname) will filter out rows where firstname is null anyway. And it should be length() not len().

select from final table(update table) concurrent select i.e select from two threads

Due to a concurrency issue in my project which happend due to 2 threads coming in together to do a select at same time, both recieve the same values which ideally should not happen.
After selecting a value it should perform a update and then second thread should select the updated value.
Am using DB2
I thought of using this approach of using
select number from final table(update tablename set columnanme=""
where )
.
My question is would this approach lock the db when the other thread comes in to select the value as theere is an update within select? and solve my concurrency issue.
OR
I was browsing and found another approach
update table (.....) select col from table where wait for
outcome
Would this select wait until the first thread finishes the select?
One thing you can certainly do to avoid multiple reads of the same value before it gets updated by one of the readers:
LOCK TABLE tablename IN EXCLUSIVE MODE;
SELECT id, ... FROM tablename WHERE ...;
UPDATE tablename SET id=newval WHERE ...;
COMMIT;
This will of course block the full table, which is maybe not what you want!
Alternative approach (relatively standard, but somewhat more involved programming logic):
SELECT id, ... FROM tablename WHERE ...
SELECT count(1) FROM FINAL TABLE (UPDATE tablename SET id=newval
WHERE ... AND id=newval);
while this count(1) is zero (meaning: someone else meanwhile updated
it) repeat from 1)
--Peter Vanroose,
ABIS Training & Consulting,
Leuven, Belgium.

Postgresql Increment if exist or Create a new row

Hello I have a simple table like that:
+------------+------------+----------------------+----------------+
|id (serial) | date(date) | customer_fk(integer) | value(integer) |
+------------+------------+----------------------+----------------+
I want to use every row like a daily accumulator, if a customer value arrives
and if doesn't exist a record for that customer and date, then create a new row for that customer and date, but if exist only increment the value.
I don't know how implement something like that, I only know how increment a value using SET, but more logic is required here. Thanks in advance.
I'm using version 9.4
It sounds like what you are wanting to do is an UPSERT.
http://www.postgresql.org/docs/devel/static/sql-insert.html
In this type of query, you update the record if it exists or you create a new one if it does not. The key in your table would consist of customer_fk and date.
This would be a normal insert, but with ON CONFLICT DO UPDATE SET value = value + 1.
NOTE: This only works as of Postgres 9.5. It is not possible in previous versions. For versions prior to 9.1, the only solution is two steps. For 9.1 or later, a CTE may be used as well.
For earlier versions of Postgres, you will need to perform an UPDATE first with customer_fk and date in the WHERE clause. From there, check to see if the number of affected rows is 0. If it is, then do the INSERT. The only problem with this is there is a chance of a race condition if this operation happens twice at nearly the same time (common in a web environment) since the INSERT has a chance of failing for one of them and your count will always have a chance of being slightly off.
If you are using Postgres 9.1 or above, you can use an updatable CTE as cleverly pointed out here: Insert, on duplicate update in PostgreSQL?
This solution is less likely to result in a race condition since it's executed in one step.
WITH new_values (date::date, customer_fk::integer, value::integer) AS (
VALUES
(today, 24, 1)
),
upsert AS (
UPDATE mytable m
SET value = value + 1
FROM new_values nv
WHERE m.date = nv.date AND m.customer_fk = nv.customer_fk
RETURNING m.*
)
INSERT INTO mytable (date, customer_fk, value)
SELECT date, customer_fk, value
FROM new_values
WHERE NOT EXISTS (SELECT 1
FROM upsert up
WHERE up.date = new_values.date
AND up.customer_fk = new_values.customer_fk)
This contains two CTE tables. One contains the data you are inserting (new_values) and the other contains the results of an UPDATE query using those values (upsert). The last part uses these two tables to check if the records in new_values are not present in upsert, which would mean the UPDATE failed, and performs an INSERT to create the record instead.
As a side note, if you were doing this in another SQL engine that conforms to the standard, you would use a MERGE query instead. [ https://en.wikipedia.org/wiki/Merge_(SQL) ]