ActiveRecord find_or_initialize_by race conditions

ActiveRecord find_or_initialize_by race conditions - postgresql

I have a scenario where 2 db connections might both run Model.find_or_initialize_by(params) and raise an error: PG::UniqueViolation: ERROR: duplicate key value violates unique constraint
I'd like to update my code so it could gracefully recover from it. Something like:
record = nil
begin
record = Model.find_or_initialize_by(params)
rescue ActiveRecord::RecordNotUnique
record = Model.where(params).first
end
return record
The trouble is that there's not a nice/easy way to reproduce this on my local machine, so I'm not confident that my fix actually works.
So I thought I'd get a bit creative and try calling create 2 times (locally) in a row which should raise then PG::UniqueViolation: ERROR, then I could rescue from it and make sure everything is handled gracefully.
But I get this error: PG::InFailedSqlTransaction: ERROR: current transaction is aborted, commands ignored until end of transaction block
I get this error even when I wrap everything in individual transaction blocks
record = nil
Model.transaction do
record = Model.create(params)
end
begin
Model.transaction do
record = Model.create(params)
end
rescue ActiveRecord::RecordNotUnique
end
Model.transaction do
record = Model.where(params).first
end
return record
My questions:
What's the right way to gracefully handle the race condition I mentioned at the very beginning of this post?
How do I test this locally?
I imagine there's probably something simple that I'm missing here, but it's late and perhaps I'm not thinking too clearly.
I'm running postgres 9.3 and rails 4.
EDIT Turns out that find_or_initialize_by should have been find_or_create_by and the errors I was getting was from the actual save call that happened later on in execution. #VeryTiredWhenIWroteThis

Has this actually happenend?
Model.find_or_initialize_by(params)
should never raise an ´ActiveRecord::RecordNotUnique´ error as it is not saving anything to db. It just creates a new ActiveRecord.
However in the second snippet you are creating records.
create (without bang) does not throw exceptions caused by validations, but
ActiveRecord::RecordNotUnique is always thrown in case of a duplicate by both create and create!
If you're creating records you don't need transactions at all. As Postgres being ACID compliant guarantees that only one of the both operations succeeds and if it responds so it's changes will be durable. (a single statement query against postgres is also a transaction). So your above code is almost fine if you replace through find_or_create_by
begin
record = Model.find_or_create_by(params)
rescue ActiveRecord::RecordNotUnique
record = Model.where(params).first
end
You can test if the code behaves correctly by simply trying to create the same record twice in row. However this will not test ActiveRecord::RecordNotUnique is actually thrown correctly on race conditions.
It's also no the responsibility of your app to test and testing it is not easy. You would have to start rails in multithread mode on your machine, or test against a multi process staging rails instance. Webrick for example handles only one request at a time. You can use puma application server, however on MRI there is no true concurrency (GIL). Threads only share the GIL only on IO blocking. Because talking to Postgres is IO, i'd expect some concurrent requests, but to be 100% sure, the best testing scenario would be to deploy on passenger with multiple workers and then use jmeter to run concurrent request agains the server.

Related

Should I code Postgres to run into an exception?

I want to optimize the amount of calls my API makes to the database. But is it okay to let Postgres run in to an Unique Constraint error. For example when registering users I have two options:
from app.models import Users
from tortoise.exceptions import DoesNotExist
try:
await Users.get(email=email)
raise HTTPException(
status_code=HTTP_400_BAD_REQUEST, detail="User already exists"
)
except DoesNotExist:
user = await Users.create(email, hashed_pw)
This would make two calls to the database, but the exceptions would occur in Python. Note that no Error or Exception is thrown at the postgres end. Postgres simply returns nill, which at the python end gets interpreted as DoesNotExist. Another solution would be this:
from app.model import Users
from asyncpg.exceptions import UniqueViolationError
try:
user = await Users.create(email, hashed_pw)
except UniqueViolationError:
raise HTTPException(
status_code=HTTP_400_BAD_REQUEST, detail="User already exists"
)
This would only make a single database call, however an error would occur at the postgres database. Obviously it seems to me the second implementation would be more efficient, but is it okay to just create an exception at the postgres end?

Your first code would only work reliably if you have transactions or locking involved. Otherwise, if two requests to create a user with the same email (I know that this is unlikely in this case) hit your API, then both could get DoesNotExist as a result and would executed await Users.create(email, hashed_pw). That kind of pattern is often discouraged.
Also using exceptions for control flow is also something that is often seen as bad practice.
Your second solution is fine, there is no problem with trying to create an entry and use the expectation that is emitted by postgres to tell the request that the user already exists.
The first one would be fine if you have transactions and if you have a function like await Users.exists(email=email) that returns true or false.
Personally, I would prefer the second one, because the unique constraint already does the check you want to do, but you need to ensure that you actually check if the error message you get is about the unique key for email and not about some other error.

Making POST requests idempotent

I have been looking for a way to design my API so it will be idempotent, meaning that some of that is to make my POST request routes idempotent, and I stumbled upon this article.
(If I have understood something not the way it is, please correct me!)
In it, there is a good explanation of the general idea. but what is lacking are some examples of the way that he implemented it by himself.
Someone asked the writer of the article, how would he guarantee atomicity? so the writer added a code example.
Essentially, in his code example there are two cases,
the flow if everything goes well:
Open a transaction on the db that holds the data that needs to change by the POST request
Inside this transaction, execute the needed change
Set the Idempotency-key key and the value, which is the response to the client, inside the Redis store
Set expire time to that key
Commit the transaction
the flow if something inside the code goes wrong:
and exception inside the flow of the function occurs.
a rollback to the transaction is performed
Notice that the transaction that is opened is for a certain DB, lets call him A.
However, it is not relevant for the redis store that he also uses, meaning that the rollback of the transaction will only affect DB A.
So it covers the case when something happends inside the code that make it impossible to complete the transaction.
But what will happend if the machine, which the code runs on, will crash, while it is in a state when it has already executed the Set expire time to that key and it is now about to run the committing of the transaction?
In that case, the key will be available in the redis store, but the transaction has not been committed.
This will result in a situation where the service is sure that the needed changes have already happen, but they didn't, the machine failed before it could finish it.
I need to design the API in such a way that if the change to the data or setting of the key and value in redis fail, that they will both roll back.
What is the solution to this problem?
How can I guarantee the atomicity of a changing the needed data in one database, and in the same time setting the key and the needed response in redis, and if any of them fails, rollback them both? (Including in a case that a machine crashes in the middle of the actions)
Please add a code example when answering! I'm using the same technologies as in the article (nodejs, redis, mongo - for the data itself)
Thanks :)

Per the code example you shared in your question, the behavior you want is to make sure there was no crash on the server between the moment where the idempotency key was set into the Redis saying this transaction already happened and the moment when the transaction is, in fact, persisted in your database.
However, when using Redis and another database together you have two independent points of failure, and two actions being executed sequentially in different moments (and even if they are executed asynchronously at the same time there is no guarantee the server won’t crash before any of them completed).
What you can do instead is include in your transaction an insert statement to a table holding relevant information on this request, including the idempotent key. As the ACID properties ensure atomicity, it guarantees either all the statements on the transaction to be executed successfully or none of them, which means your idempotency key will be available in your database if the transaction succeeded.
You can still use Redis as it’s gonna provide faster results than your database.
A code example is provided below, but it might be good to think about how relevant is the failure between insert to Redis and database to your business (could it be treated with another strategy?) to avoid over-engineering.
async function execute(idempotentKey) {
try {
// append to the query statement an insert into executions table.
// this will be persisted with the transaction
query = ```
UPDATE firsttable SET ...;
UPDATE secondtable SET ...;
INSERT INTO executions (idempotent_key, success) VALUES (:idempotent_key, true);
```;
const db = await dbConnection();
await db.beginTransaction();
await db.execute(query);
// we're setting a key on redis with a value: "false".
await redisClient.setAsync(idempotentKey, false, 'EX', process.env.KEY_EXPIRE_TIME);
/*
if server crashes exactly here, idempotent key will be on redis with false as value.
in this case, there are two possibilities: commit to database suceeded or not.
if on next request redis provides a false value, query database to verify if transaction was executed.
*/
await db.commit();
// you can now set key value to true, meaning commit suceeded and you won't need to query database to verify that.
await redis.setAsync(idempotentKey, true);
} catch (err) {
await db.rollback();
throw err;
}
}

Check if MongoDB mutation will succeed without actually executing it

I wonder how/if a MongoDB mutation can be simulated. By "simulated" I mean performing an insert, update or delete action without actually executing it. For example, I'd like to test if the uniqueness index will throw when trying to insert a duplicated value. I search for similar functionality to Ethereum estimate gas action which will throw on an invalid transaction before the transaction is actually sent to the network.

If you're using MongoDB 4.0 or newer, you can use transactions to simulate a dry run. Something like:
conn = pymongo.MongoClient()
with conn.start_session() as s:
s.start_transaction()
conn.test.test.insert_one({'_id':1}, session=s)
conn.test.test.delete_one({'_id':2}, session=s)
if ...dry run condition...:
s.abort_transaction()
else:
s.commit_transaction()
You can abort_transaction() for your dry run, or commit otherwise, like in a typical SQL style transaction. Similarly, a transaction will auto abort if it encounters any error.
Note that transactions require a replica set and MongoDB >= 4.0 to function. See the manual page on transactions for more details.

Showing warning to end user through postgres trigger without aborting transaction

I am trying to validate one field through postgres trigger.
If targeted field has value in decimals,i need to through a warning but allowing the user to save the record.
I tried with options
RAISE EXCEPTION,RAISE - USING
but it's throwing error on UI and transaction is aborted.
I tried with options
RAISE NOTICE,RAISE WARNING
through which warning is not shown and record is simply saved.
It would be great if any one help on this.
Thanks in Advance

You need to set client_min_messages to a level that'll show NOTICEs and WARNINGs. You can do this:
At the transaction level with SET LOCAL
At the session level with SET
At the user level with ALTER USER
At the database level with ALTER DATABASE
Globally in postgresql.conf
You must then check for messages from the server after running queries and display them to the user or otherwise handle them. How to do that depends on the database driver you're using, which you haven't specified. PgJDBC? libpq? other?
Note that raising a notice or warning will not cause the transaction to pause and wait for user input. You really don't want to do that. Instead RAISE an EXCEPTION that aborts the transaction. Tell the user about the problem, and re-run the transaction if they approve it, possibly with a flag set to indicate that an exception should not be raised again.
It would be technically possible to have a PL/Perlu, PL/Pythonu, or PL/Java trigger pause execution while it asked the client via a side-channel (like a TCP socket) to approve an action. It'd be a really bad idea, though.

continue insert when exception is raised in postgres

HI,
Iam trying to insert batch of records at a time when any of the record fails to insert i need to trap that record and log that to my failed record maintanance table and then the insert should continue. Kindly help on how to do this.

If using a Spring or EJB container there is a simple trick which works very well : provide a LogService witn a logWarning(String message) method. The method must be annotated/configured with the REQUIRES_NEW transaction setting.
If not then you'll have to simulate it using API calls. Open a different connection for the logging, when you enter the method begin the transaction, before leaving commit the transaction.
When not using transactions for the insert, there is actually nothing special you need to do, as by default most database run in autocommit and commit after every statement.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse