I want to optimize the amount of calls my API makes to the database. But is it okay to let Postgres run in to an Unique Constraint error. For example when registering users I have two options:
from app.models import Users
from tortoise.exceptions import DoesNotExist
try:
await Users.get(email=email)
raise HTTPException(
status_code=HTTP_400_BAD_REQUEST, detail="User already exists"
)
except DoesNotExist:
user = await Users.create(email, hashed_pw)
This would make two calls to the database, but the exceptions would occur in Python. Note that no Error or Exception is thrown at the postgres end. Postgres simply returns nill, which at the python end gets interpreted as DoesNotExist. Another solution would be this:
from app.model import Users
from asyncpg.exceptions import UniqueViolationError
try:
user = await Users.create(email, hashed_pw)
except UniqueViolationError:
raise HTTPException(
status_code=HTTP_400_BAD_REQUEST, detail="User already exists"
)
This would only make a single database call, however an error would occur at the postgres database. Obviously it seems to me the second implementation would be more efficient, but is it okay to just create an exception at the postgres end?
Your first code would only work reliably if you have transactions or locking involved. Otherwise, if two requests to create a user with the same email (I know that this is unlikely in this case) hit your API, then both could get DoesNotExist as a result and would executed await Users.create(email, hashed_pw). That kind of pattern is often discouraged.
Also using exceptions for control flow is also something that is often seen as bad practice.
Your second solution is fine, there is no problem with trying to create an entry and use the expectation that is emitted by postgres to tell the request that the user already exists.
The first one would be fine if you have transactions and if you have a function like await Users.exists(email=email) that returns true or false.
Personally, I would prefer the second one, because the unique constraint already does the check you want to do, but you need to ensure that you actually check if the error message you get is about the unique key for email and not about some other error.
Related
I am developing an authentication system using express, So I have a unique email field in the database
should I check the email first and if it exists throw a new custom error Or let the database throw the error?
I want to know what is better
Consumers of your API don't and shouldn't know what kind of database you use.
The error that makes it back to them should encapsulate all of it and specifically tell them what is wrong in some standard format with a good HTTP status code.
Database-specific errors leaking to the user should usually be considered a bug.
Both.
You should write code to check that the email exists before you attempt the insert.
But if that check finds no email, you might still get an error, because of a race condition. For example, in the brief moment between checking for the email and then proceeding to insert the row, some other concurrent session may insert its own row using that email. So your insert will get a duplicate key error in that case, even though you had checked and found the email not present.
Then why bother checking? Because if you use a table with an auto_increment primary key, a failed insert generates and then discards an auto-increment value.
This might seem like a rare and insignificant amount of waste. Also, we don't care that auto-increment id's are consecutive.
But I did help fix an application for a customer where they had a problem that new users were trying 1500 times to create unique accounts before succeeding. So they were "losing" thousands of auto-increment id's for every account. After a couple of months, they exhausted the range of the signed integer.
The fix I recommended was to first check that the email doesn't exist, to avoid attempting the insert if the email is found. But you still have to handle the race condition just in case.
I am trying to wrap my head around the best way to approach this problem.
I am importing a file that contains bunch of users so I created a handler called
ImportUsersCommandHandler and my command is ImportUsersCommand that has List<User> as one of the parameters.
In the handler, for each user that I need to import I have to make sure that the UserType is valid, this is where the confusion comes in. I need to do a query against the database, to get list of all possible user types and than for each user I am importing, I want to verify that the user type id in the import matches one that is in the db.
I have 3 options.
Create a query GetUserTypesQuery and get the rest of this and then pass it on to the ImportUsersCommand as a list and verify inside the command handler
Call the GetUserTypesQuery from the command itself and not pass it (command calling another query)
Do not create a GetUsersTypeQuery and just do the query results within the command (still a query but no query/handler involved)
I feel like all these are dirty solutions and not the correct way to apply CQRS.
I agree option 1 sounds the best but would maybe suggest adding a pre handler to validate your input?
So ImportUsersCommandHandler deals with importing you data (and only that) and add a handler that runs before that validates (in your example, checks the user types and maybe other stuff) and bails out of it does not pass. So it queries the db, checks the usertypes and does whatever it needs to if it fails. Otherwise it just passes down to your business handler (ImportUsersCommandHandler).
I am used to using Mediatr in NET Core and this pattern works well (this is what we do) so sorry if this does not fit with your environment/setup!
I have been looking for a way to design my API so it will be idempotent, meaning that some of that is to make my POST request routes idempotent, and I stumbled upon this article.
(If I have understood something not the way it is, please correct me!)
In it, there is a good explanation of the general idea. but what is lacking are some examples of the way that he implemented it by himself.
Someone asked the writer of the article, how would he guarantee atomicity? so the writer added a code example.
Essentially, in his code example there are two cases,
the flow if everything goes well:
Open a transaction on the db that holds the data that needs to change by the POST request
Inside this transaction, execute the needed change
Set the Idempotency-key key and the value, which is the response to the client, inside the Redis store
Set expire time to that key
Commit the transaction
the flow if something inside the code goes wrong:
and exception inside the flow of the function occurs.
a rollback to the transaction is performed
Notice that the transaction that is opened is for a certain DB, lets call him A.
However, it is not relevant for the redis store that he also uses, meaning that the rollback of the transaction will only affect DB A.
So it covers the case when something happends inside the code that make it impossible to complete the transaction.
But what will happend if the machine, which the code runs on, will crash, while it is in a state when it has already executed the Set expire time to that key and it is now about to run the committing of the transaction?
In that case, the key will be available in the redis store, but the transaction has not been committed.
This will result in a situation where the service is sure that the needed changes have already happen, but they didn't, the machine failed before it could finish it.
I need to design the API in such a way that if the change to the data or setting of the key and value in redis fail, that they will both roll back.
What is the solution to this problem?
How can I guarantee the atomicity of a changing the needed data in one database, and in the same time setting the key and the needed response in redis, and if any of them fails, rollback them both? (Including in a case that a machine crashes in the middle of the actions)
Please add a code example when answering! I'm using the same technologies as in the article (nodejs, redis, mongo - for the data itself)
Thanks :)
Per the code example you shared in your question, the behavior you want is to make sure there was no crash on the server between the moment where the idempotency key was set into the Redis saying this transaction already happened and the moment when the transaction is, in fact, persisted in your database.
However, when using Redis and another database together you have two independent points of failure, and two actions being executed sequentially in different moments (and even if they are executed asynchronously at the same time there is no guarantee the server won’t crash before any of them completed).
What you can do instead is include in your transaction an insert statement to a table holding relevant information on this request, including the idempotent key. As the ACID properties ensure atomicity, it guarantees either all the statements on the transaction to be executed successfully or none of them, which means your idempotency key will be available in your database if the transaction succeeded.
You can still use Redis as it’s gonna provide faster results than your database.
A code example is provided below, but it might be good to think about how relevant is the failure between insert to Redis and database to your business (could it be treated with another strategy?) to avoid over-engineering.
async function execute(idempotentKey) {
try {
// append to the query statement an insert into executions table.
// this will be persisted with the transaction
query = ```
UPDATE firsttable SET ...;
UPDATE secondtable SET ...;
INSERT INTO executions (idempotent_key, success) VALUES (:idempotent_key, true);
```;
const db = await dbConnection();
await db.beginTransaction();
await db.execute(query);
// we're setting a key on redis with a value: "false".
await redisClient.setAsync(idempotentKey, false, 'EX', process.env.KEY_EXPIRE_TIME);
/*
if server crashes exactly here, idempotent key will be on redis with false as value.
in this case, there are two possibilities: commit to database suceeded or not.
if on next request redis provides a false value, query database to verify if transaction was executed.
*/
await db.commit();
// you can now set key value to true, meaning commit suceeded and you won't need to query database to verify that.
await redis.setAsync(idempotentKey, true);
} catch (err) {
await db.rollback();
throw err;
}
}
I have a scenario where 2 db connections might both run Model.find_or_initialize_by(params) and raise an error: PG::UniqueViolation: ERROR: duplicate key value violates unique constraint
I'd like to update my code so it could gracefully recover from it. Something like:
record = nil
begin
record = Model.find_or_initialize_by(params)
rescue ActiveRecord::RecordNotUnique
record = Model.where(params).first
end
return record
The trouble is that there's not a nice/easy way to reproduce this on my local machine, so I'm not confident that my fix actually works.
So I thought I'd get a bit creative and try calling create 2 times (locally) in a row which should raise then PG::UniqueViolation: ERROR, then I could rescue from it and make sure everything is handled gracefully.
But I get this error: PG::InFailedSqlTransaction: ERROR: current transaction is aborted, commands ignored until end of transaction block
I get this error even when I wrap everything in individual transaction blocks
record = nil
Model.transaction do
record = Model.create(params)
end
begin
Model.transaction do
record = Model.create(params)
end
rescue ActiveRecord::RecordNotUnique
end
Model.transaction do
record = Model.where(params).first
end
return record
My questions:
What's the right way to gracefully handle the race condition I mentioned at the very beginning of this post?
How do I test this locally?
I imagine there's probably something simple that I'm missing here, but it's late and perhaps I'm not thinking too clearly.
I'm running postgres 9.3 and rails 4.
EDIT Turns out that find_or_initialize_by should have been find_or_create_by and the errors I was getting was from the actual save call that happened later on in execution. #VeryTiredWhenIWroteThis
Has this actually happenend?
Model.find_or_initialize_by(params)
should never raise an ´ActiveRecord::RecordNotUnique´ error as it is not saving anything to db. It just creates a new ActiveRecord.
However in the second snippet you are creating records.
create (without bang) does not throw exceptions caused by validations, but
ActiveRecord::RecordNotUnique is always thrown in case of a duplicate by both create and create!
If you're creating records you don't need transactions at all. As Postgres being ACID compliant guarantees that only one of the both operations succeeds and if it responds so it's changes will be durable. (a single statement query against postgres is also a transaction). So your above code is almost fine if you replace through find_or_create_by
begin
record = Model.find_or_create_by(params)
rescue ActiveRecord::RecordNotUnique
record = Model.where(params).first
end
You can test if the code behaves correctly by simply trying to create the same record twice in row. However this will not test ActiveRecord::RecordNotUnique is actually thrown correctly on race conditions.
It's also no the responsibility of your app to test and testing it is not easy. You would have to start rails in multithread mode on your machine, or test against a multi process staging rails instance. Webrick for example handles only one request at a time. You can use puma application server, however on MRI there is no true concurrency (GIL). Threads only share the GIL only on IO blocking. Because talking to Postgres is IO, i'd expect some concurrent requests, but to be 100% sure, the best testing scenario would be to deploy on passenger with multiple workers and then use jmeter to run concurrent request agains the server.
When user is registering on website, e-mail needs to be provided which is unique. I've made unique index on schema's email attribute, so if I try to save the document in database, error with code 11000 will be returned. My question is, regarding to business layer and data layer, should I just pass the document to database and catch/check error codes which it returns or should I check if the user with that e-mail exists before? I've being told that data integrity should be checked before passing it to the database by the business layer, but I don't see the reason why should I do that since I believe that mongo would be much faster raising the exception itself since it has that index provided. The only disadvantage I see in error code checking is that error codes might change (but I could abstract them) and the syntax might be changed.
There is the practical matter of speed and the fragility of "check-then-set" systems. If you try and check if an email exists before you write the document keyed on email, there is a chance that between the time you check and the time you right the conditions of the unique index are met and your write fails anyhow. This is a classic race condition. Further, it takes 2 queries to do check-then-set but only 1 query to do the insert and handle the failure. In my application I am having success with just letting the failure occur and reacting to the result.
As #JamesWahlin says, it is the difference between dong this all in one or causing mixed results (along with the index check) from potential race conditions by adding the extra client read.
Definitely rely on the response of only insert from MongoDB here.