Where to handle errors on database in mongoDB? - mongodb

When user is registering on website, e-mail needs to be provided which is unique. I've made unique index on schema's email attribute, so if I try to save the document in database, error with code 11000 will be returned. My question is, regarding to business layer and data layer, should I just pass the document to database and catch/check error codes which it returns or should I check if the user with that e-mail exists before? I've being told that data integrity should be checked before passing it to the database by the business layer, but I don't see the reason why should I do that since I believe that mongo would be much faster raising the exception itself since it has that index provided. The only disadvantage I see in error code checking is that error codes might change (but I could abstract them) and the syntax might be changed.

There is the practical matter of speed and the fragility of "check-then-set" systems. If you try and check if an email exists before you write the document keyed on email, there is a chance that between the time you check and the time you right the conditions of the unique index are met and your write fails anyhow. This is a classic race condition. Further, it takes 2 queries to do check-then-set but only 1 query to do the insert and handle the failure. In my application I am having success with just letting the failure occur and reacting to the result.

As #JamesWahlin says, it is the difference between dong this all in one or causing mixed results (along with the index check) from potential race conditions by adding the extra client read.
Definitely rely on the response of only insert from MongoDB here.

Related

What is better let database throw an execption or throw custom execption

I am developing an authentication system using express, So I have a unique email field in the database
should I check the email first and if it exists throw a new custom error Or let the database throw the error?
I want to know what is better
Consumers of your API don't and shouldn't know what kind of database you use.
The error that makes it back to them should encapsulate all of it and specifically tell them what is wrong in some standard format with a good HTTP status code.
Database-specific errors leaking to the user should usually be considered a bug.
Both.
You should write code to check that the email exists before you attempt the insert.
But if that check finds no email, you might still get an error, because of a race condition. For example, in the brief moment between checking for the email and then proceeding to insert the row, some other concurrent session may insert its own row using that email. So your insert will get a duplicate key error in that case, even though you had checked and found the email not present.
Then why bother checking? Because if you use a table with an auto_increment primary key, a failed insert generates and then discards an auto-increment value.
This might seem like a rare and insignificant amount of waste. Also, we don't care that auto-increment id's are consecutive.
But I did help fix an application for a customer where they had a problem that new users were trying 1500 times to create unique accounts before succeeding. So they were "losing" thousands of auto-increment id's for every account. After a couple of months, they exhausted the range of the signed integer.
The fix I recommended was to first check that the email doesn't exist, to avoid attempting the insert if the email is found. But you still have to handle the race condition just in case.

How to avoid Mongo DB NoSQL blind (sleep) injection

While scanning my Application for vulnerability, I have got one high risk error i.e.
Blind MongoDB NoSQL Injection
I have checked what exactly request is sent to database by tool which performed scanning and found while Requesting GET call it had added below line to GET request.
{"$where":"sleep(181000);return 1;"}
Scan received a "Time Out" response, which indicates that the injected "Sleep" command succeeded.
I need help to fix this vulnerability. Can anyone help me out here? I just wanted to understand what I need to add in my code to perform this check before connecting to database?
Thanks,
Anshu
Similar to SQL injection, or any other type of Code Injection, don't copy untrusted content into a string that will be executed as a MongoDB query.
You apparently have some code in your app that naively accepts user input or some other content and runs it as a MongoDB query.
Sorry, it's hard to give a more specific answer, because you haven't shown that code, or described what you intended it to do.
But generally, in every place where you use external content, you have to imagine how it could be misused if the content doesn't contain the format you assume it does.
You must instead validate the content, so it can only be in the format you intend, or else reject the content if it's not in a valid format.

Creation Concurrency with CQRS and EventStore

Baseline info:
I'm using an external OAuth provider for login. If the user logs into the external OAuth, they are OK to enter my system. However this user may not yet exist in my system. It's not really a technology issue, but I'm using JOliver EventStore for what it's worth.
Logic:
I'm not given a guid for new users. I just have an email address.
I check my read model before sending a command, if the user email
exists, I issue a Login command with the ID, if not I issue a
CreateUser command with a generated ID. My issue is in the case of a new user.
A save occurs in the event store with the new ID.
Issue:
Assume two create commands are somehow issued before the read model is updated due to browser refresh or some other anomaly that occurs before consistency with the read model is achieved. That's OK that's not my problem.
What Happens:
Because the new ID is a Guid comb, there's no chance the event store will know that these two CreateUser commands represent the same user. By the time they get to the read model, the read model will know (because they have the same email) and can merge the two records or take some other compensating action. But now my read model is out of sync with the event store which still thinks these are two separate entities.
Perhaps it doesn't matter because:
Replaying the events will have the same effect on the read model
so that should be OK.
Because both commands are duplicate "Create" commands, they should contain identical information, so it's not like I'm losing anything in the event store.
Can anybody illuminate how they handled similar issues? If some compensating action needs to occur does the read model service issue some kind of compensation command when it realizes it's got a duplicate entry? Is there a simpler methodology I'm not considering?
You're very close to what I'd consider a proper possible solution. The scenario, if I may summarize, is somewhat like this:
Perform the OAuth-entication.
Using the read model decide between a recurring visitor and a new visitor, based on the email address.
In case of a new visitor, send a RegisterNewVisitor command message that gets handled and stored in the eventstore.
Assume there is some concurrency going on that, for the same email address, causes two RegisterNewVisitor messages, each containing what the system thinks is the key associated with the email address. These keys (guids) are different.
Detect this duplicate key issue in the read model and merge both read model records into one record.
Now instead of merging the records in the read model, why not send a ResolveDuplicateVisitorEmailAddress { Key1, Key2 } towards your domain model, leaving it up to the domain model (the codified form of the business decision to be taken) to resolve this issue. You could even have a dedicated read model to deal with these kind of issues, the other read model will just get a kind of DuplicateVisitorEmailAddressResolved event, and project it into the proper records.
Word of warning: You've asked a technical question and I gave you a technical, possible solution. In general, I would not apply this technique unless I had some business indicator that this is worth investing in (what's the frequency of a user logging in concurrently for the first time - maybe solving it this way is just a way of ignoring the root cause (flakey OAuth, no register new visitor process in place, etc)). There are other technical solutions to this problem but I wanted to give you the one closest to what you already have in place. They range from registering new visitors sequentially to keeping an in-memory projection of the visitors not yet in the read model.

How to get list of aggregates using JOliviers's CommonDomain and EventStore?

The repository in the CommonDomain only exposes the "GetById()". So what to do if my Handler needs a list of Customers for example?
On face value of your question, if you needed to perform operations on multiple aggregates, you would just provide the ID's of each aggregate in your command (which the client would obtain from the query side), then you get each aggregate from the repository.
However, looking at one of your comments in response to another answer I see what you are actually referring to is set based validation.
This very question has raised quite a lot debate about how to do this, and Greg Young has written an blog post on it.
The classic question is 'how do I check that the username hasn't already been used when processing my 'CreateUserCommand'. I believe the suggested approach is to assume that the client has already done this check by asking the query side before issuing the command. When the user aggregate is created the UserCreatedEvent will be raised and handled by the query side. Here, the insert query will fail (either because of a check or unique constraint in the DB), and a compensating command would be issued, which would delete the newly created aggregate and perhaps email the user telling them the username is already taken.
The main point is, you assume that the client has done the check. I know this is approach is difficult to grasp at first - but it's the nature of eventual consistency.
Also you might want to read this other question which is similar, and contains some wise words from Udi Dahan.
In the classic event sourcing model, queries like get all customers would be carried out by a separate query handler which listens to all events in the domain and builds a query model to satisfy the relevant questions.
If you need to query customers by last name, for instance, you could listen to all customer created and customer name change events and just update one table of last-name to customer-id pairs. You could hold other information relevant to the UI that is showing the data, or you could simply hold IDs and go to the repository for the relevant customers in order to work further with them.
You don't need list of customers in your handler. Each aggregate MUST be processed in its own transaction. If you want to show this list to user - just build appropriate view.
Your command needs to contain the id of the aggregate root it should operate on.
This id will be looked up by the client sending the command using a view in your readmodel. This view will be populated with data from the events that your AR emits.

How do I pretend duplicate values in my read database with CQRS

Say that I have a User table in my ReadDatabase (use SQL Server). In a regulare read/write database I can put like a index on the table to make sure that 2 users aren't addedd to the table with the same emailadress.
So if I try to add a user with a emailadress that already exist in my table for a diffrent user, the sql server will throw an exception back.
In Cqrs I can't do that since if I decouple the write to my readdatabas from the domain model, by puting it on an asyncronus queue I wont get the exception thrown back to me, and I will return "OK" to the UI and the user will think that he is added to the database, when infact he will never be added to the read database.
I can do a search in the read database checking if there is a user already in my database with the emailadress, and if there is one, then thru an exception back to the UI. But if they press the save button the same time, I will do 2 checks to the database and see that there isn't any user in the database with the emailadress, I send back that it's okay. Put it on my queue and later it will fail (by hitting the unique identifier).
Am I suppose to load all users from my EventSource (it's a SQL Server) and then do the check on that collection, to see if I have a User that already has this emailadress. That sounds a bit crazy too me...
How have you people solved it?
The way I can see is to not using an asyncronized queue, but use a syncronized one but that will affect perfomance really bad, specially when you have many "read storages" to write to...
Need some help here...
Searching for CQRS Set Based Validation will give you solutions to this issue.
Greg Young posted about the business impact of embracing eventual consistency http://codebetter.com/gregyoung/2010/08/12/eventual-consistency-and-set-validation/
Jérémie Chassaing posted about discovering missing aggregate roots in the domain http://thinkbeforecoding.com/post/2009/10/28/Uniqueness-validation-in-CQRS-Architecture
Related stack overflow questions:
How to handle set based consistency validation in CQRS?
CQRS Validation & uniqueness