Handling simultaneous user registrations in mongoDB? - mongodb

I have a mongodb collection of registered users with index on the userID field. Every time an user tries to register, a lookup is done on the existing user IDs to check if the user ID chosen by the registering user is available or not. I was just wondering what happens when two users enter the same userID for registration at the same time and the lookup is done at the same time. Would both of them end up having the same userID? Does mongodb handle such a scenario on its own? One of the purposes of the unique userID would be to give each user an URL based on the userID.
I'll be using the PyMongo module.

Preventing duplicate usernames is an example of Concurrency Control, a broad area which has many issues and many ways in which databases and apps can be designed to avoid problems.
In the case of a collection of users where you are concerned to avoid duplicate userIDs, I would suggest the following design pattern:
Create a unique index on the userID field
In your app, make it check the response when creating a user; if it gets a Duplicate Key Error, then it knows that the userID has been taken, and it must ask the user to choose a different userID
Other approaches are also possible; for example you could have the database assign the userIDs, which would be a different way to guarantee uniqueness.

One transaction will occur first the UserID should be a primary key in this entity, and this will prevent the same userID from being re-used.

Related

should I create a seperate model (collection) for this?

i am building a small web app with MERN, i have a collection that holds "name, email, password, avatar url, and date" and i am going to add to the users some info like a "bio, hobbies(array), "visited countries(array), and another array"
question is, should i create a diffrent model for the users info, and add owner field that refers to the other model?. or should i put all of them there,
also i might add the following and followers option in the future.
The user's info should be in the user collection, I could see there is no reason to have a separate collection for it. If you want to reduce the responses from listing users, you could use populate to remove unnecessary fields.
Regards to the following and followers, I think there are 2 approaches:
Adding a new field which used to store id and necessary metadata (name, avatar) of users to the existing collection
Create a new collection which is a combination of users and users they are following, or are followed. You then could use Virtual to get this information from the User collection.
Personally, I prefer the first approach although it requires more effort to maintain the list to be accurate. E.g remove an item out of the list when your follower stops following you.

Redis hash usage as table

I want to use redis like Nosql database and I have some idea like below.
Assume that I have 3 table
1 - user
2 - post
3 - comment
I create hash for each table like below
hset user _usr_100 {"id":"_usr_100","name":"john","username"="jhn","age":25}
hset user _usr_101 {"id":"_usr_101","name":"adam","username"="adm","age":26}
hset user _usr_102 {"id":"_usr_102","name":"eric","username"="erc","age":27}
hset post _post_100 {"id":"_post_100","title":"title","content":"testpost","userid"="_usr_100"}
hset post _post_101 {"id":"_post_101","title":"title","content":"testpost","userid"="_usr_101"}
hset post _post_102 {"id":"_post_102","title":"title","content":"testpost","userid"="_usr_102"}
hset comment _comment_100 {"id":"_comment_100","content":"testpost","userid"="_usr_100","postid":"_post_100"}
hset comment _comment_101 {"id":"_comment_101","content":"testpost","userid"="_usr_101","postid":"_post_101"}
hset comment _comment_102 {"id":"_comment_102","content":"testpost","userid"="_usr_102","postid":"_post_102"}
When I want get user(_user_100) from redis
hget user _usr_100
{"id":"_usr_100","name":"john","username"="jhn","age":25}
When I want get users
hgetall user
{"id":"_usr_100","name":"john","username"="jhn","age":25}
{"id":"_usr_101","name":"adam","username"="adm","age":26}
{"id":"_usr_102","name":"eric","username"="erc","age":27}
Afer deserialize json string one by pne and fill them in list , I have List so I can do some operation (search,groupby,order,pagination ...) and I can do same thing for another hashes(post,comment)
I can delete,update user with;
hdel user _usr_101 // deleted _usr_101
hset user _usr_100 {"id":"_usr_100","name":"john","username"="jhn","age":26} //updated age
hset user _usr_103 {"id":"_usr_103","name":"max","username"="max","age":15} //new user
hgetall user
{"id":"_usr_100","name":"john","username"="jhn","age":26}
{"id":"_usr_102","name":"eric","username"="erc","age":27}
{"id":"_usr_103","name":"max","username"="max","age":15}
What can be disadvantage of this usage?Can you suggest another idea about hash to use redis like nosql tables.
Depending on your business rules/model, this option "may" work but it may not be the best/near the best solution for your domain. Using key/value store in the need of mostly relational domain cause you to make tradeoffs which may be disadvantage for you.
When your user class has new fields and this fields needed to be queried then you need to create more "space" to reduce the "time". You keep denormalizing your data to just achieve a single query. You will try to implement your relational database in the key/value store world. When you just need to update your user 101 with a simple statement;
UPDATE users SET username = 'mynewusername' where id = 101;
In your case you will need to find all related keys/fields through all hash/set/lists and update them for the data integrity. Keeping age as a field may be a bad idea, you will need to use birthday or and if your business needs to fetch list of users's whose birthday is today then you need to create new keys, duplicate most of your data, migrate all your existing users to there to just get the today's birthdays. It's better to keep that in mind, you need to query by day and month to get birthdays - which means that you have to keep users in separate sets such as users:birthday:01:01, users:birthday:02:05, users:birthday:11:08 to fetch them. If the users wants to update their birthday(depending on the business) then you need to manually move users between those sets while updating the other sets too.
Adding active/passive to users will be another pain. I am not sure whether you need to get all users, you may need to paginate them and while using hash - it will be hard, You will need another another sorted set/list to gain that.
Same goes for comments of posts of the users, last 25 comments of the user, most recent comments of the users who have the most posts or searching through posts of users etc etc. Your product manager will come with the idea, let's add tag to each post and you will need to relate this into your data model with new data structures.
These are relational data, it is better to keep them relational. When you start modeling your data in non-relational database all the elasticity rdbms provide you will be gone and it will be replaced with complexity on both data and application layer.
A single postgresql may boost you far better than redis in this problem. Redis has excellent features to solve problems but user/post/comment is not one of them.
This post may provide some insights too

To relate one record to another in MongoDB, is it ok to use a slug?

Let's say we have two models like this:
User:
_ _id
- name
- email
Company:
- _id
_ name
_ slug
Now let's say I need to connect a user to the company. A user can have one company assigned. To do this, I can add a new field called companyID in the user model. But I'm not sending the _id field to the front end. All the requests that come to the API will have the slug only. There are two ways I can do this:
1) Add slug to relate the company: If I do this, I can take the slug sent from a request and directly query for the company.
2) Add the _id of the company: If I do this, I need to first use the slug to query for the company and then use the _id returned to query for the required data.
May I please know which way is the best? Is there any extra benefit when using the _id of a record for the relationship?
Agree with the 2nd approach. There are several issues to consider when deciding on which field to use as a join key (this is true of all DBs, not just Mongo):
The field must be unique. I'm not sure exactly what the 'slug' field in your schema represents, but if there is any chance this could be duplicated, then don't use it.
The field must not change. Strictly speaking, you can change a key field but the only way to safely do so is to simultaneously change it in all the child tables atomically. This is a difficult thing to do reliably because a) you have to know which tables are using the field (maybe some other developer added another table that you're not aware of) b) If you do it one at a time, you'll introduce race conditions c) If any of the updates fail, you'll have inconsistent data and corrupted parent-child links. Some SQL DBs have a cascading-update feature to solve this problem, but Mongo does not. It's a hard enough problem that you really, really don't want to change a key field if you don't have to.
The field must be indexed. Strictly speaking this isn't true, but if you're going to join on it, then you will be running a lot of queries on it, so you'll need to index it.
For these reasons, it's almost always recommended to use a key field that serves solely as a key field, with no actual information stored in it. Plenty of people have been burned using things like Social Security Numbers, drivers licenses, etc. as key fields, either because there can be duplicates (e.g. SSNs can be duplicated if people are using fake numbers, or if they don't have one), or the numbers can change (e.g. drivers licenses).
Plus, by doing so, you can format the key field to optimize for speed of unique generation and indexing. For example, if you use SSNs, you need to check the SSN against the rest of the DB to ensure it's unique. That takes time if you have millions of records. Similarly for slugs, which are text fields that need to be hashed and checked against an index. OTOH, mongoDB essentially uses UUIDs as keys, which means it doesn't have to check for uniqueness (the algorithm guarantees a high statistical likelihood of uniqueness).
The bottomline is that there are very good reasons not to use a "real" field as your key if you can help it. Fortunately for you, mongoDB already gives you a great key field which satisfies all the above criteria, the _id field. Therefore, you should use it. Even if slug is not a "real" field and you generate it the exact same way as an _id field, why bother? Why does a record have to have 2 unique identifiers?
The second issue in your situation is that you don't expose the company's _id field to the user. Intuitively, it seems like that should be a valuable piece of information that shouldn't be given out willy-nilly. But the truth is, it has no informational value by itself, because, as stated above, a key should have no actual information. The place to implement security is in the query, ensuring that the user doing the query has permission to access the record / specific fields that she's asking for. Hiding the key is a classic security-by-obscurity that doesn't actually improve security.
The only time to hide your primary key is if you're using a poorly thought-out key that does contain useful information. For example, an invoice Id that increments by 1 for each invoice can be used by someone to figure out how many orders you get in a day. Auto-increment Ids can also be easily guessed (if my invoice is #5, can I snoop on invoice #6?). Fortunately, Mongo uses UUIDs so there's really no information leaking out (except maybe for timing attacks on its cryptographic algorithm? And if you're worried about that, you need far more in-depth security considerations than this post :-).
Look at it another way: if a slug reliably points to a specific company and user, then how is it more secure than just using the _id?
That said, there are some instances where exposing a secondary key (like slugs) is helpful, none of which have to do with security. For example, if in the future you need to migrate DB platforms and need to re-generate keys because the new platform can't use your old ones; or if users will be manually typing in identifiers, then it's helpful to give them something easier to remember like slugs. But even in those situations, you can use the slug as a handy identifier for users to use, but in your DB, you should still use the company ID to do the actual join (like in your option #2). Check out this discussion about the pros/cons of exposing _ids to users:
https://softwareengineering.stackexchange.com/questions/218306/why-not-expose-a-primary-key
So my recommendation would be to go ahead and give the user the company Id (along with the slug if you want a human-readable format e.g. for URLs, although mongo _ids can be used in a URL). They can send it back to you to get the user, and you can (after appropriate permission checks) do the join and send back the user data. If you don't want to expose the company Id, then I'd recommend your option #2, which is essentially the same thing except you're adding an additional query to first get the company Id. IMHO, that's a waste of cycles for no real improvement in security, but if there are other considerations, then it's still acceptable. And both of those options are better than using the slug as a primary key.
Second way of approach is the best,That is Add the _id of the company.
Using _id is the best way of practise to query any kind of information,even complex queries can be solved using _id as it is a unique ObjectId created by Mongodb. Population is the process of automatically replacing the specified paths in the document with document(s) from other collection(s). We may populate a single document, multiple documents, plain object, multiple plain objects, or all objects returned from a query.

Is storing documents ID in data- attributes a good practice?

In most of my apps, I need to store ID on data attributes to perform CRUD operations on specific elements of the DOM.
Indeed, my elements don't necessarily match specific criteria, or share multiple criteria, so the only way I have to delete them (for example when users clicks on it) is to store their ID in a data-id attribute and then send it to my server.
I use socket.io a lot.
Is that a good practice?
This is good practice. I don't think there is a better attribute to store this identifying data than data-id. You need some unique identifier for the document so the server knows which document the user wants to interact with when performing update/delete operations.
As long as your document is properly validated on the server side, i.e. before deleting/updating you check to make sure that the user in the session has authority to perform valid actions, there is no security risk of exposing the document _ids.

Retrieve records in mongoDB using bidirectional query

I have two collections - Tickets and Users. Where a user can have one to many tickets. The ticket collection is defined as follows
Ticket = {_id, ownerId, profile: {name}}
The ownerId is used to find all tickets that belong to a specific person. I need to write a query that gets me all users with no tickets.
How can i write this query without having to loop through all users, checking if the userID shows up in any Tickets?
Would a bidirectional storage cause me any performance problems ? For example, if i were to change my users collection and add an array of tickets: [ticketID, ticketID2, ...]?
I'd go with the array of tickets being stored in users. As far as I know, Mongo doesn't really have a way to query one collection based on the (lack of) elements in another collection. With the array, though, you can simply do db.users.find({tickets:[]}).