Looking for pseudo code of best/clean way to create and check unique room "names" for every chat between two users using socket.io/react.js/mongodb - mongodb

my flow:
User A selects user B in the user list:
system needs to check if a room for these two users exists, if not create unique room name and then join both users to the room
if exists, then just join users to the room they were already in and populate the chat with previous msges
Now what I am stuck at is how to exactly do it. Few options I am playing with in my head:
a) First how do i create the unique name that ties both users? Sure I can use string combination for both users, for example user A clicks user B --> "A&B", but this won't work when user B clicks user A, because that will be "B&A". I am struggling with creating dynamic unique names that could be applied to both.
b) do I keep an array with the two users info in the specific room saved in DB, and then check the array if user exists in it already? if so just use that room id as the room name? What is the best flow to save created rooms? Do i save by room name, which I guess would act as unique Id as well?
c) should I be checking the DB EVERYTIME user clicks another user to start a chat just to check if a room exists or not?
I know how to create rooms and all that jazz but what I am really struggling with is how to dynamically create room names so that its the same whether A clicks B or B clicks A and how to from a pseudo code level, store created rooms in DB and check for many users.

Here's an idea: Store the room in your database as a document that contains fields user1 and user2, which will contain the IDs of these users. Specifically, ensure that user1 < user2. When you need to query for this document later, you can do db.rooms.findOne({user1: smallerId, user2: largerId}). Then you can either store the room name and not use it in your queries, or you can even generate the displayed room name dynamically at runtime.
This has the benefit of not only guaranteeing the structure of a room document, but making your queries more efficient as well (you're comparing binary vs. comparing strings). There's also the benefit of not breaking the query when a user's name changes.
In general it's recommended that a document A that's associated with a different document B should refer to document B by an immutable ID, rather than by a mutable name. In this case since a room is associated with two users, have room refer to each user's ID.

Related

Best way to structure data in datastore with relationships

I'm trying to get my grips around the structuring of Datastore. Sorry in advance for the n00bness but I really can't get my head around it...
Consider a typical dating app which would use google cloud's datastore as a "database".
Suppose we have:
users
photos
swipes
matches
In a typical SQL style database I might choose to have (amongst other things):
A user table with a primary key of id
A photo table with a foreign key of user_id to link to the user whose photo it is
A swipe table for each individual swipe by any user against any other user it would have 2 foreign keys, swiper and swiped
a match table where we add a new entry if 2 people both swipe right on each other.
Would you structure it similarly in datastore by having those 4 entity types? If so how do you deal with the "foreign keys"?
Or would you nest some of those within a document e.g each user has a list of photos nested within it, or a list of all their swipes/matches, ensuring that both users in a match have that reflected?
I typically prefer interacting with Datastore via the ndb library (if you're not using Python, you might still get an idea of how to apply the following suggestion in your own runtime).
Using the ndb library, I could potentially see myself going with what you have where the 'foreign keys' you refer to would be implemented as a ndb.KeyProperty.
A very rough profile of what you might have would be
# This is your user KIND
class User(ndb.Model):
....
# This is your photo KIND
class Photo(ndb.Model):
# This ties each record in Photo to a person in your User table
user = ndb.KeyProperty(kind="User", required=True)
....
# This is your Swipes KIND
class Swipes(ndb.Model):
# This ties each swiper to a person in your User table
swiper = ndb.KeyProperty(kind="User", required=True)
# This ties each swiped to a person in your User table
swiped = ndb.KeyProperty(kind="User", required=True)
....
From the UI, you can click on any column defined as ndb.KeyProperty and it will open up the underlying record. In your code, you can directly run a query (do a GET) to return the details of the underlying record.
Let's say you used the above model, then to run queries where you wish to list photos and the name of the user who posted them or to list swipes and who swiped to who, I would use tasklets to have the queries run concurrently & asynchronously. See the tasklets section on the old Python2 NDB documentation for a worked example

How does MongoDB keep data in sync

Lets say I have a social media app. There is a Group model that has a field called invitedUsers which is simply an array of user ids that are a part of that group.
On my backend I have a route that a user hits to join that Group. In that route I do something like the following:
group.invitedUsers = lodash.concat(group.invitedUsers || [], userId)
group.save()
where group is the group that the user wants to join and userId is the id of the user that wants to join the group. Upon save everything is updated properly and the user is now a part of the group.
But what happens if two users hit the route at exactly the same time? How does MongoDB ensure that the group will always have both users ids added via the above method. Is there not a chance that group.invitedUsers could be referencing a stale value if both these group.save() are being triggered around the same time?

is it possible with optimistic concurrency ensure this case?

I have a table that has a long column that is a GroupCode. I can have groups of products, so to get all the product of a group I just get all the products which GroupCode is the same.
I can change a product from one group to another, and if I change a product from a group, I want that all the products of the group change to the new group.
If I use optimistic concurrency, it could happen this:
One user wants to change a product from a group, so he gets all the products with the same groupCode. Set the new new groupCode to all this products.
A second user add a new product to the group. But the first user doesn't have this product because he got all the products before the second user add the new product.
So at the end, a new product has a wrong GroupCode, because the code is not correct because all the products of the group was change to the new group. So I would have a group with only one product, and it wouldn't be correct.
With pessimistic concurrency, the first use get all the products of the group, block all the products.
The second user try to add a new product to the group, to do that, first try to get one of the products of the group as reference product, but how it is blocked by the first user, the second user has to wait.
The first user changes all the products to the new group and unblock all the products.
The second user get the reference product, that has the new groupCode, so the new product is added to the correct group.
In summary, I want that when I change a product from one group to another, I want to change all the products of the group, and avoid that a new product belongs to the old group.
Is it possible to solve this case with optimistic concurrency? Or I have to use pessimistic concurrency?
I honestly don't see the issue here. If you want to implement it as OCC, you should just follow the OCC phases.
User A gets all records which belong to group ABC
User B gets a reference to Record1, which belongs to ABC at the moment
User A moves Record1 to group XYZ
User B wants to add a new record to the group to which Record1 belongs. So just before inserting the record, get the group of Record, which is now XYZ
This is assuming that you go with the 'referential record' approach. If your screen (or whatever) just lists the currently available groups, and meanwhile one of those groups becomes empty (because you have moved all records to another group), there is no way of telling if that's a concurrency issue or it is working as expected. In such case, you should normalize your database and split the categories into a separate table, so that at least the user gets an error that the group no longer exists.

Modeling hierarchical data with authentication using DynamoDB

I'm looking for some best practices when it comes to modeling confidential hierarchical data in general and specifically with DynamoDB.
The scenario is best explained with an example:
Let's say we have a number of users. Each user has a number of products. Each product consists of a number of parts.
Typical use cases:
List all products for a given user
List all parts for a given product
So far I have modeled this in DynamoDB like this:
Users
----------------
HashKey: UserId
Products
-------------------
HashKey: UserId
RangeKey: ProductId
Parts
-------------------
HashKey: ProductId
RangeKey: PartId
The data is confidential and accessed through authenticated REST endpoints where an authentication token can be mapped to a UserId. Each user may be allowed to view other users' data through some group concept.
Listing all products for a given user is simple since UserId is a key in the products table:
GET /users/111/products becomes a simple Query(Table=Products, UserId=111)
But consider the case of listing all parts for a given product:
GET /users/111/products/222/parts
If I simply do a Query(Table=Parts, ProductId=222) then I will get the desired data fast, but I am not protecting against other users querying for data belonging to user 111, provided they somehow know about ProductId 222 (in reality, ID:s will of course be UUID:s or similar so not so easily guessable):
GET /users/119/products/222/parts
... would result in malicious user 119 retrieving data that doesn't belong to him, provided nothing is done to address this.
So here I imagine I need to do something like one of these:
First make another query to make sure product 222 in fact belongs to the given user
Duplicate the UserId in the Parts table and include it in the query condition (which basically means it will match either all rows or no rows when scanning through the set identified by ProductId): Query(Table=Parts, ProductId=222, UserId=111)
Use UserId as the hash key also in the Parts table and instead keep ProductId as a secondary index
Use a composite HashKey such as UserId_ProductId ("111_222") on the Parts table
If I need to return a 401 as opposed to just empty data, option 1 seems like the only approach. But if we imagine a deeper hierarchy of data, e.g. "users having inboxes having messages having parts having attachments" it seems this approach could eventually be expensive (listing all attachments for part P might result in a query to check that part P belongs to message M, that message M belongs to inbox I and that inbox I belongs to user U, and so on).
Does anyone have any good arguments for which approach is most favorable? Or am I doing something stupid and should be modeling my data in some other way completely?

How do you store and display if a user has voted or not on something?

I'm working on a voting site and I'm wondering how I should handle votes.
For example on SO when you vote for a question (or answer) your vote is stored, and each time I go back on the page I can see that I already voted for this question because the up/down button are colored.
How do you do that? I mean I've several ideas but I'm wondering if it won't be an heavy load for the database.
Here is my ideas:
Write an helper which will check for every question if a voted has been casted
That's means that the number of queries will depends on the number of items displayed on the page (usually ~20)
Loop on my items get the ids and for each page write a query which will returns if a vote has been casted or NULL
Looks ok because only one query doesn't matter how much items on the page but may be break some MVC/Domain Model design, dunno.
When User log in (or a guest for whom an anonymous user is created) retrieve all votes, store them in session, if a new vote is casted, just add it to the session.
Looks nice because no queries is needed at all except the first one, however, this one and, depending on the number of votes casted (maybe a bunch for each user) can increase the size of the session for each users and potentially make the authentification slow.
How do you do? Any other ideas?
For eg : Lets assume you have a table to store votes and the user who cast it.
Lets assume you keep votes in user_votes when a vote is cast with a table structure something like the below one.
id of type int autoincrement
user_id type int, Foreign key representing users table
question_id type of int, Foreign key representing questions table
Now as the user will be logged in , when you are doing a fetch for the questions do a left join with the user_id in the user_votes table.
Something like
SELECT q.id, q.question, uv.id
FROM questions AS q
LEFT JOIN user_votes AS uv ON
uv.question_id = q.id AND
uv.user_id = <logged_in_user_id>
WHERE <Your criteria>
From the view you can check whether the id is present. If so mark voted, else not.
You may need to change your fields of the questions table and all. I am assuming you store questions in questions table and users in user table so and so. All having the primary key id .
Thanks
You could use a combination of your suggested strategies.
Retrieve all the votes made by the logged in user for recent/active questions only and store them in the session.
You then have the ones that are more likely to be needed while still reducing the amount you need to store in the session.
In the less likely event that you need other results, query for just those as and when you need to.
This strategy will reduce the amount you need to store in the session and also reduce the number of calls you make to your database.
Just based on the information than you've given so far, I would take the second approach: get the IDs of all the items on the page, and then do a single query to get all the user's votes for that list of item IDs. Then pass the collection of the user's item votes to your view, so it can render items differently when the user has voted for that item.
The other two approaches seem like they would tend to be less efficient, if I understood you correctly. Using a view helper to initiate an individual query for each item to check if the user has voted on it could lead to a lot of unnecessary queries. And preloading all the user's voting history at login seems to add unnecessary overhead, getting data that isn't always needed and adding the burden of keeping it up to date for the duration of the session.