Many-to-many in document DBs - mongodb

I am just starting out with MongoDB (Late to the party, I know...)
I am still trying to get 10+ years of relational DBing out of my head when thinking of a document design.
Lets say I have many users using many apps. Any user can use several apps, and any app can be used by any number of users.
In the login procedure I would like to access all the apps a user uses. In another procedure I would like to get all the users of a specific app.
Should I just have duplicate data? Maybe have an array of users in the App document and an array of apps in the user document? Does this make sense? Is this a conventional approach in document DBs?

Good question!
You have many to many scenario.
In Mongo you can solve this problem in many ways:
Using a lookup table like in SQL or having an array.
What you should consider are indexes, same as in SQL, but this time you have more options.
Since its a many to many scenario I would probably go with the lookup table.
This is the most effective way to get users of an app and apps of a user.
Array is not good for dynamic values especially if you need two array fields (app / user) while the app.users array field is going to change often.
The downside is that you can "join" and will have to "select" data from two tables and do the "join" yourself but this shouldn't be an issue, especially since you can always cache the result (local caching in your application) and Mongo will return the result super fast if you will add index for the user field
{
_id: "<appID>_<userID>" ,
user: "<userID>"
}
_id indexes by default. Another index should be created for the "user" field then Mongo will load the btree into memory and you are all good.

As per your scenario, you need not have duplicate data. Since it's a many to many relationship and the data is going to keep changing, you need to use document reference instead of document embedding.
So you will have two collections:
app collection :
{
_id : appId,
app_name : "appname",
// other property of app
users : [userid1, userid2]
}
users collection:
{
_id : userId,
// other details of user
apps: [appid1, appid2, ..]
}
As you mentioned you need to have array of users in app collection & array of apps in user collection.
When you are fetching data in the client, at first when the user logs in, you will get the array of app IDs from the user document.
Then again, with the app IDs you need to query for Apps details in the app collection.
This roundtrip will be there for sure as we are using references. But you can improve performance by caching the details & by having proper indexes.
This is conventional in mongodb for a many to many relationship

Related

Get all usernames from a items user_id? (Mongodb query)

I am having some difficulty with my leaderboard for my application.
I have a database with two collections.
Users
Fish
In my Fish collection I also have user_id. When I fetch all fish, I get everything including user_id.
However, the user_id doesn't really help me, I want to display the username belonging to that user_id.
This is how my query looks.
Fish.find().sort({weight: -1}).limit(10).exec(function(err, leaderboard) {
if(err) return res.json(500, {errorMsg: 'Could not get leaderboard'});
res.json(leaderboard);
})
I feel like I need to make another query, to get all the usernames belonging to the user_ids I get from the first query. Perhaps use a loop somehow?
MongoDb is pretty new to me and don't really know what to look for.
Any advice, tips, link are much appriecated.
You may find useful information on MongoDB's Database References documentation.
The first thing to consider on using fields from different collections in MongoDB is:
MongoDB does not support joins. In MongoDB some data is denormalized, or stored with related data in documents to remove the need for joins. However, in some cases it makes sense to store related information in separate documents, typically in different collections or databases.
In your case, you might want to consider storing the information from the Fish collection as embedded documents within the users from the User collection.
If this is not an option, then you might want to use Manual References or loop over the user_ids provided in the result from your query over the Fish collection.
With the second option you may use a query to obtain the corresponding usernames from the User collection such as:
Users.find({user_id:<USER_ID>},{username:1})

mongodb scheme database design

I have users table like:
{ _id: kshjfhsf098767, email: email#something name: John joshua }
{ _id: dleoireofd9888, email: email#hhh name: Terry Holdman }
And I have other collection "game"
{_id: gsgrfsdgf8898, home_user_id: kshjfhsf098767, guest_user_id: dleoireofd9888, result: "0:1"}
Then what I want is to join (like it was in mysql), game two times with users with because I know home_user_id and guest_user_id and take name email etc.
I could place all of that in table game but that will be duplicated content. and if they change name or email I need to update whole game table....
Any help on design and query to call that game with two users that are playing game would be great...Tnx
There are two ways to manage this, manually or using a DBRef. From the preceeding documentation link:
MongoDB does not support joins. In MongoDB some data is “denormalized,” or stored with related data in documents to remove the need for joins. However, in some cases it makes sense to store related information in separate documents, typically in different collections or databases.
So it is a case of mange the link yourself or use the built-in DBRef. For the DBRef case see How to query mongodb with DBRef
Alternatively, it may be easier to manage with a different schema design. For example the game collection could just store the result and game_id and instead add the game_id reference to each of the relevant users. Of course you will still need to query both collections and the linked SO question has an example of how to do this.
MongoDB has no JOINs (NoSQL).
Just do a lazy join here where by you query your user row and then query all games that user is a part of. It will be ultra fast with the right indexes and since the commands would be two small ones MongoDB would barely notice them.
I would not recommend embedding here. Taking the reason you state, for example, that will make the data a pain to update across the 100's of users that could be in a single game "room". In this case it is better to do a single atomic update even if it means you have to put a little overhead on querying another collection.

Sharing a document with users

I have to choose a database for implementing a sharing system.
My system will have users and documents. I have to share a document with a few users.
Example:
There are 2 users, and there is one document.
So if I have to share that one document with both the users, I could do these possible solutions:
The current method I'm using is with MySQL (I don't want to use this):
Relational Databases (MySQL)
Users Table = user1, user2
Docs Table = doc1
Docs-User Relation Table = doc1, user1
doc1, user2
And I would like to use something like this:
NoSQL Document Stores (MongoDB)
Users Documents:
{
_id: user1,
docs_i_have_access_to: {doc1}
}
{
_id: user2,
docs_i_have_access_to: {doc1}
}
Document's Document:
{
_id: doc1
members_of_this_doc: {user1, user2}
}
And I don't yet know how I would implement in a key-value store like Redis.
So I just wanted to know, would the MongoDB way I have given above, the best solution?
And is there any other way I could implement this? Maybe with another database solution?
Should I try to implement it with Redis or not?
Which database and which method should I choose and will be the best to share the data and why?
Note: I want something highly scalable and persistent. :D
Thanks. :D
Actually, you need to represent a many-to-many relationship. One user can have several documents. One document can be shared among several users.
See my previous answer to this question: how to have relations many to many in redis
With Redis, representing relationship with the set datatype is a pretty common pattern. You can expect to get better performance than with MongoDB for this kind of data model. And as a bonus, you can easily and efficiently find which users have a given list of documents in common, or which documents are shared by a given set of users.
Considering only this simple example (you just need to keep who owns what) SQL seems to be the most appropriate, as it will give additional options for free, such as reporting who has how many docs, the most popular documents, most active user etc with almost zero cost + the data will be more consistent (no duplication, possibly foreign keys). This is valid unless you have millions of documents of course.
If I chose between document-oriented and relational DB, I'd make a decision based mostly on the structure of the document itself. Whether they're all uniform or may have different fields for different types, do you nested sub-documents or arrays with the ability to search by their contents.

Storing secondary documents in MongoDB doc

Say User2 has these objects of data, they are stored in User2.objects as an array of objects.
User2:{objects:[{object1},{object2},{object3}]}
If anyone other then User2 needs to query this data, like User1 needs any of those objects that pertain to them. Then it should be broken out into it's own collections of in MongoDB db right?
Because say a User1 wants to find every object like that are part of. They would need to have a reference to all the User2s they created data with then look through every object in each user, and return the one they need.
I should break those objects out into their own collections? Then I can just index user ids and each user can just query once for their own id.
Sorry if this Q is confusing I'm a little lost.
It appears as though there may be some confusion between Mongo Document structure and using authentication with MongoDB.
The documentation on how to set up user authentication for a Mongo Database is here:
http://www.mongodb.org/display/DOCS/Security+and+Authentication
If User2 needs to run a query on a collection that was created by User1, then User2 must have an account with the Database where that collection resides, and must be properly authenticated.
The example document provided is also a little confusing. It is a better idea to use key names that will be the same across all documents. For example:
{userName:"user1", name:"Marc"},
{userName:"user2", name:"Jeff"},
{userName:"user3", name:"Steve"}
is preferable to
{user1:"Marc"},
{user2:"Jeff"},
{user3:"Steve"}
In the second example, the username (user1, user2, etc) will have to be known in order to find out the name of the user. MongoDB does not support wildcards in queries.
The following document structure would be preferable:
{
user: "User2",
objects:[object1,object2,object3]
},
{
user: "User1",
objects:[object1,object2,object3]
}
All of the objects created by user1 could be retrieved with the following query:
> db.<your collection name>.find({user: "User1"}, {objects:1})
For more information on the MongoDB document structure, I recommend reading the following:
http://www.mongodb.org/display/DOCS/Schema+Design - A great introduction to the way data is stored in MongoDB, including example documents, best practices, and an introduction to indexing.
Hopefully the above will put you on the right track in terms of deciding on a schema for your collection and creating users and setting permissions. Authentication is one of MongoDB's more advanced features, so I would begin by focusing on building an efficient schema and organizing your data correctly before worrying about authentication.
If you have any additional questions about these topics, or anything else MongoDB-related, the Community is here to help! Good Luck!

MongoDB - simulate join or subquery

I'm trying to figure out the best way to structure my data in Mongo to simulate what would be a simple join or subquery in SQL.
Say I have the classic Users and Posts example, with Users in one collection and Posts in another. I want to find all posts by users who's city is "london".
I've simplified things in this question, in my real world scenario storing Posts as an array in the User document won't work as I have 1,000's of "posts" per user constantly inserting.
Can Mongos $in operator help here? Can $in handle an array of 10,000,000 entries?
Honestly, if you can't fit "Posts" into "Users", then you have two options.
Denormalize some User data inside of posts. Then you can search through just the one collection.
Do two queries. (one to find users the other find posts)
Based on your question, you're trying to do #2.
Theoretically, you could build a list of User IDs (or refs) and then find all Posts belonging to a User $in that array. But obviously that approach is limited.
Can $in handle an array of 10,000,000 entries?
Look, if you're planning to "query" your posts for all users in a set of 10,000,000 Users you are well past the stage of "query". You say yourself that each User has 1,000s of posts so you're talking about a query for "Users with Posts who live in London" returning 100Ms of records.
100M records isn't a query, that's a dataset!
If you're worried about breaking the $in command, then I highly suggest that you use map/reduce. The Mongo Map/Reduce will create a new collection for you. You can then trim down or summarize this dataset as you see fit.
$in can handle 100,000 entries. I've never tried 10,000,000 entries but the query (a query is also a document) has to be smaller than 4mb (like every document) so 10,0000,0000 entries isn't possible.
Why don't you include the user and its town in the Posts collection? You can index this town because you can index properties of embedded entities. You no longer have to simulate a join because you can query the Posts on the towns of its embedded users.
This means that you have to update the Posts when the town of a user changes but that doesn't happen very often. This update will be fast if you index the UserId in the Posts collection.
I have something similar, but my setup is geared towards "users" and "messages." What I did was add a reference to the user, sort of like a foreign key. I used the generated "_id" from the users collection and stored it as a key inside of "messages." For every message a user sends, I save it to the "messages" collection. You should read up on dbrefs, I think it's what you're looking for.
You'll have to run multiple queries, but you should definitely do that on the app side.