I am a newbie to nosql world and I am stuck during the designing of my database.I am developing an app where there are two collections,
User
Leave
When a user applies for leave ,leave details will be added to leave collection and the leaveID(Mongo generated) will be added to the user collection depending on which user applied for leave.
Now my question is for adding the _id to the user collection ,Should i write one more query or is there any way to auto fill the user collection when a document is added to the leave Collection. ie should i write 2 queries to insert into the leave and user collection or with only one query the task could be completed.
I am using java driver for interacting with db.
In mongodb, with that collection structure, you'll have two use two requests, yes. One for inserting leave, another for inserting leave reference to a user document.
You could do with one request if your leaves were embedded in a user, but that might not make sense, according to your other requirements.
Related
I am trying to come up with a rough design for an application we're working on. What I'd like to know is, if there is a way to directly map a one to many relation in mongo.
My schema is like this:
There are a bunch of Devices.
Each device is known by it's name/ID uniquely.
Each device, can have multiple interfaces.
These interfaces can be added by a user in the front end at any given
time.
An interface is known uniquely by it's ID, and can be associated with
only one Device.
A device can contain at least an order of 100 interfaces.
I was going through MongoDB documentation wherein they mention things relating to Embedded document vs. multiple collections. By no means am I having a detailed clarity over this as I've just started with Mongo and meteor.
Question is, what could seemingly be a better approach? Having multiple small collections or having one big embedded collection. I know this question is somewhat subjective, I just need some clarity from folks who have more expertise in this field.
Another question is, suppose I go with the embedded model, is there a way to update only a part of the document (specific to the interface alone) so that as and when itf is added, it can be inserted into the same device document?
It depends on the purpose of the application.
Big document
A good example on where you'd want a big embedded collection would be if you are not going to modify (normally) the data but you're going to query them a lot. In my application I use this for storing pre-processed trips with all the information. Therefore when someone wants to consult this trip, all the information is located in a single document. However if your query is based on a value that is embedded in a trip, inside a list this would be very slow. If that's the case I'd recommend creating another collection with a relation between both collections. Also for updating part of a document it would be slow since it would require you to fetch the whole document and then update it.
Small documents with relations
If you plan on modify the data a lot, I'd recommend you to stick to a reference to another collection. With small documents, this will allow you to update any collection quicker. If you want to model a unique relation you may consider using a unique index in mongo. This can be done using: db.members.createIndex( { "user_id": 1 }, { unique: true } ).
Therefore:
Big object: Great for querying data but slow for complex queries.
Small related collections: Great for updating but requires several queries on distinct collections.
I'm new to mongoDB and trying to figure out what would be the best way to store user logs. I identified two main solutions, but can't figure out which might be the best. If others come to mind, please feel free to share them ;)
1)The first is about storing a log in all the collections that I have. For instance, If I have the 'post', 'friends', 'sports' and 'music' collections, then I could create a log field in each document in each collection with all the logging info that I want to store.
2)The second way is to create an entire 'log' collection, each document having a type ('post', 'friends' ...) to identify the kind of log I'm storing along with the id of the document that is refered to.
What I really need is to be able to store and retrieve data (that is, everything but logs) as fast as possible. (so if I go with (1), I would have to always remove the logs from my selection queries since they would be useless most of the time)
Logs will only be accessed periodicaly (for reporting and stats mostly), yet will require to be mapped to their initial document (in case of (2)).
I will be creating logs for almost all the non log data to store (so storing logs inside each collection might be faster : one insert vs two).
Logging could also be done asynchronously to ease the load on the server.
With all that in mind, I can't really manage to find which is the best for my needs. Would anyone have any idea / comments to share ?
Thanks a lot !
How you want to access your logs will play a big part in your design decision. By what criteria will you access your log documents? Will you have to query by type (e.g. post, friends) AND id (object id of the document)? Is there some other identifying feature? This can be extra overhead as you would have to read your 'type' collection first, get the id you're after, then query your logs collection. This creates a lot more read overhead.
What I would recommend is a separate logs collection as this keeps all related data in the one place. Then, for each log document, have a 1:1 mapping between document ids for your type collections, and your log collection. e.g. If you have a friend document, use the friend document's _id field as the _id field for your document in your logs collection. That way you can directly look up your log document without a second read. If you will have multiple log records for each type document, use an array in the log document and append each log record to it using mongo's $push. This would be a very efficient log architecture in terms of storage, write ($push requires no read - 'set and forget') and lookup time (smart 1:1 mapping - no more than one query needed if you have the _id).
I am just starting out with MongoDB (Late to the party, I know...)
I am still trying to get 10+ years of relational DBing out of my head when thinking of a document design.
Lets say I have many users using many apps. Any user can use several apps, and any app can be used by any number of users.
In the login procedure I would like to access all the apps a user uses. In another procedure I would like to get all the users of a specific app.
Should I just have duplicate data? Maybe have an array of users in the App document and an array of apps in the user document? Does this make sense? Is this a conventional approach in document DBs?
Good question!
You have many to many scenario.
In Mongo you can solve this problem in many ways:
Using a lookup table like in SQL or having an array.
What you should consider are indexes, same as in SQL, but this time you have more options.
Since its a many to many scenario I would probably go with the lookup table.
This is the most effective way to get users of an app and apps of a user.
Array is not good for dynamic values especially if you need two array fields (app / user) while the app.users array field is going to change often.
The downside is that you can "join" and will have to "select" data from two tables and do the "join" yourself but this shouldn't be an issue, especially since you can always cache the result (local caching in your application) and Mongo will return the result super fast if you will add index for the user field
{
_id: "<appID>_<userID>" ,
user: "<userID>"
}
_id indexes by default. Another index should be created for the "user" field then Mongo will load the btree into memory and you are all good.
As per your scenario, you need not have duplicate data. Since it's a many to many relationship and the data is going to keep changing, you need to use document reference instead of document embedding.
So you will have two collections:
app collection :
{
_id : appId,
app_name : "appname",
// other property of app
users : [userid1, userid2]
}
users collection:
{
_id : userId,
// other details of user
apps: [appid1, appid2, ..]
}
As you mentioned you need to have array of users in app collection & array of apps in user collection.
When you are fetching data in the client, at first when the user logs in, you will get the array of app IDs from the user document.
Then again, with the app IDs you need to query for Apps details in the app collection.
This roundtrip will be there for sure as we are using references. But you can improve performance by caching the details & by having proper indexes.
This is conventional in mongodb for a many to many relationship
I am new to MongoDB.I have one Master Collection user_group.The sample document is shown bellow.
{group_name:"xyz","previlege":["Add","Delete"],...}
And second collection user_detail
{"user_name":"pt123","group_name":"xyz",...}
How can I maintain relation between these two collections.Should I use reference from user_group into user_detail or any other alternative?
Often, in MongoDB, the "has many" relationship is managed on the opposite side as in a relational database. A MongoDB document often will have an array of ObjectIds or group names (or whatever you're using to identify the foreign document). This is opposed to a relational database where the other side usually has a "belongs to" column.
Do be clear, this is not required. In your example, you could store an array of user details IDs in your group document if it was the most common query that you were going to make. Basically, the question you should ask is "what query am I likely to need?" and design your documents to support it.
Simple answer: You don't.
The entire design philosophy changes when you start looking at MongoDB. If I were you, I would maintain the previlege field inside the user_detail documents itself.
{"user_name":"abc","group_name":"xyz","previlege" : ["add","delete"]}
This may not be ideal if you keep changing group priviledges though. But the idea is, you make design your data storage in a way so that all the information for one "record" can be stored in one object.
MongoDB being NoSQL does not have explicit joins. Workarounds are possible, but not recommended(read MapReduce).
Your best bet is to retrieve both the documents from the mongo collections on the client side and apply user specific privileges. Make sure you have index on the group_name in the user_group collection.
Or better still store the permissions[read, del, etc] for the user in the same document after applying the join at the client side. But then, you cannot update the collection externally since this might break invariants. Everytime an update to the user group occurs, you will need to apply those permissions(privileges) yourself at the client side and save those privileges in the same document. Writes might suffer but reads will be fast(assuming a few fields are indexed, like username).
I am new to MongoDB so I apologize if these questions are simple.
I am developing an application that will track specific user interactions and put information about the user and the interactions into a MongoDB. There are several types of interactions that will all collect different information from the user.
My First question is: Should all of these interaction be in the same collection or should I separate them out by types (as you would do in a RDBMS)?
Additionally I would like to be able to look up:
All the interactions a specific user has made
All the users that have made a specific interaction
I was thinking of putting a Manual reference to an interaction document for each interaction a user performs in his document and a manual reference to the user that performed the interaction in each interaction document.
My second questions is: Does this "doubling up" of Manual references make sense or is there a better way to do this?
Any thoughts would be greatly appreciated.
Thank you!
My First question is: Should all of these interaction be in the same collection or should I separate them out by types (as you would do in a RDBMS)?
Without knowing too much about your data size, write amount, read amount, querying needs etc I would say; yes, all in one collection.
I am not sure if separating them out is how I would design this in a RDBMS either.
"Does this "doubling up" of Manual references make sense or is there a better way to do this?"
No it doesn't make sound databse design to me.
Putting a user_id on the interaction collection document sounds good enough.
So when you want to get all user interactions you just query by the interactions collection user_id.
When you want to do it the other way around you query for all interactions that fit your query area, pull out those user_ids and then do a $in clause on the user collection.
My First question is: Should all of these interaction be in the same collection or should I separate them out by types (as you would do in a RDBMS)?
The greatest advantage of a document store over a relational database is precisely that you can do that. Put all different interactions into one collection and don't be afraid to give them different sets of fields.
Additionally I would like to be able to look up:
All the interactions a specific user has made
I was thinking of putting a Manual reference to an interaction document for each interaction a user performs in his document and a manual reference to the user that performed the interaction in each interaction document.
Note that it's usually not a good idea to have documents which grow indefinitely. MongoDB has an upper limit for document size (per default:16MB). MongoDB isn't good at handling large documents, because documents are loaded completely into ram cache. When you have many large objects, not much will fit into the cache. Also, when documents grow, they sometimes need to be moved to another hard drive location, which slows down updates (that also screws with natural ordering, but you shouldn't rely on that anyway).
All the users that have made a specific interaction
Are you referring to a specific interaction instance (assuming that multiple users can be part of one interaction) or all users which already performed a specific interaction type?
In the latter case I would add an array of performed interaction types to the user document, because otherwise you would have to perform a join-like operation, which would either require a MapReduce or some application-sided logic.
The the first case I would, contrary to what Sammaye suggests, recommend to use not the _id field of the user collection, but rather the username. When you use an index with the unique flag on user.username, it's just as fast as searching by user._id and uniqueness is guaranteed.
The reason is that when you search for the interactions by a specific user, it's more likely that you know the username and not the id. When you only have the username and you are referencing the user by id, you first have to search the users collection to get the _id of the username, which is a additional database query.
This of course assumes that you don't always have the user._id at hand. When you do, you can of course use _id as reference.