MongoDB and "joins" [duplicate] - mongodb

This question already has answers here:
How do I perform the SQL Join equivalent in MongoDB?
(19 answers)
Closed 6 years ago.
I'm sure MongoDB doesn't officially support "joins". What does this mean?
Does this mean "We cannot connect two collections(tables) together."?
I think if we put the value for _id in collection A to the other_id in collection B, can we simply connect two collections?
If my understanding is correct, MongoDB can connect two tables together, say, when we run a query. This is done by "Reference" written in http://www.mongodb.org/display/DOCS/Schema+Design.
Then what does "joins" really mean?
I'd love to know the answer because this is essential to learn MongoDB schema design. http://www.mongodb.org/display/DOCS/Schema+Design

It's no join since the relationship will only be evaluated when needed. A join (in a SQL database) on the other hand will resolve relationships and return them as if they were a single table (you "join two tables into one").
You can read more about DBRef here:
http://docs.mongodb.org/manual/applications/database-references/
There are two possible solutions for resolving references. One is to do it manually, as you have almost described. Just save a document's _id in another document's other_id, then write your own function to resolve the relationship. The other solution is to use DBRefs as described on the manual page above, which will make MongoDB resolve the relationship client-side on demand. Which solution you choose does not matter so much because both methods will resolve the relationship client-side (note that a SQL database resolves joins on the server-side).

The database does not do joins -- or automatic "linking" between documents. However you can do it yourself client side. If you need to do 2, that is ok, but if you had to do 2000, the number of client/server turnarounds would make the operation slow.
In MongoDB a common pattern is embedding. In relational when normalizing things get broken into parts. Often in mongo these pieces end up being a single document, so no join is needed anyway. But when one is needed, one does it client-side.
Consider the classic ORDER, ORDER-LINEITEM example. One order and 8 line items are 9 rows in relational; in MongoDB we would typically just model this as a single BSON document which is an order with an array of embedded line items. So in that case, the join issue does not arise. However the order would have a CUSTOMER which probably is a separate collection - the client could read the cust_id from the order document, and then go fetch it as needed separately.
There are some videos and slides for schema design talks on the mongodb.org web site I belive.

one kind of join a query in mongoDB, is ask at one collection for id that match , put ids in a list (idlist) , and do find using on other (or same) collection with $in : idlist
u = db.friends.find({"friends": something }).toArray()
idlist= []
u.forEach(function(myDoc) { idlist.push(myDoc.id ); } )
db.family.find({"id": {$in : idlist} } )

The first example you link to shows how MongoDB references behave much like lazy loading not like a join. There isn't a query there that's happening on both collections, rather you query one and then you lookup items from another collection by reference.

The fact that mongoDB is not relational have led some people to consider it useless.
I think that you should know what you are doing before designing a DB. If you choose to use noSQL DB such as MongoDB, you better implement a schema. This will make your collections - more or less - resemble tables in SQL databases. Also, avoid denormalization (embedding), unless necessary for efficiency reasons.
If you want to design your own noSQL database, I suggest to have a look on Firebase documentation. If you understand how they organize the data for their service, you can easily design a similar pattern for yours.
As others pointed out, you will have to do the joins client-side, except with Meteor (a Javascript framework), you can do your joins server-side with this package (I don't know of other framework which enables you to do so). However, I suggest you read this article before deciding to go with this choice.
Edit 28.04.17:
Recently Firebase published this excellent series on designing noSql Databases. They also highlighted in one of the episodes the reasons to avoid joins and how to get around such scenarios by denormalizing your database.

If you use mongoose, you can just use(assuming you're using subdocuments and population):
Profile.findById profileId
.select 'friends'
.exec (err, profile) ->
if err or not profile
handleError err, profile, res
else
Status.find { profile: { $in: profile.friends } }, (err, statuses) ->
if err
handleErr err, statuses, res
else
res.json createJSON statuses
It retrieves Statuses which belong to one of Profile (profileId) friends. Friends is array of references to other Profiles. Profile schema with friends defined:
schema = new mongoose.Schema
# ...
friends: [
type: mongoose.Schema.Types.ObjectId
ref: 'Profile'
unique: true
index: true
]

I came across lot of posts searching for the same - "Mongodb Joins" and alternatives or equivalents. So my answer would help many other who are like me. This is the answer I would be looking for.
I am using Mongoose with Express framework. There is a functionality called Population in place of joins.
As mentioned in Mongoose docs.
There are no joins in MongoDB but sometimes we still want references to documents in other collections. This is where population comes in.
This StackOverflow answer shows a simple example on how to use it.

Related

MongoDB and one-to-many relation

I am trying to come up with a rough design for an application we're working on. What I'd like to know is, if there is a way to directly map a one to many relation in mongo.
My schema is like this:
There are a bunch of Devices.
Each device is known by it's name/ID uniquely.
Each device, can have multiple interfaces.
These interfaces can be added by a user in the front end at any given
time.
An interface is known uniquely by it's ID, and can be associated with
only one Device.
A device can contain at least an order of 100 interfaces.
I was going through MongoDB documentation wherein they mention things relating to Embedded document vs. multiple collections. By no means am I having a detailed clarity over this as I've just started with Mongo and meteor.
Question is, what could seemingly be a better approach? Having multiple small collections or having one big embedded collection. I know this question is somewhat subjective, I just need some clarity from folks who have more expertise in this field.
Another question is, suppose I go with the embedded model, is there a way to update only a part of the document (specific to the interface alone) so that as and when itf is added, it can be inserted into the same device document?
It depends on the purpose of the application.
Big document
A good example on where you'd want a big embedded collection would be if you are not going to modify (normally) the data but you're going to query them a lot. In my application I use this for storing pre-processed trips with all the information. Therefore when someone wants to consult this trip, all the information is located in a single document. However if your query is based on a value that is embedded in a trip, inside a list this would be very slow. If that's the case I'd recommend creating another collection with a relation between both collections. Also for updating part of a document it would be slow since it would require you to fetch the whole document and then update it.
Small documents with relations
If you plan on modify the data a lot, I'd recommend you to stick to a reference to another collection. With small documents, this will allow you to update any collection quicker. If you want to model a unique relation you may consider using a unique index in mongo. This can be done using: db.members.createIndex( { "user_id": 1 }, { unique: true } ).
Therefore:
Big object: Great for querying data but slow for complex queries.
Small related collections: Great for updating but requires several queries on distinct collections.

Decide create multiple database and relational in cloudant

I'm currently learning NOSQL cloudant and trying to design the database, as I'm learning I'm cloudant treat all records as a documents and using denormalise for table. so I'm currently a bit confuse to how to decide which one need to be in one document and which one is need to be separated.
Below are my test cases :
let's say i'm designing store book tables structure, for simplicity I'll be having this tables BOOK, STORE, STORE_BRANCH
BOOK field : _id, book_name, author
STORE field : _id, store_name
STORE_BRANCH field : _id, store_branch_name, address, store_id_fk
with above case, I not able to decide where should i put the "price" field to ? as for normal RDBMS i will just create another table and having fields : ( store book_id, store_branch_id and prices), this with the assumption the price of the book is different for each branch. so i wondering how I put this in cloudant ?
any suggestion is appreciated
Your doubts are pretty common for RDMBS user.
In NoSQL generally you use the everything-in-one-document approach. In fact, in some cases, approximating JOINs in a document-oriented database like Cloudant is outright trivial. For example if you want to model a one-to-n relationship, you can put all n-related documents into the document they belong to. In your case you should put all the store_branch in the related store. This strategy is OK if:
The document does not get so big that it impairs performance. This can be mitigated somewhat by using database views or show functions.
The information in the inner document only appears there and does not need to be duplicated into other documents, or such duplication is acceptable for your application.
The document does not get updated concurrently. If it does, there will likely be unnecessary conflicts that will need to be resolved by the application.
If the above strategy is not applicable you can use an approach that more closely mimics how you solve this problem in a relational database: you can create a document for each "relational table". In your case you should create a document having fields : ( store book_id, store_branch_id and prices), too.
This Cloudant article explains very deeply these possibilities: Cloudant - Join the fun, have fun with JOINs.

mongodb scheme database design

I have users table like:
{ _id: kshjfhsf098767, email: email#something name: John joshua }
{ _id: dleoireofd9888, email: email#hhh name: Terry Holdman }
And I have other collection "game"
{_id: gsgrfsdgf8898, home_user_id: kshjfhsf098767, guest_user_id: dleoireofd9888, result: "0:1"}
Then what I want is to join (like it was in mysql), game two times with users with because I know home_user_id and guest_user_id and take name email etc.
I could place all of that in table game but that will be duplicated content. and if they change name or email I need to update whole game table....
Any help on design and query to call that game with two users that are playing game would be great...Tnx
There are two ways to manage this, manually or using a DBRef. From the preceeding documentation link:
MongoDB does not support joins. In MongoDB some data is “denormalized,” or stored with related data in documents to remove the need for joins. However, in some cases it makes sense to store related information in separate documents, typically in different collections or databases.
So it is a case of mange the link yourself or use the built-in DBRef. For the DBRef case see How to query mongodb with DBRef
Alternatively, it may be easier to manage with a different schema design. For example the game collection could just store the result and game_id and instead add the game_id reference to each of the relevant users. Of course you will still need to query both collections and the linked SO question has an example of how to do this.
MongoDB has no JOINs (NoSQL).
Just do a lazy join here where by you query your user row and then query all games that user is a part of. It will be ultra fast with the right indexes and since the commands would be two small ones MongoDB would barely notice them.
I would not recommend embedding here. Taking the reason you state, for example, that will make the data a pain to update across the 100's of users that could be in a single game "room". In this case it is better to do a single atomic update even if it means you have to put a little overhead on querying another collection.

MongoDB - How to Handle Relationship

I just start learning about nosql database, specially MongoDB (no specific reason for mongodb). I browse few tutorial sites, but still cant figure out, how it handle relationship between two documents/entity
Lets say for example:
1. One Employee works in one department
2. One Employee works in many department
I dont know the term 'relationship' make sense for mongodb or not.
Can somebody please give something about joins, relationship.
The short answer: with "nosql" you wouldn't do it that way.
What you'd do instead of a join or a relationship is add the departments the user is in to the user object.
You could also add the user to a field in the "department" object, if you needed to see users from that direction.
Denormalized data like this is typical in a "nosql" database.
See this very closely related question: How do I perform the SQL Join equivalent in MongoDB?
in general, you want to denormalize your data in your collections (=tables). Your collections should be optimized so that you don't need to do joins (joins are not possible in NoSQL).
In MongoDB you can either reference other collections (=tables), or you can embed them into each other -- whatever makes more sense in your domain. There are size limits to entries in a collection, so you can't just embed the encyclopedia britannica ;-)
It's probably best if you look for API documentation and examples for the programming language of your choice.
For Ruby, I'd recommend the Mondoid library: http://mongoid.org/docs/relations.html
Generally, if you decided to learn about NoSql databases you should follow the "NoSql way", i.e. learn the principles beyond the movement and the approach to design and not simply try to map RDBMS to your first NoSql project.
Simply put - you should learn how to embed and denormalize data (like Will above suggested), and not simply copy the id to simulate foreign keys.
If you do this the "foreign _id way", next step is to search for transactions to ensure that two "rows" are consistently inserted/updated. Few steps after Oracle/MySql is waiting. :)
There are some instances in which you want/need to keep the documents separate in which case you would take the _id from the one object and add it as a value in your other object.
For Example:
db.authors
{
_id:ObjectId(21EC2020-3AEA-1069-A2DD-08002B30309D)
name:'George R.R. Martin'
}
db.books
{
name:'A Dance with Dragons'
authorId:ObjectId(21EC2020-3AEA-1069-A2DD-08002B30309D)
}
There is no official relationship between books and authors its just a copy of the _id from authors into the authorId value in books.
Hope that helps.

Relations in Document-oriented database?

I'm interested in document-oriented databases, and I'd like to play with MongoDB. So I started a fairly simple project (an issue tracker), but am having hard times thinking in a non-relational way.
My problems:
I have two objects that relate to each other (e.g. issue = {code:"asdf-11", title:"asdf", reporter:{username:"qwer", role:"manager"}} - here I have a user related to the issue). Should I create another document 'user' and reference it in 'issue' document by its id (like in relational databases), or should I leave all the user's data in the subdocument?
If I have objects (subdocuments) in a document, can I update them all in a single query?
I'm totally new to document-oriented databases, and right now I'm trying to develop sort of a CMS using node.js and mongodb so I'm facing the same problems as you.
By trial and error I found this rule of thumb: I make a collection for every entity that may be a "subject" for my queries, while embedding the rest inside other objects.
For example, comments in a blog entry can be embedded, because usually they're bound to the entry itself and I can't think about a useful query made globally on all comments. On the other side, tags attached to a post might deserve their own collection, because even if they're bound to the post, you might want to reason globally about all the tags (for example making a list of trending topics).
In my mind this is actually pretty simple. Embedded documents can only be accessed via their master document. If you can envision a need to query an object outside the context of the master document, then don't embed it. Use a ref.
For your example
issue = {code:"asdf-11", title:"asdf", reporter:{username:"qwer", role:"manager"}}
I would make issue and reporter each their own document, and reference the reporter in the issue. You could also reference a list of issues in reporter. This way you won't duplicate reporters in issues, you can query them each separately, you can query reporter by issue, and you can query issues by reporter. If you embed reporter in issue, you can only query the one way, reporter by issue.
If you embed documents, you can update them all in a single query, but you have to repeat the update in each master document. This is another good reason to use reference documents.
The beauty of mongodb and other "NoSQL" product is that there isn't any schema to design. I use MongoDB and I love it, not having to write SQL queries and awful JOIN queries! So to answer your two questions.
1 - If you create multiple documents, you'll need make two calls to the DB. Not saying it's a bad thing but if you can throw everything into one document, why not? I recall when I used to use MySQL, I would create a "blog" table and a "comments" table. Now, I append the comments to the record in the same collection (aka table) and keep building on it.
2 - Yes ...
The schema design in Document-oriented DBs can seems difficult at first, but building my startup with Symfony2 and MongoDB I've found that the 80% of the time is just like with a relational DB.
At first, think it like a normal db:
To start, just create your schema as you would with a relational Db:
Each Entity should have his own Collection, especially if you'll need to paginate the documents in it.
(in Mongo you can somewhat paginate nested document arrays, but the capabilities are limited)
Then just remove overly complicated normalization:
do I need a separate category table? (simply write the category in a column/property as a string or embedded doc)
Can I store comments count directly as an Int in the Author collection? (then update the count with an event, for example in Doctrine ODM)
Embedded documents:
Use embedded documents only for:
clearness (nested documents like: addressInfo, billingInfo in the User collection)
to store tags/categories ( eg: [ name: "Sport", parent: "Hobby", page: "/sport"
] )
to store simple multiple values (for eg. in User collection: list of specialties, list of personal websites)
Don't use them when:
the parent Document will grow too large
when you need to paginate them
when you feel the entity is important enough to deserve his own collection
Duplicate values across collection and precompute counts:
Duplicate some columns/attributes values from a Collection to another if you need to do a query with each values in the where conditions. (remember there aren't joins)
eg: In the Ticket collection put also the author name (not only the ID)
Also if you need a counter (number of tickets opened by user, by category, ecc), precompute them.
Embed references:
When you have a One-to-Many or Many-to-Many reference, use an embedded array with the list of the referenced document ids (see MongoDB DB Ref).
You'll need to use an Event again to remove an id if the referenced document get deleted.
(There is an extension for Doctrine ODM if you use it: Reference Integrity)
This kind of references are directly managed by Doctrine ODM: Reference Many
Its easy to fix errors:
If you find late that you have made a mistake in the schema design, its quite simply to fix it with few lines of Javascript to run directly in the Mongo console.
(stored procedures made easy: no need of complex migration scripts)
Waring: don't use Doctrine ODM Migrations, you'll regret that later.
Redid this answer since the original answer took the relation the wrong way round due to reading incorrectly.
issue = {code:"asdf-11", title:"asdf", reporter:{username:"qwer", role:"manager"}}
As to whether embedding some important information about the user (creator) of the ticket is a wise decision or not depends upon the system specifics.
Are you giving these users the ability to login and report issues they find? If so then it is likely you might want to factor that relation off to a user collection.
On the other hand, if that is not the case then you could easily get away with this schema. The one problem I see here is if you wish to contact the reporter and their job role has changed, that's somewhat awkward; however, that is a real world dilemma, not one for the database.
Since the subdocument represents a single one-to-one relation to a reporter you also should not suffer fragmentation problems mentioned in my original answer.
There is one glaring problem with this schema and that is duplication of changing repeating data (Normalised Form stuff).
Let's take an example. Imagine you hit the real world dilemma I spoke about earlier and a user called Nigel wants his role to reflect his new job position from now on. This means you have to update all rows where Nigel is the reporter and change his role to that new position. This can be a lengthy and resource consuming query for MongoDB.
To contradict myself again, if you were to only have maybe 100 tickets (aka something manageable) per user then the update operation would likely not be too bad and would, in fact, by manageable for the database quite easily; plus due to the lack of movement (hopefully) of the documents this would be a completely in place update.
So whether this should be embedded or not depends heavily upn your querying and documents etc, however, I would say this schema isn't a good idea; specifically due to the duplication of changing data across many root documents. Technically, yes, you could get away with it but I would not try.
I would instead split the two out.
If I have objects (subdocuments) in a document, can I update them all in a single query?
Just like the relation style in my original answer, yes and easily.
For example, let's update the role of Nigel to MD (as hinted earlier) and change the ticket status to completed:
db.tickets.update({'reporter.username':'Nigel'},{$set:{'reporter.role':'MD', status: 'completed'}})
So a single document schema does make CRUD easier in this case.
One thing to note, stemming from your English, you cannot use the positional operator to update all subdocuments under a root document. Instead it will update only the first found.
Again hopefully that makes sense and I haven't left anything out. HTH
Original Answer
here I have a user related to the issue). Should I create another document 'user' and reference it in 'issue' document by its id (like in relational databases), or should I leave all the user's data in the subdocument?
This is a considerable question and requires some background knowledge before continuing.
First thing to consider is the size of a issue:
issue = {code:"asdf-11", title:"asdf", reporter:{username:"qwer", role:"manager"}}
Is not very big, and since you no longer need the reporter information (that would be on the root document) it could be smaller, however, issues are never that simple. If you take a look at the MongoDB JIRA for example: https://jira.mongodb.org/browse/SERVER-9548 (as a random page that proves my point) the contents of a "ticket" can actually be quite considerable.
The only way you would gain a true benefit from embedding the tickets would be if you could store ALL user information in a single 16 MB block of contigious sotrage which is the maximum size of a BSON document (as imposed by the mongod currently).
I don't think you would be able to store all tickets under a single user.
Even if you was to shrink the ticket to, maybe, a code, title and a description you could still suffer from the "swiss cheese" problem caused by regular updates and changes to documents in MongoDB, as ever this: http://www.10gen.com/presentations/storage-engine-internals is a good reference for what I mean.
You would typically witness this problem as users add multiple tickets to their root user document. The tickets themselves will change as well but maybe not in a drastic or frequent manner.
You can, of course, remedy this problem a bit by using power of 2 sizes allocation: http://docs.mongodb.org/manual/reference/command/collMod/#usePowerOf2Sizes which will do exactly what it says on the tin.
Ok, hypothetically, if you were to only have code and title then yes, you could store the tickets as subdocuments in the root user without too many problems, however, this is something that comes down to specifics that the bounty assignee has not mentioned.
If I have objects (subdocuments) in a document, can I update them all in a single query?
Yes, quite easily. This is one thing that becomes easier with embedding. You could use a query like:
db.users.update({user_id:uid,'tickets.code':'asdf-1'}, {$set:{'tickets.$.title':'Oh NOES'}})
However, to note, you can only update ONE subdocument at a time using the positional operator. As such this means you cannot, in a single atomic operation, update all ticket dates on a single user to 5 days in the future.
As for adding a new ticket, that is quite simple:
db.users.update({user_id:uid},{$push:{tickets:{code:asdf-1,title:"Whoop"}}})
So yes, you can quite simply, depending on your queries, update the entire users data in a single call.
That was quite a long answer so hopefully I haven't missed anything out, hope it helps.
I like MongoDB, but I have to say that I will use it a lot more soberly in my next project.
Specifically, I have not had as much luck with the Embedded Document facility as people promise.
Embedded Document seems to be useful for Composition (see UML Composition), but not for aggregation. Leaf nodes are great, anything in the middle of your object graph should not be an embedded document. It will make searching and validating your data more of a struggle than you'd want.
One thing that is absolutely better in MongoDB is your many-to-X relationships. You can do a many-to-many with only two tables, and it's possible to represent a many-to-one relationship on either table. That is, you can either put 1 key in N rows, or N keys in 1 row, or both. Notably, queries to accomplish set operations (intersection, union, disjoint set, etc) are actually comprehensible by your coworkers. I have never been satisfied with these queries in SQL. I often have to settle for "two other people will understand this".
If you've ever had your data get really big, you know that inserts and updates can be constrained by how much the indexes cost. You need fewer indexes in MongoDB; an index on A-B-C can be used to query for A, A & B, or A & B & C (but not B, C, B & C or A & C). Plus the ability to invert a relationship lets you move some indexes to secondary tables. My data hasn't gotten big enough to try, but I'm hoping that will help.