How to merge two collections in mongodb

How to merge two collections in mongodb - mongodb

I have two collections called Company_Details and Company_Ranks...Comp_ID is common in two collections. How do I merge these two collections to get complete details of a company.
Please help me
Thanks
Satyam

To make long story short, you either do that on client-side or consider the benefits of embedding those documents.
MongoDB does not support joins, as opposed to relational databases. This is both a pro and a con. It has helped MongoDB's developers to focus on scalability which is much harder to implement when you have joins and transactions.
You can follow the DBRef specification. Lots of drivers support DBRef and do the composition seamlessly for you. You can even do that manually. But most importantly, you can take advantage of embedding documents.
Embedding documents in MongoDB is a unique ability over relational databases. Meaning, you can create one collection consisting of compound documents. You'll enjoy atomicity, as there is no "partial success", and data locality: spinning disks are better in accessing data in sequence.

If querying is your motive and you don't want to change your schema. Then, try Apache Drill which allows you to query with SQLs. Then perform the full join, inner join, etc whatever you want. You can check for drill with MongoDB.

With MongoDB Version 3.2 and higher we got now the $lookup Command, which is the "same" as a Join in a RDBMS.
With that you can easy Query between your 2 Collections and get the Information you want.
For further Details Checkt out the Documentation
https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/

Related

Are there technical downsides to using a single collection over multiple collections in MongoDB?

Since MongoDB is schemaless, I could just drop all my documents into a single collection, with a key collection and an index on that key.
For example this:
db.getCollection('dogs').find()
db.getCollection('cars').find()
Would become this:
db.getCollection('all').find({'collection': 'dogs'})
db.getCollection('all').find({'collection': 'cars'})
Is there any technical downside to doing this?

There are multiple reasons to have different collections, maybe the two most importants are:
Performance: even if MongoDB has been designed to be flexible, it doesn't prevent the need to have indexes on fields that will be used during the search. You would have dramatic response times if the collection is too heterogeneous.
Maintenability/evolutivity: design should be driven by the usecases (usually you'll store the data as it's received by the application) and the design should be explicit to anyone looking at the database collections
MongoDB University is a great e-learning platform, it is free and there is in particular this course:
M320: Data Modeling

schema questions are often better understood by working backwards from the queries you'll rely on and how the data will get written.... if you were going to query Field1 AND Field2 together in 1 query statement you do want them in the same collection....dogs and cars don't sound very related while dogs and cats do...so really look at how you're going to want to query.....joining collections is not really ideal - doable via $lookup but not ideal....

MongoDB $lookup: Limitations & Usage

With the new aggregation pipeline stage $lookup we are now able to perform 'left outer joins'.
At first glance, I want to immediately replace one of our denormalised collections with two separate collections and use the $lookup to join them upon querying. This will solve the problem of having, when necessary, to update a huge number of documents. Now we can update just one document.
But surely this is too good to be true? This is a NoSQL, document database after all!
MongoDB's CTO also highlights his concerns:
We’re still concerned that $lookup can be misused to treat MongoDB
like a relational database. But instead of limiting its availability,
we’re going to help developers know when its use is appropriate, and
when it’s an anti-pattern. In the coming months, we will go beyond the
existing documentation to provide clear, strong guidance in this area.
What are the limitations of $lookup? Can I use them in real-time, operational querying of our data or should they be left for reporting, offline situations?

I share your same enthusiasm for $lookup.
I think there are trade-offs. One of the major concerns of SQL databases (and which is one of the reasons for the genesis of NoSQL) is that at large scale, joins can take a lot of time (well, relatively speaking).
It definitely helps in giving you a declarative model for your data, but then if you start to model your entire NoSQL database as though its a database of rows and tables (just using refs, for example), then you begin modeling it as though it's simply a SQL database (to a degree). Even MongoDB mentioned it (like you put in your question):
We’re still concerned that $lookup can be misused to treat MongoDB like a relational database.
You mentioned:
This will solve the problem of having, when necessary, to update a huge number of documents. Now we can update just one document.
I'm not sure what your collections look like exactly, but that definitely sounds like it could be a good use for $lookup.
Can I use them in real-time, operational querying
I would say, again, it depends on your use-case. You'll have to compare:
Desired semantics of your queries (declarative vs imperative)
Whether modeling your data as more relational (and thus using $lookup) in certain circumstances is worth the potential trade-off in computational time (that's assuming that querying across collections is even something to be concerned about, computationally speaking)
etc...
I'm sure in the coming months we'll see perf tests of the "left outer joins" and perhaps MongoDB will start writing some posts about when $lookup is an antipattern.
Hope this answer helps add to the discussion.

First of all MongoDB is a document-based database and will always be. So the $lookup aggregation pipeline stage new in version 3.2 didn't change MongoDB to relational database (RDBMS) as MongoDB's CTO mentioned:
We’re still concerned that $lookup can be misused to treat MongoDB like a relational database.
The first limitation of $lookup as mentioned in the documentation is that it:
Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing.
Which means that you can't use it with a sharded collection.
Also the $lookup operator doesn't work directly with an array as mentioned in post therefore you will need a preliminary $unwind stage to denormalize the localField if it is an array.
Now you said:
This will solve the problem of having, when necessary, to update a huge number of documents.
This is a good idea if your data are updated often than they are read.
as mentioned in 6 Rules of Thumb for MongoDB Schema Design: Part 3 especially if you have a large hierarchical data sets.
Denormalizing one or more fields makes sense if those fields are read much more often than they are updated.
I believe that with careful schema design you probably will not need the $lookup operator.

MongoDb casscade update/delete

I'm trying to use mongoDB with Morphia but still I have a problem with deleting documents. Is there any additional plugin or wrapper which works with Mongo and provides something like transactions in DBMS?

No, there are no (multi document) transactions. There are two possible solutions:
You can restructure your data into a single document instead of spreading it over multiple tables. Thus MongoDB's single document transactions (if you call them that) are enough for you. You can solve many problems with embedded entities or arrays. You might want to start a question related to "schema" design, if you're unsure how to approach this.
Your problem absolutely needs transactions across multiple documents / tables. Then MongoDB is simply not the right tool and you should use a relational database.
Don't fight the tool, pick the right one...

SQL view in mongodb

I am currently evaluating mongodb for a project I have started but I can't find any information on what the equivalent of an SQL view in mongodb would be. What I need, that an SQL view provides, is to lump together data from different tables (collections) into a single collection.
I want nothing more than to clump some documents together and label them as a single document. Here's an example:
I have the following documents:
cc_address
us_address
billing_address
shipping_address
But in my application, I'd like to see all of my addresses and be able to manage them in a single document.
In other cases, I may just want a couple of fields from collections:
I have the following documents:
fb_contact
twitter_contact
google_contact
reddit_contact
each of these documents have fields that align, like firstname lastname and email, but they also have fields that don't align. I'd like to be able to compile them into a single document that only contains the fields that align.
This can be accomplished by Views in SQL correct? Can I accomplish this kind of functionality in MongoDb?

The question is quite old already. However, since mongodb v3.2 you can use $lookup in order to join data of different collections together as long as the collections are unsharded.
Since mongodb v3.4 you can also create read-only views.

There are no "joins" in MongoDB. As said by JonnyHK, you can either enormalize your data or you use embedded documents or you perform multiple queries
However, you could also use Map-Reduce.
or if you're prepared to use the development branch, you could test the new aggregation framework though maybe it's too much? This new framework will be in the soon-to-be-released 2.2, which is production-ready unlike 2.1.x.
Here's the SQL-Mongo chart also, which may be of some help in your learning.
Update: Based on your re-edit, you don't need Map-Reduce or the Aggregation Framework because you're just querying.
You're essentially doing joins, querying multiple documents and merging the results. The place to do this is within your application on the client-side.

MongoDB queries never span more than a single collection as there is no support for joins. So if you have related data you need available in the results of a query you must either add that related data to the collection you're querying (i.e. denormalize your data), or make a separate query for it from another collection.

I am currently evaluating mongodb for a project I have started but I
can't find any information on what the equivalent of an SQL view in
mongodb would be
In addition to this answer, mongodb now has on-demand materialized views. In a nutshell, this feature allows you to use aggregate and $merge (in 4.2) to create/update a quick view collection that you can query from faster. The strategy is used to update the quick view collection whenever the main collection has a record change. This has the side effect unlike SQL of increasing your data storage size. But the benefits can be huge depending on your querying needs.

mongodb dbrefs examples using Java

I tried to find working examples of java/SpringData mongodb DBRefs but couldn't find any. I'm new to Mongodb and looking for ways to use SQL join-like functionality to aggregate/merge data from two mongo collections based on a common id.
Could someone point me in the right direction? Is application-level aggregating/merging is the only best solution with Mongo/Java/Spring combination?

There is a significant difference between DBRefs and Joins.
If you have two collections, that you are trying to join, then it might be worth looking at your data model. It could be the case, that you are using a relational modelling approach. This will not work with MongoDB.
It is usially better, to denormalize the dependent collection into the document of the master collection.
Then you do not need to join at all and make the most of the document model.