How to write union queries in mongoDB - mongodb

Is it possible to write union queries in Mongo DB using 2 or more collections similar to SQL queries?
I'm using spring mongo template and in my use case, I need to fetch the data from 3-4 collections based on some conditions. Can we achieve this in a single operation?
For example, I have a field named "circuitId" which is present in all 4 collections. And I need to fetch all records from all 4 collections for which that field matches with a given value.

Doing unions in MongoDB in a 'SQL UNION' fashion is possible using aggregations along with lookups, in a single query.
Something like this:
db.getCollection("AnyCollectionThatContainsAtLeastOneDocument").aggregate(
[
{ $limit: 1 }, // Reduce the result set to a single document.
{ $project: { _id: 1 } }, // Strip all fields except the Id.
{ $project: { _id: 0 } }, // Strip the id. The document is now empty.
// Lookup all collections to union together.
{ $lookup: { from: 'collectionToUnion1', pipeline: [...], as: 'Collection1' } },
{ $lookup: { from: 'collectionToUnion2', pipeline: [...], as: 'Collection2' } },
{ $lookup: { from: 'collectionToUnion3', pipeline: [...], as: 'Collection3' } },
// Merge the collections together.
{
$project:
{
Union: { $concatArrays: ["$Collection1", "$Collection2", "$Collection3"] }
}
},
{ $unwind: "$Union" }, // Unwind the union collection into a result set.
{ $replaceRoot: { newRoot: "$Union" } } // Replace the root to cleanup the resulting documents.
]);
Here is the explanation of how it works:
Instantiate an aggregate out of any collection of your database that has at least one document in it. If you can't guarantee any collection of your database will not be empty, you can workaround this issue by creating in your database some sort of 'dummy' collection containing a single empty document in it that will be there specifically for doing union queries.
Make the first stage of your pipeline to be { $limit: 1 }. This will strip all the documents of the collection except the first one.
Strip all the fields of the remaining document by using $project stages:
{ $project: { _id: 1 } },
{ $project: { _id: 0 } }
Your aggregate now contains a single, empty document. It's time to add lookups for each collection you want to union together. You may use the pipeline field to do some specific filtering, or leave localField and foreignField as null to match the whole collection.
{ $lookup: { from: 'collectionToUnion1', pipeline: [...], as: 'Collection1' } },
{ $lookup: { from: 'collectionToUnion2', pipeline: [...], as: 'Collection2' } },
{ $lookup: { from: 'collectionToUnion3', pipeline: [...], as: 'Collection3' } }
You now have an aggregate containing a single document that contains 3 arrays like this:
{
Collection1: [...],
Collection2: [...],
Collection3: [...]
}
You can then merge them together into a single array using a $project stage along with the $concatArrays aggregation operator:
{
"$project" :
{
"Union" : { $concatArrays: ["$Collection1", "$Collection2", "$Collection3"] }
}
}
You now have an aggregate containing a single document, into which is located an array that contains your union of collections. What remains to be done is to add an $unwind and a $replaceRoot stage to split your array into separate documents:
{ $unwind: "$Union" },
{ $replaceRoot: { newRoot: "$Union" } }
VoilĂ . You know have a result set containing the collections you wanted to union together. You can then add more stages to filter it further, sort it, apply skip() and limit(). Pretty much anything you want.

Starting Mongo 4.4, the aggregation framework provides a new $unionWith stage, performing the union of two collections (the combined pipeline results from two collections into a single result set).
Thus, in order to combine documents from 3 collections:
// > db.collection1.find()
// { "circuitId" : 12, "a" : "1" }
// { "circuitId" : 17, "a" : "2" }
// { "circuitId" : 12, "a" : "5" }
// > db.collection2.find()
// { "circuitId" : 12, "b" : "x" }
// { "circuitId" : 12, "b" : "y" }
// > db.collection3.find()
// { "circuitId" : 12, "c" : "i" }
// { "circuitId" : 32, "c" : "j" }
db.collection1.aggregate([
{ $match: { circuitId: 12 } },
{ $unionWith: { coll: "collection2", pipeline: [{ $match: { circuitId: 12 } }] } },
{ $unionWith: { coll: "collection3", pipeline: [{ $match: { circuitId: 12 } }] } }
])
// { "circuitId" : 12, "a" : "1" }
// { "circuitId" : 12, "a" : "5" }
// { "circuitId" : 12, "b" : "x" }
// { "circuitId" : 12, "b" : "y" }
// { "circuitId" : 12, "c" : "i" }
This:
First filters documents from collection1
Then includes documents from collection2 into the pipeline with the new $unionWith stage. The pipeline parameter is an optional aggregation pipeline applied on documents from the collection being merged before the merge happens.
And also includes documents from collection3 into the pipeline with the same $unionWith stage.

Unfortunately document based MongoDB doesn't support JOINS/Unions as in Relational DB engines.
One of the key design principles on MongoDB is to prevent joins using embedded documents as per your application's data fetch patterns.
Having said that, you will need to manage the logic in your application end if you really need to use the 4 collections or you may redesign your DB design as per MongoDB best practices.
For more info : https://docs.mongodb.com/master/core/data-model-design/

Related

MongoDB sort by value in embedded document array

I have a MongoDB collection of documents formatted as shown below:
{
"_id" : ...,
"username" : "foo",
"challengeDetails" : [
{
"ID" : ...,
"pb" : 30081,
},
{
"ID" : ...,
"pb" : 23995,
},
...
]
}
How can I write a find query for records that have a challengeDetails documents with a matching ID and sort them by the corresponding PB?
I have tried (this is using the NodeJS driver, which is why the projection syntax is weird)
const result = await collection
.find(
{ "challengeDetails.ID": challengeObjectID},
{
projection: {"challengeDetails.$": 1},
sort: {"challengeDetails.0.pb": 1}
}
)
This returns the correct records (documents with challengeDetails for only the matching ID) but they're not sorted.
I think this doesn't work because as the docs say:
When the find() method includes a sort(), the find() method applies the sort() to order the matching documents before it applies the positional $ projection operator.
But they don't explain how to sort after projecting. How would I write a query to do this? (I have a feeling aggregation may be required but am not familiar enough with MongoDB to write that myself)
You need to use aggregation to sort n array
$unwind to deconstruct the array
$match to match the value
$sort for sorting
$group to reconstruct the array
Here is the code
db.collection.aggregate([
{ "$unwind": "$challengeDetails" },
{ "$match": { "challengeDetails.ID": 2 } },
{ "$sort": { "challengeDetails.pb": 1 } },
{
"$group": {
"_id": "$_id",
"username": { "$first": "$username" },
"challengeDetails": { $push: "$challengeDetails" }
}
}
])
Working Mongo playground

Using "$count" Within an "addField" Operation in MongoDB Aggregation

I am trying to find the correct combination of aggregation operators to add a field titled "totalCount" to my mongoDB view.
This will get me the count at this particular stage of the aggregation pipeline and output this as the result of a count on each of the documents:
{
$count: "count"
}
But I then end up with one document with this result, rather than what I'm trying to accomplish, which is to make this value print out as an addedField that is a field/value on all of the documents, or even better, a value that prints in addition to the returned documents.
I've tried this but it gives me an error ""Unrecognized expression '$count'",":
{
$addFields: {
"totalCount" : { $count: "totalCount" }
}
}
What would the correct syntactical construction be for this? Is it possible to do it this way, or do I need to use $sum, or some other operator to make this work? I also tried this:
{
$addFields: {
"totalCount" : { $sum: { _id: 1 } }
}
},
... but while it doesn't give me any errors, it just prints 0 as the value for that field on every document rather than the total count of all documents.
Total count will always be a one-document result so you need $facet to run mutliple aggregation pipelines and then merge results. Let's say your regular pipeline contains simple $project and you want to merge it's results with $count. You can run below aggregation:
db.col.aggregate([
{
$facet: {
totalCount: [
{ $count: "value" }
],
pipelineResults: [
{
$project: { _id: 1 } // your regular aggregation pipeline here
}
]
}
},
{
$unwind: "$pipelineResults"
},
{
$unwind: "$totalCount"
},
{
$replaceRoot: {
newRoot: {
$mergeObjects: [ "$pipelineResults", { totalCount: "$totalCount.value" } ]
}
}
}
])
After $facet stage you'll get single document like this
{
"totalCount" : [
{
"value" : 3
}
],
"pipelineResults" : [
{
"_id" : ObjectId("5b313241120e4bc08ce87e46")
},
//....
]
}
Then you have to use $unwind to transform arrays into multiple documents and $replaceRoot with $mergeObjects to promote regular pipeline results into root level.
Since mongoDB version 5.0 there is another option, that allows to avoid the disadvantage of $facet, the grouping of all returned document into a one big document. The main concern is that a document as a size limit of 16M. Using $setWindowFields allows to avoid this concern
This can simply replace #micki's 4 steps:
db.col.aggregate([
{$setWindowFields: {output: {totalCount: {$count: {}}}}}
])

Remove duplicate in MongoDB

I have a collection with the field called "contact_id".
In my collection I have duplicate registers with this key.
How can I remove duplicates, resulting in just one register?
I already tried:
db.PersonDuplicate.ensureIndex({"contact_id": 1}, {unique: true, dropDups: true})
But did not work, because the function dropDups is no longer available in MongoDB 3.x
I'm using 3.2
Yes, dropDups is gone for good. But you can definitely achieve your goal with little bit effort.
You need to first find all duplicate rows and then remove all except first.
db.dups.aggregate([{$group:{_id:"$contact_id", dups:{$push:"$_id"}, count: {$sum: 1}}},
{$match:{count: {$gt: 1}}}
]).forEach(function(doc){
doc.dups.shift();
db.dups.remove({_id : {$in: doc.dups}});
});
As you see doc.dups.shift() will remove first _id from array and then remove all documents with remaining _ids in dups array.
script above will remove all duplicate documents.
this is a good pattern for mongod 3+ that also ensures that you will not run our of memory which can happen with really big collections. You can save this to a dedup.js file, customize it, and run it against your desired database with: mongo localhost:27017/YOURDB dedup.js
var duplicates = [];
db.runCommand(
{aggregate: "YOURCOLLECTION",
pipeline: [
{ $group: { _id: { DUPEFIELD: "$DUPEFIELD"}, dups: { "$addToSet": "$_id" }, count: { "$sum": 1 } }},
{ $match: { count: { "$gt": 1 }}}
],
allowDiskUse: true }
)
.result
.forEach(function(doc) {
doc.dups.shift();
doc.dups.forEach(function(dupId){ duplicates.push(dupId); })
})
printjson(duplicates); //optional print the list of duplicates to be removed
db.YOURCOLLECTION.remove({_id:{$in:duplicates}});
We can also use an $out stage to remove duplicates from a collection by replacing the content of the collection with only one occurrence per duplicate.
For instance, to only keep one element per value of x:
// > db.collection.find()
// { "x" : "a", "y" : 27 }
// { "x" : "a", "y" : 4 }
// { "x" : "b", "y" : 12 }
db.collection.aggregate(
{ $group: { _id: "$x", onlyOne: { $first: "$$ROOT" } } },
{ $replaceWith: "$onlyOne" }, // prior to 4.2: { $replaceRoot: { newRoot: "$onlyOne" } }
{ $out: "collection" }
)
// > db.collection.find()
// { "x" : "a", "y" : 27 }
// { "x" : "b", "y" : 12 }
This:
$groups documents by the field defining what a duplicate is (here x) and accumulates grouped documents by only keeping one (the $first found) and giving it the value $$ROOT, which is the document itself. At the end of this stage, we have something like:
{ "_id" : "a", "onlyOne" : { "x" : "a", "y" : 27 } }
{ "_id" : "b", "onlyOne" : { "x" : "b", "y" : 12 } }
$replaceWith all existing fields in the input document with the content of the onlyOne field we've created in the $group stage, in order to find the original format back. At the end of this stage, we have something like:
{ "x" : "a", "y" : 27 }
{ "x" : "b", "y" : 12 }
$replaceWith is only available starting in Mongo 4.2. With prior versions, we can use $replaceRoot instead:
{ $replaceRoot: { newRoot: "$onlyOne" } }
$out inserts the result of the aggregation pipeline in the same collection. Note that $out conveniently replaces the content of the specified collection, making this solution possible.
maybe it be a good try to create a tmpColection, create unique index, then copy data from source, and last step will be swap names?
Other idea, I had is to get doubled indexes into array (using aggregation) and then loop thru calling the remove() method with the justOne parameter set to true or 1.
var itemsToDelete = db.PersonDuplicate.aggregate([
{$group: { _id:"$_id", count:{$sum:1}}},
{$match: {count: {$gt:1}}},
{$group: { _id:1, ids:{$addToSet:"$_id"}}}
])
and make a loop thru ids array
makes this sense for you?
I have used this approach:
Take the mongo dump of the particular collection.
Clear that collection
Add a unique key index
Restore the dump using mongorestore.

Aggregation in MongoDB, using unwind

I need to aggregate all tags from records like this:
https://gist.github.com/sbassi/5642925
(there are 2 sample records in this snippet) and sort them by size (first the tag that appears with more frequency). But I don't want to take into account data that have specific "user_id" (lets say, 2,3,6 and 12).
Here is my try (just the aggregation, without filtering and sorting):
db.user_library.aggregate( { $unwind : "$annotations.data.tags" }, {
$group : { _id : "$annotations.data.tags" ,totalTag : { $sum : 1 } } }
)
And I got:
{ "result" : [ ], "ok" : 1 }
Right now you can't unwind an array that is nested inside another array. See SERVER-6436
Consider structuring the data differently, having an array field with all tags for that document or possibly unwinding annotations and then unwinding annotations.data.tags in a stacked unwind like this:
db.user_library.aggregate([
{ $project: { 'annotations.data.tags': 1 } },
{ $unwind: '$annotations' },
{ $unwind: '$annotations.data.tags' },
{ $group: { _id: '$annotations.data.tags', totalTag: { $sum: 1 } } }
])

MongoDB Aggregation Framework - Dynamic Field Rename

I am finding the MongoDB aggregation framework to be extremely powerful - it seems like a good option to flatten out an object. My schema uses a an array of sub objects in an array called materials. The number of materials is variable, but a specific field category will be unique across objects in the array. I would like to use the aggregation framework to flatten the structure and dynamically rename the fields based on the value of the category field. I could not find an easy way to accomplish this using a $project along with $cond. Is there a way?
The reason for the array of material objects is to allow simple searching:
e.g. { 'materials.name' : 'XYZ' } pulls back any document where "XYZ" is found.
E.g. of before and after document
{
"_id" : ObjectId("123456"),
"materials" : [
{
"name" : "XYZ",
"type" : "Red",
...
"category" : "A"
},
{
"name" : "ZYX",
"type" : "Blue",
...
"category" : "B"
}]
}
to
{
"material_A_name" : "XYZ",
"material_A_type" : "Red",
...
"material_B_name" : "ZYX",
"material_B_type" : "Blue",
...
}
There is a request for something like this in jira https://jira.mongodb.org/browse/SERVER-5947 - vote it up if you would like to have this feature.
Meanwhile, there is a work-around if you know up front what the possible values of the keys will be (i.e. all unique values of "category") and I have some sample code on it on my blog.
This would be useful from MongoDB v4.4,
$map to iterate loop of materials array
$map to iterate loop of name and type fields after converting to array using $objectToArray, concat your key fields requirement as per fields and value using $concat,
back to first $map convert returned result from second $map from array to object using $arrayToObject
$unwind deconstruct materials array
$group by null and merge materials object to one object
$replaceRoot to replace object in root
db.collection.aggregate([
{
$project: {
materials: {
$map: {
input: "$materials",
as: "m",
in: {
$arrayToObject: [
{
$map: {
input: {
$objectToArray: {
name: "$$m.name",
type: "$$m.type"
}
},
in: {
k: { $concat: ["material", "_", "$$m.category", "_", "$$this.k"] },
v: "$$this.v"
}
}
}
]
}
}
}
}
},
{ $unwind: "$materials" },
{
$group: {
_id: null,
materials: { $mergeObjects: "$materials" }
}
},
{ $replaceRoot: { newRoot: "$materials" } }
])
Playground