How to compare 2 collections in mongodb and find the missing ids - mongodb

I have two collections in MongoDb and want to compare those and get the difference documents.
for example Collection A has below 5 documents
{
"Number" : "0000A95B"
}
{
"Number" : "0001385B"
}
{
"Number" : "0002195B"
}
{
"Number" : "0002E85B"
}
{
"Number" : "0002FC5B"
}
Collection B has below 3 documents:
{
"Number" : "0000A95B"
}
{
"Number" : "0001385B"
}
{
"Number" : "0002195B"
}
I need a query to get the documents which are present in A but not in B

Could use an aggregation query with a $lookup.
db.getCollection("collection_a").aggregate([{
$lookup: {
from: "collection_b",
localField: "Number",
foreignField: "Number",
as: "b_docs"
}
},{
$match: {
b_docs: {
$size: 0
}
}
}])
The first $lookup stage should perform a "join" of sorts on collection_a and collection_b wherein the docs with the matching value of number from b will be added in the b_docs property as an array. If no document is found in collection_b, b_docs should be an empty array, so, just add a $match pipeline to filter the results where the size of the b_docs array is 0.
I have not tested the above query, you might want to try it out.

Related

How do I match an array of sub-documents in MongoDB?

Match documents if a value in an array of sub-documents is greater than some value only if the same document contains a field that is equal to some value
I have a collection that contains documents with an array of sub-documents. This array of sub-documents contains a field that dictates whether or not I can filter the documents in the collection based on another field in the sub-document. This'll make more sense when you see an example of the document.
{
"_id":"ObjectId('XXX')",
"Data":{
"A":"",
"B":"-25.78562 ; 28.35629",
"C":"165"
},
"SubDocuments":[
{
"_id":"ObjectId('XXX')",
"Data":{
"Value":"XXX",
"DataFieldId":"B"
}
},
{
"_id":"ObjectId('XXX')",
"Data":{
"Value":"",
"DataFieldId":"A"
}
},
{
"_id":"ObjectId('XXX')",
"Data":{
"Value":"105",
"DataFieldId":"Z"
}
}
]
}
I only want to match documents that contain sub-documents with a DataFieldId that is equal to Z but also filter for Values that are greater than 105 only if Data Field Id is equal to Z.
Try as below:
db.collection.aggregate([
{
$project: {
_id:1,
Data:1,
filteredSubDocuments: {
$filter: {
input: "$SubDocuments",
as: "subDoc",
cond: {
$and: [
{ $eq: ["$$subDoc.Data.DataFieldId", "Z"] },
{ $gte: ["$$subDoc.Data.Value", 105] }
]
}
}
}
}
}
])
Resulted response will be:
{
"_id" : ObjectId("5cb09659952e3a179190d998"),
"Data" : {
"A" : "",
"B" : "-25.78562 ; 28.35629",
"C" : "165"
},
"filteredSubDocuments" : [
{
"_id" : "ObjectId('XXX')",
"Data" : {
"Value" : 105,
"DataFieldId" : "Z"
}
}
]
}
This can be done by using the $elemMatch operator on sub-documents, for details you can click on provided link. For your problem you can try below query by using $elemMatch which is match simpler than aggregation:
db.collectionName.find({
"SubDocuments": {
$elemMatch: {
"Data.DataFieldId": "Z" ,
"Data.Value" : {$gte: 105}
}
} })
Its working fine, I have verified it locally, one modification you required is that you have to put the value of SubDocuments.Data.Value as Number or Long as per your requirements.

Doctrine ODM: create $lookup on aggregated field with aggregation builder

In a simplified data model, I have three types of documents: items, users and assignments. Users and items are stored in their own collections, while assignments are embedded in items. A sample item might look like this:
{
"_id" : ObjectId("xxx"),
"name" : "yyy",
"assignments" : [
{
"assignmentDate" : ISODate("2018-01-11T10:05:20.125Z"),
"user" : ObjectId("zzz"),
},
{
"assignmentDate" : ISODate("2018-01-12T10:05:20.125Z"),
"user" : ObjectId("iii"),
}
]
}
I would like to query all items that are currently assigned to a given user. This aggregation pipeline does the job:
db.Item.aggregate([
{
$addFields: {
currentAssignment: { $arrayElemAt: ['$assignments', -1] }
}
},
{
$lookup: {
from: 'User',
localField: 'currentAssignment.user',
foreignField: '_id',
as: 'currentUser'
}
},
{
$match: {
'currentUser.name': { $eq: 'admin' }
}
}
]);
How can I build this with the Doctrine ODM Aggregation Builder? The Stage::lookup method accepts only a from parameter. If I use it on a computed field from the aggregation pipeline (currentAssignment in this case), it results in:
arguments to $lookup must be strings, localField: null is type null
Other solutions (if possible even without aggregation?) for retrieving the described dataset are also welcome.
The Lookup stage has more methods, one of which is localField which sets the localField in the aggregation stage.

How to write union queries in mongoDB

Is it possible to write union queries in Mongo DB using 2 or more collections similar to SQL queries?
I'm using spring mongo template and in my use case, I need to fetch the data from 3-4 collections based on some conditions. Can we achieve this in a single operation?
For example, I have a field named "circuitId" which is present in all 4 collections. And I need to fetch all records from all 4 collections for which that field matches with a given value.
Doing unions in MongoDB in a 'SQL UNION' fashion is possible using aggregations along with lookups, in a single query.
Something like this:
db.getCollection("AnyCollectionThatContainsAtLeastOneDocument").aggregate(
[
{ $limit: 1 }, // Reduce the result set to a single document.
{ $project: { _id: 1 } }, // Strip all fields except the Id.
{ $project: { _id: 0 } }, // Strip the id. The document is now empty.
// Lookup all collections to union together.
{ $lookup: { from: 'collectionToUnion1', pipeline: [...], as: 'Collection1' } },
{ $lookup: { from: 'collectionToUnion2', pipeline: [...], as: 'Collection2' } },
{ $lookup: { from: 'collectionToUnion3', pipeline: [...], as: 'Collection3' } },
// Merge the collections together.
{
$project:
{
Union: { $concatArrays: ["$Collection1", "$Collection2", "$Collection3"] }
}
},
{ $unwind: "$Union" }, // Unwind the union collection into a result set.
{ $replaceRoot: { newRoot: "$Union" } } // Replace the root to cleanup the resulting documents.
]);
Here is the explanation of how it works:
Instantiate an aggregate out of any collection of your database that has at least one document in it. If you can't guarantee any collection of your database will not be empty, you can workaround this issue by creating in your database some sort of 'dummy' collection containing a single empty document in it that will be there specifically for doing union queries.
Make the first stage of your pipeline to be { $limit: 1 }. This will strip all the documents of the collection except the first one.
Strip all the fields of the remaining document by using $project stages:
{ $project: { _id: 1 } },
{ $project: { _id: 0 } }
Your aggregate now contains a single, empty document. It's time to add lookups for each collection you want to union together. You may use the pipeline field to do some specific filtering, or leave localField and foreignField as null to match the whole collection.
{ $lookup: { from: 'collectionToUnion1', pipeline: [...], as: 'Collection1' } },
{ $lookup: { from: 'collectionToUnion2', pipeline: [...], as: 'Collection2' } },
{ $lookup: { from: 'collectionToUnion3', pipeline: [...], as: 'Collection3' } }
You now have an aggregate containing a single document that contains 3 arrays like this:
{
Collection1: [...],
Collection2: [...],
Collection3: [...]
}
You can then merge them together into a single array using a $project stage along with the $concatArrays aggregation operator:
{
"$project" :
{
"Union" : { $concatArrays: ["$Collection1", "$Collection2", "$Collection3"] }
}
}
You now have an aggregate containing a single document, into which is located an array that contains your union of collections. What remains to be done is to add an $unwind and a $replaceRoot stage to split your array into separate documents:
{ $unwind: "$Union" },
{ $replaceRoot: { newRoot: "$Union" } }
VoilĂ . You know have a result set containing the collections you wanted to union together. You can then add more stages to filter it further, sort it, apply skip() and limit(). Pretty much anything you want.
Starting Mongo 4.4, the aggregation framework provides a new $unionWith stage, performing the union of two collections (the combined pipeline results from two collections into a single result set).
Thus, in order to combine documents from 3 collections:
// > db.collection1.find()
// { "circuitId" : 12, "a" : "1" }
// { "circuitId" : 17, "a" : "2" }
// { "circuitId" : 12, "a" : "5" }
// > db.collection2.find()
// { "circuitId" : 12, "b" : "x" }
// { "circuitId" : 12, "b" : "y" }
// > db.collection3.find()
// { "circuitId" : 12, "c" : "i" }
// { "circuitId" : 32, "c" : "j" }
db.collection1.aggregate([
{ $match: { circuitId: 12 } },
{ $unionWith: { coll: "collection2", pipeline: [{ $match: { circuitId: 12 } }] } },
{ $unionWith: { coll: "collection3", pipeline: [{ $match: { circuitId: 12 } }] } }
])
// { "circuitId" : 12, "a" : "1" }
// { "circuitId" : 12, "a" : "5" }
// { "circuitId" : 12, "b" : "x" }
// { "circuitId" : 12, "b" : "y" }
// { "circuitId" : 12, "c" : "i" }
This:
First filters documents from collection1
Then includes documents from collection2 into the pipeline with the new $unionWith stage. The pipeline parameter is an optional aggregation pipeline applied on documents from the collection being merged before the merge happens.
And also includes documents from collection3 into the pipeline with the same $unionWith stage.
Unfortunately document based MongoDB doesn't support JOINS/Unions as in Relational DB engines.
One of the key design principles on MongoDB is to prevent joins using embedded documents as per your application's data fetch patterns.
Having said that, you will need to manage the logic in your application end if you really need to use the 4 collections or you may redesign your DB design as per MongoDB best practices.
For more info : https://docs.mongodb.com/master/core/data-model-design/

How to efficiently count filtered documents in MongoDB $group operator

I have a fairly small dataset of 63k documents (2.5GB total). Example of document:
{
_id : "[uniqueId]",
FormId : 10,
Name : "Name of form",
IsComplete : true,
Sections : [ many sections and can be large ]
}
I want to get the total count of documents by FormId. I get fast result (.15sec) on this query:
db.getCollection('collection').aggregate([
{ $sort : { FormId : 1 } }, //Index exists on FormId
{ $group : { _id : "$FormId", count : { $sum : 1 } } },
{ $sort : { "count" : -1 } }
])
My problem is I need to get a count of the documents where { "IsComplete":true }. I have 2 indexes built on both properties but I realize that using the $match operator scans all docs. So how does one efficiently filter the $group count?
Efficient Way would be
Filters down the documents by using $match to pass only matching documents to the next pipeline. By placing $match at the very beginning of a pipeline, the query can take advantage of indexes.
Use $project to pass along the documents with only the required fields to the next stage in the pipeline, this will further reduce data to the next pipeline.
db.getCollection('collection').aggregate([
{ $match: {"IsComplete":true} },
{ $project: {"IsComplete":1, "FormId":1}},
{ $group : { _id : "$FormId", count : { $sum : 1 } } },
{ $sort : { "count" : -1 } }
])

Aggregation in MongoDB, using unwind

I need to aggregate all tags from records like this:
https://gist.github.com/sbassi/5642925
(there are 2 sample records in this snippet) and sort them by size (first the tag that appears with more frequency). But I don't want to take into account data that have specific "user_id" (lets say, 2,3,6 and 12).
Here is my try (just the aggregation, without filtering and sorting):
db.user_library.aggregate( { $unwind : "$annotations.data.tags" }, {
$group : { _id : "$annotations.data.tags" ,totalTag : { $sum : 1 } } }
)
And I got:
{ "result" : [ ], "ok" : 1 }
Right now you can't unwind an array that is nested inside another array. See SERVER-6436
Consider structuring the data differently, having an array field with all tags for that document or possibly unwinding annotations and then unwinding annotations.data.tags in a stacked unwind like this:
db.user_library.aggregate([
{ $project: { 'annotations.data.tags': 1 } },
{ $unwind: '$annotations' },
{ $unwind: '$annotations.data.tags' },
{ $group: { _id: '$annotations.data.tags', totalTag: { $sum: 1 } } }
])