Mongodb combine aggregate queries - mongodb

I have following collections in MongoDB
Profile Collection
> db.Profile.find()
{ "_id" : ObjectId("5ec62ccb8897af3841a46d46"), "u" : "Test User", "is_del": false }
Store Collection
> db.Store.find()
{ "_id" : ObjectId("5eaa939aa709c30ff4703ffd"), "id" : "5ec62ccb8897af3841a46d46", "a" : { "ci": "Test City", "st": "Test State" }, "ip" : false }, "op" : [ ], "b" : [ "normal" ], "is_del": false}
Item Collection
> db.Item.find()
{ "_id" : ObjectId("5ea98a25f1246b53a46b9e10"), "sid" : "5eaa939aa709c30ff4703ffd", "n" : "sample", "is_del": false}
Relation among these collections are defined as follows:
Profile -> Store: It is 1:n relation. id field in Store relates with _id field in Profile.
Store -> Item: It is also 1:n relation. sid field in Item relates with _id field in Store.
Now, I need to write a query to find the all the store of profiles alongwith their count of Item for each store. Document with is_del as true must be excluded.
I am trying it following way:
Query 1 to find the count of item for each store.
Query 2 to find the store for each profile.
Then in the application logic use both the result to produce the combined output.
I have query 1 as follows:
db.Item.aggregate({$group: {_id: "$sid", count:{$sum:1}}})
Query 2 is as follows:
db.Profile.aggregate([{ "$addFields": { "pid": { "$toString": "$_id" }}}, { "$lookup": {"from": "Store","localField": "pid","foreignField": "id", "as": "stores"}}])
In the query, is_del is also missing. Is there any simpler way to perform all these in a single query? If so, what will be scalability impact?

You can use uncorrelated sub-queries, available from MongoDB v3.6
db.Profile.aggregate([
{
$match: { is_del: false }
},
{
$lookup: {
from: "Store",
as: "stores",
let: {
pid: { $toString: "$_id" }
},
pipeline: [
{
$match: {
is_del: false,
$expr: { $eq: ["$$pid", "$id"] }
}
},
{
$lookup: {
from: "Item",
as: "items",
let: {
sid: { $toString: "$_id" }
},
pipeline: [
{
$match: {
is_del: false,
$expr: { $eq: ["$$sid", "$sid"] }
}
},
{
$count: "count"
}
]
}
},
{
$unwind: "$items"
}
]
}
}
])
Mongo Playground
To improve performance, I suggest you store the reference ids as ObjectId so you don't have to convert them in each step.

Related

MongoDB join 2 tables and get ids on condition

We are really new to MongoDB query writing. We have 2 MongoDB tables Supplier1 & Supplier 2. Both have the same _id. But the version number of these objects can be different sometimes.
We need to find out _id when the version of 2 collections are different (i.e. Suplier1.version != Supplier2.version)
Supplier1
{
"_id" : ObjectId("60cd86b914dfed073d77300f"),
"companyName" : "Main Supplier",
"version" : NumberLong(246),
}
Supplier2
{
"_id" : ObjectId("60cd86b914dfed073d77300f"),
"companyName" : "Main Supplier",
"version" : NumberLong(247),
}
What we have written up to now and no idea to move forward with this. Any help is highly appreciated.
db.getCollection("Supplier1").aggregate([
{
$lookup: {
from: "Supplier2",
localField: "_id",
foreignField: "_id",
as: "selected-supplier"
}
},
You can simply use a sub-pipeline in $lookup. Simply $unwind the result array to filter out unwanted result.
db.Supplier1.aggregate([
{
"$lookup": {
"from": "Supplier2",
"let": {
id1: "$_id",
version1: "$version"
},
"pipeline": [
{
"$match": {
$expr: {
$and: [
{
$eq: [
"$$id1",
"$_id"
]
},
{
$ne: [
"$$version1",
"$version"
]
}
]
}
}
}
],
"as": "selected-supplier"
}
},
{
"$unwind": "$selected-supplier"
}
])
Here is the Mongo playground for your reference.

Mongo DB filter on other collection without projecting any fields from foreign collection

User Collection
{
"_id: : "123"
"name" : "John Doe",
"age" : 40,
}
Audit collection
{
"_id" : "456",
"region": "IND"
"userId" : 123
}
I need to perform aggregation on User collection where "region" is "IND" but don't want fields from foreign collection to be projected. What I tried so far is this a lookup like below
db.User.aggregate([
{
$lookup:
{
from: "Audit",
localField: "_id",
foreignField: "userId",
as: "auditTrail"
}
},
{
$unwind: "$auditTrail"
},
{
$match : {
"auditTrail.region": "IND"
}
},
{
$project : {"auditTrail.region": 0}
}
])
Other way is to use lookup with pipeline and not project the foreign fields
db.User.aggregate([
{
$lookup:
{
from: "Audit",
let: { user_id: "$_id"},
pipeline: [
{ $match:
{ $expr:
{ $and:
[
{ $eq: [ "$userId", "$$user_id" ] }
]
}
}
},
{ $project: { _id: 1} }
],
as: "stockdata"
}
}
])
Both the collections mentioned here are simplified, in production can be a huge document with thousands of records in each collection where there can be match stage to filter fields on source collection in addition to filtering on foreign collection. Is there any better way to accomplish this?
You can modify your 2nd Query by adding region condition inside $match of $lookup pipeline.
db.User.aggregate([
{
$lookup:
{
from: "Audit",
let: { user_id: "$_id"},
pipeline: [
{ $match:
{ $expr:
{ $and:
[
{ $eq: [ "$userId", "$$user_id" ] },
{ $eq: [ "$region", "IND" ] } // region Condition added
]
}
}
},
{ $project: { _id: 1} }
],
as: "stockdata"
}
}
])

Count _id occurrences in other collection

We have a DB structure similar to the following:
Pet owners:
/* 1 */
{
"_id" : ObjectId("5baa8b8ce70dcbe59d7f1a32"),
"name" : "bob"
}
/* 2 */
{
"_id" : ObjectId("5baa8b8ee70dcbe59d7f1a33"),
"name" : "mary"
}
Pets:
/* 1 */
{
"_id" : ObjectId("5baa8b4fe70dcbe59d7f1a2a"),
"name" : "max",
"owner" : ObjectId("5baa8b8ce70dcbe59d7f1a32")
}
/* 2 */
{
"_id" : ObjectId("5baa8b52e70dcbe59d7f1a2b"),
"name" : "charlie",
"owner" : ObjectId("5baa8b8ce70dcbe59d7f1a32")
}
/* 3 */
{
"_id" : ObjectId("5baa8b53e70dcbe59d7f1a2c"),
"name" : "buddy",
"owner" : ObjectId("5baa8b8ee70dcbe59d7f1a33")
}
I need a list of all pet owners and additionally the number of pets they own. Our current query looks similar to the following:
db.getCollection('owners').aggregate([
{ $lookup: { from: 'pets', localField: '_id', foreignField: 'owner', as: 'pets' } },
{ $project: { '_id': 1, name: 1, numPets: { $size: '$pets' } } }
]);
This works, however it's quite slow and I'm asking myself if there's a more efficient way to perform the query?
[update and feedback] Thanks for the answers. The solutions work, however I can unfortunately see no performance improvement compared to the query given above. Obviously, MongoDB still needs to scan the entire pet collection. My hope was, that the owner index (which is present) on the pets collection could somehow be exploited for getting just the counts (not needing to touch the pet documents), but this does not seem to be the case.
Are there any other ideas or solutions for a very fast retrieval of the 'pet count' beside explicitly storing the count within the owner documents?
In MongoDB 3.6 you can create custom $lookup pipeline and count instead of entire pets documents, try:
db.owners.aggregate([
{
$lookup: {
from: "pets",
let: { ownerId: "$_id" },
pipeline: [
{ $match: { $expr: { $eq: [ "$$ownerId", "$owner" ] } } },
{ $count: "count" }
],
as: "numPets"
}
},
{
$unwind: "$numPets"
}
])
You can try below aggregation
db.owners.aggregate([
{ "$lookup": {
"from": "pets",
"let": { "ownerId": "$_id" },
"pipeline": [
{ "$match": { "$expr": { "$eq": [ "$$ownerId", "$owner" ] }}},
{ "$count": "count" }
],
"as": "numPets"
}},
{ "$project": {
"_id": 1,
"name": 1,
"numPets": { "$ifNull": [{ "$arrayElemAt": ["$numPets.count", 0] }, 0]}
}}
])

How to make lookup between two collections when an item in an array exists in the other collection?

In Lookup with a pipeline, I would like to get the linked records from an array in the parent document.
// Orders
[{
"_id" : ObjectId("5b5b91a25c68de2538620689"),
"Name" : "Test",
"Products" : [
ObjectId("5b5b919a5c68de2538620688"),
ObjectId("5b5b925a5c68de2538621a15")
]
}]
// Products
[
{
"_id": ObjectId("5b5b919a5c68de2538620688"),
"ProductName": "P1"
},
{
"_id": ObjectId("5b5b925a5c68de2538621a15"),
"ProductName": "P2"
}
,
{
"_id": ObjectId("5b5b925a5c68de2538621a55"),
"ProductName": "P3"
}
]
How to make a lookup between Orders and Products when Products field is an array!
I tried this query
db.getCollection("Orders").
aggregate(
[
{
$lookup:
{
from: "Products",
let: { localId: "$_id" , prods: "$Products" },
pipeline: [
{
"$match":
{
"_id" : { $in: "$$prods" }
}
},
{
$project:
{
"_id": "$_id",
"name": "$prods" ,
}
}
],
as: "linkedData"
}
},
{
"$skip": 0
},
{
"$limit": 1
},
]
)
This is not working because $in is expecting an array, and even though $$prods is an array, it is not accepting it.
Is my whole approach correct? How to make this magic join ?
You were going in the right direction the only thing you missed here is to use expr with in aggregation operator which matches the same fields of the document
db.getCollection("Orders").aggregate([
{ "$lookup": {
"from": "Products",
"let": { "localId": "$_id" , "prods": "$Products" },
"pipeline": [
{ "$match": { "$expr": { "$in": [ "$_id", "$$prods" ] } } },
{ "$project": { "_id": 1, "name": "$ProductName" } }
],
"as": "linkedData"
}},
{ "$skip": 0 },
{ "$limit": 1 }
])
See the docs here
You just need regular $lookup, the documentation states that:
If your localField is an array, you may want to add an $unwind stage to your pipeline. Otherwise, the equality condition between the localField and foreignField is foreignField: { $in: [ localField.elem1, localField.elem2, ... ] }.
So for below aggregation:
db.Orders.aggregate([
{
$lookup: {
from :"Products",
localField: "Products",
foreignField: "_id",
as: "Products"
}
}
])
you'll get following result for your sample data:
{
"_id" : ObjectId("5b5b91a25c68de2538620689"),
"Name" : "Test",
"Products" : [
{
"_id" : ObjectId("5b5b919a5c68de2538620688"),
"ProductName" : "P1"
},
{
"_id" : ObjectId("5b5b925a5c68de2538621a15"),
"ProductName" : "P2"
}
]
}
have you try unwind before the lookup. use unwind to brak the array annd then make lookup.

MONGODB redact array

i just want to understand $redact in mongodb
suppose i have a collection like .
db.tab12.find()
{ "_id" : "1", "name" : "jan", "passport" : [ "usa" ] }
{ "_id" : "2", "name" : "jaan", "passport" : [ "usa", "canada" ] }
{ "_id" : "3", "name" : "jon", "passport" : [ "germany" ] }
and i run the following command
db.tab12.aggregate({$match:{"name":{$regex:"a"}}},{$redact:{$cond:{if:{$in:["$country",["canada"]]},then:"$$DESCEND",else:"$$PRUNE" }} } )
I get no result, mongodb lacks examples on net i think.
Let's imagine that you have collections PC and events. The PC is generating a different type of events and you want just to take the events which are from type:1
PC {
"_id": ObjectId,
"version": "SomeVersion",
"location": "Sofia",
"price": 220,
events: [1,2,3,4,5]
}
events{
"_id":1,
"type": 1,
"ts": "1999-01-01"
}
events{
"_id":2,
"securiy:" true,
"type": 2,
"ts": "2015-01-01"
}
PCs.aggregate([
{
$match: {
"location": "Sofia"
}
},
{
$unwind: {
path: '$events'
}
},
{
$project: {
events: '$pcs.events'
}
},
{
$unwind: {
path: '$events'
}
},
{
$lookup: {
from: 'events',
localField: 'event',
foreignField: '_id',
as: 'events'
}
},
{
$match: {
$and: [{
'events.ts': {
$gt: new Date("1970-01-01")
}
},
{
'events.ts': {
$lt: new Date("2000-02-02")
}
}
]
}
},
{
$redact: {
$cond: {
if: { $eq: [ '$events.type', 1 ] },
then: '$$KEEP',
else: '$$PRUNE'
}
}
},
this will remove all the other type of events, which are not passing the condition. After this step you can make some kind of grouping and for example if the events have some data (for example price), you can sum it.
{
$group: {
_id: {
isSecurityEvent: '$events.security',
},
totalEvents: { $sum: 1 },
}
}
}
])
From the mongo documentation you have 3 types of operators:
System Variable Description
$$DESCEND $redact returns the fields at
the current document level, excluding embedded documents. To include
embedded documents and embedded documents within arrays, apply the
$cond expression to the embedded documents to determine access for
these embedded documents.
$$PRUNE $redact excludes all fields at this current
document/embedded document level, without further inspection of any of
the excluded fields. This applies even if the excluded field contains
embedded documents that may have different access levels.
$$KEEP $redact returns or keeps all fields at this current
document/embedded document level, without further inspection of the
fields at this level. This applies even if the included field contains
embedded documents that may have different access levels.