Grouping by filter of another collection - mongodb

I'm using Meteor and MongoDB. I need to publish with aggregation (I'm using jcbernack:reactive-aggregate and ReactiveAggregate).
db.getCollection('Jobs').aggregate([
{
$lookup:
{
from: "JobMatches",
localField: "_id",
foreignField: "jobId",
as: "matches"
}
},
{ $project:
{
matches: {
'$filter': {
input: '$matches',
as: 'match',
cond: { '$and': [{$eq: ['$$match.userId', userId]}]}
}
}
}
},
{$match: { 'matches.score': { '$gte': 60 }},
{$sort: { "matches.score": -1 }},
{$limit: 6}
])
On the client I get only the data part (limit 6). So I will have to count the number of all the data on the server side. I can't use find().count() because in the find() call without aggregation I can't use a filter associated with other collection (like this { 'matches.score': { '$gte': 60 }). How can I calculate the data filtered in this way? There may be a need to use a $group in the pipeline?

Related

Mongo performance is extremely slow for an aggregation query

Hope someone can help with the slow Mongo query - it runs fine against smaller collections but once we test it against the larger production collections, it fails with the message "Not enough disk space" even though we had limited the result set to 100.
I feel like there is an issue with the query structure and/or appropriate indexes
Both collections are ~5 million records.
We need help to make this query fast.
// divide these by 1000 because the ts field isn't javascript milliseconds
const startDate = (ISODate("2022-07-01T00:00:00.000Z").getTime()/1000)
const endDate = (ISODate("2022-08-10T00:00:00.000Z").getTime()/1000)
const clientId = xxxx
const ordersCollection = "orders"
const itemsCollection = "items"
db[ordersCollection].aggregate(
[
{
$lookup: {
from: itemsCollection,
localField: "data.id",
foreignField: "data.orders_id",
as: "item"
}
},
{
$unwind: "$item"
},
{
$match: {"data.client_id": clientId}
},
{
$match: {"item.ts": {$gt: startDate, $lt: endDate}}
},
{
$project: {
order_id: "$data.id",
parent_id: "$data.parent_id",
owner_id: "$data.owner_id",
client_id: "$data.client_id",
ts: "$item.ts",
status: {
$cond: {
if: {$eq: ["$item.data.status",10] },then: 3,
else: {
$cond: { if: { $eq: ["$item.data.status",4] },
then: 2,
else: "$item.data.status"
}
}
}
}
}
},
{$group: { _id: {"order_id": "$order_id", "status": "$status"},
order_id: {$first:"$order_id"},
parent_id: {$first:"$parent_id"},
owner_id: {$first:"$owner_id"},
client_id: {$first:"$client_id"},
ts: {$first:"$ts"},
status:{$first:"$status"}
}},
{$sort: {"ts": 1}}
]
).limit(100).allowDiskUse(true)
Try pulling $match on the main collection up.
This way you limit the number of documents you need to $lookup on (otherwise we'll try to lookup 5 million documents in other collection of 5 million documents).
Be sure to have an index on data.client_id.
db[ordersCollection].aggregate(
[
{
$match: {"data.client_id": clientId}
},
{
$lookup: {
from: itemsCollection,
localField: "data.id",
foreignField: "data.orders_id",
as: "item"
}
},
{
$unwind: "$item"
},
{
$match: {"item.ts": {$gt: startDate, $lt: endDate}}
},
...
As a side note limiting the result set to 100 is not helping, as the heaviest part - aggregation with lookups and grouping can not be limited.

Mongo $lookup aggregation using date timestamps

I have two collections: Car Drive Histories and Car Geolocations. For the purpose of analyzing drive patterns I have to aggregate driving histories and link them to car geolocations.
I've used $match and $project aggregation stages to get drive history documents with the following structure:
travelPurpose:<String>
carID:<ObjectId>
checkOutTime:<Date>
checkInTime:<Date>
The next step is to use $lookup stage to get car location between the two timestamps (checkOutTime and checkInTime). Every car geolocation document has carID and geoLocationTimestamp fields. If I use static dates, for example as such:
{
from: 'carGeoLocations',
localField: 'carID',
foreignField: 'carID',
pipeline: [
{$match: {
geoLocationTimestamp: {
$gte: ISODate('2022-01-01T00:00:00.000+0000'),
$lte: ISODate('2023-01-01T00:00:00.000+0000')
}
}}
],
as: 'coordinates'
}
I do get geolocations between 1. 1. 2022 and 1. 1. 2023. Mongo Playground with an example of this behaviour can be accessed here.
However, if I try to use dynamic dates based on values of checkOutTime and checkInTime, no documents are retrieved. Mongo playground with this example is available here. I've tried the following:
{
from: 'carGeoLocations',
localField: 'carID',
foreignField: 'carID',
pipeline: [
{$match: {
geoLocationTimestamp: {
$gte: "$checkOutTime",
$lte: "$checkInTime"
}
}}
],
as: 'coordinates'
}
and
{
from: 'carGeoLocations',
localField: 'carID',
foreignField: 'carID',
let: {t1: '$checkOutTime', t2: '$checkInTime'}
pipeline: [
{$match: {
geoLocationTimestamp: {
$gte: '$$t1',
$lte: '$$t2'
}
}}
],
as: 'coordinates'
}
With the same results. Can anyone spot any issues with my approach?
First lookup and than use match for geoLocationTimestamp
Try following code
{
$lookup:{
from: 'carGeoLocations',
localField: 'carID',
foreignField: 'carID',
as: 'coordinates'
}
},
{
$match:{
geoLocationTimestamp: {
$gte:'$coordinates.checkOutTime',
$lte:'$coordinates.checkInTime'
}
}
}
Update
After further experimentation, it turns out you need to use $expr when you want to use variables declared with let in $lookup stage of aggretation.
My lookup stage now looks like this:
{
"$lookup": {
"from": "carGeoLocations",
"localField": "carID",
"foreignField": "carID",
"let": {
t1: "$checkOutTime",
t2: "$checkInTime"
},
"pipeline": [
{
$match: {
$and: [
{
$expr: {
$gte: [
"$geoLocationTimestamp",
"$$t1"
],
}
},
{
$expr: {
$lte: [
"$geoLocationTimestamp",
"$$t2"
],
}
},
]
}
}
],
"as": "coordinates"
}
}

How to count number of root documents in intermediate aggregate stage in mongo?

I want to implement pagination on a website and I'd like my mongodb query to return first perform the lookup between 2 collections, sort the documents, calculate the total number of documents and then return the relevant documents after $skip and $limit stages in the aggregation. This is my query:
const res = await Product.aggregate([
{
$lookup: {
from: 'Brand',
localField: 'a',
foreignField: 'b',
as: 'brand'
}
},
{
$sort: {
c: 1,
'brand.d': -1
}
},
{
$skip: offset
},
{
$limit: productsPerPage
}
])
I don't want to make 2 queries which are essentially the same only for the first one to return the count of documents and for the other to return the documents themselves.
So the result would be something like this:
{
documents: [...],
totalMatchedDocumentsCount: x
}
such that there will be for example 10 documents but totalMatchedDocumentsCount may be 500.
I can't figure out how to do this, I don't see that aggregate method returns cursor. Is it possible to achieve what I want in one query?
You need $facet and you can run your pipeline with $limit and $skip as one subpipeline while $count can be used simultaneously:
const res = await Product.aggregate([
// $match here if needed
{
$facet: {
documents: [
{
$lookup: {
from: 'Brand',
localField: 'a',
foreignField: 'b',
as: 'brand'
}
},
{
$sort: {
c: 1,
'brand.d': -1
}
},
{
$skip: offset
},
{
$limit: productsPerPage
}
],
total: [
{ $count: "count" }
]
}
},
{
$unwind: "$total"
}
])

Use $match on fields from two separate collections in an aggregate query mongodb

I have an aggregate query where I join 3 collections. I'd like to filter the search based on fields from two of those collections. The problem is, I'm only able to use $match on the initial collection that mongoose initialized with.
Here's the query:
var pipeline = [
{
$lookup: {
from: 'blurts',
localField: 'followee',
foreignField: 'author.id',
as: 'followerBlurts'
}
},
{
$unwind: '$followerBlurts'
},
{
$lookup: {
from: 'users',
localField: 'followee',
foreignField: '_id',
as: 'usertbl'
}
},
{
$unwind: '$usertbl'
},
{
$match: {
'follower': { $eq: req.user._id },
//'blurtDate': { $gte: qryDateFrom, $lte: qryDateTo }
}
},
{
$sample: { 'size': 42 }
},
{
$project: {
_id: '$followerBlurts._id',
name: '$usertbl.name',
smImg: '$usertbl.smImg',
text: '$followerBlurts.text',
vote: '$followerBlurts.vote',
blurtDate: '$followerBlurts.blurtDate',
blurtImg: '$followerBlurts.blurtImg'
}
}
];
keystone.list('Follow').model.aggregate(pipeline)
.sort({blurtDate: -1})
.cursor().exec()
.toArray(function(err, data) {
if (!err) {
res.json(data);
} else {
console.log('Error getting following blurts --> ' + err);
}
});
Within the pipeline, I can only use $match on the 'Follow' model. When I use $match on the 'Blurt' model, it simply ignores the condition (you can see where I tried to include it in the commented line under $match).
What's perplexing is that I can utilize this field in the .sort method, but not in the $match conditions.
Any help much appreciated.
You can use the mongo dot notation to access elements of the collection that is being looked up via $lookup.
https://docs.mongodb.com/manual/core/document/#dot-notation
So, in this case followerBlurts.blurtDate should give you the value you are looking for.

some questions about Aggregate running mechanism

Aggregate1:
db.collection.aggregate([
{
$lookup: {
...
}
},
{
$limit: 1
}
])
Aggregate2:
db.collection.aggregate([
{
$limit: 1
},
{
$lookup: {
...
}
}
])
Aggregate1 and Aggregate2 are different?
In Aggregate1,is the whole collection scanned firstly, then do $lookup?
If it is different,how to lookup with some query?just like this:
db.collection.aggregate([
{
$lookup: {
from: 'collection2',
localField: 'field',
foreignField: 'field',
as: 'newField',
// do some query when lookup
query: {'newField.xxx': 1}
}
}
])
I know, i can do this:
db.collection.aggregate([
{
$lookup: {
from: 'collection2',
localField: 'field',
foreignField: 'field',
as: 'newField'
}
},
{$unwind: '$newField'},
{$match: {'newField.xxx': 1}}
])
But I'm afraid that, like the example above, the entire collection will be scanned。
Look forward to your reply!
Now,i find this api:$graphLookup.restrictSearchWithMatch,but:
NOTE
You cannot use any aggregation expression in this filter.
For example, a query document such as
{ lastName: { $ne: "$lastName" } }
will not work in this context to find documents in which the lastName
value is different from the lastName value of the input document,
because "$lastName" will act as a string literal, not a field path.