Is it possible to perform both a map reduce with a lookup in the same query pipeline efficiently?
Let's say I've two collections:
items: { _id, group_id, createdAt }
purchases: { _id, item_id }
I want to get the top n item groups, based on the number of purchases on the most recent x items per group.
If I had the number of purchases available in the item documents, then I could aggregate and sort, but this is not the case.
I can get the most recent x items per group as so:
let x = 3;
let map = function () {
emit(this.group_id, { items: [this] });
};
let reduce = function (key, values) {
return { items: getLastXItems(x, values.map(v => v.items[0])) };
};
let scope = { x };
db.items.mapReduce(map, reduce, { out: { inline: 1 }, scope }, function(err, res) {
if (err) {
...
} else {
// res is an array of { group_id, items } where items is the last x items of the group
}
});
But I'm missing purchase count so I can't use it to sort groups, and output the top n groups (which btw I'm not even sure I can do)
I'm using this on a web server, and running the query with scope variable depending on the user context, so I don't want to output the result to another collection and have to do everything inline.
=== edit 1 === add data example:
Sample data could be:
// items
{ _id: '1, group_id: 'a', createdAt: 0 }
{ _id: '2, group_id: 'a', createdAt: 2 }
{ _id: '3, group_id: 'a', createdAt: 4 }
{ _id: '4, group_id: 'b', createdAt: 1 }
{ _id: '5, group_id: 'b', createdAt: 3 }
{ _id: '6, group_id: 'b', createdAt: 5 }
{ _id: '7, group_id: 'b', createdAt: 7 }
{ _id: '8, group_id: 'c', createdAt: 5 }
{ _id: '9, group_id: 'd', createdAt: 5 }
// purchases
{ _id: '1', item_id: '1' }
{ _id: '2', item_id: '1' }
{ _id: '3', item_id: '3' }
{ _id: '4', item_id: '5' }
{ _id: '5', item_id: '5' }
{ _id: '6', item_id: '6' }
{ _id: '7', item_id: '7' }
{ _id: '8', item_id: '7' }
{ _id: '9', item_id: '7' }
{ _id: '10', item_id: '3' }
{ _id: '11', item_id: '9' }
and sample result with n = 3 and x = 2 would be:
[
group_id: 'a', numberOfPurchasesOnLastXItems: 4,
group_id: 'b', numberOfPurchasesOnLastXItems: 3,
group_id: 'c', numberOfPurchasesOnLastXItems: 1,
]
I think this can be solved with the aggregation pipeline, but I've no idea on how bad this is, especially performance wise.
Concerns I have are:
will the aggregation pipeline be able to benefits from indexes, on lookup and sort?
can the lookup + projection that's only used to count matching items be simplified
Anyway, I think one solution I could be:
x = 2;
n = 3;
items.aggregate([
{
$lookup: {
from: 'purchases',
localField: '_id',
foreignField: 'item_id',
as: 'purchases',
},
},
/*
after the join, the data is like {
_id: <itemId>,
group_id: <itemGroupId>,
createdAt: <itemCreationDate>,
purchases: <arrayOfPurchases>,
}
*/
{
$project: {
group_id: 1,
createdAt: 1,
pruchasesCount: { $size: '$purchases' },
}
}
/*
after the projection, the data is like {
_id: <itemId>,
group_id: <itemGroupId>,
createdAt: <itemCreationDate>,
purchasesCount: <numberOfPurchases>,
}
*/
{
$sort: { createdAt: 1 }
},
{
$group: {
_id: '$group_id',
items: {
$push: '$purchasesCount',
}
}
}
/*
after the group, the data is like {
_id: <groupId>,
items: <array of number of purchases per item, sorted per item creation date>,
}
*/
{
$project: {
numberOfPurchasesOnMostRecentItems: { $sum: { $slice: ['$purchasesCount', x] } },
}
}
/*
after the projection, the data is like {
_id: <groupId>,
numberOfPurchasesOnMostRecentItems: <number of purchases on the last x items>,
}
*/
{
$sort: { numberOfPurchasesOnMostRecentItems: 1 }
},
{ $limit : n }
]);
Related
I have a simplified order model that looks likes:
order = {
_id: 1,
productGroups:[
{ productId: 1, qty: 3 },
{ productId: 2, qty: 5 }
],
cancels:[]
}
Now I have an api that cancels part of the order.
The request could be something like cancel {productId:1, qty:2}, {productId:2, qty:2} from order where orderId:1. The result should be
order = {
_id: 1,
productGroups:[
{ productId: 1, qty: 2 },
{ productId: 2, qty: 2 }
],
cancels:[
{
productGroups:[
{ productId: 1, qty: 1 },
{ productId: 2, qty: 3 }
]
}
]
}
let order = await Order.findOneAndUpdate(
{
_id: id
},
{
$inc: {
'productGroups.$.qty': cancelQty //this is the part that needs fixing. how do I get different cancelQty according to productId
},
$push: {
cancels: {
products: cancelProductGropus
}
}
},
{ new: true }
);
Now I know I can just findOne, update the model with javascript, and then .save() the model. But if possible I would like to do this update in one go. Or if it is not possible, can I fix the schema so that I can do such update in a single request?
I have a document like this
{
users: [
{
name: 'John',
id: 1
},
{
name: 'Mark',
id: 2
},
{
name: 'Mike',
id: 3
},
{
name: 'Anna',
id: 4
}
]
}
and I want to remove users from the array with ids 2 and 4. To do that I execute the following code:
const documents = [
{
id: 2
},
{
id: 4
},
]
Model.updateOne({ document_id: 1 }, { $pull: { users: { $in: documents } } });
But it doesn't remove any user.
Could you say me what I'm doing wrong and how to achieve the needed result?
This works if you can redefine the structure of your documents array:
const documents = [2, 4]
Model.updateOne({ document_id: 1 }, { $pull: { users: { id: { $in: documents } } } })
I have a dataset with metrics collected from a group of sensors.
My dataset looks like this:
{type: 1, display: 'foo', value: 'A'}
{type: 2, display: 'bar', value: 'B'}
{type: 2, display: 'foo', value: 'B'}
I am trying to aggregate the results and get some meaning insights via a REST API. I am somehow trying to produce aggregated results as:
[{
type: 1,
displays: [
{
name: 'foo',
count: 1
}
],
values: [
{
name: 'A',
count: 1
}
],
total_count: 1
},{
type: 2,
displays: [
{
name: 'foo',
count: 1
} , {
name: 'bar',
count: 1
}
],
values: [
{
name: 'B',
count: 2
}
],
total_count: 2
}]
Summarizing the aggregated results and producing shallow results is straight forward, I am struggling though as I can't created the nested counters for types and displays all together.
I have tried to use various aggregation operators with no luck.
Basically I can get one group by types or displays as:
db.logs.aggregate([
{
$group: {
_id: {
type: '$type',
display: '$display'
},
count: { $sum: 1 }
}
}, {
$group: {
_id: '$_id.type',
displays: {
$push: {
name: "$_id.display",
count: "$count"
}
}
}
}
]);
Any help will be highly appreciated.
I'm newbie with MongoDB.
I have created a mapReduce on my Person collection to group cities.
db.Person.find()
[{
name: 'Bob',
addresses: [
{
street: 'Vegas Street',
neighborhood: {
name: 'Center',
city: {
name: 'Springfield'
}
}
}, {
.....
}
]
}, {
....
}]
And this is my mapReduce:
db.Person.mapReduce(function() {
for (var i = 0; i < this.address.length-1; i++) {
var address = this.address[i];
emit(address.neighborhood.city.name, 1);
}
}, function(k, v) {
return v.length;
}, { out: 'City' });
Then I use this to list my cities:
db.City.find().sort({'_id:', 1})
[{
_id: 'Springfield',
value: 3
}, {
_id: 'City B',
value: 2
}, {
...
}]
My question is about the City data, I need run the mapReduce each time I insert, update or delete on my Personcollection or it runs automatically?
Given the following data:
{
_id: '123',
name: 'Foobar',
friends: [
{ name: 'a' },
{ name: 'b' },
{ name: 'c' },
{ name: 'd' },
{ name: 'e' }
]
}
Is there a way to query MongoDB to return a list of friends with an offset - e.g. skip the first two friends in the array ('a' and 'b') and return only 'c', 'd' and 'e'?
I've tried to use $slice, but it seem to require a "limit" as well, e.g.
db.users.findOne({ _id: '123' }, { friends: { $slice: [2,-1] } })
This will not work, since the "limit" (-1 in the above example) needs to be a positive integer.
It isn't terribly elegant, but just provide a limit value large enough to effectively not be a limit:
db.users.findOne({ _id: '123' }, { friends: { $slice: [2,1000000000] } })