Group by, count and stream individual results from mongodb query

Group by, count and stream individual results from mongodb query - mongodb

In mongodb, having a collection with sessionIds and labels, I would like to group by the sessionId where label equals 'view_item' and accomplish:
Get the count of sessionId groups.
Be able to stream each sessionId to the consumer (assuming I have limited memory resources and a large number individual sessionIds)
Assume following documents in a collection:
{ "label" : "view_item", "sessionId" : "01e5dnnpsczgfq58rmp0cjtjm0" }
{ "label" : "view_category", "sessionId" : "01e5dnnpsczgfq58rmp0cjtjm0" }
{ "label" : "view_item", "sessionId" : "01e5dnnpsczgfq58rmp0cjtjm0" }
{ "label" : "view_item", "sessionId" : "01e5g7vzx5dh0mv8m6g1zbdrnj" }
{ "label" : "view_item", "sessionId" : "01e5g7vzx5dh0mv8m6g1zbdrnj" }
{ "label" : "view_category", "sessionId" : "01e5g7vzx5dh0mv8m6g1zbdrnj" }
{ "label" : "view_item", "sessionId" : "01e5g7vzx5dh0mv8m6g1zbdrnj" }
The expected result would be something like this:
Get results somehow and...
result.count() // 2 (or some other way of getting the count)
await result.next() // { sessionId: '01e5dnnpsczgfq58rmp0cjtjm0' }
await result.next() // { sessionId: '01e5g7vzx5dh0mv8m6g1zbdrnj' }
await result.next() // null
I've been fiddling with the aggregation framework and manage to group and count. In theory I could do two queries to first get count and then the groups, but in a frequent write scenario I'm worried that doing two separate queries could lead to inconsistencies, especially since I haven't figured out how to include any start / end ids in the result from the count query, which could be used to confine the results from the groups query.
What I have so far is:
const result = collection.aggregate([
{ $match: { label: 'view_item' } },
{ $group : { _id: { sessionId: '$sessionId' } } },
]);
await result.next() // { _id: { sessionId: '01e5g7vzx5dh0mv8m6g1zbdrnj' } }
await result.next() // { _id: { sessionId: '01e5dnnpsczgfq58rmp0cjtjm0' } }
await result.next() // null
and
const result = collection.aggregate([
{ $match: { label: 'view_item' } },
{ $group : { _id: { sessionId: '$sessionId' } } },
{ $facet: { count: [{ $count: 'count' }] } }
]);
await result.next() // { count: [ { count: 2 } ] }
await result.next() // null
Question
How can the two queries above be combined to reliably get the count and a result with the grouped sessionId that can be streamed? (I assume any solution relying on result.toArray().length needs to load the whole result in memory, which is ruled out).
Is it possible to do in one single query or more likely to get the count and start / end ids in one query and then do a second query to get the groups confined by the start / end ids?
Thanks!

If I understand you requirements clearly, you need to gather all the sessions that have been assigned to each label in one array, and count that sessions
if so, we may use the $group to group the sessions assigned to each label,
and $size to calculate that array length
we may do something like that
db.collection.aggregate([
{
$match: {} // if you need the 'View_Item' labels only, than add it here
},
{
$group: {
_id: "$label", // make the _id of the results is the label
sessionsIds: { // array of sessions
$push: "$sessionId"
}
}
},
{
$project: { // use the $project as $size is available only in the $project stage
_id: 1,
sessionsIds: 1,
sessionsCount: {
$size: "$sessionsIds"
}
}
}
])
you could try that here in Mongo Playground
Update, If you need to get the number of unique sessions Ids, and no duplication in sessionsIds array, use $addToSet instead of $push
update 2: If we need to group by the sessionId and count how many documents have this sessionId, we can do something like
db.collection.aggregate([
{
$match: {} // if you need the 'View_Item' labels only, than add it here
},
{
$group: {
_id: "$sessionId",
count: {
$sum: 1
}
}
}
])
this will return a result
[
{
"_id": "01e5dnnpsczgfq58rmp0cjtjm0",
"count": 3
},
{
"_id": "01e5g7vzx5dh0mv8m6g1zbdrnj",
"count": 4
}
]
if you need to make the _id of the result be an object rather than ObjectId, we could do something like
db.collection.aggregate([
{
$match: {} // if you need the 'View_Item' labels only, than add it here
},
{
$group: {
_id: {
sessionId: "$sessionId"
},
count: {
$sum: 1
}
}
}
])
this will result in
[
{
"_id": {
"sessionId": "01e5dnnpsczgfq58rmp0cjtjm0"
},
"count": 3
},
{
"_id": {
"sessionId": "01e5g7vzx5dh0mv8m6g1zbdrnj"
},
"count": 4
}
]
you can try all of that here Mongo_Playground 2

Related

Mongoose - filter matched documents and assign the resultant length to a field

I have this collection(some irrelevant fields were omitted for brevity):
clients: {
userId: ObjectId,
clientSalesValue: Number,
currentDebt: Number,
}
Then I have this query that matches all the clients for a specific user, then calculates the sum of all debts and sales and put those results in a separate field each of them:
await clientsCollection.aggregate([
{
$match: { userId: new ObjectId(userId) }
},
{
$group: {
_id: null,
totalSalesValue: { $sum: '$clientSalesValue' },
totalDebts: { $sum: '$currentDebt' },
}
},
{
$unset: ['_id']
}
]).exec();
This works as expected, it returns an array with only one item which is an object, but now I need to also include in that resultant object a field for the amount of debtors, that is for the amount of clients that have currentDebt > 0, how can I do that is the same query? is it possible?
PD: I cannot modify the $match condition, it need to always return all the clients for the corresponding users.

To include a count of how many matching documents have a positive currentDebt, you can use the $sum and $cond operators like so:
await clientsCollection.aggregate([
{
$match: { userId: new ObjectId(userId) }
},
{
$group: {
_id: null,
totalSalesValue: { $sum: '$clientSalesValue' },
totalDebts: { $sum: '$currentDebt' },
numDebtors: {
$sum: {
$cond: [{ $gt: ['$currentDebt', 0] }, 1, 0]
}
},
}
},
{
$unset: ['_id']
}
]).exec();

how to add a dot (.) inside a field name?

When I run this command :
db.runCommand(
{
aggregate:"myColl",
pipeline:[
{
$group:{
_id:{os_name:"$os_name",os_version:"$os_version"},
"events.login":{$sum:"$events.login"},
count:{$sum:NumberInt(1)}}
}
],
cursor:{}
}
)
I receive the error:
The field name 'events.login' cannot contain '.'
How can i do to keep the '.' in the returned field name (ie: events.login)

It's not quiet clear what you're trying to do, So look at these :
Error :
The field name 'events.login' cannot contain '.'
It's because as in $group stage when it creates a new field on each document for a given name, You'll not be able to create it like this : "events.login" means cannot create a sub-doc it has to be an object to a top level field events basically you can not use . notation, So to make it work you need to have one more stage like this :
db.collection.aggregate([{
$group: {
_id: { os_name: "$os_name", os_version: "$os_version" },
"events": { $sum: "$events.login" },
count: { $sum: NumberInt(1) }
}
}, { $addFields: { 'events.login': '$events' } }])
Test : MongoDB-Playground
If in a case you need to update login field inside events field & to retain all other fields inside events, try below query which would get last document in iteration of each _id (this will last document inserted to DB on that _id criteria) & append login to it :
db.collection.aggregate([{
$group: {
_id: { os_name: "$os_name", os_version: "$os_version" },
"login": { $sum: "$events.login" }, 'events': { $last: '$events' },
count: { $sum: NumberInt(1) }
}
}, { $addFields: { 'events.login': '$login' } }, { $project: { login: 0 } }])
Test : MongoDB-Playground

MongoDB: Create Object in Aggregation result

I want to return Object as a field in my Aggregation result similar to the solution in this question. However in the solution mentioned above, the Aggregation results in an Array of Objects with just one item in that array, not a standalone Object. For example, a query like the following with a $push operation
$group:{
_id: "$publisherId",
'values' : { $push:{
newCount: { $sum: "$newField" },
oldCount: { $sum: "$oldField" } }
}
}
returns a result like this
{
"_id" : 2,
"values" : [
{
"newCount" : 100,
"oldCount" : 200
}
]
}
}
not one like this
{
"_id" : 2,
"values" : {
"newCount" : 100,
"oldCount" : 200
}
}
}
The latter is the result that I require. So how do I rewrite the query to get a result like that? Is it possible or is the former result the best I can get?

You don't need the $push operator, just add a final $project pipeline that will create the embedded document. Follow this guideline:
var pipeline = [
{
"$group": {
"_id": "$publisherId",
"newCount": { "$sum": "$newField" },
"oldCount": { "$sum": "$oldField" }
}
},
{
"$project" {
"values": {
"newCount": "$newCount",
"oldCount": "$oldCount"
}
}
}
];
db.collection.aggregate(pipeline);

Return all fields MongoDB Aggregate

I tried searching on here but couldn't really find what I need. I have documents like this:
{
appletype:Granny,
color:Green,
datePicked:2015-01-26,
dateRipe:2015-01-24,
numPicked:3
},
{
appletype:Granny,
color:Green,
datePicked:2015-01-01,
dateRipe:2014-12-28,
numPicked:6
}
I would like to return only those apples picked latest, will all fields. I want my query to return me the first document only essentially. When I try to do:
db.collection.aggregate([
{ $match : { "appletype" : "Granny" } },
{ $sort : { "datePicked" : 1 } },
{ $group : { "_id" : { "appletype" : "$appletype" },
"datePicked" : { $max : "$datePicked" } },
])
It does return me all the apples picked latest, however with only appletype:Granny and datePicked:2015-01-26. I need the remaining fields. I tries using $project and adding all the fields, but it didn't get me what I needed. Also, when I added the other fields to the group, since datePicked is unique, it returned both records.
How can I go about returning all fields, for only the latest datePicked?
Thanks!

From your description, it sounds like you want one document for each of the types of apple in your collection and showing the document with the most recent datePicked value.
Here is an aggregate query for that:
db.collection.aggregate([
{ $sort: { "datePicked": -1 },
{ $group: { _id: "$appletype", color: { $first: "$color" }, datePicked: { $first: "$datePicked" }, dateRipe: { $first: "$dateRipe" }, numPicked: { $first: "$numPicked" } } },
{ $project: { _id: 0, color: 1, datePicked: 1, dateRipe: 1, numPicked: 1, appletype: "$_id" } }
])
But then based on the aggregate query you've written, it looks like you're trying to get this:
db.collection.find({appletype: "Granny"}).sort({datePicked: -1}).limit(1);

way to update multiple documents with different values

I have the following documents:
[{
"_id":1,
"name":"john",
"position":1
},
{"_id":2,
"name":"bob",
"position":2
},
{"_id":3,
"name":"tom",
"position":3
}]
In the UI a user can change position of items(eg moving Bob to first position, john gets position 2, tom - position 3).
Is there any way to update all positions in all documents at once?

You can not update two documents at once with a MongoDB query. You will always have to do that in two queries. You can of course set a value of a field to the same value, or increment with the same number, but you can not do two distinct updates in MongoDB with the same query.

You can use db.collection.bulkWrite() to perform multiple operations in bulk. It has been available since 3.2.
It is possible to perform operations out of order to increase performance.

From mongodb 4.2 you can do using pipeline in update using $set operator
there are many ways possible now due to many operators in aggregation pipeline though I am providing one of them
exports.updateDisplayOrder = async keyValPairArr => {
try {
let data = await ContestModel.collection.update(
{ _id: { $in: keyValPairArr.map(o => o.id) } },
[{
$set: {
displayOrder: {
$let: {
vars: { obj: { $arrayElemAt: [{ $filter: { input: keyValPairArr, as: "kvpa", cond: { $eq: ["$$kvpa.id", "$_id"] } } }, 0] } },
in:"$$obj.displayOrder"
}
}
}
}],
{ runValidators: true, multi: true }
)
return data;
} catch (error) {
throw error;
}
}
example key val pair is: [{"id":"5e7643d436963c21f14582ee","displayOrder":9}, {"id":"5e7643e736963c21f14582ef","displayOrder":4}]

Since MongoDB 4.2 update can accept aggregation pipeline as second argument, allowing modification of multiple documents based on their data.
See https://docs.mongodb.com/manual/reference/method/db.collection.update/#modify-a-field-using-the-values-of-the-other-fields-in-the-document
Excerpt from documentation:
Modify a Field Using the Values of the Other Fields in the Document
Create a members collection with the following documents:
db.members.insertMany([
{ "_id" : 1, "member" : "abc123", "status" : "A", "points" : 2, "misc1" : "note to self: confirm status", "misc2" : "Need to activate", "lastUpdate" : ISODate("2019-01-01T00:00:00Z") },
{ "_id" : 2, "member" : "xyz123", "status" : "A", "points" : 60, "misc1" : "reminder: ping me at 100pts", "misc2" : "Some random comment", "lastUpdate" : ISODate("2019-01-01T00:00:00Z") }
])
Assume that instead of separate misc1 and misc2 fields, you want to gather these into a new comments field. The following update operation uses an aggregation pipeline to:
add the new comments field and set the lastUpdate field.
remove the misc1 and misc2 fields for all documents in the collection.
db.members.update(
{ },
[
{ $set: { status: "Modified", comments: [ "$misc1", "$misc2" ], lastUpdate: "$$NOW" } },
{ $unset: [ "misc1", "misc2" ] }
],
{ multi: true }
)

Suppose after updating your position your array will looks like
const objectToUpdate = [{
"_id":1,
"name":"john",
"position":2
},
{
"_id":2,
"name":"bob",
"position":1
},
{
"_id":3,
"name":"tom",
"position":3
}].map( eachObj => {
return {
updateOne: {
filter: { _id: eachObj._id },
update: { name: eachObj.name, position: eachObj.position }
}
}
})
YourModelName.bulkWrite(objectToUpdate,
{ ordered: false }
).then((result) => {
console.log(result);
}).catch(err=>{
console.log(err.result.result.writeErrors[0].err.op.q);
})
It will update all position with different value.
Note : I have used here ordered : false for better performance.