The idea is to return a kind of row number to a mongodb aggregate command/ pipeline. Similar to what we've in an RDBM.
It should be a unique number, not important if it matches exactly to a row/number.
For a query like:
[ { $match: { "author" : { $ne: 1 } } }, { $limit: 1000000 } ]
I'd like to return:
{ "rownum" : 0, "title" : "The Banquet", "author" : "Dante", "copies" : 2 }
{ "rownum" : 1, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 }
{ "rownum" : 2, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
{ "rownum" : 3, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 }
{ "rownum" : 4, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
Is it possible to generate this rownum in mongodb?
Not sure about the performance in big queries, but this is at least an option.
You can add your results to an array by grouping/pushing and then unwind with includeArrayIndex like this:
[
{$match: {author: {$ne: 1}}},
{$limit: 10000},
{$group: {
_id: 1,
book: {$push: {title: '$title', author: '$author', copies: '$copies'}}
}},
{$unwind: {path: '$book', includeArrayIndex: 'rownum'}},
{$project: {
author: '$book.author',
title: '$book.title',
copies: '$book.copies',
rownum: 1
}}
]
Now, if your database contains a big amount of records, and you intend to paginate, you can use the $skip stage and then $limit 10 or 20 or whatever you want to display per page, and just add the number from the $skip stage to your rownum and you'll get the real position without having to push all your results to enumerate them.
Starting in Mongo 5, it's a perfect use case for the new $setWindowFields aggregation operator and its $documentNumber operation:
// { x: "a" }
// { x: "b" }
// { x: "c" }
// { x: "d" }
db.collection.aggregate([
{ $setWindowFields: {
sortBy: { _id: 1 },
output: { rowNumber: { $documentNumber: {} } }
}}
])
// { x: "a", rowNumber: 1 }
// { x: "b", rowNumber: 2 }
// { x: "c", rowNumber: 3 }
// { x: "d", rowNumber: 4 }
$setWindowFields allows us to work for each document with the knowledge of previous or following documents. Here we just need the information of the place of the document in the whole collection (or aggregation intermediate result), as provided by $documentNumber.
Note that we sort by _id because the sortBy parameter is required, but really, since you don't care about the ordering of your rows, it could be anything you'd like.
Another way would be to keep track of row_number using "$function"
[{ $match: { "author" : { $ne: 1 } }} , { $limit: 1000000 },
{
$set: {
"rownum": {
"$function": {
"body": "function() {try {row_number+= 1;} catch (e) {row_number= 0;}return row_number;}",
"args": [],
"lang": "js"
}
}
}
}]
I am not sure if this can mess up something though!
Related
I'm currently trying to split an aggregation result in two differents arrays using only mongodb.
My main goal is to create two subset of user with the same distribution regarding the number of interactions that they have made. For this I'm currently making this request:
db.getCollection('Interaction').aggregate([
{ $group : { _id : "$userId", count: { $sum: 1 }}},
{ $sort : { count : -1 }},
{ $group : { _id :{$mod : [_rand() * 2, 2]}, ids : { $push: "$_id"}}}
}
My main issue actualy is that the _rand() function is called only once during the aggregation execution to I only have all my result in a single array.
Also, a random distribution is not so good. Is there a way to use the index of each result ?
Edit 1 :
After #dnickless answer I still got an issue on distribution in the groupBy part. Ideally I would like to do something like this
db.getCollection('Interaction').aggregate([
{ $group : { _id : "$userId", count: { $sum: 1 }}},
{ $sort : { count : -1 }},
{ $bucket: {
groupBy: { $mod: [ { $indexOfArray : ??? }, 2 ] },
boundaries: [ 0, 1 ],
default: 2,
output: {
"users": { $push: "$_id"}
}
}
}
],
{ allowDiskUse: true })
That could split even index and odd index into two separated array. But I would like to apply the $indexOfArray on the current aggregation result.
To give you more context here is my Interaction object model :
{ "_id" : ObjectId("5af01..."), "name" : "WATCH", "date" : ISODate("2018-05-07T09:32:53.219Z") }
Without the bucket part I have this result :
{ "_id" : "5b1e7f...", "count" : 43.0 }
{ "_id" : "5b1e75...", "count" : 41.0 }
{ "_id" : "5b1e7a...", "count" : 40.0 }
...
I would like my answer to look like this :
{
{ "_id" : 0, "users" : [ "5b1e7f...", "5b1e7a...", ... ] }, // even index results
{ "_id" : 1, "users" : [ "5b1e75...", ... ] } // odd index results
}
My end goal is to split my users in 2 groups with evenly distributed numbers of interactions.
Edit 2 :
Finally found a solution to resolve my problem :
db.getCollection('Interaction').aggregate([
{ $group : { _id : "$userId", count: { $sum: 1 }}},
{ $sort : { count : -1 }},
{ $group : { _id : "whatever" , user : { $push : { _id : "$_id" , count : "$count"}}}},
{ $unwind : { path : "$user" , "includeArrayIndex" : "rank"}},
{ $bucket: {
groupBy: { $mod: [ "$rank" , 2 ] },
boundaries: [ 0, 1 ],
default: 2,
output: {
"users": { $push: "$user._id"}
}
}
}
],
{ allowDiskUse: true })
Probably not the most optimized solution at all, but still do the job :)
If you have any advise to improve it I'm still interested in.
I don't fuly understand what exactly you are trying to achieve here without seeing some sample input and output. However, have you tried using $bucketAuto? Something like this:
db.getCollection('Interaction').aggregate([
{ $group : { _id : "$userId", count: { $sum: 1 }}},
{ $bucketAuto : {
groupBy : "$count",
buckets : 2, // number of buckets goes here
output : {
ids : { $push : "$id" }
}
}
}])
If you want to go more sophisticated regarding the distribution aspect you could perhaps try something like this which would throw all even counts into one pot and all odd ones into another:
$bucket: {
groupBy: { $mod: [ "$count", 2 ] },
boundaries: [ 0, 1 ],
default: 2,
output: {
"docs": { $push: "$$ROOT" }
}
}
Depending on the type of your userId field you could perhaps come up with a more "random" distribution.
Lastly, I am not sure what exactly you mean by
"Is there a way to use the index of each result ?"
Perhaps something like $size, $arrayElemAt and/or $indexOfArray...?
Alternatively, you could perhaps try to $slice the sorted array into two equally sized parts (using $size $divided by 2), then $reverseArray one of them and then $zip both arrays up again which should result in something like when you shuffle a deck of playing cards. After that, you would need to flatten the nested array into a single one again (using $reduce and $concatArrays or so) and then slice the array again in two parts which should be what you are looking for if I am not too tired by now to think through the statistical parts here.
Suppose my collection consists of items that looks like this:
{
"items" : [
{
"item_id": 1,
"item_field": 10
},
{
"item_id": 2,
"item_field": 15
},
{
"item_id": 3,
"item_field": 3
},
]
}
Can I somehow select the entry of items with the lowest value of item_field, in this case the one with item_id 3?
I'm ok with using the aggregation framework. Bonus point if you can give me the code for the C# driver.
You can use $reduce expression in the following way.
The below query will set the initialValue to the first element of $items.item_field and followed by $lt comparison on the item_field and if true set $$this to $$value, if false keep the previous value and $reduce all the values to find the minimum element and $project to output min item.
db.collection.aggregate([
{
$project: {
items: {
$reduce: {
input: "$items",
initialValue:{
item_field:{
$let: {
vars: { obj: { $arrayElemAt: ["$items", 0] } },
in: "$$obj.item_field"
}
}
},
in: {
$cond: [{ $lt: ["$$this.item_field", "$$value.item_field"] }, "$$this", "$$value" ]
}
}
}
}
}
])
You can use $unwind to seperate items entries.
Then $sort by item_field asc and then $group.
db.coll.find().pretty()
{
"_id" : ObjectId("58edec875748bae2cc391722"),
"items" : [
{
"item_id" : 1,
"item_field" : 10
},
{
"item_id" : 2,
"item_field" : 15
},
{
"item_id" : 3,
"item_field" : 3
}
]
}
db.coll.aggregate([
{$unwind: {path: '$items', includeArrayIndex: 'index'}},
{$sort: { 'items.item_field': 1}},
{$group: {_id: '$_id', item: {$first: '$items'}}}
])
{ "_id" : ObjectId("58edec875748bae2cc391722"), "item" : { "item_id" : 3, "item_field" : 3 } }
We can get expected result using following query
db.testing.aggregate([{$unwind:"$items"}, {$sort: { 'items.item_field': 1}},{$group: {_id: "$_id", minItem: {$first: '$items'}}}])
Result is
{ "_id" : ObjectId("58edf28c73fed29f4b741731"), "minItem" : { "item_id" : 3, "item_field" : 3 } }
{ "_id" : ObjectId("58edec3373fed29f4b741730"), "minItem" : { "item_id" : 3, "item_field" : 3 } }
I have the following items collection :
[{
"_id": 1,
"manufactureId": 1,
"itemTypeId": "Type1"
},
{
"_id": 2,
"manufactureId": 1,
"itemTypeId": "Type2"
},
{
"_id": 3,
"manufactureId": 2,
"itemTypeId": "Type1"
}]
I would like to create a query that will return the amount of items for each item type that each manufacturer have in the following structure (or something similar) :
[
{
_id:1, //this would be the manufactureId
itemsCount:{
"Type1":1, //Type1 items count
"Type2":1 //...
}
},
{
_id:2,
itemsCount:{
"Type1":1
}
}
]
I have tried to use the aggregation framework but i couldn't figure out if there is a way to create a "structured" groupby queries with it.
I can easily achieve the desired result by post-processing this simple aggregation query result :
db.items.aggregate([{$group:{_id:{itemTypeId:"$itemTypeId",manufactureId:"$manufactureId"},count:{$sum:1}}}])
but if possible I prefer not to post-process the result.
Data stays data
I would rather use this query which, I believe, will give you the closest data structure to what you want, without post-processing.
Query
db.items.aggregate(
{
$group:
{
_id:
{
itemTypeId: "$itemTypeId",
manufactureId: "$manufactureId"
},
count:
{
$sum: 1
}
},
},
{
$group:
{
_id: "$_id.manufactureId",
itemCounts:
{
"$push":
{
itemTypeId: "$_id.itemTypeId",
count: "$count"
}
}
}
})
Output
{
"_id" : 1,
"itemCounts" : [
{
"itemTypeId" : "Type1",
"count" : 1
},
{
"itemTypeId" : "Type2",
"count" : 1
}
]
},
{
"_id" : 2,
"itemCounts" : [
{
"itemTypeId" : "Type1",
"count" : 1
}
]
}
Data transformed to object fields
This is actually an approach that I wouldn't advice in general. It is harder to manage in your application, because the field names between different objects will be inconsistent and you won't know what object fields to expect in advance. This would be a crucial point if you use a strongly typed languageāautomatic data binding to your domain objects will become impossible.
Anyway, the only way to get the exact data structure you want is to apply post-processing.
Query
db.items.aggregate(
{
$group:
{
_id:
{
itemTypeId: "$itemTypeId",
manufactureId: "$manufactureId"
},
count:
{
$sum: 1
}
},
},
{
$group:
{
_id: "$_id.manufactureId",
itemCounts:
{
"$push":
{
itemTypeId: "$_id.itemTypeId",
count: "$count"
}
}
}
}).forEach(function(doc) {
var obj = {
_id: doc._id,
itemCounts: {}
};
doc.itemCounts.forEach(function(typeCount) {
obj.itemCounts[typeCount.itemTypeId] = typeCount.count;
});
printjson(obj);
})
Output
{ "_id" : 1, "itemCounts" : { "Type1" : 1, "Type2" : 1 } }
{ "_id" : 2, "itemCounts" : { "Type1" : 1 } }
I have document shown below:
{
name: "testing",
place:"London",
documents: [
{
x:1,
y:2,
},
{
x:1,
y:3,
},
{
x:4,
y:3,
}
]
}
I want to retrieve all matching documents i.e. I want o/p in below format:
{
name: "testing",
place:"London",
documents: [
{
x:1,
y:2,
},
{
x:1,
y:3,
}
]
}
What I have tried is :
db.test.find({"documents.x": 1},{_id: 0, documents: {$elemMatch: {x: 1}}});
But, it gives first entry only.
As JohnnyHK said, the answer in MongoDB: select matched elements of subcollection explains it well.
In your case, the aggregate would look like this:
(note: the first match is not strictly necessary, but it helps in regards of performance (can use index) and memory usage ($unwind on a limited set)
> db.xx.aggregate([
... // find the relevant documents in the collection
... // uses index, if defined on documents.x
... { $match: { documents: { $elemMatch: { "x": 1 } } } },
... // flatten array documennts
... { $unwind : "$documents" },
... // match for elements, "documents" is no longer an array
... { $match: { "documents.x" : 1 } },
... // re-create documents array
... { $group : { _id : "$_id", documents : { $addToSet : "$documents" } }}
... ]);
{
"result" : [
{
"_id" : ObjectId("515e2e6657a0887a97cc8d1a"),
"documents" : [
{
"x" : 1,
"y" : 3
},
{
"x" : 1,
"y" : 2
}
]
}
],
"ok" : 1
}
For more information about aggregate(), see http://docs.mongodb.org/manual/applications/aggregation/
I've been looking for a while now and can't seem to sort an inner array and keep that in the doc that I'm currently working with.
{
"service": {
"apps": {
"updates": [
{
"n" : 1
"date": ISODate("2012-03-10T16:15:00Z")
},
{
"n" : 2
"date": ISODate("2012-01-10T16:15:00Z")
},
{
"n" : 5
"date": ISODate("2012-07-10T16:15:00Z")
}
]
}
}
}
So I want to keep the item to be returned as the service, but have my updates array sorted. So far with the shell I have:
db.servers.aggregate(
{$unwind:'$service'},
{$project:{'service.apps':1}},
{$unwind:'$service.apps'},
{$project: {'service.apps.updates':1}},
{$sort:{'service.apps.updates.date':1}});
Anyone think they can help on this?
You can do this by $unwinding the updates array, sorting the resulting docs by date, and then $grouping them back together on _id using the sorted order.
db.servers.aggregate(
{$unwind: '$service.apps.updates'},
{$sort: {'service.apps.updates.date': 1}},
{$group: {_id: '$_id', 'updates': {$push: '$service.apps.updates'}}},
{$project: {'service.apps.updates': '$updates'}})
Starting in Mongo 4.4, the $function aggregation operator allows applying a custom javascript function to implement behaviour not supported by the MongoDB Query Language.
For instance, in order to sort an array of objects by one of their fields:
// {
// "service" : { "apps" : { "updates" : [
// { "n" : 1, "date" : ISODate("2012-03-10T16:15:00Z") },
// { "n" : 2, "date" : ISODate("2012-01-10T16:15:00Z") },
// { "n" : 5, "date" : ISODate("2012-07-10T16:15:00Z") }
// ]}}
// }
db.collection.aggregate(
{ $set: {
{Ā "service.apps.updates":
{ $function: {
body: function(updates) {
updates.sort((a, b) => a.date - b.date);
return updates;
},
args: ["$service.apps.updates"],
lang: "js"
}}
}
}
)
// {
// "service" : { "apps" : { "updates" : [
// { "n" : 2, "date" : ISODate("2012-01-10T16:15:00Z") },
// { "n" : 1, "date" : ISODate("2012-03-10T16:15:00Z") },
// { "n" : 5, "date" : ISODate("2012-07-10T16:15:00Z") }
// ]}}
// }
This modifies the array in place, without having to apply a combination of expensive $unwind, $sort and $group stages.
$function takes 3 parameters:
body, which is the function to apply, whose parameter is the array to modify.
args, which contains the fields from the record that the body function takes as parameter. In our case "$service.apps.updates".
lang, which is the language in which the body function is written. Only js is currently available.
Starting in Mongo 5.2, it's the exact use case for the new $sortArray aggregation operator:
// {
// service: { apps: { updates: [
// { n: 1, date: ISODate("2012-03-10") },
// { n: 2, date: ISODate("2012-01-10") },
// { n: 5, date: ISODate("2012-07-10") }
// ]}}
// }
db.collection.aggregate([
{ $set: {
"service.apps.updates": {
$sortArray: {
input: "$service.apps.updates",
sortBy: { date: 1 }
}
}
}}
])
// {
// service: { apps: { updates: [
// { n: 2, date: ISODate("2012-01-10") },
// { n: 1, date: ISODate("2012-03-10") },
// { n: 5, date: ISODate("2012-07-10") }
// ]}}
// }
This:
sorts ($sortArray) the service.apps.updates array (input: "$service.apps.updates")
by applying a sort on dates (sortBy: { date: 1 })
without having to apply a combination of expensive $unwind, $sort and $group stages