$type (aggregation) in MongoDB - mongodb

I am checking the data types of one field in several documents that belong to a collection. Fortunately MongoDB has documentation related in that topic MongoLink. The problem is that I do not understant the output of the aggregation operation.
This is the collection
{ _id: 0, a : 8 }
{ _id: 1, a : [ 41.63, 88.19 ] }
{ _id: 2, a : { a : "apple", b : "banana", c: "carrot" } }
{ _id: 3, a : "caribou" }
{ _id: 4, a : NumberLong(71) }
{ _id: 5 }
this is the aggregation operation
db.coll.aggregate([{
$project: {
a : { $type: "$a" }
}
}])
and the result is
{ _id: 0, "a" : "double" }
{ _id: 1, "a" : "array" }
{ _id: 2, "a" : "object" }
{ _id: 3, "a" : "string" }
{ _id: 4, "a" : "long" }
{ _id: 5, "a" : "missing" }
The bits that I do not understant: (i) is the plain letter "a" not the $a in the aggregation function and (ii) the letter "a" that is in the result which is guess is not related with the $a in the aggregation function.
Best Regards

Related

Sum of All Messages Sent

Mongodb Version 2.6.9
I'm attempting to count the total a company has been involved in a messaging interaction. I'm able to get one side of the interaction using the aggregate $group, but I've come up empty on essentially looking at the two fields and aggregating those together for each unique company ID.
The sender_id and receiver_id relate to the same company Id's.
{ "_id" : a, "sender_id" : 1, "receiver_id" : 2, payload: "data" }
{ "_id" : b, "sender_id" : 2, "receiver_id" : 5, payload: "data" }
{ "_id" : c, "sender_id" : 2, "receiver_id" : 4, payload: "data" }
{ "_id" : d, "sender_id" : 3, "receiver_id" : 2, payload: "data" }
{ "_id" : e, "sender_id" : 4, "receiver_id" : 1, payload: "data" }
Using the above data structure, I attempting to produce a result set similar to
{ "_id" : 1, count: 2}
{ "_id" : 2, count: 4}
{ "_id" : 3, count: 1}
{ "_id" : 4, count: 2}
{ "_id" : 5, count: 1}
where for example Company 2 was involved in messages a, b, c, d.
Your options are limited in 2.6 pipeline. You can try below pipeline.
$group with $push to create single value array for both sender_id and receiver_id.
$project with $setUnion to merge ids into single array.
$unwind and $group to count the occurrences.
db.collection.aggregate({
"$group": {
"_id": "$_id",
"sender_id": {
"$push": "$sender_id"
},
"receiver_id": {
"$push": "$receiver_id"
}
}
}, {
"$project": {
"id": {
"$setUnion": ["$sender_id", "$receiver_id"]
}
}
}, {
"$unwind": "$id"
}, {
"$group": {
"_id": "$id",
"count": {
"$sum": 1
}
}
})
You can use below pipeline for newer versions. Use [] to create array.
db.collection.aggregate({
$project: {
id: ["$sender_id", "$receiver_id"]
}
}, {
$unwind: "$id"
}, {
$group: {
_id: "$id",
count: {
$sum: 1
}
}
})

MongoDB: Project to array item with minimum value of field

Suppose my collection consists of items that looks like this:
{
"items" : [
{
"item_id": 1,
"item_field": 10
},
{
"item_id": 2,
"item_field": 15
},
{
"item_id": 3,
"item_field": 3
},
]
}
Can I somehow select the entry of items with the lowest value of item_field, in this case the one with item_id 3?
I'm ok with using the aggregation framework. Bonus point if you can give me the code for the C# driver.
You can use $reduce expression in the following way.
The below query will set the initialValue to the first element of $items.item_field and followed by $lt comparison on the item_field and if true set $$this to $$value, if false keep the previous value and $reduce all the values to find the minimum element and $project to output min item.
db.collection.aggregate([
{
$project: {
items: {
$reduce: {
input: "$items",
initialValue:{
item_field:{
$let: {
vars: { obj: { $arrayElemAt: ["$items", 0] } },
in: "$$obj.item_field"
}
}
},
in: {
$cond: [{ $lt: ["$$this.item_field", "$$value.item_field"] }, "$$this", "$$value" ]
}
}
}
}
}
])
You can use $unwind to seperate items entries.
Then $sort by item_field asc and then $group.
db.coll.find().pretty()
{
"_id" : ObjectId("58edec875748bae2cc391722"),
"items" : [
{
"item_id" : 1,
"item_field" : 10
},
{
"item_id" : 2,
"item_field" : 15
},
{
"item_id" : 3,
"item_field" : 3
}
]
}
db.coll.aggregate([
{$unwind: {path: '$items', includeArrayIndex: 'index'}},
{$sort: { 'items.item_field': 1}},
{$group: {_id: '$_id', item: {$first: '$items'}}}
])
{ "_id" : ObjectId("58edec875748bae2cc391722"), "item" : { "item_id" : 3, "item_field" : 3 } }
We can get expected result using following query
db.testing.aggregate([{$unwind:"$items"}, {$sort: { 'items.item_field': 1}},{$group: {_id: "$_id", minItem: {$first: '$items'}}}])
Result is
{ "_id" : ObjectId("58edf28c73fed29f4b741731"), "minItem" : { "item_id" : 3, "item_field" : 3 } }
{ "_id" : ObjectId("58edec3373fed29f4b741730"), "minItem" : { "item_id" : 3, "item_field" : 3 } }

Finding all documents which share the same value in an array

Consider I have the following data below:
{
"id":123,
"name":"apple",
"codes":["ABC", "DEF", "EFG"]
}
{
"id":234,
"name":"pineapple",
"codes":["DEF"]
}
{
"id":345,
"name":"banana",
"codes":["HIJ","KLM"]
}
If I didn't want to search by a specific code, is there a way to find all fruits in my mongodb collection which shares the same code?
db.collection.aggregate([
{ $unwind: '$codes' },
{ $group: { _id: '$codes', count: {$sum:1}, fruits: {$push: '$name'}}},
{ $match: {'count': {$gt:1}}},
{ $group:{_id:null, total:{$sum:1}, data:{$push:{fruits: '$fruits', code:'$_id'}}}}
])
result:
{ "_id" : null, "total" : 1, "data" : [ { "fruits" : [ "apple", "pineapple" ], "code" : "DEF" } ] }

Add some kind of row number to a mongodb aggregate command / pipeline

The idea is to return a kind of row number to a mongodb aggregate command/ pipeline. Similar to what we've in an RDBM.
It should be a unique number, not important if it matches exactly to a row/number.
For a query like:
[ { $match: { "author" : { $ne: 1 } } }, { $limit: 1000000 } ]
I'd like to return:
{ "rownum" : 0, "title" : "The Banquet", "author" : "Dante", "copies" : 2 }
{ "rownum" : 1, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 }
{ "rownum" : 2, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
{ "rownum" : 3, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 }
{ "rownum" : 4, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
Is it possible to generate this rownum in mongodb?
Not sure about the performance in big queries, but this is at least an option.
You can add your results to an array by grouping/pushing and then unwind with includeArrayIndex like this:
[
{$match: {author: {$ne: 1}}},
{$limit: 10000},
{$group: {
_id: 1,
book: {$push: {title: '$title', author: '$author', copies: '$copies'}}
}},
{$unwind: {path: '$book', includeArrayIndex: 'rownum'}},
{$project: {
author: '$book.author',
title: '$book.title',
copies: '$book.copies',
rownum: 1
}}
]
Now, if your database contains a big amount of records, and you intend to paginate, you can use the $skip stage and then $limit 10 or 20 or whatever you want to display per page, and just add the number from the $skip stage to your rownum and you'll get the real position without having to push all your results to enumerate them.
Starting in Mongo 5, it's a perfect use case for the new $setWindowFields aggregation operator and its $documentNumber operation:
// { x: "a" }
// { x: "b" }
// { x: "c" }
// { x: "d" }
db.collection.aggregate([
{ $setWindowFields: {
sortBy: { _id: 1 },
output: { rowNumber: { $documentNumber: {} } }
}}
])
// { x: "a", rowNumber: 1 }
// { x: "b", rowNumber: 2 }
// { x: "c", rowNumber: 3 }
// { x: "d", rowNumber: 4 }
$setWindowFields allows us to work for each document with the knowledge of previous or following documents. Here we just need the information of the place of the document in the whole collection (or aggregation intermediate result), as provided by $documentNumber.
Note that we sort by _id because the sortBy parameter is required, but really, since you don't care about the ordering of your rows, it could be anything you'd like.
Another way would be to keep track of row_number using "$function"
[{ $match: { "author" : { $ne: 1 } }} , { $limit: 1000000 },
{
$set: {
"rownum": {
"$function": {
"body": "function() {try {row_number+= 1;} catch (e) {row_number= 0;}return row_number;}",
"args": [],
"lang": "js"
}
}
}
}]
I am not sure if this can mess up something though!

"Structured" grouping query in MongoDB

I have the following items collection :
[{
"_id": 1,
"manufactureId": 1,
"itemTypeId": "Type1"
},
{
"_id": 2,
"manufactureId": 1,
"itemTypeId": "Type2"
},
{
"_id": 3,
"manufactureId": 2,
"itemTypeId": "Type1"
}]
I would like to create a query that will return the amount of items for each item type that each manufacturer have in the following structure (or something similar) :
[
{
_id:1, //this would be the manufactureId
itemsCount:{
"Type1":1, //Type1 items count
"Type2":1 //...
}
},
{
_id:2,
itemsCount:{
"Type1":1
}
}
]
I have tried to use the aggregation framework but i couldn't figure out if there is a way to create a "structured" groupby queries with it.
I can easily achieve the desired result by post-processing this simple aggregation query result :
db.items.aggregate([{$group:{_id:{itemTypeId:"$itemTypeId",manufactureId:"$manufactureId"},count:{$sum:1}}}])
but if possible I prefer not to post-process the result.
Data stays data
I would rather use this query which, I believe, will give you the closest data structure to what you want, without post-processing.
Query
db.items.aggregate(
{
$group:
{
_id:
{
itemTypeId: "$itemTypeId",
manufactureId: "$manufactureId"
},
count:
{
$sum: 1
}
},
},
{
$group:
{
_id: "$_id.manufactureId",
itemCounts:
{
"$push":
{
itemTypeId: "$_id.itemTypeId",
count: "$count"
}
}
}
})
Output
{
"_id" : 1,
"itemCounts" : [
{
"itemTypeId" : "Type1",
"count" : 1
},
{
"itemTypeId" : "Type2",
"count" : 1
}
]
},
{
"_id" : 2,
"itemCounts" : [
{
"itemTypeId" : "Type1",
"count" : 1
}
]
}
Data transformed to object fields
This is actually an approach that I wouldn't advice in general. It is harder to manage in your application, because the field names between different objects will be inconsistent and you won't know what object fields to expect in advance. This would be a crucial point if you use a strongly typed language—automatic data binding to your domain objects will become impossible.
Anyway, the only way to get the exact data structure you want is to apply post-processing.
Query
db.items.aggregate(
{
$group:
{
_id:
{
itemTypeId: "$itemTypeId",
manufactureId: "$manufactureId"
},
count:
{
$sum: 1
}
},
},
{
$group:
{
_id: "$_id.manufactureId",
itemCounts:
{
"$push":
{
itemTypeId: "$_id.itemTypeId",
count: "$count"
}
}
}
}).forEach(function(doc) {
var obj = {
_id: doc._id,
itemCounts: {}
};
doc.itemCounts.forEach(function(typeCount) {
obj.itemCounts[typeCount.itemTypeId] = typeCount.count;
});
printjson(obj);
})
Output
{ "_id" : 1, "itemCounts" : { "Type1" : 1, "Type2" : 1 } }
{ "_id" : 2, "itemCounts" : { "Type1" : 1 } }