MongoDB aggregation: Project separate document fields into a single array field - mongodb

I have a document like this:
{fax: '8135551234', cellphone: '8134441234'}
Is there a way to project (without a group stage) this document into this:
{
phones: [{
type: 'fax',
number: '8135551234'
}, {
type: 'cellphone',
number: '8134441234'
}]
}
I could probably use a group stage operator for this, but I'd rather not if there's any other way, because my query also projects several other fields, all of which would require a $first just for the group stage.
Hope that's clear. Thanks in advance!

MongoDB 2.6 Introduces the the $map operator which is an array transformation operator which can be used to do exactly this:
db.phones.aggregate([
{ "$project": {
"phones": { "$map": {
"input": { "$literal": ["fax","cellphone"] },
"as": "el",
"in": {
"type": "$$el",
"number": { "$cond": [
{ "$eq": [ "$$el", "fax" ] },
"$fax",
"$cellphone"
]}
}
}}
}}
])
So your document now looks exactly like you want. The trick of course to to create a new array with members "fax" and "cellphone", then transform that array with the new document fields by matching those values.
Of course you can also do this in earlier versions using $unwind and $group in a similar fashion, but just not as efficiently:
db.phones.aggregate([
{ "$project": {
"type": { "$const": ["fax","cellphone"] },
"fax": 1,
"cellphone": 1
}},
{ "$unwind": "$type" },
{ "$group": {
"_id": "_id",
"phones": { "$push": {
"type": "$type",
"number": { "$cond": [
{ "$eq": [ "$type", "fax" ] },
"$fax",
"$cellphone"
]}
}}
}}
])
Of course it can be argued that unless you are doing some sort of aggregation then you may as well just post process the collection results in code. But this is an alternate way to do that.

Related

How to convert an array of documents to two dimensions array

I am making a query to MongoDB
db.getCollection('user_actions').aggregate([
{$match: {
type: 'play_started',
entity_id: {$ne: null}
}},
{$group: {
_id: '$entity_id',
view_count: {$sum: 1}
}},
])
and getting a list of docs with two fields:
How can I get a list of lists with two items like
[[entity_id, view_count], [entity_id, view_count], ...]
Actually there are two different way to do this, depending on your MongoDB server version.
The optimal way is in MongoDB 3.2 using the square brackets [] to directly create new array fields in the $project stage. This return an array for each group. The next stage is the another $group stage where you group your document and use the $push accumulator operator to return a two dimensional array.
db.getCollection('user_actions').aggregate([
{ "$match": {
"type": 'play_started',
"entity_id": { "$ne": null }
}},
{ "$group": {
"_id": "$entity_id",
"view_count": { "$sum": 1}
}},
{ "$project": {
"_id": 0,
"result": [ "$_id", "$view_count" ]
}},
{ "$group": {
"_id": null,
"result": { "$push": "$result" }
}}
])
From MongoDB 2.6 and prior to 3.2 you need a different approach. In order to create your array you need to use the $map operator. Because the $map "input" field must resolves to and array you need to use $literal operator to set a literal array value to input. Of course the $cond operator here returns the "entity_id" or "view_count" accordingly to the "boolean-expression".
db.getCollection('user_actions').aggregate([
{ "$match": {
"type": 'play_started',
"entity_id": { "$ne": null }
}},
{ "$group": {
"_id": "$entity_id",
"view_count": { "$sum": 1}
}},
{ "$project": {
"_id": 0,
"result": {
"$map": {
"input": { "$literal": [ "A", "B"] },
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", "A" ] },
"$_id",
"$view_count"
]
}
}
}
}},
{ "$group": {
"_id": null,
"result": { "$push": "$result" }
}}
])
It worth noting that this will also work in MongoDB 2.4. If you are running MongoDB 2.2, you can use the undocumented $const operator which does the same thing.

MongoDB - search sub documents [duplicate]

This question already has answers here:
Retrieve only the queried element in an object array in MongoDB collection
(18 answers)
Closed 6 years ago.
I am trying to search within sub documents. This is my structure of my document:
{
_id: <ObjectID>,
email: ‘test#emample.com’,
password: ‘12345’,
images: [
{
title: ‘Broken Hand’,
description: ‘Here is a full description’,
comments: [
{
comment: ‘Looks painful’,
}
],
tags: [‘hand’, ‘broken’]
}
]
}
And i want to be able to find all images from all users that have a specific tag, but the query i am using is only returning the first image it finds with that tag:
db.site_users.find({'images.tags': "broken"}, {images: 1, images: {$elemMatch: { 'tags': 'broken'}}}).pretty()
Can someone please point me in the right direction to how i can get all the images?
You can use the aggregation framework for that:
db.site_users.aggregate([
{$unwind: "$images"},
{$match:{
"images.tags": "broken"
}}
])
The query is good as it matches the document, but the "projection" is outside the scope of what you can do with .find(), you need .aggregate() and some care taken to not remove the "images" items from the array and only the non matching "tags".
Ideally you do this with MongoDB 3.2 using $filter inside $project:
db.site_users.aggregate([
{ "$match": { "images.tags": "broken" }},
{ "$project": {
"email": 1,
"password": 1,
"images": {
"$filter": {
"input": "$images",
"as": "image",
"cond": {
"$setIsSubSet": [["broken"], "$$image.tags"]
}
}
}
}}
])
Or possibly using $map and $setDifference which is also compatible with MongoDB 2.6, as long as the "images" content is "unique" for each entry. This is due to the "set" operation, in which "sets" are "unique":
db.site_users.aggregate([
{ "$match": { "images.tags": "broken" }},
{ "$project": {
"email": 1,
"password": 1,
"images": {
"$setDifference": [
{ "$map": {
"input": "$images",
"as": "image",
"in": {
"$cond": {
"if": { "$setIsSubSet": [["broken"], "$$image.tags" ] },
"then": "$$image",
"else": false
}
}
}},
[false]
]
}
}}
])
It can be done in earlier versions of MongoDB but is possibly best avoided due to the cost of processing $unwind on the array:
db.site_users.aggregate([
{ "$match": { "images.tags": "broken" }},
{ "$unwind": "$images" },
{ "$match": { "images.tags": "broken" }},
{ "$group": {
"_id": "$_id",
"email": { "$first": "$email" },
"password": { "$first": "$password" },
"images": { "$push": "$images" }
}}
])
Since there is usually a considerable cost in using $unwind for this purpose where you are not "aggregating" anything, then if you don't have a modern version where the other practical approaches are available, it's often best to accept "filtering" the array content itself in client code rather than on the server.
So you should only resort to $unwind for this case where the array entries would be "significantly" reduced by order of removing the non-matching items. Otherwise the cost of processing is likely greater than the network cost of transferring the data, and any benefit is negated.
If you don't have a modern version, then get one. The features make all the difference to what is practical and performant.

get all the documents having max value using aggregation in mongodb

I want to fetch "all the documents" having highest value for specific field and than group by another field.
Consider below data:
_id:1, country:india, quantity:12, name:xyz
_id:2, country:USA, quantity:5, name:abc
_id:3, country:USA, quantity:6, name:xyz
_id:4, country:india, quantity:8, name:def
_id:5, country:USA, quantity:10, name:jkl
_id:6, country:india, quantity:12, name:jkl
Answer should be
country:india max-quantity:12
name xyz
name jkl
country:USA max-quantity:10
name jkl
I have tried several queries, but I can get only the max value without the name or i can go group by but it shows all the values.
db.coll.aggregate([{
$group:{
_id:"$country",
"maxQuantity":{$max:"$quantity"}
}
}])
for example above will give max quantity on every country but how to combine with other field such that it shows all the documents of max quantity.
If you want to keep document information, then you basically need to $push it into an array. But of course, then having your $max values, you need to filter the contents of the array for just the elements that match:
db.coll.aggregate([
{ "$group":{
"_id": "$country",
"maxQuantity": { "$max": "$quantity" },
"docs": { "$push": {
"_id": "$_id",
"name": "$name",
"quantity": "$quantity"
}}
}},
{ "$project": {
"maxQuantity": 1,
"docs": {
"$setDifference": [
{ "$map": {
"input": "$docs",
"as": "doc",
"in": {
"$cond": [
{ "$eq": [ "$maxQuantity", "$$doc.quantity" ] },
"$$doc",
false
]
}
}},
[false]
]
}
}}
])
So you store everything in an array and then test each array member to see if it's value matches the one that was recorded as the maximum, discarding any that do not.
I'd keep the _id values in the array documents since that is what makes them "unique" and won't be adversely affected by $setDifference when filtering out values. But of course if "name" is always unique then it won't be required.
You can also just return whatever fields you want from $map, but I'm just returning the whole document for example.
Keep in mind that this has the limitation of not exceeding the BSON size limit of 16MB, so is okay for small data samples, but anything producing a potentially large list ( since you cannot pre-filter array content ) would be better of processed with a separate query to find the "max" values, and another to fetch the matching documents.
I know how to do similar task simpler only if you alter specific range of countries:
[
{"$match":{"name":{"$in":["USA","india"]}}}, // stage one
{ "$sort": { "quanity": -1 }}, // stage three
{"$limit":2 } // stage four - count equal ["USA","india"] length
]
If you need all countries try follow, but without guaranties from me:
[
{"$project": {
"country": "$country",
"quantity": "$quantity",
"document": "$$ROOT" // save all fields for future usage
}},
{ "$sort": { "quantity": -1 }},
{"$group":{"_id":{"country":"$country"},"original_doc":{"$first":"$document"} }}
]
Another way can be like:
db.coll.aggregate(
[
{
$sort:{ country: -1, "quantity": -1 }
},
{
"$group":
{
"_id":{ "country": "$country" },
"data":{ "$first": "$$ROOT" }
}
}
])
Another possibility close to Blakes Seven's solution to simplify a bit the setDifference + map part by a filter of the array of documents.
db.coll.aggregate([
{ "$group":{
"_id": "$country",
"maxQuantity": { "$max": "$quantity" },
"docs": { "$push": {
"_id": "$_id",
"name": "$name",
"quantity": "$quantity"
}}
}},
{ "$project": {
"maxQuantity": 1,
"docs": {
"$filter": {
"input": "$docs",
"as": "doc",
"cond": { $eq: ["$$doc.quantity", "$maxQuantity"] }
}
}
}}
])

How to find match in documents in Mongo and Mongo aggregation?

I have following json structure in mongo collection-
{
"students":[
{
"name":"ABC",
"fee":1233
},
{
"name":"PQR",
"fee":345
}
],
"studentDept":[
{
"name":"ABC",
"dept":"A"
},
{
"name":"XYZ",
"dept":"X"
}
]
},
{
"students":[
{
"name":"XYZ",
"fee":133
},
{
"name":"LMN",
"fee":56
}
],
"studentDept":[
{
"name":"XYZ",
"dept":"X"
},
{
"name":"LMN",
"dept":"Y"
},
{
"name":"ABC",
"dept":"P"
}
]
}
Now I want to calculate following output.
if students.name = studentDept.name
so my result should be as below
{
"name":"ABC",
"fee":1233,
"dept":"A",
},
{
"name":"XYZ",
"fee":133,
"dept":"X"
}
{
"name":"LMN",
"fee":56,
"dept":"Y"
}
Do I need to use mongo aggregation or is it possible to get above given output without using aggregation???
What you are really asking here is how to make MongoDB return something that is actually quite different from the form in which you store it in your collection. The standard query operations do allow a "limitted" form of "projection", but even as the title on the page shared in that link suggests, this is really only about "limiting" the fields to display in results based on what is present in your document already.
So any form of "alteration" requires some form of aggregation, which with both the aggregate and mapReduce operations allow to "re-shape" the document results into a form that is different from the input. Perhaps also the main thing people miss with the aggregation framework in particular, is that it is not just all about "aggregating", and in fact the "re-shaping" concept is core to it's implementation.
So in order to get results how you want, you can take an approach like this, which should be suitable for most cases:
db.collection.aggregate([
{ "$unwind": "$students" },
{ "$unwind": "$studentDept" },
{ "$group": {
"_id": "$students.name",
"tfee": { "$first": "$students.fee" },
"tdept": {
"$min": {
"$cond": [
{ "$eq": [
"$students.name",
"$studentDept.name"
]},
"$studentDept.dept",
false
]
}
}
}},
{ "$match": { "tdept": { "$ne": false } } },
{ "$sort": { "_id": 1 } },
{ "$project": {
"_id": 0,
"name": "$_id",
"fee": "$tfee",
"dept": "$tdept"
}}
])
Or alternately just "filter out" the cases where the two "name" fields do not match and then just project the content with the fields you want, if crossing content between documents is not important to you:
db.collection.aggregate([
{ "$unwind": "$students" },
{ "$unwind": "$studentDept" },
{ "$project": {
"_id": 0,
"name": "$students.name",
"fee": "$students.fee",
"dept": "$studentDept.dept",
"same": { "$eq": [ "$students.name", "$studentDept.name" ] }
}},
{ "$match": { "same": true } },
{ "$project": {
"name": 1,
"fee": 1,
"dept": 1
}}
])
From MongoDB 2.6 and upwards you can even do the same thing "inline" to the document between the two arrays. You still want to reshape that array content in your final output though, but possible done a little faster:
db.collection.aggregate([
// Compares entries in each array within the document
{ "$project": {
"students": {
"$map": {
"input": "$students",
"as": "stu",
"in": {
"$setDifference": [
{ "$map": {
"input": "$studentDept",
"as": "dept",
"in": {
"$cond": [
{ "$eq": [ "$$stu.name", "$$dept.name" ] },
{
"name": "$$stu.name",
"fee": "$$stu.fee",
"dept": "$$dept.dept"
},
false
]
}
}},
[false]
]
}
}
}
}},
// Students is now an array of arrays. So unwind it twice
{ "$unwind": "$students" },
{ "$unwind": "$students" },
// Rename the fields and exclude
{ "$project": {
"_id": 0,
"name": "$students.name",
"fee": "$students.fee",
"dept": "$students.dept"
}},
])
So where you want to essentially "alter" the structure of the output then you need to use one of the aggregation tools to do. And you can, even if you are not really aggregating anything.

Mongodb array concatenation

When querying mongodb, is it possible to process ("project") the result so as to perform array concatenation?
I actually have 2 different scenarios:
(1) Arrays from different fields:, e.g:
Given:
{companyName:'microsoft', managers:['ariel', 'bella'], employees:['charlie', 'don']}
{companyName:'oracle', managers:['elena', 'frank'], employees:['george', 'hugh']}
I'd like my query to return each company with its 'managers' and 'employees' concatenated:
{companyName:'microsoft', allPersonnel:['ariel', 'bella','charlie', 'don']}
{companyName:'oracle', allPersonnel:['elena', 'frank','george', 'hugh']}
(2) Nested arrays:, e.g.:
Given the following docs, where employees are separated into nested arrays (never mind why, it's a long story):
{companyName:'microsoft', personnel:[ ['ariel', 'bella'], ['charlie', 'don']}
{companyName:'oracle', personnel:[ ['elena', 'frank'], ['george', 'hugh']}
I'd like my query to return each company with a flattened 'personal' array:
{companyName:'microsoft', allPersonnel:['ariel', 'bella','charlie', 'don']}
{companyName:'oracle', allPersonnel:['elena', 'frank','george', 'hugh']}
I'd appreciate any ideas, using either 'find' or 'aggregate'
Thanks a lot :)
Of Course in Modern MongoDB releases we can simply use $concatArrays here:
db.collection.aggregate([
{ "$project": {
"companyNanme": 1,
"allPersonnel": { "$concatArrays": [ "$managers", "$employees" ] }
}}
])
Or for the second form with nested arrays, using $reduce in combination:
db.collection.aggregate([
{ "$project": {
"companyName": 1,
"allEmployees": {
"$reduce": {
"input": "$personnel",
"initialValue": [],
"in": { "$concatArrays": [ "$$value", "$$this" ] }
}
}
}}
])
There is the $setUnion operator available to the aggregation framework. The constraint here is that these are "sets" and all the members are actually "unique" as a "set" requires:
db.collection.aggregate([
{ "$project": {
"companyname": 1,
"allPersonnel": { "$setUnion": [ "$managers", "$employees" ] }
}}
])
So that is cool, as long as all are "unique" and you are in singular arrays.
In the alternate case you can always process with $unwind and $group. The personnel nested array is a simple double unwind
db.collection.aggregate([
{ "$unwind": "$personnel" },
{ "$unwind": "$personnel" },
{ "$group": {
"_id": "$_id",
"companyName": { "$first": "$companyName" },
"allPersonnel": { "$push": { "$personnel" } }
}}
])
Or the same thing as the first one for versions earlier than MongoDB 2.6 where the "set operators" did not exist:
db.collection.aggregate([
{ "$project": {
"type": { "$const": [ "M", "E" ] },
"companyName": 1,
"managers": 1,
"employees": 1
}},
{ "$unwind": "$type" },
{ "$unwind": "$managers" },
{ "$unwind": "$employees" },
{ "$group": {
"_id": "$_id",
"companyName": { "$first": "$companyName" },
"allPersonnel": {
"$addToSet": {
"$cond": [
{ "$eq": [ "$type", "M" ] },
"$managers",
"$employees"
]
}
}
}}
])