MongoDB - search sub documents [duplicate] - mongodb

This question already has answers here:
Retrieve only the queried element in an object array in MongoDB collection
(18 answers)
Closed 6 years ago.
I am trying to search within sub documents. This is my structure of my document:
{
_id: <ObjectID>,
email: ‘test#emample.com’,
password: ‘12345’,
images: [
{
title: ‘Broken Hand’,
description: ‘Here is a full description’,
comments: [
{
comment: ‘Looks painful’,
}
],
tags: [‘hand’, ‘broken’]
}
]
}
And i want to be able to find all images from all users that have a specific tag, but the query i am using is only returning the first image it finds with that tag:
db.site_users.find({'images.tags': "broken"}, {images: 1, images: {$elemMatch: { 'tags': 'broken'}}}).pretty()
Can someone please point me in the right direction to how i can get all the images?

You can use the aggregation framework for that:
db.site_users.aggregate([
{$unwind: "$images"},
{$match:{
"images.tags": "broken"
}}
])

The query is good as it matches the document, but the "projection" is outside the scope of what you can do with .find(), you need .aggregate() and some care taken to not remove the "images" items from the array and only the non matching "tags".
Ideally you do this with MongoDB 3.2 using $filter inside $project:
db.site_users.aggregate([
{ "$match": { "images.tags": "broken" }},
{ "$project": {
"email": 1,
"password": 1,
"images": {
"$filter": {
"input": "$images",
"as": "image",
"cond": {
"$setIsSubSet": [["broken"], "$$image.tags"]
}
}
}
}}
])
Or possibly using $map and $setDifference which is also compatible with MongoDB 2.6, as long as the "images" content is "unique" for each entry. This is due to the "set" operation, in which "sets" are "unique":
db.site_users.aggregate([
{ "$match": { "images.tags": "broken" }},
{ "$project": {
"email": 1,
"password": 1,
"images": {
"$setDifference": [
{ "$map": {
"input": "$images",
"as": "image",
"in": {
"$cond": {
"if": { "$setIsSubSet": [["broken"], "$$image.tags" ] },
"then": "$$image",
"else": false
}
}
}},
[false]
]
}
}}
])
It can be done in earlier versions of MongoDB but is possibly best avoided due to the cost of processing $unwind on the array:
db.site_users.aggregate([
{ "$match": { "images.tags": "broken" }},
{ "$unwind": "$images" },
{ "$match": { "images.tags": "broken" }},
{ "$group": {
"_id": "$_id",
"email": { "$first": "$email" },
"password": { "$first": "$password" },
"images": { "$push": "$images" }
}}
])
Since there is usually a considerable cost in using $unwind for this purpose where you are not "aggregating" anything, then if you don't have a modern version where the other practical approaches are available, it's often best to accept "filtering" the array content itself in client code rather than on the server.
So you should only resort to $unwind for this case where the array entries would be "significantly" reduced by order of removing the non-matching items. Otherwise the cost of processing is likely greater than the network cost of transferring the data, and any benefit is negated.
If you don't have a modern version, then get one. The features make all the difference to what is practical and performant.

Related

How to convert an array of documents to two dimensions array

I am making a query to MongoDB
db.getCollection('user_actions').aggregate([
{$match: {
type: 'play_started',
entity_id: {$ne: null}
}},
{$group: {
_id: '$entity_id',
view_count: {$sum: 1}
}},
])
and getting a list of docs with two fields:
How can I get a list of lists with two items like
[[entity_id, view_count], [entity_id, view_count], ...]
Actually there are two different way to do this, depending on your MongoDB server version.
The optimal way is in MongoDB 3.2 using the square brackets [] to directly create new array fields in the $project stage. This return an array for each group. The next stage is the another $group stage where you group your document and use the $push accumulator operator to return a two dimensional array.
db.getCollection('user_actions').aggregate([
{ "$match": {
"type": 'play_started',
"entity_id": { "$ne": null }
}},
{ "$group": {
"_id": "$entity_id",
"view_count": { "$sum": 1}
}},
{ "$project": {
"_id": 0,
"result": [ "$_id", "$view_count" ]
}},
{ "$group": {
"_id": null,
"result": { "$push": "$result" }
}}
])
From MongoDB 2.6 and prior to 3.2 you need a different approach. In order to create your array you need to use the $map operator. Because the $map "input" field must resolves to and array you need to use $literal operator to set a literal array value to input. Of course the $cond operator here returns the "entity_id" or "view_count" accordingly to the "boolean-expression".
db.getCollection('user_actions').aggregate([
{ "$match": {
"type": 'play_started',
"entity_id": { "$ne": null }
}},
{ "$group": {
"_id": "$entity_id",
"view_count": { "$sum": 1}
}},
{ "$project": {
"_id": 0,
"result": {
"$map": {
"input": { "$literal": [ "A", "B"] },
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", "A" ] },
"$_id",
"$view_count"
]
}
}
}
}},
{ "$group": {
"_id": null,
"result": { "$push": "$result" }
}}
])
It worth noting that this will also work in MongoDB 2.4. If you are running MongoDB 2.2, you can use the undocumented $const operator which does the same thing.

limit and sort each group by in mongoDB using aggregation [duplicate]

This question already has answers here:
mongodb group values by multiple fields
(4 answers)
Closed 3 years ago.
How can I sort and limit each group by in mongoDB.
Consider below data:
Country:USA,name:xyz,rating:10,id:x
Country:USA,name:xyz,rating:10,id:y
Country:USA,name:xyz,rating:10,id:z
Country:USA,name:abc,rating:5,id:x
Country:India,name:xyz,rating:5,id:x
Country:India,name:xyz,rating:5,id:y
Country:India,name:abc,rating:10,id:z
Country:India,name:abc,rating:10,id:x
Now say I will group by country and sort by rating and limit the data of each group by 2.
so answer would be:
Country:USA
name:xyz,rating:10,id:x
name:xyz,rating:10,id:y
Country:India
name:abc,rating:10,id:x
name:abc,rating:10,id:z
I want to accomplish this using aggregate framework only.
I tried including sort in aggregate for rating but simply query turns no results after processing.
Your best option here is to run seperate queries for each "Country" ( ideally in parallel ) and return the combined results. The queries are quite simple, and just return the top 2 values after applying a sort on the rating value and will execute quite quickly even if you need to perform multiple queries to obtain the complete result.
The aggregation framework is not a good fit for this, now and even in the near future. The problem is there is no such operator that "limits" the result of any grouping in any way. So in order to do this, you basically need to $push all content into an array and extract the "top n" values from that.
The current operations needed to do that are pretty horrible, and the core problem is results are likely to exceed the BSON limit of 16MB per document on most real data sources.
Also there is an n complexity to this due to how you would have to do it right now. But just to demonstrate with 2 items:
db.collection.aggregate([
// Sort content by country and rating
{ "$sort": { "Country": 1, "rating": -1 } },
// Group by country and push all items, keeping first result
{ "$group": {
"_id": "$Country",
"results": {
"$push": {
"name": "$name",
"rating": "$rating",
"id": "$id"
}
},
"first": {
"$first": {
"name": "$name",
"rating": "$rating",
"id": "$id"
}
}
}},
// Unwind the array
{ "$unwind": "results" },
// Remove the seen result from the array
{ "$redact": {
"$cond": {
"if": { "$eq": [ "$results.id", "$first.id" ] },
"then": "$$PRUNE",
"else": "$$KEEP"
}
}},
// Group to return the second result which is now first on stack
{ "$group": {
"_id": "$_id",
"first": { "$first": "$first" },
"second": {
"$first": {
"name": "$results.name",
"rating": "$results.rating",
"id": "$results.id"
}
}
}},
// Optionally put these in an array format
{ "$project": {
"results": {
"$map": {
"input": ["A","B"],
"as": "el",
"in": {
"$cond": {
"if": { "$eq": [ "$$el", "A" ] },
"then": "$first",
"else": "$second"
}
}
}
}
}}
])
That gets the result but its not a great approach and gets a lot more complex with iterations for higher limits or even where groupings have possibly less than n results to return in some cases.
The current development series ( 3.1.x ) as of writing has a $slice operator that makes this a bit more simple, but still has the same "size" pitfall:
db.collection.aggregate([
// Sort content by country and rating
{ "$sort": { "Country": 1, "rating": -1 } },
// Group by country and push all items, keeping first result
{ "$group": {
"_id": "$Country",
"results": {
"$push": {
"name": "$name",
"rating": "$rating",
"id": "$id"
}
}
}},
{ "$project": {
"results": { "$slice": [ "$results", 2 ] }
}}
])
But basically until the aggregation framework has some way to "limit" the number of items produced by $push or a similar grouping "limit" operator, then the aggregation framework is not really the optimal solution for this type of problem.
Simple queries like this:
db.collection.find({ "Country": "USA" }).sort({ "rating": -1 }).limit(1)
Run for each distinct country and ideally in parallel processing by event loop of thread with a combined result produces the most optimal approach right now. They only fetch what is needed, which is the big problem the aggregation framework cannot yet handle in such grouping.
So look for support to do this "combined query results" in the most optimal way for your chosen language instead, as it will be far less complex and much more performant than throwing this at the aggregation framework.

get all the documents having max value using aggregation in mongodb

I want to fetch "all the documents" having highest value for specific field and than group by another field.
Consider below data:
_id:1, country:india, quantity:12, name:xyz
_id:2, country:USA, quantity:5, name:abc
_id:3, country:USA, quantity:6, name:xyz
_id:4, country:india, quantity:8, name:def
_id:5, country:USA, quantity:10, name:jkl
_id:6, country:india, quantity:12, name:jkl
Answer should be
country:india max-quantity:12
name xyz
name jkl
country:USA max-quantity:10
name jkl
I have tried several queries, but I can get only the max value without the name or i can go group by but it shows all the values.
db.coll.aggregate([{
$group:{
_id:"$country",
"maxQuantity":{$max:"$quantity"}
}
}])
for example above will give max quantity on every country but how to combine with other field such that it shows all the documents of max quantity.
If you want to keep document information, then you basically need to $push it into an array. But of course, then having your $max values, you need to filter the contents of the array for just the elements that match:
db.coll.aggregate([
{ "$group":{
"_id": "$country",
"maxQuantity": { "$max": "$quantity" },
"docs": { "$push": {
"_id": "$_id",
"name": "$name",
"quantity": "$quantity"
}}
}},
{ "$project": {
"maxQuantity": 1,
"docs": {
"$setDifference": [
{ "$map": {
"input": "$docs",
"as": "doc",
"in": {
"$cond": [
{ "$eq": [ "$maxQuantity", "$$doc.quantity" ] },
"$$doc",
false
]
}
}},
[false]
]
}
}}
])
So you store everything in an array and then test each array member to see if it's value matches the one that was recorded as the maximum, discarding any that do not.
I'd keep the _id values in the array documents since that is what makes them "unique" and won't be adversely affected by $setDifference when filtering out values. But of course if "name" is always unique then it won't be required.
You can also just return whatever fields you want from $map, but I'm just returning the whole document for example.
Keep in mind that this has the limitation of not exceeding the BSON size limit of 16MB, so is okay for small data samples, but anything producing a potentially large list ( since you cannot pre-filter array content ) would be better of processed with a separate query to find the "max" values, and another to fetch the matching documents.
I know how to do similar task simpler only if you alter specific range of countries:
[
{"$match":{"name":{"$in":["USA","india"]}}}, // stage one
{ "$sort": { "quanity": -1 }}, // stage three
{"$limit":2 } // stage four - count equal ["USA","india"] length
]
If you need all countries try follow, but without guaranties from me:
[
{"$project": {
"country": "$country",
"quantity": "$quantity",
"document": "$$ROOT" // save all fields for future usage
}},
{ "$sort": { "quantity": -1 }},
{"$group":{"_id":{"country":"$country"},"original_doc":{"$first":"$document"} }}
]
Another way can be like:
db.coll.aggregate(
[
{
$sort:{ country: -1, "quantity": -1 }
},
{
"$group":
{
"_id":{ "country": "$country" },
"data":{ "$first": "$$ROOT" }
}
}
])
Another possibility close to Blakes Seven's solution to simplify a bit the setDifference + map part by a filter of the array of documents.
db.coll.aggregate([
{ "$group":{
"_id": "$country",
"maxQuantity": { "$max": "$quantity" },
"docs": { "$push": {
"_id": "$_id",
"name": "$name",
"quantity": "$quantity"
}}
}},
{ "$project": {
"maxQuantity": 1,
"docs": {
"$filter": {
"input": "$docs",
"as": "doc",
"cond": { $eq: ["$$doc.quantity", "$maxQuantity"] }
}
}
}}
])

MongoDB aggregation: Project separate document fields into a single array field

I have a document like this:
{fax: '8135551234', cellphone: '8134441234'}
Is there a way to project (without a group stage) this document into this:
{
phones: [{
type: 'fax',
number: '8135551234'
}, {
type: 'cellphone',
number: '8134441234'
}]
}
I could probably use a group stage operator for this, but I'd rather not if there's any other way, because my query also projects several other fields, all of which would require a $first just for the group stage.
Hope that's clear. Thanks in advance!
MongoDB 2.6 Introduces the the $map operator which is an array transformation operator which can be used to do exactly this:
db.phones.aggregate([
{ "$project": {
"phones": { "$map": {
"input": { "$literal": ["fax","cellphone"] },
"as": "el",
"in": {
"type": "$$el",
"number": { "$cond": [
{ "$eq": [ "$$el", "fax" ] },
"$fax",
"$cellphone"
]}
}
}}
}}
])
So your document now looks exactly like you want. The trick of course to to create a new array with members "fax" and "cellphone", then transform that array with the new document fields by matching those values.
Of course you can also do this in earlier versions using $unwind and $group in a similar fashion, but just not as efficiently:
db.phones.aggregate([
{ "$project": {
"type": { "$const": ["fax","cellphone"] },
"fax": 1,
"cellphone": 1
}},
{ "$unwind": "$type" },
{ "$group": {
"_id": "_id",
"phones": { "$push": {
"type": "$type",
"number": { "$cond": [
{ "$eq": [ "$type", "fax" ] },
"$fax",
"$cellphone"
]}
}}
}}
])
Of course it can be argued that unless you are doing some sort of aggregation then you may as well just post process the collection results in code. But this is an alternate way to do that.

MongoDb : Find common element from two arrays within a query

Let's say we have records of following structure in database.
{
"_id": 1234,
"tags" : [ "t1", "t2", "t3" ]
}
Now, I want to check if database contains a record with any of the tags specified in array tagsArray which is [ "t3", "t4", "t5" ]
I know about $in operator but I not only want to know whether any of the records in database has any of the tag specified in tagsArray, I also want to know which tag of the record in database matches with any of the tags specified in tagsArray. (i.e. t3 in for the case of record mentioned above)
That is, I want to compare two arrays (one of the record and other given by me) and find out the common element.
I need to have this expression along with many expressions in the query so projection operators like $, $elematch etc won't be of much use. (Or is there a way it can be used without having to iterate over all records?)
I think I can use $where operator but I don't think that is the best way to do this.
How can this problem be solved?
There are a few approaches to do what you want, it just depends on your version of MongoDB. Just submitting the shell responses. The content is basically JSON representation which is not hard to translate for DBObject entities in Java, or JavaScript to be executed on the server so that really does not change.
The first and the fastest approach is with MongoDB 2.6 and greater where you get the new set operations:
var test = [ "t3", "t4", "t5" ];
db.collection.aggregate([
{ "$match": { "tags": {"$in": test } }},
{ "$project": {
"tagMatch": {
"$setIntersection": [
"$tags",
test
]
},
"sizeMatch": {
"$size": {
"$setIntersection": [
"$tags",
test
]
}
}
}},
{ "$match": { "sizeMatch": { "$gte": 1 } } },
{ "$project": { "tagMatch": 1 } }
])
The new operators there are $setIntersection that is doing the main work and also the $size operator which measures the array size and helps for the latter filtering. This ends up as a basic comparison of "sets" in order to find the items that intersect.
If you have an earlier version of MongoDB then this is still possible, but you need a few more stages and this might affect performance somewhat depending if you have large arrays:
var test = [ "t3", "t4", "t5" ];
db.collection.aggregate([
{ "$match": { "tags": {"$in": test } }},
{ "$project": {
"tags": 1,
"match": { "$const": test }
}},
{ "$unwind": "$tags" },
{ "$unwind": "$match" },
{ "$project": {
"tags": 1,
"matched": { "$eq": [ "$tags", "$match" ] }
}},
{ "$match": { "matched": true }},
{ "$group": {
"_id": "$_id",
"tagMatch": { "$push": "$tags" },
"count": { "$sum": 1 }
}}
{ "$match": { "count": { "$gte": 1 } }},
{ "$project": { "tagMatch": 1 }}
])
Or if all of that seems to involved or your arrays are large enough to make a performance difference then there is always mapReduce:
var test = [ "t3", "t4", "t5" ];
db.collection.mapReduce(
function () {
var intersection = this.tags.filter(function(x){
return ( test.indexOf( x ) != -1 );
});
if ( intersection.length > 0 )
emit ( this._id, intersection );
},
function(){},
{
"query": { "tags": { "$in": test } },
"scope": { "test": test },
"output": { "inline": 1 }
}
)
Note that in all cases the $in operator still helps you to reduce the results even though it is not the full match. The other common element is checking the "size" of the intersection result to reduce the response.
All pretty easy to code up, convince the boss to switch to MongoDB 2.6 or greater if you are not already there for the best results.