MongoDB - limit response in array property? - mongodb

I have a MongoDB collection indicators/
It returns statistical data such as:
/indicators/population
{
id: "population"
data : [
{
country : "A",
value : 100
},
{
country : "B",
value : 150
}
]
}
I would like to be able to limit the response to specific countries.
MongoDB doesn't seem to support this, so should I:
Restructure the MongoDB collection setup to allow this via native find()
Extend my API so that it allows filtering of the data array before returning to client
Other?

This is actually a very simple operation that just involves "projection" using the positional $ operator in order to match a given condition. In the case of a "singular" match that is:
db.collection.find(
{ "data.country": "A" },
{ "data.$": 1 }
)
And that will match the first element in the array which matches the condition as given in the query.
For more than one match, you need to invoke the aggregation framework for MongoDB:
db.collection.agggregate([
// Match documents that are possible first
{ "$match": {
"data.country": "A"
}},
// Unwind the array to "de-normalize" the documents
{ "$unwind": "$data" },
// Actually filter the now "expanded" array items
{ "$match": {
"data.country": "A"
}},
// Group back together
{ "$group": {
"_id": "$_id",
"data": { "$push": "$data" }
}}
])
Or with MongoDB 2.6 or greater, a little bit cleaner, or at least without the $unwind:
db.collection.aggregate({
// Match documents that are possible first
{ "$match": {
"data.country": "A"
}},
// Filter out the array in place
{ "$project": {
"data": {
"$setDifference": [
{
"$map": {
"input": "$data",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.country", "A" },
"$$el",
false
]
}
}
},
[false]
]
}
}}
])

If my understanding of the problem is ok, then you can use :
db.population.find({"population.data.country": {$in : ["A", "C"]}});

Related

Find Documents by the number of embedded array elements that match condition

I am new to MongoDB and need help in accomplishing my task:
I am using MongoDB to query for actions that were taken by a person. The actions are embedded in the person document like this:
{
"_id" : ObjectId("56447ac0583d4871570041c3"),
"email" : "email#example.net",
"actions" : [
{
"name" : "support",
"created_at" : ISODate("2015-10-17T01:40:35.000Z"),
},
{
"name" : "hide",
"created_at" : ISODate("2015-10-16T01:40:35.000Z")
},
{
"name" : "support",
"created_at" : ISODate("2015-10-17T03:40:35.000Z"),
}
]
}
A person can have many actions with different action names (support and hide are just 2 examples).
I know that I could find all people with at least one support action like this:
db.test.find({'actions.name':'support'})
What I want to do, is, retrieve all people with at least X support actions. Is this possible without using javascript syntax? As people could have hundreds of actions, this would be slow.
So, If I want all people with at least 2 support actions, the only way I know would be using the js syntax:
db.test.find({$where: function() {
return this.actions.filter(function(action){
return action.name = 'support';
}).length >= 2;
}});
Is there an other/better/faster possibility for this query?
Well the best way to do this is using the the .aggregate() method which provides access to the aggregation pipelines.
You can reduce the size of documents to process on the pipeline using $match operator to filter out all documents that don't match the given criteria.
You need to use the $redact operator to return only documents where the numbers of elements that with name "support" in your array is $gte 2. The $map operator here return an array of subdocuments that match your critera and false that you can easily drop using the $setDifference operator. Of course the $size operator returns the size of the array.
db.test.aggregate([
{ "$match": {
"actions.name": "support",
"actions.2": { "$exists": true }
}},
{ "$redact": {
"$cond": [
{ "$gte": [
{ "$size": {
"$setDifference": [
{ "$map": {
"input": "$actions",
"as": "action",
"in": {
"$cond": [
{ "$eq": [ "$$action.name", "support" ] },
"$$action",
false
]
}
}},
[false]
]
}},
2
]},
"$$KEEP",
"$$PRUNE"
]
}}
])
From MongoDB 3.2 this can be handled using the $filter operator.
db.test.aggregate([
{ "$match": {
"actions.name": "support",
"actions.2": { "$exists": true }
}},
{ "$redact": {
"$cond": [
{ "$gte": [
{ "$size": {
"$filter": {
"input": "$actions",
"as": "action",
"cond": { "$eq": [ "$$action.name", "support" ] }
}
}},
2
]},
"$$KEEP",
"$$PRUNE"
]
}}
])
As #BlakesSeven pointed out:
$setDifference is fine as long as the data being filtered is "unique". In this case it "should" be fine, but if any two results contained the same date then it would skew results by considering the two to be one. $filter is the better option when it comes, but if data was not unique it would be necessary to $unwind at present.
I haven't benchmarked this against your attempt, but this sounds like a great usecase for Mongo's aggregation framework.
db.test.aggregate([
{$unwind: "$actions"},
{$group: {
_id: { _id: "$_id", action: "$actions},
count: {$sum: 1}
},
{$match: {$and: [{count: {$gt: 2}}, {"_id.action": "support"]}
]);
Note that I havent run this in mongo, so it might have some syntax issues.
The idea behind it is:
unwind the actions array so each element of the array becomes its own document
group the resulting collection by an _id - action type pair, and count how much we get of each.
match will filter for only things we are interested in.

Using $project to return an array

I have a collection with documents which look like this:
{
"campaignType" : 1,
"allowAccessControl" : true,
"userId" : "108028399"
}
I'd like to query this collection using aggregation framework and have a result which looks like this:
{
"campaignType" : ["APPLICATION"],
"allowAccessControl" : "true",
"userId" : "108028399",
}
You will notice that:
campaignType field becomes and array
the numeric value was mapped to a string
Can that be done using aggregation framework?
I tried looking at $addToSet and $push but had no luck.
Please help.
Thanks
In either case here it is th $cond operator from the aggregation framework that is your friend. It is a "ternary" operator, which means it evaluates a condition for true|false and then returns the result based on that evaluation.
So for modern versions from MongoDB 2.6 and upwards you can $project with usage of the $map operator to construct the array:
db.campaign.aggregate([
{ "$project": {
"campaignType": {
"$map": {
"input": { "$literal": [1] },
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$campaignType", 1 ] },
"APPLICATION",
false
]
}
}
},
"allowAcessControl" : 1,
"userId": 1
}}
])
Or generally in most versions you can simply use the $push operator in a $group pipeline stage:
db.campaign.aggregate([
{ "$group": {
"_id": "$_id",
"campaignType": {
"$push": {
"$cond": [
{ "$eq": [ "$campaignType", 1 ] },
"APPLICATION",
false
]
}
},
"allowAccessControl": { "$first": "$allowAccessControl" },
"userId": { "first": "$userId" }
}}
])
But the general concept if that you use "nested" expressions with the $cond operator in order to "test" and return some value that matches your "mapping" condition and do that with another operator that allows you to produce an array.

MongoDb : Find common element from two arrays within a query

Let's say we have records of following structure in database.
{
"_id": 1234,
"tags" : [ "t1", "t2", "t3" ]
}
Now, I want to check if database contains a record with any of the tags specified in array tagsArray which is [ "t3", "t4", "t5" ]
I know about $in operator but I not only want to know whether any of the records in database has any of the tag specified in tagsArray, I also want to know which tag of the record in database matches with any of the tags specified in tagsArray. (i.e. t3 in for the case of record mentioned above)
That is, I want to compare two arrays (one of the record and other given by me) and find out the common element.
I need to have this expression along with many expressions in the query so projection operators like $, $elematch etc won't be of much use. (Or is there a way it can be used without having to iterate over all records?)
I think I can use $where operator but I don't think that is the best way to do this.
How can this problem be solved?
There are a few approaches to do what you want, it just depends on your version of MongoDB. Just submitting the shell responses. The content is basically JSON representation which is not hard to translate for DBObject entities in Java, or JavaScript to be executed on the server so that really does not change.
The first and the fastest approach is with MongoDB 2.6 and greater where you get the new set operations:
var test = [ "t3", "t4", "t5" ];
db.collection.aggregate([
{ "$match": { "tags": {"$in": test } }},
{ "$project": {
"tagMatch": {
"$setIntersection": [
"$tags",
test
]
},
"sizeMatch": {
"$size": {
"$setIntersection": [
"$tags",
test
]
}
}
}},
{ "$match": { "sizeMatch": { "$gte": 1 } } },
{ "$project": { "tagMatch": 1 } }
])
The new operators there are $setIntersection that is doing the main work and also the $size operator which measures the array size and helps for the latter filtering. This ends up as a basic comparison of "sets" in order to find the items that intersect.
If you have an earlier version of MongoDB then this is still possible, but you need a few more stages and this might affect performance somewhat depending if you have large arrays:
var test = [ "t3", "t4", "t5" ];
db.collection.aggregate([
{ "$match": { "tags": {"$in": test } }},
{ "$project": {
"tags": 1,
"match": { "$const": test }
}},
{ "$unwind": "$tags" },
{ "$unwind": "$match" },
{ "$project": {
"tags": 1,
"matched": { "$eq": [ "$tags", "$match" ] }
}},
{ "$match": { "matched": true }},
{ "$group": {
"_id": "$_id",
"tagMatch": { "$push": "$tags" },
"count": { "$sum": 1 }
}}
{ "$match": { "count": { "$gte": 1 } }},
{ "$project": { "tagMatch": 1 }}
])
Or if all of that seems to involved or your arrays are large enough to make a performance difference then there is always mapReduce:
var test = [ "t3", "t4", "t5" ];
db.collection.mapReduce(
function () {
var intersection = this.tags.filter(function(x){
return ( test.indexOf( x ) != -1 );
});
if ( intersection.length > 0 )
emit ( this._id, intersection );
},
function(){},
{
"query": { "tags": { "$in": test } },
"scope": { "test": test },
"output": { "inline": 1 }
}
)
Note that in all cases the $in operator still helps you to reduce the results even though it is not the full match. The other common element is checking the "size" of the intersection result to reduce the response.
All pretty easy to code up, convince the boss to switch to MongoDB 2.6 or greater if you are not already there for the best results.

How to optimize mongoDB query?

I am having following sample document in the mongoDB.
{
"location" : {
"language" : null,
"country" : "null",
"city" : "null",
"state" : null,
"continent" : "null",
"latitude" : "null",
"longitude" : "null"
},
"request" : [
{
"referrer" : "direct",
"url" : "http://www.google.com/"
"title" : "index page"
"currentVisit" : "1401282897"
"visitedTime" : "1401282905"
},
{
"referrer" : "direct",
"url" : "http://www.stackoverflow.com/",
"title" : "index page"
"currentVisit" : "1401282900"
"visitedTime" : "1401282905"
},
......
]
"uuid" : "109eeee0-e66a-11e3"
}
Note:
The database contains more than 10845 document
Each document contains nearly 100 request(100 object in the request array).
Technology/Language - node.js
I had setProfiling to check the execution time
First Query - 13899ms
Second Query - 9024ms
Third Query - 8310ms
Fourth Query - 6858ms
There is no much difference using indexing
Queries:
I am having the following aggregation queries to be executed to fetch the data.
var match = {"request.currentVisit":{$gte:core.getTime()[1].toString(),$lte:core.getTime()[0].toString()}};
For Example: var match = {"request.currentVisit":{$gte:"1401282905",$lte:"1401282935"}};
For third and fourth query request.visitedTime instead of request.currentVisit
First
[
{ "$project":{
"request.currentVisit":1,
"request.url":1
}},
{ "$match":{
"request.1": {$exists:true}
}},
{ "$unwind": "$request" },
{ "$match": match },
{ "$group": {
"_id": {
"url":"$request.url"
},
"count": { "$sum": 1 }
}},
{ "$sort":{ "count": -1 } }
]
Second
[
{ "$project": {
"request.currentVisit":1,
"request.url":1
}},
{ "$match": {
"request":{ "$size": 1 }
}},
{ "$unwind": "$request" },
{ "$match": match },
{ "$group": {
"_id":{
"url":"$request.url"
},
"count":{ "$sum": 1 }
}},
{ "$sort": { "count": -1} }
]
Third
[
{ "$project": {
"request.visitedTime":1,
"uuid":1
}},
{ "$match":{
"request.1": { "$exists": true }
}},
{ "$match": match },
{ "$group": {
"_id": "$uuid",
"count":{ "$sum": 1 }
}},
{ "$group": {
"_id": null,
"total": { "$sum":"$count" }}
}}
]
Forth
[
{ "$project": {
"request.visitedTime":1,
"uuid":1
}},
{ "$match":{
"request":{ "$size": 1 }
}},
{ "$match": match },
{ "$group": {
"_id":"$uuid",
"count":{ "$sum": 1 }
}},
{ "$group": {
"_id":null,
"total": { "$sum": "$count" }
}}
]
Problem:
It is taking more than 38091 ms to fetch the data.
Is there any way to optimize the query?
Any suggestion will be grateful.
Well there are a few problems and you definitely need indexes, but you cannot have compound ones. It is the "timestamp" values that you are querying within the array that you want to index. It would also be advised that you either convert these to numeric values rather than the current strings, or indeed to BSON Date types. The latter form is actually internally stored as a numeric timestamp value, so there is a general storage size reduction, which also reduces the index size as well as being more efficient to match on the numeric values.
The big problem with each query is that you are always later diving into the "array" contents after processing an $unwind and then "filtering" that with match. While this what you want to do for your result, since you have not applied the same filter at an earlier stage, you have many documents in the pipeline that do not match these conditions when you $unwind. The result is "lots" of documents you do not need being processed in this stage. And here you cannot use an index.
Where you need this match is at the start of the pipeline stages. This narrows down the documents to the "possible" matches before that acutual array is filtered.
So using the first as an example:
[
{ "$match":{
{ "request.currentVisit":{
"$gte":"1401282905", "$lte": "1401282935"
}
}},
{ "$unwind": "$request" },
{ "$match":{
{ "request.currentVisit":{
"$gte":"1401282905", "$lte": "1401282935"
}
}},
{ "$group": {
"_id": {
"url":"$request.url"
},
"count": { "$sum": 1 }
}},
{ "$sort":{ "count": -1 } }
]
So a few changes. There is a $match at the head of the pipeline. This narrows down documents and is able to use an index. That is the most important performance consideration. Golden rule, always "match" first.
The $project you had in there was redundant as you cannot project "just" the fields of an array that is yet unwound. There is also a misconception that people believe they $project first to reduce the pipeline. The effect is very minimal if in fact there is a later $project or $group statement that actually limits the fields, then this will be "forward optimized" so things do get taken out of the pipeline processing for you. Still the $match statement above does more to optimize.
Dropping the need to see if the array is actually there with the other $match stage, as you are now "implicitly" doing that at the start of the pipeline. If more conditions make you more comfortable, then add them to that initial pipeline stage.
The rest remains unchanged, as you then $unwind the array and $match to filter the items that you actually want before moving on to your remaining processing. By now, the input documents have been significantly reduced, or reduced as much as they are going to be.
The other alternative that you can do with MongoDB 2.6 and greater is "filter" the array content before you even **$unwind it. This would produce a listing like this:
[
{ "$match":{
{ "request.currentVisit":{
"$gte":"1401282905", "$lte": "1401282935"
}
}},
{ "$project": {
"request": {
"$setDifference": [
{
"$map": {
"input": "$request",
"as": "el",
"in": {
"$cond"": [
{
"$and":[
{ "$gte": [ "1401282905", "$$el.currentVisit" ] },
{ "$lt": [ "1401282935", "$$el.currentVisit" ] }
]
}
"$el",
false
]
}
}
}
[false]
]
}
}}
{ "$unwind": "$request" },
{ "$group": {
"_id": {
"url":"$request.url"
},
"count": { "$sum": 1 }
}},
{ "$sort":{ "count": -1 } }
]
That may save you some by being able to "filter" the array before the $unwind and which is possibly better than doing the $match afterwards.
But this is the general rule for all of your statements. You need usable indexes and you need to $match first.
It is possible that the actual results you really want could be obtained in a single query, but as it stands your question is not presented that way. Try changing your processing as outlined, and you should see a notable improvement.
If you are still then trying to come to terms with how this could possibly be singular, then you can always ask another question.

In Operator in mongodb

this is my collection structure injury contain injury data i have two injuryid so i jst want those injuries
for example i have 2 ids(538d9e7ed173e5202a000065,538f21868a5fc5e01f000065) then i have to get only 1st two array i user IN operator bt still get all 3 array..i tried below query
db.users.find(
{"injury._id":{$in:[ObjectId("538d9e7ed173e5202a000065"),
ObjectId("538f21868a5fc5e01f000065")]}
})
using that i got all 3 array
What you need to understand here is that your query is meant to filter "documents" and does not filter elements of the array "within" a document. In order to actually filter the array contents for more than a single match you need to use the aggregation framework:
db.users.aggregate([
// Matches the "documents" containing those elements in the array
{ "$match": {
"injury._id":{
"$in": [
ObjectId("538d9e7ed173e5202a000065"),
ObjectId("538f21868a5fc5e01f000065")
]
}
}},
// Unwind the array to de-normalize as documents
{ "$unwind": "$injury" },
// Match the array members
{ "$match": {
"injury._id":{
"$in": [
ObjectId("538d9e7ed173e5202a000065"),
ObjectId("538f21868a5fc5e01f000065")
]
}
}},
// Group back as an array
{ "$group": {
"_id": "$_id",
"injury": { "$push": "$injury" }
}}
])
Under MongoDB 2.6 and greater you can utilize $map to filter the array:
db.users.aggregate([
// Matches the "documents" containing those elements in the array
{ "$match": {
"injury._id":{
"$in": [
ObjectId("538d9e7ed173e5202a000065"),
ObjectId("538f21868a5fc5e01f000065")
]
}
}},
// Project with $map to filter
{ "$project": {
"injury": {
"$setDifference": [
{ "$map": {
"input": "$injury",
"as": "el",
"in": {
"$cond": [
{
"$or": [
{ "$eq": [
"$$el._id"
ObjectId("538d9e7ed173e5202a000065")
]},
{ "$eq": [
"$$el._id"
ObjectId("538f21868a5fc5e01f000065")
]}
]
},
"$$el",
false
]
}
}},
[false]
]
}
}}
])