MongoDB Aggregation - match if value in array - mongodb

I have a collection that I'm performing an aggregation on and I've basically gotten it down to
{array:[1,2,3], value: 1},
{array:[1,2,3], value: 4}
How would I perform an aggregation match to check if the value is in the array? I tried using {$match: {"array: {$in: ["$value"]}}} but it doesn't find anything.
I would want the output (if using the above as an example) to be:
{array:[1,2,3], value:1}

You can use aggregation expression in regular query in 3.6 version.
db.collection_name.find({"$expr": {"$in": ["$value", "$array"]}})
Using Aggregation:
You can use $match + $expr in current 3.6 version.
db.collection_name.aggregate({"$match": {"$expr": {"$in": ["$value", "$array"]}}})
You can try $redact + $in expression in 3.4 version.
db.collection_name.aggregate({
"$redact": {
"$cond": [
{
"$in": [
"$value",
"$array"
]
},
"$$KEEP",
"$$PRUNE"
]
}
})

As stated, $where is a good option where you do not need to continue the logic in the aggregation pipeline.
But if you do then use $redact, with $map to transform the "value" into an array and use of $setIsSubSet to compare. It is the fastest way to do this since you do not need to duplicate documents using $unwind:
db.collection.aggregate([
{ "$redact": {
"$cond": {
"if": { "$setIsSubset": [
{ "$map": {
"input": { "$literal": ["A"] },
"as": "a",
"in": "$value"
}},
"$array"
]},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
The $redact pipeline operator allows the proccessing of a logical condition within $cond and uses the special operations $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "remove" the document where the condition was false.
This allows it to work like $project with a subsequent $match, but in a single pipeline stage which is more efficient.
Considering these are native coded operators and not JavaScript then it is likely "the" fastest way to perform your match. So provided you are using a MongoDB 2.6 version or above, then this is the way you should be doing it to compare these elements in your document.

A slight variation based on #chridam's answer:
db.test.aggregate([
{ "$unwind": "$array" },
{ "$group": {
_id: { "_id": "$_id", "value": "$value" },
array: { $push: "$array" },
mcount: { $sum: {$cond: [{$eq: ["$value","$array"]},1,0]}}
}
},
{ $match: {mcount: {$gt: 0}}},
{ "$project": { "value": "$_id.value", "array": 1, "_id": 0 }}
])
The idea is to $unwind and $group back the array, counting in mcount the number of items matching the value. After that, a simple $match on mcount > 0 will filter out unwanted documents.

A more efficient approach would involve a single pipeline that uses the $redact operator as follows:
db.collection.aggregate([
{
"$redact": {
"$cond": [
{
"$setIsSubset": [
["$value"],
"$array"
]
},
"$$KEEP",
"$$PRUNE"
]
}
}
])
For earlier versions of MongoDB that do not support $redact (versions < 2.6) then consider this aggregation pipeline that uses the $unwind operator:
db.collection.aggregate([
{ "$unwind": "$array" },
{
"$project": {
"isInArray": {
"$cond": [
{ "$eq": [ "$array", "$value" ] },
1,
0
]
},
"value": 1,
"array": 1
}
},
{ "$sort": { "isInArray": -1 } },
{
"$group": {
"_id": {
"_id": "$_id",
"value": "$value"
},
"array": { "$push": "$array" },
"isInArray": { "$first": "$isInArray" }
}
},
{ "$match": { "isInArray": 1 } },
{ "$project": { "value": "$_id.value", "array": 1, "_id": 0 } }
])

A little late to answer but this presents another solution:
By using addFields and match separately, this gives more flexibility than the redact. You can expose several fields and then use other matching logic together based on the results.
db.applications.aggregate([
{$addFields: {"containsValueInArray": {$cond:[{$setIsSubset: [["valueToMatch"], "$arrayToMatchIn"]},true,false]}}},
{$match: {"containsValueInArray":true}}
]);

Try the combination of $eq and $setIntersection
{$group :{
_id: "$id",
yourName : { $sum:
{ $cond :[
{$and : [
{$eq:[{$setIntersection : ["$someArrayField", ["$value"]] },["$value"]]}
]
},1,0]
}
}
}

i prefer without grouping, there's an easy approach since v.3.2
...aggregate([
{
$addFields: {
arrayFilter: {
$filter: {
input: '$array',
as: 'item',
cond: ['$$item', '$value']
}
}
}
},
{
$unwind: '$arrayFilter'
},
{
$project: {
arrayFilter: 0
}
}
]);
Add a temporary filter field
$unwind on the resulting array (pipeline results with empty arrays get removed)
(optional) remove filter field from result via project

You can do it with simple $project & $match
db.test.aggregate([{
$project: {
arrayValue: 1,
value: 1,
"has_same_value" : { $in: ["$value", "$arrayValue"] }
}
},
{
$match: {has_same_value: true}
},
{
$project: {has_same_value: 0}
}])

"$match": { "name": { "$in":["Rio","Raja"] }} }])

Related

document returned by mongoShell query is zero for comparing column in same document

I have collection with something similar datastructure
{
id: 1
limit: {
max: 10000,
used: 0
}
}
and I tried running the below query but it is giving 0 results
db.getCollection('promos').aggregate(
[
{ $match: { id: 1} },
{$match: { $expr: {$gt ["limit.max" , "limit.used"]}}}
])
I also used the below query
db.getCollection('promos').aggregate(
[
{ $match: { id: 1} },
{$match: { "$limit.max": {$gt: "limit.used"}}}
])
None of them is giving the result . Any help will be appreciated.
You need to prefix "field expressions" with the $. This also can be simply done in a .find()
db.getCollection('promos').find({
"id": 1,
"$expr": { "$gt": [ "$limit.max" , "$limit.used" ] }
})
Or a single $match stage if you really need to use aggregate instead:
db.getCollection('promos').aggregate([
{ "$match": {
"id": 1,
"$expr": { "$gt": [ "$limit.max" , "$limit.used" ] }
}}
])
That's how $expr works and you can "mix it" with other regular query operators in the same query or pipeline stage.
Also see $gt for general usage examples
Of course if you don't actually even have MongoDB 3.6, then you use $redact instead:
db.getCollection('promos').aggregate([
{ "$match": { "id": 1 } },
{ "$redact": {
"$cond": {
"if": { "$gt": [ "$limit.max" , "$limit.used" ] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
Or use $where. Works in all versions:
db.getCollection('promos').find({
"id": 1,
"$where": "this.limit.max > this.limit.used"
})

How to convert an array of documents to two dimensions array

I am making a query to MongoDB
db.getCollection('user_actions').aggregate([
{$match: {
type: 'play_started',
entity_id: {$ne: null}
}},
{$group: {
_id: '$entity_id',
view_count: {$sum: 1}
}},
])
and getting a list of docs with two fields:
How can I get a list of lists with two items like
[[entity_id, view_count], [entity_id, view_count], ...]
Actually there are two different way to do this, depending on your MongoDB server version.
The optimal way is in MongoDB 3.2 using the square brackets [] to directly create new array fields in the $project stage. This return an array for each group. The next stage is the another $group stage where you group your document and use the $push accumulator operator to return a two dimensional array.
db.getCollection('user_actions').aggregate([
{ "$match": {
"type": 'play_started',
"entity_id": { "$ne": null }
}},
{ "$group": {
"_id": "$entity_id",
"view_count": { "$sum": 1}
}},
{ "$project": {
"_id": 0,
"result": [ "$_id", "$view_count" ]
}},
{ "$group": {
"_id": null,
"result": { "$push": "$result" }
}}
])
From MongoDB 2.6 and prior to 3.2 you need a different approach. In order to create your array you need to use the $map operator. Because the $map "input" field must resolves to and array you need to use $literal operator to set a literal array value to input. Of course the $cond operator here returns the "entity_id" or "view_count" accordingly to the "boolean-expression".
db.getCollection('user_actions').aggregate([
{ "$match": {
"type": 'play_started',
"entity_id": { "$ne": null }
}},
{ "$group": {
"_id": "$entity_id",
"view_count": { "$sum": 1}
}},
{ "$project": {
"_id": 0,
"result": {
"$map": {
"input": { "$literal": [ "A", "B"] },
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", "A" ] },
"$_id",
"$view_count"
]
}
}
}
}},
{ "$group": {
"_id": null,
"result": { "$push": "$result" }
}}
])
It worth noting that this will also work in MongoDB 2.4. If you are running MongoDB 2.2, you can use the undocumented $const operator which does the same thing.

Mongo Group and sum with two fields

I have documents like:
{
"from":"abc#sss.ddd",
"to" :"ssd#dff.dff",
"email": "Hi hello"
}
How can we calculate count of sum "from and to" or "to and from"?
Like communication counts between two people?
I am able to calculate one way sum. I want to have sum both ways.
db.test.aggregate([
{ $group: {
"_id":{ "from": "$from", "to":"$to"},
"count":{$sum:1}
}
},
{
"$sort" :{"count":-1}
}
])
Since you need to calculate number of emails exchanged between 2 addresses, it would be fair to project a unified between field as following:
db.a.aggregate([
{ $match: {
to: { $exists: true },
from: { $exists: true },
email: { $exists: true }
}},
{ $project: {
between: { $cond: {
if: { $lte: [ { $strcasecmp: [ "$to", "$from" ] }, 0 ] },
then: [ { $toLower: "$to" }, { $toLower: "$from" } ],
else: [ { $toLower: "$from" }, { $toLower: "$to" } ] }
}
}},
{ $group: {
"_id": "$between",
"count": { $sum: 1 }
}},
{ $sort :{ count: -1 } }
])
Unification logic should be quite clear from the example: it is an alphabetically sorted array of both emails. The $match and $toLower parts are optional if you trust your data.
Documentation for operators used in the example:
$match
$exists
$project
$cond
$lte
$strcasecmp
$toLower
$group
$sum
$sort
You basically need to consider the _id for grouping as an "array" of the possible "to" and "from" values, and then of course "sort" them, so that in every document the combination is always in the same order.
Just as a side note, I want to add that "typically" when I am dealing with messaging systems like this, the "to" and "from" sender/recipients are usually both arrays to begin with anyway, so it usally forms the base of where different variations on this statement come from.
First, the most optimal MongoDB 3.2 statement, for single addresses
db.collection.aggregate([
// Join in array
{ "$project": {
"people": [ "$to", "$from" ],
}},
// Unwind array
{ "$unwind": "$people" },
// Sort array
{ "$sort": { "_id": 1, "people": 1 } },
// Group document
{ "$group": {
"_id": "$_id",
"people": { "$push": "$people" }
}},
// Group people and count
{ "$group": {
"_id": "$people",
"count": { "$sum": 1 }
}}
]);
Thats the basics, and now the only variations are in construction of the "people" array ( stage 1 only above ).
MongoDB 3.x and 2.6.x - Arrays
{ "$project": {
"people": { "$setUnion": [ "$to", "$from" ] }
}}
MongoDB 3.x and 2.6.x - Fields to array
{ "$project": {
"people": {
"$map": {
"input": ["A","B"],
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "A", "$$el" ] },
"$to",
"$from"
]
}
}
}
}}
MongoDB 2.4.x and 2.2.x - from fields
{ "$project": {
"to": 1,
"from": 1,
"type": { "$const": [ "A", "B" ] }
}},
{ "$unwind": "$type" },
{ "$group": {
"_id": "$_id",
"people": {
"$addToSet": {
"$cond": [
{ "$eq": [ "$type", "A" ] },
"$to",
"$from"
]
}
}
}}
But in all cases:
Get all recipients into a distinct array.
Order the array to a consistent order
Group on the "always in the same order" list of recipients.
Follow that and you cannot go wrong.

Mongodb: Aggregation : sum up values in an array before $group

I have a collection of documents with the following structure:
{
_id: 1,
array: [
{value: 10 },
{value: 11 },
{value: 12 }
]
}
I want make an aggregate query on the collection:
get the proportion of each item. (i.e. for example the proportion of item 1 would be value of item 1 divided by the sum of the values of all three items.
Note: I want to do this within a single query.
The basic idea here is to $unwind the array, $group the document and then apply to each array member. This works better for MongoDB 2.6 or greater due to the $map operator:
db.collection.aggregate([
{ "$unwind": "$array" },
{ "$group": {
"_id": "$_id",
"array": { "$push": "$array" },
"total": { "$sum": "$array.value" }
}},
{ "$project": {
"array": {
"$map": {
"input": "$array",
"as": "el",
"in": {
"value": "$$el.value",
"prop": {
"$divide": [ "$$el.value", "$total" ]
}
}
}
}
}}
])
Or with earlier versions:
db.collection.aggregate([
{ "$unwind": "$array" },
{ "$group": {
"_id": "$_id",
"array": { "$push": "$array" },
"total": { "$sum": "$array.value" }
}},
{ "$unwind": "$array" },
{ "$group": {
"_id": "$_id",
"array": {
"$push": {
"value": "$array.value",
"prop": {
"$divide": [ "$array.value", "$total" ]
}
}
}
}}
])
In either case, if you are not actually "aggregating" anything beyond the document, it is far more efficient to do this calculation in client code. The $unwind here can get very costly due to what it does.
Also if you just stored the "total" as another element, then the simple $project is all that you need, which comes at very little cost by itself. Keeping a total on updates is just simple usage of the $inc operator as you $push new elements to the array.
Here is the aggregation pipeline you need:
[
{$unwind: '$array'},
{
$group: {
_id: '$_id',
array: {$push: '$array'},
sum: {$sum: '$array.value'}
}
},
{$unwind: '$array'},
{
$project: {
_id: 1,
'array.value': 1,
'array.proportion': {
$divide: ['$array.value', '$sum']
}
}
}
]

Mongodb array concatenation

When querying mongodb, is it possible to process ("project") the result so as to perform array concatenation?
I actually have 2 different scenarios:
(1) Arrays from different fields:, e.g:
Given:
{companyName:'microsoft', managers:['ariel', 'bella'], employees:['charlie', 'don']}
{companyName:'oracle', managers:['elena', 'frank'], employees:['george', 'hugh']}
I'd like my query to return each company with its 'managers' and 'employees' concatenated:
{companyName:'microsoft', allPersonnel:['ariel', 'bella','charlie', 'don']}
{companyName:'oracle', allPersonnel:['elena', 'frank','george', 'hugh']}
(2) Nested arrays:, e.g.:
Given the following docs, where employees are separated into nested arrays (never mind why, it's a long story):
{companyName:'microsoft', personnel:[ ['ariel', 'bella'], ['charlie', 'don']}
{companyName:'oracle', personnel:[ ['elena', 'frank'], ['george', 'hugh']}
I'd like my query to return each company with a flattened 'personal' array:
{companyName:'microsoft', allPersonnel:['ariel', 'bella','charlie', 'don']}
{companyName:'oracle', allPersonnel:['elena', 'frank','george', 'hugh']}
I'd appreciate any ideas, using either 'find' or 'aggregate'
Thanks a lot :)
Of Course in Modern MongoDB releases we can simply use $concatArrays here:
db.collection.aggregate([
{ "$project": {
"companyNanme": 1,
"allPersonnel": { "$concatArrays": [ "$managers", "$employees" ] }
}}
])
Or for the second form with nested arrays, using $reduce in combination:
db.collection.aggregate([
{ "$project": {
"companyName": 1,
"allEmployees": {
"$reduce": {
"input": "$personnel",
"initialValue": [],
"in": { "$concatArrays": [ "$$value", "$$this" ] }
}
}
}}
])
There is the $setUnion operator available to the aggregation framework. The constraint here is that these are "sets" and all the members are actually "unique" as a "set" requires:
db.collection.aggregate([
{ "$project": {
"companyname": 1,
"allPersonnel": { "$setUnion": [ "$managers", "$employees" ] }
}}
])
So that is cool, as long as all are "unique" and you are in singular arrays.
In the alternate case you can always process with $unwind and $group. The personnel nested array is a simple double unwind
db.collection.aggregate([
{ "$unwind": "$personnel" },
{ "$unwind": "$personnel" },
{ "$group": {
"_id": "$_id",
"companyName": { "$first": "$companyName" },
"allPersonnel": { "$push": { "$personnel" } }
}}
])
Or the same thing as the first one for versions earlier than MongoDB 2.6 where the "set operators" did not exist:
db.collection.aggregate([
{ "$project": {
"type": { "$const": [ "M", "E" ] },
"companyName": 1,
"managers": 1,
"employees": 1
}},
{ "$unwind": "$type" },
{ "$unwind": "$managers" },
{ "$unwind": "$employees" },
{ "$group": {
"_id": "$_id",
"companyName": { "$first": "$companyName" },
"allPersonnel": {
"$addToSet": {
"$cond": [
{ "$eq": [ "$type", "M" ] },
"$managers",
"$employees"
]
}
}
}}
])