I've got a document that contains an array of elements. I'd like to find specific entries in that document.
Example entry 1:
{
"c": [
{
"k": "1",
"v": "1",
},
{
"k": "2",
"v": "2",
},
]
}
Example entry 2:
{
"c": [
{
"k": "1",
"v": "2",
},
{
"k": "2",
"v": "1",
},
]
}
I'd like to find all entries that have a k that is 1, and a matching v that is 1 (here, the first entry matches, but the second one doesn't, because the v that has the value of 1 is not the same object as the k valued 1).
So far, I found the query:
{
"$and": [
{"c.k": "1"},
{"c.v": "1"}
]
}
However, that returns both entries, not just the first one. Is there a way to tell MongoDB that both constraints should apply to the same item in the array, and not just to any item?
As suggested by #turiviashal, make use of the $elemMatch operator which matches documents with will match all the key conditions in at least one object.
db.collection.aggregate([
{
"$match": {
"c": {
"$elemMatch": {
"k": "1",
"v": "1",
}
}
}
}
])
Mongo Playground Example
Related
In my collection, I have documents which contains an array (called values) of objects, which has id and val fields. The data looks like this
[
{
values: [
{
"id": "123",
"val": true
},
{
"id": "456",
"val": true
},
{
"id": "789",
"val": false
},
]
},
{
values: [
{
"id": "123",
"val": true
},
{
"id": "123",
"val": true
},
{
"id": "123",
"val": false
},
]
},
{
values: [
{
"id": "234",
"val": false
},
{
"id": "567",
"val": false
}
]
}
]
I want to query this data by val to ensure it is true, and there may be cases where I want to ensure that an array has 2 instances of val's that is true. I am able to achieve this with the following query:
db.collection.find({
values: {
"$elemMatch": {
val: true
}
},
$expr: {
$gte: [
{
$reduce: {
input: "$values",
initialValue: 0,
in: {
$sum: [
"$$value",
{
$cond: [
{
$eq: [
"$$this.val",
true
]
},
1,
0
]
}
]
}
}
},
2
]
}
})
The results of the query above give me the following result:
[
{
"values": [
{
"id": "123",
"val": true
},
{
"id": "456",
"val": true
},
{
"id": "789",
"val": false
}
]
},
{
"values": [
{
"id": "123",
"val": true
},
{
"id": "123",
"val": true
},
{
"id": "123",
"val": false
}
]
}
]
The issue with this now, is that I want to ensure that there are no duplicate id's in the results of the $reduce'd list. I have researched the docs and $group looks promising, however I am unsure on how to integrate this into the query.
I want the result to look like the following:
[
{
"values": [
{
"id": "123",
"val": true
},
{
"id": "456",
"val": true
},
{
"id": "789",
"val": false
}
]
}
]
Here is a MongoPlayground link with all of the above in it.
Please Note
I have simplified the code examples here to get to the core of the problem. In my actual use case, there are many values of different data types, which means that using an $elemMatch is the best way to go for me.
After looking at Takis's answer, it made me realise that using $setUnion is an alternative way to find the distinct values of a field. Using this, I was able to rework my query to achieve what I want.
What I have done to achieve this is to have a $cond operator within the $expr operator. I pass in the original $reduce as the condition to see if x amount of val's appear within the document. If it succeeds, then I am ensuring that the size of the union of values.id's (i.e. all the unique id's within the current document values array) is at least x amount. If this condition is satisfied, the document will be returned. If not, then it falls back to the value false, i.e. it will not return the current document.
The query is as follows:
db.collection.find({
values: {
"$elemMatch": {
val: true
}
},
$expr: {
$cond: [
{
$gte: [
{
$reduce: {
input: "$values",
initialValue: 0,
in: {
$sum: [
"$$value",
{
$cond: [
{
$eq: [
"$$this.val",
true
]
},
1,
0
]
}
]
}
}
},
2 // x amount of instances
]
},
{
$gte: [
{
"$size": {
$setUnion: [
"$values.id"
]
}
},
2 // x amount of instances
]
},
false
]
}
})
Here is a MongoPlayground link showing it in action.
Query
reduce is fine, but maybe 2 filters are simpler here
first filter to have true count >=2
second filter to not have duplicate id, by comparing the values.id length, with the length of the set values.id (keep it only if same size else => duplicate), it works using paths on arrays = arrays like values.id is array of ids
*if this is slow for your use case maybe it can be faster
Playmongo
aggregate(
[{"$match":
{"$expr":
{"$and":
[{"$gte":
[{"$size": {"$filter": {"input": "$values.val", "cond": "$$this"}}},
2]},
{"$eq":
[{"$size": "$values.id"},
{"$size": {"$setUnion": ["$values.id", []]}}]}]}}}])
I am trying to solve a problem. I want to write a query that finds a document among my documents which one is greater by the sum of columns A and B's in an array. I write an example down here. I am new to MongoDB and I've been searching a lot but I could not find my solution. So can somebody help me to solve this problem? Here are my sample documents:
document1:
{
"_id" : "1",
"array": [
{
"user": "1",
"A": 2,
"B": 0
},
{
"user": "2",
"A": 3,
"B": 1},
{
"user": "3",
"A": 0,
"B": 5
}
]
}
and document 2:
{
"_id" : "2",
"array": [
{
"user": "4",
"A": 1,
"B": 1
},
{
"user": "5",
"A": 2,
"B": 2
}
]
}
for example, the sum of A and B's in all elements of an array in document 1 is 11 and the sum of A and B's in elements of an array in document 2 is 6. So I want to get document 1 for output because it is greater than 2 after summing all A and B's in all of the elements.
You can try this query:
Create an auxiliar field called total (or whatever name you want) and $add values. This add the $sum of the arrays. That means here you are adding all values from A and B together.
Then sort by the auxiliar field to get the greatest at first position
$limit to only one (the greatest)
And $project to not output the auxiliar field.
db.collection.aggregate([
{
"$addFields": {
"total": {
"$add": [
{
"$sum": "$array.A"
},
{
"$sum": "$array.B"
}
]
}
}
},
{
"$sort": {
"total": -1
}
},
{
"$limit": 1
},
{
"$project": {
"total": 0
}
}
])
Example here
I have a collection with data that looks sort of like this
{
"part": [
{ "a": "1", "b": "a" },
{ "a": "23", "b": "b" },
{ "a": "4", "b": "c" },
]
}
What I would like is a way of searching for documents where the join of all a parts equals the search that I am looking for.
for example 1234 should match the document above, but 124 should not.
is this possible with MongoDB?
You can do it with Aggregation framework:
$match with $eq - To filter only documents where concatenated a properties of part array are equal to the input string.
$reduce with $concat - To concatenate all a properties of part array for each document.
db.collection.aggregate([
{
"$match": {
"$expr": {
"$eq": [
"1234",
{
"$reduce": {
"input": "$part",
"initialValue": "",
"in": {
"$concat": [
"$$value",
"$$this.a"
]
}
}
}
]
}
}
}
])
Working example
You can use aggregate with $reduce to join string then $match to filter your string.
Here is the playground.
Assuming I have the following persons collection:
{
"_id": ObjectId("569d07a38e61973f6aded134"),
"name": "john",
"pets": [
{
"name": "spot",
"type": "dog",
"special": "spot eye"
},
{
"name": "bob",
"type": "cat",
}
]
},
{
"_id": ObjectId("569d07a38e61973f6aded135"),
"name": "susie",
"pets": [
{
"name": "fred",
"type": "cat",
}
]
}
How can I retrieve the persons who's pet(s) has a special field? I'm looking to have the returned pets array only contain the pets with a special field.
For example, the expected result from the collection above would be:
{
"_id": ObjectId("569d07a38e61973f6aded134"),
"name": "john",
"pets": [
{
"name": "spot",
"type": "dog",
"special": "spot eye"
}
]
}
I'm trying to implement this in hopefully one query with pymongo, although even just a working MongoDB or mongoose query would be lovely.
I've tried to start with:
db.persons.find({pets:{special:{$exists:true}}});
but that has returned 0 records, even though there should be some.
If the array holds embedded documents, you can query for specific fields in the embedded documents using dot notation.
Without dot notation you are querying array documents for a complete match.
Try the following query:
db.persons.find({'pets.special':{$exists:true}});
You can use the aggregation framework to get the desired result. Run the following aggregation pipeline:
db.persons.aggregate([
{
"$match": {
"pets.special": { "$exists": true }
}
},
{
"$project": {
"name": 1,
"pets": {
"$setDifference": [
{
"$map": {
"input": "$pets",
"as": "el",
"in": {
"$cond": [
{ "$gt": [ "$$el.special", null ] },
"$$el", false
]
}
}
},
[false]
]
}
}
}
])
Sample Output
{
"result" : [
{
"_id" : ObjectId("569d07a38e61973f6aded134"),
"name" : "john",
"pets" : [
{
"name" : "spot",
"type" : "dog",
"special" : "spot eye"
}
]
}
],
"ok" : 1
}
The operators that make a significant difference are the $setDifference and $map operators. The $map operator in essence creates a new array field that holds values as a result of the evaluated logic in a subexpression to each element of an array. The $setDifference operator then returns a set with elements that appear in the first set but not in the second set; i.e. performs a relative complement of the second set relative to the first. In this case it will return the final pets array that has elements not related to the parent documents based on the existence of the special property, based on the conditional operator $cond which evaluates the expression returned by the comparison operator $gt.
I have a database of 30mb size, and it has 300 documents which are stored in a single collection, and their size vary from 1mb to 10kb. I am using the new aggregation framework which comes with 2.6 and I do not have any indexes.
I have an aggregation pipeline as following:
1. $match > first query match
2. $project > exclude some fields for efficiency
3. $unwind > unwind one of the arrays
4. $unwind > unwind second array
5. $project > projection to find matching fields among two arrays with $eq
6. $match > same:true
7. $group > put the unwinded arrays together
8. $limit(50)
this pipeline above requires 30 seconds. If I remove $limit, it takes ages. My question is:
Database size is only 30MB, and pipeline is not complicated at all. Why is it taking so long? Any ideas on that?
EDIT
My schema is as following:
{
username: string (max 20 chars
userid : string max 20 chars
userage : string max 20 chars
userobj1: array of objects, length: ~300-500
// example of userobj1:
[
{
innerobj1: array of objects, length: ~30-50
innerobj2: array of objects, length: ~100-200
userinfo1: string max 20 chars
userinfo2: string max 20 chars
userinfo3: string max 20 chars
userinfo4: string max 20 chars
} ...
]
userobj2: same as userobj1
userobj3: same as userobj1
userobj4: same as userobj1
}
this document above has inner objects up to 3-4 levels. Sorry that I cannot provide an example but the alias should be enough. Example query is as following:
1. $match:
$and : [
{userobj1: $elemMatch: {userinfo1:a}},
{userobj1: $elemMatch: {userinfo4:b}}
]
2. $project {username:1, userid:1, userobj1:1, userobj2:1}
3. $unwind userobj1
4. $unwind userobj2
5. $project
{
username:1,
userid:1,
userobj1:1,
userobj2:1,
userobj3:1,
userobj4:1,
"same" : {
$eq: [ userobj3.userinfo4, userobj4.userinfo4 ]
}
}
6. $match {same:true}
7. $group all arrays back
8. limit 50.
There is something here that I just don't get about what you are actually trying to do here. So please bear with me on the possible actual questions and answers that I see.
Considering this simplified data set to your case:
{
"obj1": [
{ "a": "a", "b": "b" },
{ "a": "a", "b": "c" }
],
"obj2": [
{ "a": "c", "b": "b" },
{ "a": "c", "b": "c" }
]
},
{
"obj1": [
{ "a": "a", "b": "b" }
],
"obj2": [
{ "a": "a", "b": "c" }
]
}
Q: "Are you not just trying to to match the documents with { "a": "a", "b": b" } in "obj1" and also { "b": "b" } in "object2"?"
If that is the case then this is just a simple query with .find():
db.collection.find({
"obj1": {
"$elemMatch": { "a": "a", "b": "b" }
},
"obj2.b": "b"
})
Matches only one of those documents that meets the conditions, in this case just the one:
{
"obj1": [
{ "a": "a", "b": "b" },
{ "a": "a", "b": "c" }
],
"obj2": [
{ "a": "c", "b": "b" },
{ "a": "c", "b": "c" }
]
}
Q: "Are you possibly trying to find the positions in the array where your conditions are true?"
If so there are some operators available to MongoDB 2.6 that helps you without using $unwind:
db.objects.aggregate([
{ "$match": {
"obj1": {
"$elemMatch": { "a": "a", "b": "b" }
},
"obj2.b": "b"
}},
{ "$project": {
"obj1": 1,
"obj2": 1,
"match1": {
"$map": {
"input": "$obj1",
"as": "el",
"in": {
"$and": [
{ "$eq": [ "$$el.a", "a" ] },
{ "$eq": [ "$$el.b", "b" ] }
]
}
}
},
"match2": {
"$map": {
"input": "$obj2",
"as": "el",
"in": {
"$eq": [ "$$el.b", "b" ]
}
}
}
}}
])
Gives you:
{
"obj1": [
{ "a": "a", "b": "b" },
{ "a": "a", "b": "c" }
],
"obj2": [
{ "a": "c", "b": "b" },
{ "a": "c", "b": "c" }
],
"match1" : [
true,
false
],
"match2" : [
true,
false
]
}
Q: "Or are you possibly trying to "filter" only the matching array elements to those conditions?"
You can do this with more set operators in MongoDB 2.6 without using $unwind:
db.objects.aggregate([
{ "$match": {
"obj1": {
"$elemMatch": { "a": "a", "b": "b" }
},
"obj2.b": "b"
}},
{ "$project": {
"obj1": {
"$setDifference": [
{ "$map": {
"input": "$obj1",
"as": "el",
"in": {
"$cond": [
{ "$and": [
{ "$eq": [ "$$el.a", "a" ] },
{ "$eq": [ "$$el.b", "b" ] }
]},
"$$el",
false
]
}
}},
[false]
]
},
"obj2": {
"$setDifference": [
{ "$map": {
"input": "$obj2",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.b", "b" ] },
"$$el",
false
]
}
}},
[false]
]
}
}}
])
And the result:
{
"obj1": [
{ "a": "a", "b": "b" },
],
"obj2": [
{ "a": "c", "b": "b" },
]
}
The last entry there is the cutest which combines $cond, $map and $setDifference to do some complex filtering of the objects in the array in order to filter just the matches to the conditions. You previously would have to $unwind and $match to get those results.
So it is both $unwind and $group that are not required to actually get to any of these results, and those are really killing you. Also your big "pass through" on the "unwound" arrays with $eq suggests trying to get to the end result of one of the above, but in the way you have implemented it would be very costly.
Also try to have an index within one of those arrays for the element to match that is going to reduce your working results down as far as possible. In all cases it's going to improve things even if you cannot have a compound "multi-key" index due to the restrictions there.
Anyhow, hoping that at least something here that either matches your intent or is at least close to what you are trying to do.
Since your comments went this way, matching values of "obj1.a" to "obj2.b" without the filtering is not much different to the general cases shown.
db.objects.aggregate([
{ "$project": {
"match": {
"$size": {
"$setIntersection": [
{ "$map": {
"input": "$obj1",
"as": "el",
"in": { "$concat": ["$$el.a",""] }
}},
{ "$map": {
"input": "$obj2",
"as": "el",
"in": { "$concat": ["$$el.b",""] }
}}
]
}
}
}},
{ "$match": { "$gte": 1 } }
])
All simply done without using $unwind.
I know this is an old question, but it looks like a simple answer was never reached and it involves using an expression that was available in 2.6 so it would have worked back then too. You don't need to do any $unwinding or complex $mapping you just need to do a $setIntersection on the two arrays that you want to find a match in.
Using the example data from the very long answer:
db.foo.aggregate(
{$match:{"obj1.a":"a"}},
{$project:{keep:{$setIntersection:["$obj1.b","$obj2.b"]},obj1:1,obj2:1}},
{$match:{keep:{$ne:[]}}})
{ "_id" : ObjectId("588a8206c01d80beca3a8e45"), "obj1" : [ { "a" : "a", "b" : "b" }, { "a" : "a", "b" : "c" } ], "obj2" : [ { "a" : "c", "b" : "b" }, { "a" : "c", "b" : "c" } ], "keep" : [ "b", "c" ] }
Only one of the two documents is kept, the one that had two values of "b" in both obj1 and obj2 arrays.
In your original "syntax" the $project stage would be
same: {$setIntersection: [ '$userobj3.userinfo4', '$userobj4.userinfo4' ]}
My guess is that it takes that long because there are no indexes so it does a full collection scan every time it needs a record.
Try adding an index on userinfo1:a and I think you will see a good performance gain. I will also recommend that you remove the AND syntax from the match phase and rewrite it as a list.
I think it would be really helpful for both you and the question to give us the output of the aggregation's explain. In mongo 2.6 you can have explain in aggregation pipeline.
db.collection.aggregate( [ ... stages ...], { explain:true } )