MongoDb: querying against a collection's own fields - mongodb

I've done some research and it seems that it's possible to query (i.e. compare) two fields in the same collection using the aggregation framework. It's also possible with the $where operator but I want to avoid a low performance Javascript solution.
Here's an example document:
{
"_id" : ObjectId("541ba14d2208236d06ff1e57"),
"a" : "foo",
"d" : {
"e" : "foo"
}
}
{
"_id" : ObjectId("541ba14d2208236d06ff1e58"),
"a" : "foo",
"d" : {
"e" : "bar"
}
}
I'd like to pick the documents where 'a' != 'd.e'. I've attempted the following without success:
db.test.aggregate([{$match: {$ne: ['$a', '$d.e']}}]);

As you said the query can be done with JavaScript by issuing a $where condition in your query:
db.test.find(function() { return this.a != this.d.e } )
Which is the short form of the query.
While you can do other manipulation in the aggregation framework, it does not change the basic nature of the query in that you cannot place a query condition that compares the values of two fields. This is why $match alone cannot do this because it follows the same rules.
What you "can" do is $project another field value that matches the same logical conditions that you want to enforce. Depending on your actual implementation this may or may not be better for performance:
db.test.aggregate([
{ "$project": {
"a": 1,
"d": 1,
"notEqual": { "$ne": [ "$a", "$d.e" ] }
}},
{ "$match": { "notEqual": true } }
])
That probably is not going to make a lot of sense on it's own unless some other filtering is done in the overall process though. But the general comparison is done with a comparison operator to return a true/false result that can then be filtered.
So the best thing to do if you can is to actually maintain the result of this in a similar way by a field that is present on your document. Then you have a basic query condition to look for that value rather than the comparison. This is if you need to regularly do these kinds of checks.
But for "ad-hoc" purposes, you either stick with the JavaScript evaluation or use the "projection" form in aggregation queries ( where you cannot use a $where clause ) in order to do the field level comparison.

Related

Remove complete document or element from array based on condition

My collection documents are:
{
"_id" : 1,
"fruits" : [ {"name":"pears"},
{"name":"grapes"},
{"name":"bananas"} ],
}
{
"_id" : 2,
"fruits" : [ {"name":"bananas"} ],
}
I need to remove the whole document when the fruits contains only "bananas" or only remove the fruit "bananas" when there are more than one fruit in the fruits array.
My final collection after running the required query should be:
{
"_id" : 1,
"fruits" : [ {"name":"pears"},
{"name":"grapes"}],
}
I am currently using two queries to get this done:
db.collection.remove({'fruits':{$size:1, $elemMatch:{'name': 'bananas'} }}) [this will remove the document when only one fruit present]
and
db.collection.update({},{$pull:{'fruits':{'name':'bananas'}}},{multi: true}) [this will remove the entry 'bananas' from the array]
Is there any way to combine these into one query?
EDIT: Final take
-- I guess there is no "one query" to perform the above tasks since the intents are very different of both the actions.
-- The best that can be performed is: club the actions into a bulk_write query which saves on the network I/O(as suggested in the answer by Neil). This is believe is more beneficial when you have multiple such actions being fired. Also, bulk_write can provide the feature of locking in the sense that the "ordered" mode of the bulk_write makes the actions sequential, breaking and halting execution in case of error.
Hence bulk_write is more beneficial when the actions performed need to be sequential. Somewhat like "chaining" in JS. There is also the option to perform un-ordered bulk_writes.
Also, the actions specified in the bulk write, operate on the collection level as individual actions.
You basically want bulk_write() here to do them both. Also Use $exists to ensure there's only one element:
from pymongo import UpdateMany, DeleteMany
db.collection.bulk_write(
[
UpdateMany(
{ "fruits.1": { "$exists": True }, "fruits.name": "bananas" },
{ "$pull":{
'fruits': { 'name':'bananas' }
}}
),
DeleteMany(
{ "fruits.1": { "$exists": False }, "fruits.name": "bananas" }
)
],
ordered=False
)
You don't really need $elemMatch for "one" condition and you should be using update_many() and in this case UpdateMany() instead of { "multi": true }. And that option is different in "pymongo" anyway. Then of course there is delete_many() or DeleteMany() for the "bulk" context.
Bulk operations send one request with one response, which is better than sending multiple requests. Also "update" and "delete" are two different things, but the single request can combine just like this.
The $size operator is valid but $exists can apply to a "range" where $size cannot, so it's generally a bit more flexible.
i.e Just as a $exists range example
# Array between 2 and 4 elements
db.collection.update_many(
{
"fruits.1": { "$exists": True },
"fruits.4": { "$exists": False },
"fruits.name": "bananas"
},
{ "$pull":{
'fruits': { 'name':'bananas' }
}}
)
And of course in the context here you actually want to know the difference between other possible things in the array and those with "only" a single "bananas".
The ordered=False here actually refers to two different ways that "bulk write" requests can be handled
Ordered - Where True ( which is the "default" ) then the operations are executed in "serial order" as they appear in the array of operations sent with the "bulk write". If any error occurs here then the batch stops execution at the point of the error and returns an exception.
UnOrdered - Where False the operations are executed in "parallel" within reasonable constraints on the server. If any error occurs there is still an exception raised, however this does not stop other operations within the "bulk write" from completing. Any errors are returned with the "array index" from the list provided to the command of which operation caused the error.
This option can used to "tune" the desired behavior in particular to error reporting and continuation, and also allows a degree of "parallelism" to the execution where "serial" is not actually required of the operations. Since these two statements do not actually depend on one or the other and will in fact select different documents anyway, then ordered=False is probably the better option in terms of efficiency here.
db.users.aggregate(
[{
$project: {
data: {
$filter: {
input: "$fruits",
as: "filterData",
cond: { $ne: [ "$$filterData.name", 'bananas' ] }
}
}
}
},
{
$unwind: {
path : "$data",
preserveNullAndEmptyArrays : false
}
},
{
$group: {
_id:"$_id",
data: { $addToSet: "$data" }
}
},
])
I think above query would give you perfect results

Counting entries of subdocument in MongoDB documents

I have a document structure like so
{
"_id" : "3:/content/somepath/test.txt",
"_revisions" : {
"r152f47f1daf-0-2" : "c",
"r152f48413c1-0-2" : "c",
"r152f4851bf7-0-1" : "c"
}
}
My task is to find all documents with the following conditions:
The "_id" needs to start with "5:"
The number of revisions need to be exclusively greater then 3
The first part is easy, I have solved it with
db.nodes.find( {'_id': /^5:/} )
But I am struggling with the second part, am supposed to use $gt.
Since I am new to MongoDB, I was first looking at $size, but _revisions is not an array, it is a subdocument, right?.
Was also looking at $unwind and then counting the results, but that does not make sense either, since my result need to be the documents that match the above two conditions.
Any pointers highly appreciated.
Using the $where operator.
db.nodes.find(function() {
return (/^5:/.test(this._id) && Object.keys(this._revisions).length > 3 );
})
The problem with this as mentioned in the documentation is that:
$where evaluates JavaScript and cannot take advantage of indexes. Therefore, query performance improves when you express your query using the standard MongoDB operators (e.g., $gt, $in).
You should definitely consider to change the _revisions field to an array of sub-documents like this:
{
"_id" : "3:/content/somepath/test.txt",
"_revisions" : [
{
"rev": "r152f47f1daf-0-2",
"value": "c"
},
{
"rev": "r152f48413c1-0-2",
"value": "c"
},
{
"rev": "r152f4851bf7-0-1",
"value": "c"
}
]
}
And use the $exists operator.
db.nodes.find({ "_id": /^5:/, "_revisions.3": { "$exists": true } } )

How to use $elemMatch on aggregate's projection?

This is my object:
{ "_id" : ObjectId("53fdcb6796cb9b9aa86f05b9"), "list" : [ "a", "b" ], "complist" : [ { "a" : "a", "b" : "b" }, { "a" : "c", "b" : "d" } ] }
And this is what I want to accomplish: check if "list" contains a certain element and get only the field "a" from the objects on "complist" while reading the document regardless of any of these values. I'm building a forum system, this is the query that will return the details of a forum. I need to read the forum information while knowing if the user is in the forum's white list.
With a find I can use the query
db.itens.find({},{list:{$elemMatch:{$in:["a"]}}})
to get only the first element that matches a certain value. This way I can just check if the returned array is not empty and I know if "list" contains the value I'm looking for. I can't do it on the query because I want the document regardless of it containing the value I'm looking for in the "list" value. I need the document AND know if "list" has a certain value.
With an aggregate I can use the query
db.itens.aggregate({$project:{"complist.a":1}})
to read only the field "a" of the objects contained in complist. This is going to get the forum's threads basic information, I don't want all the information of the threads, just a couple of things.
But when I try to use the query
db.itens.aggregate({$project:{"complist.b":1,list:{$elemMatch:{$in:["a"]}}}})
to try and do both, it throws me an error saying the operator $elemMatch is not valid.
Am I doing something wrong here with the $elemMatch in aggregate? Is there a better way to accomplish this?
Quite on old question but literally none of the proposed answers are good.
TLDR:
You can't use $elemMatch in a $project stage. but you can achieve the same result using other aggregation operators like $filter.
db.itens.aggregate([
{
$project: {
compList: {
$filter: {
input: "$complist",
as: "item",
cond: {$eq: ["$$item.a", 1]}
}
}
}
}
])
And if you want just the first item from the array that matches the condition similarly to what $elemMatch does you can incorporate $arrayElemAt
In Depth Explanation:
First let's understand $elemMatch:
$elemMatch is a query expressions while also this projection version of it exists this refers to a query projection and not $project aggregation stage.
So what? what does this have to do with anything? well a $project stage has certain input structure it can have while the one we want to use is:
<field>: <expression>
What is a valid expression?
Expressions can include field paths, literals, system variables, expression objects, and expression operators. Expressions can be nested.
So we want to use an expression operator, but as you can see from the doc's $elemMatch is not part of it. hence it's not a valid expression to be used in an aggregation $project stage.
For some reason $elemMatch doesn't work in aggregations. You need to use the new $filter operator in Mongo 3.2. See https://docs.mongodb.org/manual/reference/operator/aggregation/filter/
The answer to this question maybe help.
db.collection_name.aggregate({
"$match": {
"complist": {
"$elemMatch": {
"a": "a"
}
}
}
});
Actually, the simplest solution is to just $unwind your array, then $match the appropriate documents. You can wind-up the appropriate documents again using $group and $push.
Although the question is old, here is my contribution for November 2017.
I had similar problem and doing two consecutive match operations worked for me. The code below is a subset of my whole code and I changed elements names, so it's not tested. Anyway this should point you in the right direction.
db.collection.aggregate([
{
"$match": {
"_id": "ID1"
}
},
{
"$unwind": "$sub_collection"
},
{
"$match": {
"sub_collection.field_I_want_to_match": "value"
}
}
])
For aggregations simply use $expr:
db.items.aggregate([
{
"$match": {
"$expr": {"$in": ["a", "$list"]}
}
},
])
Well, it happens you can use "array.field" on a find's projection block.
db.itens.find({},{"complist.b":1,list:{$elemMatch:{$in:["a"]}}})
did what I needed.

Mongodb query with fields in the same documents

I have the following json:
{
"a1": {"a": "b"},
"a2": {"a": "c"}
}
How can I request all documents where a1 and a2 are not equal in the same document?
You could use $where:
db.myCollection.find( { $where: "this.a1.a != this.a2.a" } )
However, be aware that this won't be very fast, because it will have to spin up the java script engine and iterate each and every document and check the condition for each.
If you need to do this query for large collections, or very often, it's best to introduce a denormalized flag, like areEqual. Still, such low-selectivity fields don't yield good index performance, because he candidate set is still large.
update
using the new $expr operator available as of mongo 3.6 you can use aggregate expressions in find query like this:
db.myCollection.find({$expr: {$ne: ["$a1.a", "$a2.a"] } });
Although this comment solves the problem, I think a better match for this use case would be to use $addFields operator available as of version 3.4 instead of $project.
db.myCollection.aggregate([
{"$match":{"a1":{"$exists":true},"a2":{"$exists":true}}},
{"$addFields": {
"aEq": {"$eq":["$a1.a","$a2.a"]}
}
},
{"$match":{"aEq": false}}
]);
To avoid JavaScript use the aggregation framework:
db.myCollection.aggregate([
{"$match":{"a1":{"$exists":true},"a2":{"$exists":true}}},
{"$project": {
"a1":1,
"a2":1,
"aCmp": {"$cmp":["$a1.a","$a2.a"]}
}
},
{"$match":{"aCmp":0}}
])
On our development server the equivalent JavaScript query takes 7x longer to complete.
Update (10 May 2017)
I just realized my answer didn't answer the question, which wanted values that are not equal (sometimes I'm really slow). This will work for that:
db.myCollection.aggregate([
{"$match":{"a1":{"$exists":true},"a2":{"$exists":true}}},
{"$project": {
"a1":1,
"a2":1,
"aEq": {"$eq":["$a1.a","$a2.a"]}
}
},
{"$match":{"aEq": false}}
])
$ne could be used in place of $eq if the match condition was changed to true but I find using $eq with false to be more intuitive.
MongoDB uses Javascript in the background, so
{"a": "b"} == {"a": "b"}
would be false.
So to compare each you would have to a1.a == a2.a
To do this in MongoDB you would use the $where operator
db.myCollection.find({$where: "this.a1.a != this.a2.a"});
This assumes that each embedded document will have a property "a". If that isn't the case things get more complicated.
Starting in Mongo 4.4, for those that want to compare sub-documents and not only primitive values (since {"a": "b"} == {"a": "b"} is false), we can use the new $function aggregation operator that allows applying a custom javascript function:
// { "a1" : { "x" : 1, "y" : 2 }, "a2" : { "x" : 1, "y" : 2 } }
// { "a1" : { "x" : 1, "y" : 2 }, "a2" : { "x" : 3, "y" : 2 } }
db.collection.aggregate(
{ $match:
{ $expr:
{ $function: {
body: function(a1, a2) { return JSON.stringify(a1) != JSON.stringify(a2); },
args: ["$a1", "$a2"],
lang: "js"
}}
}
}
)
// { "a1" : { "x" : 1, "y" : 2 }, "a2" : { "x" : 3, "y" : 2 } }
$function takes 3 parameters:
body, which is the function to apply, whose parameter are the two fields to compare.
args, which contains the fields from the record that the body function takes as parameter. In our case, both "$a1" and "$a2".
lang, which is the language in which the body function is written. Only js is currently available.
Thanks all for solving my problem -- concerning the answers that use aggregate(), one thing that confused me at first is that $eq (or $in, or lots of other operators) has different meaning depending on where it is used. In a find(), or the $match phase of aggregation, $eq takes a single value, and selects matching documents:
db.items.aggregate([{$match: {_id: {$eq: ObjectId("5be5feb45da16064c88e23d4")}}}])
However, in the $project phase of aggregation, $eq takes an Array of 2 expressions, and makes a new field with value true or false:
db.items.aggregate([{$project: {new_field: {$eq: ["$_id", "$foreignID"]}}}])
In passing, here's the query I used in my project to find all items whose list of linked items (due to a bug) linked to themselves:
db.items.aggregate([{$project: {idIn: {$in: ["$_id","$header.links"]}, "header.links": 1}}, {$match: {idIn: true}}])

Using stored JavaScript functions in the Aggregation pipeline, MapReduce or runCommand

Is there a way to use a user-defined function saved as db.system.js.save(...) in pipeline or mapreduce?
Any function you save to system.js is available for usage by "JavaScript" processing statements such as the $where operator and mapReduce and can be referenced by the _id value is was asssigned.
db.system.js.save({
"_id": "squareThis",
"value": function(a) { return a*a }
})
And some data inserted to "sample" collection:
{ "_id" : ObjectId("55aafd2bacbed38e06f9eccf"), "a" : 1 }
{ "_id" : ObjectId("55aafea6acbed38e06f9ecd0"), "a" : 2 }
{ "_id" : ObjectId("55aafeabacbed38e06f9ecd1"), "a" : 3 }
Then:
db.sample.mapReduce(
function() {
emit(null, squareThis(this.a));
},
function(key,values) {
return Array.sum(values);
},
{ "out": { "inline": 1 } }
);
Gives:
"results" : [
{
"_id" : null,
"value" : 14
}
],
Or with $where:
db.sample.find(function() { return squareThis(this.a) == 9 })
{ "_id" : ObjectId("55aafeabacbed38e06f9ecd1"), "a" : 3 }
But in "neither" case can you use globals such as the database db reference or other functions. Both $where and mapReduce documentation contain information of the limits of what you can do here. So if you thought you were going to do something like "look up data in another collection", then you can forget it because it is "Not Allowed".
Every MongoDB command action is actually a call to a "runCommand" action "under the hood" anyway. But unless what that command is actually doing is "calling a JavaScript processing engine" then the usage becomes irrelevant. There are only a few commands anyway that do this, being mapReduce, group or eval, and of course the find operations with $where.
The aggregation framework does not use JavaScript in any way at all. You might be mistaking just as others have done a statement like this, which does not do what you think it does:
db.sample.aggregate([
{ "$match": {
"a": { "$in": db.sample.distinct("a") }
}}
])
So that is "not running inside" the aggregation pipeline, but rather the "result" of that .distinct() call is "evaluated" before the pipeline is sent to the server. Much as with an external variable is done anyway:
var items = [1,2,3];
db.sample.aggregate([
{ "$match": {
"a": { "$in": items }
}}
])
Both essentially send to the server in the same way:
db.sample.aggregate([
{ "$match": {
"a": { "$in": [1,2,3] }
}}
])
So it is "not possible" to "call" any JavaScript function in the aggregation pipeline, nor is there really any point is "passing in" results in general from something saved in system.js. The "code" needs to be "loaded to the client" and only a JavaScript engine can actually do anything with it.
With the aggregation framework, all of the "operators" available are actually natively coded functions as opposed to the "free form" JavaScript interpretation provided for mapReduce. So instead of writing "JavaScript", you use the operators themselves:
db.sample.aggregate([
{ "$group": {
"_id": null,
"sqared": { "$sum": {
"$multiply": [ "$a", "$a" ]
}}
}}
])
{ "_id" : null, "sqared" : 14 }
So there are limitations on what you can do with functions saved in system.js, and the chances are that what you want to do is either:
Not allowed, such as accessing data from another collection
Not really required as the logic is generally self contained anyway
Or probably better implemented in client logic or other different form anyway
Just about the only practical use I can really think of is that you have a number of "mapReduce" operations that cannot be done any other way and you have various "shared" functions that you would rather just store on the server than maintain within every mapReduce function call.
But then again, the 90% reason for mapReduce over the aggregation framework is usually that the "document structure" of the collections has been poorly chosen and the JavaScript functionality is "required" to traverse the document for search and analysis.
So you can use it under the allowed constraints, but in most cases you probably should not be using this at all, but fixing the other issues that caused you to believe you needed this feature in the first place.