This question already has answers here:
How to use $cond operation in Spring-MongoDb aggregation framework
(2 answers)
Closed 5 years ago.
I am looking to aggregate the following data
{
"user": "user1",
"error": true
}
{
"user": "user2",
"error": false
}
{
"user": "user1",
"error": false
}
Into
{
"_id": "user1",
"errorCount": 1,
"totalCount": 2
},
{
"_id": "user2",
"errorCount": 0,
"totalCount": 1
}
With $cond operator, this can be achieved using:
$group: {
_id: "$user",
errorCount : { "$sum" : {"$cond" : ["$error", 1, 0]}},
totalCount : { "$sum" : 1 }
}
However, since I am using Spring-data-mongodb which does not yet support $cond (as of 1.3.4-RELEASE), I couldn't do this.
Is there a way to do the same aggregation without $cond?
Thanks to Neil Lunn's suggestion, I have managed to get $cond to using aggregation support from spring-data. To do this, you have to implement the AggregationOperation interface to take in a DBObject.
public class DBObjectAggregationOperation implements AggregationOperation {
private DBObject operation;
public DBObjectAggregationOperation (DBObject operation) {
this.operation = operation;
}
#Override
public DBObject toDBObject(AggregationOperationContext context) {
return context.getMappedObject(operation);
}
}
Then you will be able to use it in TypeAggregation normally:
DBObject operation = (DBObject)JSON.parse ("your pipleline here...");
TypedAggregation<UserStats> aggregation = newAggregation(User.class,
new DBObjectAggregationOperation(operation)
);
AggregationResults<UserStats> result = mongoTemplate.aggregate(aggregation, UserStats.class);
This approach will allow you to use any aggregation operator not yet defined in the framework. However, it also bypassed the validation put in place by spring-data and should be used with caution.
Please vote if you would like to have $cond operator supported properly through the framework: https://jira.springsource.org/browse/DATAMONGO-861
You are not bound to this even if there is no "functional interface" in Spring data yet. (BTW, raise a JIRA)
Just get the native form and use BasicDBObject types in the pipeline. So in principle:
DBCollection myCollection = mongoOperation.getCollection("collection");
<result cast> = myCollection.aggregate(<pipeline here>);
Spring data gives you abstractions, but it does not prohibit the use of the native driver functions. It actually gives you accessors to use them, as I demonstrated above.
Related
I have an entry like below,
[
{
"_id":ObjectId("59ce020caa87df4da0ee2c78"),
"name": "Tom",
"owner_id": ObjectId("59ce020caa87df4da0ee2c78")
},
{
"_id":ObjectId("59ce020caa87df4da0ee2c79"),
"name": "John",
"owner_id": ObjectId("59ce020caa87df4da0ee2c78")
}
]
now, I need to find the person whose _id is equal to owner_id using find() in MongoDB.
Note, we can't not use $match (aggregation) due to some reason.
I am using this query,
db.people.find({ $where: "this._id == this.owner_id" })
but, it's not returning the expected output. Can anyone help me with this.
Thanks.
Using $expr and $eq you can get desired values avoiding the use of $where into a find stage (not aggregation necessary).
db.collection.find({
"$expr": {
"$eq": [
"$_id",
"$owner_id"
]
}
})
new to Mongo. Trying to group across different sub fields of a document based on a condition. The condition is a regex on a field value. Looks like -
db.collection.aggregate([{
{
"$group": {
"$cond": [{
"upper.leaf": {
$not: {
$regex: /flower/
}
}
},
{
"_id": {
"leaf": "$upper.leaf",
"stem": "$upper.stem"
}
},
{
"_id": {
"stem": "$upper.stem",
"petal": "$upper.petal"
}
}
]
}
}])
Using api v4.0: cond in the docs shows - { $cond: [ <boolean-expression>, <true-case>, <false-case> ] }
The error I get with the above code is - "Syntax error: dotted field name 'upper.leaf' can not used in a sub object."
Reading up on that I tried $let to re-assign the dotted field name. But started to hit various syntax errors with no obvious issue in the query.
Also tried using $project to rename the fields, but got - Field names may not start with '$'
Thoughts on the best approach here? I can always address this at the application level and split my query into two but it's attractive potentially to solve it natively in mongo.
$group syntax is wrong
{
$group:
{
_id: <expression>, // Group By Expression
<field1>: { <accumulator1> : <expression1> },
...
}
}
You tried to do
{
$group:
<expression>
}
And even if your expression resulted in the same code, its invalid syntax for $group (check from the documentation where you are allowed to use expressions)
One other problem is that you use the query operator for regex, and not the aggregate regex operators (you can't do that, if you aggregate you can use only aggregate operators, only $match is the exception that you can use both if you add $expr)
You need this i think
[{
"$group" : {
"_id" : {
"$cond" : [ {
"$not" : [ {
"$regexMatch" : {
"input" : "$upper.leaf",
"regex" : "/flower/"}}]},
{"leaf" : "$upper.leaf","stem" : "$upper.stem"},
{"stem" : "$upper.stem","petal" : "$upper.petal"}]
}
}}]
Its similar code, but expression gets as value of the "_id" and $regexMatch
is used that is aggregate operator.
I didnt tested the code.
How can I group by tagValue in Spring and MongoDb?
MongoDB Query :
db.feed.aggregate([
{ $group: { _id: "$feedTag.tagValue", number: { $sum : 1 } } },
{ $sort: { _id : 1 } }
])
How can I do the same thing in Spring MongoDB, may be using Aggregation method?
Sample document of feed collections:
{
"_id" : ObjectId("556846dd1df42d5d579362fd"),
"feedTag" : [
{
"tagName" : "sentiment",
"tagValue" : "neutral",
"modelName" : "sentiment"
}
],
"createdDate" : "2015-05-28"
}
To group by tagValue, since this is an array field, you need to apply the $unwind pipeline step before the group to split the array so that you can get the actual count:
db.feed.aggregate([
{
"$unwind": "$feedTag"
}
{
"$group": {
"_id": "$feedTag.tagValue",
"number": { "$sum" : 1 }
}
},
{ "$sort": { "_id" : 1 } }
])
The following is the equivalent example in Spring Data MongoDB:
import static org.springframework.data.mongodb.core.aggregation.Aggregation.*;
Aggregation agg = newAggregation(
unwind("feedTag"),
group("feedTag.tagValue").count().as("number"),
sort(ASC, "_id")
);
// Convert the aggregation result into a List
AggregationResults<Feed> results = mongoTemplate.aggregate(agg, "feed", Feed.class);
List<Feed> feedCount = results.getMappedResults();
From the above, a new aggregation object is created via the newAggregation static factory method which is passed a list of aggregation operations that define the aggregation pipeline of your Aggregation.
The firt step uses the unwind operation to generate a new document for each tag within the "feedTag" array.
In the second step the group operation defines a group for each embedded "feedTag.tagValue"-value for which the occurrence count is aggregated via the count aggregation operator.
As the third step, sort the resulting list of feedTag by their tagValue in ascending order via the sort operation.
Finally call the aggregate Method on the MongoTemplate to let MongoDB perform the actual aggregation operation with the created Aggregation as an argument.
Note that the input collection is explicitly specified as the "feed" parameter to the aggregate Method. If the name of the input collection is not specified explicitly, it is derived from the input-class passed as first parameter to the newAggreation Method.
I wan't to run a query to get all Articles that have more than 6 com and then sort according length of com list,
for this i doing it:
ArticleModel.objects.filter(com__6__exists=True).order_by('-com.length')[:50]
suppose com is a ListField, but ordering not work, how can i fix it? thanks
Standard queries cannot do this as the "sort" needs to be done on a physical field present in the document. The best way to do this is to actually keep a count of your "list" as another field in the document. That also makes your query more efficient as well as that "counter" field can be indexed, so the basic query becomes ( Raw MongoDB sytax ) :
{ "comLength": { "$gt": 6 } }
If you cannot or do not want to change the document structure then the only way to otherwise sort on the length of your list is to $project it via .aggregate():
ArticleModel._get_collection().aggregate([
{ "$match": { "com.6": { "$exists": true } }},
{ "$project": {
"com": 1,
"otherField": 1,
"comLength": { "$size": "$com" }
}},
{ "$sort": { "comLength": -1 } }
])
And that considers that you have MongoDB 2.6 at least for the use of the $size aggregation operator. If you don't then you have to $unwind and $group in order to calculate the length of arrays:
ArticleModel._get_collection().aggregate([
{ "$match": { "com.6": { "$exists": true } }},
{ "$unwind": "$com" },
{ "$group": {
"_id": "$_id",
"otherField": { "$first": "$otherField" }
"com": { "$push": "$com" },
"comLength": { "$sum": 1 }
}},
{ "$sort": { "comLength": -1 } }
])
So if you are going to go down that route then take a good look at the documentation since you are possibly not used to the raw MongoDB syntax and have been using the query DSL that MongoEngine provdides.
Overall, only the aggregation providers in .aggregate() or .mapReduce() can actually "create a field" that is not present within the document. There is also not test for the "current" length that is available to standard projection or sorting of documents either.
Your best option to to add another field and keep it in sync with the actual array length. But failing that the above shows you the general approach.
If you're creating the database and you know such request will mostly be requested a lot it's recommended to add "com_length" field in A ArticleModel and make it automatically inserted on every save using save() method
add inside of your ArticleModel in models.py
def save(self, *args, **kwargs):
self.com_length = len(self.com)
return super(ArticleModel, self).save(*args, **kwargs)
then for requesting the asked question:
ArticleModel.objects.filter(com__6__exists=True).order_by('-com_length')[:50]
Is there a way to use a user-defined function saved as db.system.js.save(...) in pipeline or mapreduce?
Any function you save to system.js is available for usage by "JavaScript" processing statements such as the $where operator and mapReduce and can be referenced by the _id value is was asssigned.
db.system.js.save({
"_id": "squareThis",
"value": function(a) { return a*a }
})
And some data inserted to "sample" collection:
{ "_id" : ObjectId("55aafd2bacbed38e06f9eccf"), "a" : 1 }
{ "_id" : ObjectId("55aafea6acbed38e06f9ecd0"), "a" : 2 }
{ "_id" : ObjectId("55aafeabacbed38e06f9ecd1"), "a" : 3 }
Then:
db.sample.mapReduce(
function() {
emit(null, squareThis(this.a));
},
function(key,values) {
return Array.sum(values);
},
{ "out": { "inline": 1 } }
);
Gives:
"results" : [
{
"_id" : null,
"value" : 14
}
],
Or with $where:
db.sample.find(function() { return squareThis(this.a) == 9 })
{ "_id" : ObjectId("55aafeabacbed38e06f9ecd1"), "a" : 3 }
But in "neither" case can you use globals such as the database db reference or other functions. Both $where and mapReduce documentation contain information of the limits of what you can do here. So if you thought you were going to do something like "look up data in another collection", then you can forget it because it is "Not Allowed".
Every MongoDB command action is actually a call to a "runCommand" action "under the hood" anyway. But unless what that command is actually doing is "calling a JavaScript processing engine" then the usage becomes irrelevant. There are only a few commands anyway that do this, being mapReduce, group or eval, and of course the find operations with $where.
The aggregation framework does not use JavaScript in any way at all. You might be mistaking just as others have done a statement like this, which does not do what you think it does:
db.sample.aggregate([
{ "$match": {
"a": { "$in": db.sample.distinct("a") }
}}
])
So that is "not running inside" the aggregation pipeline, but rather the "result" of that .distinct() call is "evaluated" before the pipeline is sent to the server. Much as with an external variable is done anyway:
var items = [1,2,3];
db.sample.aggregate([
{ "$match": {
"a": { "$in": items }
}}
])
Both essentially send to the server in the same way:
db.sample.aggregate([
{ "$match": {
"a": { "$in": [1,2,3] }
}}
])
So it is "not possible" to "call" any JavaScript function in the aggregation pipeline, nor is there really any point is "passing in" results in general from something saved in system.js. The "code" needs to be "loaded to the client" and only a JavaScript engine can actually do anything with it.
With the aggregation framework, all of the "operators" available are actually natively coded functions as opposed to the "free form" JavaScript interpretation provided for mapReduce. So instead of writing "JavaScript", you use the operators themselves:
db.sample.aggregate([
{ "$group": {
"_id": null,
"sqared": { "$sum": {
"$multiply": [ "$a", "$a" ]
}}
}}
])
{ "_id" : null, "sqared" : 14 }
So there are limitations on what you can do with functions saved in system.js, and the chances are that what you want to do is either:
Not allowed, such as accessing data from another collection
Not really required as the logic is generally self contained anyway
Or probably better implemented in client logic or other different form anyway
Just about the only practical use I can really think of is that you have a number of "mapReduce" operations that cannot be done any other way and you have various "shared" functions that you would rather just store on the server than maintain within every mapReduce function call.
But then again, the 90% reason for mapReduce over the aggregation framework is usually that the "document structure" of the collections has been poorly chosen and the JavaScript functionality is "required" to traverse the document for search and analysis.
So you can use it under the allowed constraints, but in most cases you probably should not be using this at all, but fixing the other issues that caused you to believe you needed this feature in the first place.