Wrong reference in Spring MongoDB data aggregation - mongodb

I have mongodb documents of following kind:
{
"_id" : {
"refId" : ObjectId("55e44dd70a975a6fec7ae66e"),
"someString" : "foo"
},
"someValue" : 1,
"date" : NumberLong("1441025536869"),
"subdoc" : {
"someRef" : ObjectId("xf2h55e44dd70a975a6fec7a"),
"count" : 99
}
}
Want to get list of documents with unique "subdoc.someRef" with earliest "date" (if there are any documents with same "subdoc.someRef") and sorted by any field. The issue is in sorting by field "subdoc.count".
I use following spring data aggregation:
Aggregation agg = Aggregation.newAggregation(
Aggregation.match(match),
Aggregation.project("date", "someValue")
.and("subdoc.someRef").as("someRef")
.and("subdoc.count").as("count")
.and("_id.someString").as("someString")
.and("_id.refId").as("refId"),
Aggregation.sort(Sort.Direction.ASC, "date"),
Aggregation.group("someRef")
.min("date").as("date")
.first("refId").as("refId")
.first("someValue").as("someValue")
.first("count").as("count"),
Aggregation.project("someValue", "date", "refId", "count")
.and("_id").as("someRef"),
Aggregation.sort(pageable.getSort()),
Aggregation.skip(pageable.getOffset()),
Aggregation.limit(pageable.getPageSize())
);
Everething is fine except how spring data converts: .first("count").as("count")
I always got "count" : null in aggregation result.
DBObject created by spring is not what I've expected. That line in log concerns me:
"$group" : { "_id" : "$someRef" , "date" : { "$min" : "$date"} , "refId" : { "$first" : "$_id.refId"} , "someValue" : { "$first" : "$someValue"} , "count" : { "$first" : "$subdoc.count"}}}
I cannot understand why it always puts "$subdoc.count" instead of putting "$count" as result of previous step of aggreagation pipeline. So "count" in result is always null as "$subdoc.count" is always null in last grop step.
How to make Spring use the value that I want instead of putting reference?

Thanks to comment, the solution is to eliminate projections and updating group stage. Grouping work well with subdocument references, that I didn't expected:
Aggregation agg = Aggregation.newAggregation(
Aggregation.match(match),
Aggregation.sort(Sort.Direction.ASC, "date"),
Aggregation.group("subdoc.someRef")
.first("subdoc.count").as("count")
.first("_id.refId").as("refId")
.first("_id.someString").as("someString")
.first("date").as("date")
.first("someValue").as("someValue"),
Aggregation.sort(pageable.getSort()),
Aggregation.skip(pageable.getOffset()),
Aggregation.limit(pageable.getPageSize())
);

Related

MongoDB get all embedded documents where condition is met

I did this in my mongodb:
db.teams.insert({name:"Alpha team",employees:[{name:"john"},{name:"david"}]});
db.teams.insert({name:"True team",employees:[{name:"oliver"},{name:"sam"}]});
db.teams.insert({name:"Blue team",employees:[{name:"jane"},{name:"raji"}]});
db.teams.find({"employees.name":/.*o.*/});
But what I got was:
{ "_id" : ObjectId("5ddf3ca83c182cc5354a15dd"), "name" : "Alpha team", "employees" : [ { "name" : "john" }, { "name" : "david" } ] }
{ "_id" : ObjectId("5ddf3ca93c182cc5354a15de"), "name" : "True team", "employees" : [ { "name" : "oliver" }, { "name" : "sam" } ] }
But what I really want is
[{"name":"john"},{"name":"oliver"}]
I'm having a hard time finding examples of this without using some kind of programmatic iterator/loop. Or examples I find return the parent document, which means I'd have to parse out the embedded array employees and do some kind of UNION statement?
Eg.
How to get embedded document in mongodb?
Retrieve only the queried element in an object array in MongoDB collection
Can someone point me in the right direction?
Please add projections to filter out the fields you don't need. Please refer the project link mongodb projections
Your find query should be constructed with the projection parameters like below:
db.teams.find({"employees.name":/.*o.*/}, {_id:0, "employees.name": 1});
This will return you:
[{"name":"john"},{"name":"oliver"}]
Can be solved with a simple aggregation pipeline.
db.teams.aggregate([
{$unwind : "$employees"},
{$match : {"employees.name":/.*o.*/}},
])
EDIT:
OP Wants to skip the parent fields. Modified query:
db.teams.aggregate([
{$unwind : "$employees"},
{$match : {"employees.name":/.*o.*/}},
{$project : {"name":"$employees.name",_id:0}}
])
Output:
{ "name" : "john" }
{ "name" : "oliver" }

MongoDB 4.0 aggregation addFields not saving documents after using toDate

I have the following documents,
{
"_id" : ObjectId("5b85312981c1634f59751604"),
"date" : "0"
},
{
"_id" : ObjectId("5b85312981c1634f59751604"),
"date" : "20180330"
},
{
"_id" : ObjectId("5b85312981c1634f59751604"),
"date" : "20180402"
},
{
"_id" : ObjectId("5b85312981c1634f59751604"),
"date" : "20180323"
},
I tried to convert date to ISODate using $toDate in aggregation,
db.documents.aggregate( [ { "$addFields": { "received_date": { "$cond": [ {"$ne": ["$date", "0"] }, {"$toDate": "$date"}, new Date("1970-01-01") ] } } } ] )
the query executed fine, but when I
db.documents.find({})
to examine all the documents, nothing changed, I am wondering how to fix it. I am using MongoDB 4.0.6 on Linux Mint 19.1 X64.
As they mentioned in the comments, aggregate doesn't update documents in the database directly (just an output of them).
If you'd like to permanently add a new field to documents via aggregation (aka update the documents in the database), use the following .forEach/.updateOne method:
Your example:
db.documents
.aggregate([{"$addFields":{"received_date":{"$cond":[{"$ne":["$date","0"]}, {"$toDate": "$date"}, new Date("1970-01-01")]}}}])
.forEach(function (x){db.documents.updateOne({_id: x._id}, {$set: {"received_date": x.received_date}})})
Since _id's value is an ObjectID(), there may be a slight modification you need to do to {_id:x._id}. If there is, let me know and I'll update it!
Another example:
db.users.find().pretty()
{ "_id" : ObjectId("5acb81b53306361018814849"), "name" : "A", "age" : 1 }
{ "_id" : ObjectId("5acb81b5330636101881484a"), "name" : "B", "age" : 2 }
{ "_id" : ObjectId("5acb81b5330636101881484b"), "name" : "C", "age" : 3 }
db.users
.aggregate([{$addFields:{totalAge:{$sum:"$age"}}}])
.forEach(function (x){db.users.updateOne({name: x.name}, {$set: {totalAge: x.totalAge}})})
Being able to update collections via the aggregation pipeline seems to be quite valuable because of what you have the power to do with aggregation (e.g. what you did in your question, doing calculations based on other fields within the document, etc.). I'm newer to MongoDB so maybe updating collections via aggregation pipeline is "bad practice", but it works and it's been quite valuable for me. I wonder why it isn't more straight-forward to do?
Note: I came up with this method after discovering Nazo's now-deprecated .save() method. Shoutout to Nazo!

Mongo aggregation on array elements

I have a mongo document like
{ "_id" : 12, "location" : [ "Kannur","Hyderabad","Chennai","Bengaluru"] }
{ "_id" : 13, "location" : [ "Hyderabad","Chennai","Mysore","Ballary"] }
From this how can I get the location aggregation (distinct area count).
some thing like
Hyderabad 2,
Kannur 1,
Chennai 2,
Bengaluru 1,
Mysore 1,
Ballary 1
Using aggregation you cannot get the exact output that you want. One of the limitations of aggregation pipeline is its inability to transform values to keys in the output document.
For example, Kannur is one of the values of the location field, in the input document. In your desired output structure it needs to be the key("kannur":1). This is not possible using aggregation. While, this can be used achieving map-reduce, you can however get a very closely related and useful structure using aggregation.
Unwind the location array.
Group by the location fields, get the count of individual locations
using the $sum operator.
Group again all the documents once again to get a consolidated array
of results.
Code:
db.collection.aggregate([
{$unwind:"$location"},
{$group:{"_id":"$location","count":{$sum:1}}},
{$group:{"_id":null,"location_details":{$push:{"location":"$_id",
"count":"$count"}}}},
{$project:{"_id":0,"location_details":1}}
])
Sample o/p:
{
"location_details" : [
{
"location" : "Ballary",
"count" : 1
},
{
"location" : "Mysore",
"count" : 1
},
{
"location" : "Bengaluru",
"count" : 1
},
{
"location" : "Chennai",
"count" : 2
},
{
"location" : "Hyderabad",
"count" : 2
},
{
"location" : "Kannur",
"count" : 1
}
]
}

MongoDB Group querying for Embeded Document

I have a mongo document which has structure like
{
"_id" : "THIS_IS_A_DHP_USER_ID+2014-11-26",
"_class" : "weight",
"items" : [
{
"dateTime" : ISODate("2014-11-26T08:08:38.716Z"),
"value" : 98.5
},
{
"dateTime" : ISODate("2014-11-26T08:18:38.716Z"),
"value" : 95.5
},
{
"dateTime" : ISODate("2014-11-26T08:28:38.663Z"),
"value" : 90.5
}
],
"source" : "MANUAL",
"to" : ISODate("2014-11-26T08:08:38.716Z"),
"from" : ISODate("2014-11-26T08:08:38.716Z"),
"userId" : "THIS_IS_A_DHP_USER_ID",
"createdDate" : ISODate("2014-11-26T08:38:38.776Z")
}
{
"_id" : "THIS_IS_A_DHP_USER_ID+2014-11-25",
"_class" : "weight",
"items" : [
{
"dateTime" : ISODate("2014-11-25T08:08:38.716Z"),
"value" : 198.5
},
{
"dateTime" : ISODate("2014-11-25T08:18:38.716Z"),
"value" : 195.5
},
{
"dateTime" : ISODate("2014-11-25T08:28:38.716Z"),
"value" : 190.5
}
],
"source" : "MANUAL",
"to" : ISODate("2014-11-25T08:08:38.716Z"),
"from" : ISODate("2014-11-25T08:08:38.716Z"),
"userId" : "THIS_IS_A_DHP_USER_ID",
"createdDate" : ISODate("2014-11-26T08:38:38.893Z")
}
The query that want to fire on this document structure,
finding documents for a particular user id
unwinding the embedded array
Grouping the documents based over _id with -
summing the items.value of the embedded array
getting the minimum of the items.dateTime of the embedded array
Note. The sum and min, I want to get as a object i.e. { value : sum , dateTime : min of the items.dateTime} inside an array of items
Can this be achieved in an single aggregation call using push or some other technique.
When you group over a particular _id, and apply aggregation operators such as $min and $sum, there exists only one record per group(_id), that holds the sum and the minimum date for that group. So there is no way to obtain a different sum and a different minimum date for the same _id, which also logically makes no sense.
What you would want to do is:
db.collection.aggregate([
{$match:{"userId":"THIS_IS_A_DHP_USER_ID"}},
{$unwind:"$items"},
{$group:{"_id":"$_id",
"values":{$sum:"$items.value"},
"dateTime":{$min:"$items.dateTime"}}}
])
But in case when you do not query for a particular userId, then you would have multiple groups, each group having its own sum and min date. Then it makes sense to accumulate all these results together in an array using the $push operator.
db.collection.aggregate([
{$unwind:"$items"},
{$group:{"_id":"$_id",
"result":{$sum:"$items.value"},
"dateTime":{$min:"$items.dateTime"}}},
{$group:{"_id":null,"result":{$push:{"value":"$result",
"dateTime":"$dateTime",
"id":"$_id"}}}},
{$project:{"_id":0,"result":1}}
])
you should use following aggregation may it works
db.collectionName.aggregate(
{"$unwind":"$items"},
{"$match":{"userId":"THIS_IS_A_DHP_USER_ID"}},
{"$group":{"_id":"$_id","sum":{"$sum":"$items.value"},
"minDate":{"$min":"$items.dateTime"}}}
)

How can I select a number of records per a specific field using mongodb?

I have a collection of documents in mongodb, each of which have a "group" field that refers to a group that owns the document. The documents look like this:
{
group: <objectID>
name: <string>
contents: <string>
date: <Date>
}
I'd like to construct a query which returns the most recent N documents for each group. For example, suppose there are 5 groups, each of which have 20 documents. I want to write a query which will return the top 3 for each group, which would return 15 documents, 3 from each group. Each group gets 3, even if another group has a 4th that's more recent.
In the SQL world, I believe this type of query is done with "partition by" and a counter. Is there such a thing in mongodb, short of doing N+1 separate queries for N groups?
You cannot do this using the aggregation framework yet - you can get the $max or top date value for each group but aggregation framework does not yet have a way to accumulate top N plus there is no way to push the entire document into the result set (only individual fields).
So you have to fall back on MapReduce. Here is something that would work, but I'm sure there are many variants (all require somehow sorting an array of objects based on a specific attribute, I borrowed my solution from one of the answers in this question.
Map function - outputs group name as a key and the entire rest of the document as the value - but it outputs it as a document containing an array because we will try to accumulate an array of results per group:
map = function () {
emit(this.name, {a:[this]});
}
The reduce function will accumulate all the documents belonging to the same group into one array (via concat). Note that if you optimize reduce to keep only the top five array elements by checking date then you won't need the finalize function, and you will use less memory during running mapreduce (it will also be faster).
reduce = function (key, values) {
result={a:[]};
values.forEach( function(v) {
result.a = v.a.concat(result.a);
} );
return result;
}
Since I'm keeping all values for each key, I need a finalize function to pull out only latest five elements per key.
final = function (key, value) {
Array.prototype.sortByProp = function(p){
return this.sort(function(a,b){
return (a[p] < b[p]) ? 1 : (a[p] > b[p]) ? -1 : 0;
});
}
value.a.sortByProp('date');
return value.a.slice(0,5);
}
Using a template document similar to one you provided, you run this by calling mapReduce command:
> db.top5.mapReduce(map, reduce, {finalize:final, out:{inline:1}})
{
"results" : [
{
"_id" : "group1",
"value" : [
{
"_id" : ObjectId("516f011fbfd3e39f184cfe13"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.498Z"),
"contents" : 0.23778377776034176
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0e"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.467Z"),
"contents" : 0.4434165076818317
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe09"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.436Z"),
"contents" : 0.5935856597498059
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe04"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.405Z"),
"contents" : 0.3912118375301361
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfdff"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.372Z"),
"contents" : 0.221651989268139
}
]
},
{
"_id" : "group2",
"value" : [
{
"_id" : ObjectId("516f011fbfd3e39f184cfe14"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.504Z"),
"contents" : 0.019611883210018277
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0f"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.473Z"),
"contents" : 0.5670706110540777
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0a"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.442Z"),
"contents" : 0.893193120136857
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe05"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.411Z"),
"contents" : 0.9496864483226091
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe00"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.378Z"),
"contents" : 0.013748752186074853
}
]
},
{
"_id" : "group3",
...
}
]
}
],
"timeMillis" : 15,
"counts" : {
"input" : 80,
"emit" : 80,
"reduce" : 5,
"output" : 5
},
"ok" : 1,
}
Each result has _id as group name and values as array of most recent five documents from the collection for that group name.
you need aggregation framework $group stage piped in a $limit stage...
you want also to $sort the records in some ways or else the limit will have undefined behaviour, the returned documents will be pseudo-random (the order used internally by mongo)
something like that:
db.collection.aggregate([{$group:...},{$sort:...},{$limit:...}])
here there is the documentation if you want to know more