MongoDB Group querying for Embeded Document - mongodb

I have a mongo document which has structure like
{
"_id" : "THIS_IS_A_DHP_USER_ID+2014-11-26",
"_class" : "weight",
"items" : [
{
"dateTime" : ISODate("2014-11-26T08:08:38.716Z"),
"value" : 98.5
},
{
"dateTime" : ISODate("2014-11-26T08:18:38.716Z"),
"value" : 95.5
},
{
"dateTime" : ISODate("2014-11-26T08:28:38.663Z"),
"value" : 90.5
}
],
"source" : "MANUAL",
"to" : ISODate("2014-11-26T08:08:38.716Z"),
"from" : ISODate("2014-11-26T08:08:38.716Z"),
"userId" : "THIS_IS_A_DHP_USER_ID",
"createdDate" : ISODate("2014-11-26T08:38:38.776Z")
}
{
"_id" : "THIS_IS_A_DHP_USER_ID+2014-11-25",
"_class" : "weight",
"items" : [
{
"dateTime" : ISODate("2014-11-25T08:08:38.716Z"),
"value" : 198.5
},
{
"dateTime" : ISODate("2014-11-25T08:18:38.716Z"),
"value" : 195.5
},
{
"dateTime" : ISODate("2014-11-25T08:28:38.716Z"),
"value" : 190.5
}
],
"source" : "MANUAL",
"to" : ISODate("2014-11-25T08:08:38.716Z"),
"from" : ISODate("2014-11-25T08:08:38.716Z"),
"userId" : "THIS_IS_A_DHP_USER_ID",
"createdDate" : ISODate("2014-11-26T08:38:38.893Z")
}
The query that want to fire on this document structure,
finding documents for a particular user id
unwinding the embedded array
Grouping the documents based over _id with -
summing the items.value of the embedded array
getting the minimum of the items.dateTime of the embedded array
Note. The sum and min, I want to get as a object i.e. { value : sum , dateTime : min of the items.dateTime} inside an array of items
Can this be achieved in an single aggregation call using push or some other technique.

When you group over a particular _id, and apply aggregation operators such as $min and $sum, there exists only one record per group(_id), that holds the sum and the minimum date for that group. So there is no way to obtain a different sum and a different minimum date for the same _id, which also logically makes no sense.
What you would want to do is:
db.collection.aggregate([
{$match:{"userId":"THIS_IS_A_DHP_USER_ID"}},
{$unwind:"$items"},
{$group:{"_id":"$_id",
"values":{$sum:"$items.value"},
"dateTime":{$min:"$items.dateTime"}}}
])
But in case when you do not query for a particular userId, then you would have multiple groups, each group having its own sum and min date. Then it makes sense to accumulate all these results together in an array using the $push operator.
db.collection.aggregate([
{$unwind:"$items"},
{$group:{"_id":"$_id",
"result":{$sum:"$items.value"},
"dateTime":{$min:"$items.dateTime"}}},
{$group:{"_id":null,"result":{$push:{"value":"$result",
"dateTime":"$dateTime",
"id":"$_id"}}}},
{$project:{"_id":0,"result":1}}
])

you should use following aggregation may it works
db.collectionName.aggregate(
{"$unwind":"$items"},
{"$match":{"userId":"THIS_IS_A_DHP_USER_ID"}},
{"$group":{"_id":"$_id","sum":{"$sum":"$items.value"},
"minDate":{"$min":"$items.dateTime"}}}
)

Related

Mongodb aggregate lookup return only one field of array

i have some collections for our project.
Casts collection contains movie casts
Contents collection contains movie contents
i want to run aggregate lookup for get information about movie casts with position type.
i removed collections details unnecessary fields.
Casts details:
{
"_id" : ObjectId("5a6cf47415621604942386cd"),
"fa_name" : "",
"en_name" : "Ehsan",
"fa_bio" : "",
"en_bio" : ""
}
Contents details:
{
"_id" : ObjectId("5a6b8b734f1408137f79e2cc"),
"casts" : [
{
"_id" : ObjectId("5a6cf47415621604942386cd"),
"fa_fictionName" : "",
"en_fictionName" : "Ehsan2",
"positionType" : {
"id" : 3,
"fa_name" : "",
"en_name" : "Director"
}
},
{
"_id" : ObjectId("5a6cf47415621604942386cd"),
"fa_fictionName" : "",
"en_fictionName" : "Ehsan1",
"positionType" : {
"id" : 3,
"fa_name" : "",
"en_name" : "Writers"
}
}
],
"status" : 0,
"created" : Timestamp(1516997542, 4),
"updated" : Timestamp(1516997542, 5)
}
when i run aggregate lookup with bellow query, in new generated lookup array only one casts contents If in accordance with above casts array value aggregate lookup should return two casts content with two type. in casts array value exists two type of casts, 1) writers and directors. but returned director casts content. _casts should contains two object not one object!
aggregate lookup query:
{$lookup:{from:"casts",localField:"casts._id",foreignField:"_id",as:"_casts"}}
result:
{
"_id" : ObjectId("5a6b8b734f1408137f79e2cc"),
"casts" : [
{
"_id" : ObjectId("5a6cf47415621604942386cd"),
"fa_fictionName" : "",
"en_fictionName" : "Ehsan2",
"positionType" : {
"id" : 3,
"fa_name" : "",
"en_name" : "Director"
}
},
{
"_id" : ObjectId("5a6cf47415621604942386cd"),
"fa_fictionName" : "",
"en_fictionName" : "Ehsan1",
"positionType" : {
"id" : 3,
"fa_name" : "",
"en_name" : "Writers"
}
}
],
"_casts" : [
{
"_id" : ObjectId("5a6cf47415621604942386cd"),
"fa_name" : "",
"en_name" : "Ehsan",
"fa_bio" : "",
"en_bio" : ""
}
],
"status" : 0,
"created" : Timestamp(1516997542, 4),
"updated" : Timestamp(1516997542, 5)
}
EDIT-1
finally my problem is solved. i have only one problem with this query, this query doesn't show root document fields. finally solve this problem. finally query exists in EDIT-2.
query:
db.contents.aggregate([
{"$unwind":"$casts"},
{"$lookup":{"from":"casts","localField":"casts._id","foreignField":"_id","as":"casts.info"}},
{"$unwind":"$casts.info"},
{"$group":{"_id":"$_id", "casts":{"$push":"$casts"}}},
])
EDIT-2
db.contents.aggregate([
{"$unwind":"$casts"},
{"$lookup":{"from":"casts","localField":"casts._id","foreignField":"_id","as":"casts.info"}},
{"$unwind":"$casts.info"},
{$group:{"_id":"$_id", "data":{"$first":"$$ROOT"}, "casts":{"$push":"$casts"}}},
{$replaceRoot:{"newRoot":{"$mergeObjects":["$data",{"casts‌​":"$casts"}]}}},
{$project:{"casts":0}}
]).pretty()
This is expected behavior.
From the docs,
If your localField is an array, you may want to add an $unwind stage
to your pipeline. Otherwise, the equality condition between the
localField and foreignField is foreignField: { $in: [
localField.elem1, localField.elem2, ... ] }.
So to join each local field array element with foreign field element you have to $unwind the local array.
db.content.aggregate([
{"$unwind":"$casts"},
{"$lookup":{"from":"casts","localField":"casts._id","foreignField":"_id","as":"_casts"}}
])
Vendor Collection
Items Collection
db.items.aggregate([
{ $match:
{"item_id":{$eq:"I001"}}
},
{
$lookup:{
from:"vendor",
localField:"vendor_id",
foreignField:"vendor_id",
as:"vendor_details"
}
},
{
$unwind:"$vendor_details"
},
{
$project:{
"_id":0,
"vendor_id":0,
"vendor_details.vendor_company_description":0,
"vendor_details._id":0,
"vendor_details.country":0,
"vendor_details.city":0,
"vendor_details.website":0
}
}
]);
Output
Your Casts collection shows only 1 document. Your Contents collection, likewise, shows only 1 document.
This is 1 to 1 - not 1 to 2. Aggregate is working as designed.
The Contents document has 2 "casts." These 2 casts are sub-documents. Work with those as sub-documents, or re-design your collections. I don't like using sub-documents unless I know I will not need to use them as look-ups or join on them.
I would suggest you re-design your collection.
Your Contents collection (it makes me think of "Movies") could look like this:
_id
title
releaseDate
genre
etc.
You can create a MovieCasts collection like this:
_id
movieId (this is _id from Contents collection, above)
castId (this is _id from Casts collection, below)
Casts
_id
name
age
etc.

Mongo aggregation on array elements

I have a mongo document like
{ "_id" : 12, "location" : [ "Kannur","Hyderabad","Chennai","Bengaluru"] }
{ "_id" : 13, "location" : [ "Hyderabad","Chennai","Mysore","Ballary"] }
From this how can I get the location aggregation (distinct area count).
some thing like
Hyderabad 2,
Kannur 1,
Chennai 2,
Bengaluru 1,
Mysore 1,
Ballary 1
Using aggregation you cannot get the exact output that you want. One of the limitations of aggregation pipeline is its inability to transform values to keys in the output document.
For example, Kannur is one of the values of the location field, in the input document. In your desired output structure it needs to be the key("kannur":1). This is not possible using aggregation. While, this can be used achieving map-reduce, you can however get a very closely related and useful structure using aggregation.
Unwind the location array.
Group by the location fields, get the count of individual locations
using the $sum operator.
Group again all the documents once again to get a consolidated array
of results.
Code:
db.collection.aggregate([
{$unwind:"$location"},
{$group:{"_id":"$location","count":{$sum:1}}},
{$group:{"_id":null,"location_details":{$push:{"location":"$_id",
"count":"$count"}}}},
{$project:{"_id":0,"location_details":1}}
])
Sample o/p:
{
"location_details" : [
{
"location" : "Ballary",
"count" : 1
},
{
"location" : "Mysore",
"count" : 1
},
{
"location" : "Bengaluru",
"count" : 1
},
{
"location" : "Chennai",
"count" : 2
},
{
"location" : "Hyderabad",
"count" : 2
},
{
"location" : "Kannur",
"count" : 1
}
]
}

How can I select a number of records per a specific field using mongodb?

I have a collection of documents in mongodb, each of which have a "group" field that refers to a group that owns the document. The documents look like this:
{
group: <objectID>
name: <string>
contents: <string>
date: <Date>
}
I'd like to construct a query which returns the most recent N documents for each group. For example, suppose there are 5 groups, each of which have 20 documents. I want to write a query which will return the top 3 for each group, which would return 15 documents, 3 from each group. Each group gets 3, even if another group has a 4th that's more recent.
In the SQL world, I believe this type of query is done with "partition by" and a counter. Is there such a thing in mongodb, short of doing N+1 separate queries for N groups?
You cannot do this using the aggregation framework yet - you can get the $max or top date value for each group but aggregation framework does not yet have a way to accumulate top N plus there is no way to push the entire document into the result set (only individual fields).
So you have to fall back on MapReduce. Here is something that would work, but I'm sure there are many variants (all require somehow sorting an array of objects based on a specific attribute, I borrowed my solution from one of the answers in this question.
Map function - outputs group name as a key and the entire rest of the document as the value - but it outputs it as a document containing an array because we will try to accumulate an array of results per group:
map = function () {
emit(this.name, {a:[this]});
}
The reduce function will accumulate all the documents belonging to the same group into one array (via concat). Note that if you optimize reduce to keep only the top five array elements by checking date then you won't need the finalize function, and you will use less memory during running mapreduce (it will also be faster).
reduce = function (key, values) {
result={a:[]};
values.forEach( function(v) {
result.a = v.a.concat(result.a);
} );
return result;
}
Since I'm keeping all values for each key, I need a finalize function to pull out only latest five elements per key.
final = function (key, value) {
Array.prototype.sortByProp = function(p){
return this.sort(function(a,b){
return (a[p] < b[p]) ? 1 : (a[p] > b[p]) ? -1 : 0;
});
}
value.a.sortByProp('date');
return value.a.slice(0,5);
}
Using a template document similar to one you provided, you run this by calling mapReduce command:
> db.top5.mapReduce(map, reduce, {finalize:final, out:{inline:1}})
{
"results" : [
{
"_id" : "group1",
"value" : [
{
"_id" : ObjectId("516f011fbfd3e39f184cfe13"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.498Z"),
"contents" : 0.23778377776034176
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0e"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.467Z"),
"contents" : 0.4434165076818317
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe09"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.436Z"),
"contents" : 0.5935856597498059
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe04"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.405Z"),
"contents" : 0.3912118375301361
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfdff"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.372Z"),
"contents" : 0.221651989268139
}
]
},
{
"_id" : "group2",
"value" : [
{
"_id" : ObjectId("516f011fbfd3e39f184cfe14"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.504Z"),
"contents" : 0.019611883210018277
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0f"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.473Z"),
"contents" : 0.5670706110540777
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0a"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.442Z"),
"contents" : 0.893193120136857
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe05"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.411Z"),
"contents" : 0.9496864483226091
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe00"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.378Z"),
"contents" : 0.013748752186074853
}
]
},
{
"_id" : "group3",
...
}
]
}
],
"timeMillis" : 15,
"counts" : {
"input" : 80,
"emit" : 80,
"reduce" : 5,
"output" : 5
},
"ok" : 1,
}
Each result has _id as group name and values as array of most recent five documents from the collection for that group name.
you need aggregation framework $group stage piped in a $limit stage...
you want also to $sort the records in some ways or else the limit will have undefined behaviour, the returned documents will be pseudo-random (the order used internally by mongo)
something like that:
db.collection.aggregate([{$group:...},{$sort:...},{$limit:...}])
here there is the documentation if you want to know more

mongodb get elements which was inserted after some document

I have a document and I need to query mongodb database to return me all the documents which was inserted after current document.
Is it possible and how to do that query?
If you do not override the default _id field you can use that objectID (see the mongodb docs) to make a comparison by time. For instance, the following query will find all the documents that are inserted after curDoc has been inserted (assuming none overwrite the _id field):
>db.test.find({ _id : {$gt : curDoc._id}})
Note that these timestamps are not super granular, if you would like a finer grained view of the time that documents are inserted I encourage you to add your own timestamp field to the documents you are inserting and use that field to make such queries.
If you are using Insert time stamp as on of the parameter, you can query like below
> db.foo.find()
{ "_id" : ObjectId("514bf8bbbe11e483111af213"), "Name" : "abc", "Insert_time" : ISODate("2013-03-22T06:22:51.422Z") }
{ "_id" : ObjectId("514bf8c5be11e483111af214"), "Name" : "xyz", "Insert_time" : ISODate("2013-03-22T06:23:01.310Z") }
{ "_id" : ObjectId("514bf8cebe11e483111af215"), "Name" : "pqr", "Insert_time" : ISODate("2013-03-22T06:23:10.006Z") }
{ "_id" : ObjectId("514bf8eabe11e483111af216"), "Name" : "ijk", "Insert_time" : ISODate("2013-03-22T06:23:38.410Z") }
>
Here my Insert_time corresponds to the document inserted time, and following query will give you the documents after a particular Insert_time,
> db.foo.find({Insert_time:{$gt:ISODate("2013-03-22T06:22:51.422Z")}})
{ "_id" : ObjectId("514bf8c5be11e483111af214"), "Name" : "xyz", "Insert_time" : ISODate("2013-03-22T06:23:01.310Z") }
{ "_id" : ObjectId("514bf8cebe11e483111af215"), "Name" : "pqr", "Insert_time" : ISODate("2013-03-22T06:23:10.006Z") }
{ "_id" : ObjectId("514bf8eabe11e483111af216"), "Name" : "ijk", "Insert_time" : ISODate("2013-03-22T06:23:38.410Z") }
>

MongoDb - How to search BSON composite key exactly?

I have a collection that stored information about devices like the following:
/* 1 */
{
"_id" : {
"startDate" : "2012-12-20",
"endDate" : "2012-12-30",
"dimensions" : ["manufacturer", "model"],
"metrics" : ["deviceCount"]
},
"data" : {
"results" : "1"
}
}
/* 2 */
{
"_id" : {
"startDate" : "2012-12-20",
"endDate" : "2012-12-30",
"dimensions" : ["manufacturer", "model"],
"metrics" : ["deviceCount", "noOfUsers"]
},
"data" : {
"results" : "2"
}
}
/* 3 */
{
"_id" : {
"dimensions" : ["manufacturer", "model"],
"metrics" : ["deviceCount", "noOfUsers"]
},
"data" : {
"results" : "3"
}
}
And I am trying to query the documents using the _id field which will be unique. The problem I am having is that when I query for all the different attributes as in:
db.collection.find({$and: [{"_id.dimensions":{ $all: ["manufacturer","model"], $size: 2}}, {"_id.metrics": { $all:["noOfUsers","deviceCount"], $size: 2}}]});
This matches 2 and 3 documents (I don't care about the order of the attributes values), but I would like to only get 3 back. How can I say that there should not be any other attributes to _id than those that I specify in the search query?
Please advise. Thanks.
Unfortunately, I think the closest you can get to narrowing your query results to just unordered _id.dimensions and unordered _id.metrics requires you to know the other possible fields in the _id subdocument field, eg. startDate and endDate.
db.collection.find({$and: [
{"_id.dimensions":{ $all: ["manufacturer","model"], $size: 2}},
{"_id.metrics": { $all:["noOfUsers","deviceCount"], $size: 2}},
{"_id.startDate":{$exists:false}},
{"_id.endDate":{$exists:false}}
]});
If you don't know the set of possible fields in _id, then the other possible solution would be to specify the exact _id that you want, eg.
db.collection.find({"_id" : {
"dimensions" : ["manufacturer", "model"],
"metrics" : ["deviceCount", "noOfUsers"]
}})
but this means that the order of _id.dimensions and _id.metrics is significant. This last query does a document match on exact BSON representation of _id.