I have question collection each profile can have many questions.
{"_id":"..." , "pid":"...",.....}
Using mongo DB new aggregation framework how can I calculate the avg number of questions per profile?
tried the following without success:
{ "aggregate" : "question" , "pipeline" : [ { "$group" : { "_id" : "$pid" , "qCount" : { "$sum" : 1}}} , { "$group" : { "qavg" : { "$avg" : "qCount"} , "_id" : null }}]}
Can it be done with only one group operator?
Thanks.
For this you just need to know the amount of questions, and the amount of different profiles (uniquely identified with "pid" I presume). With the aggregation framework, you need to do that in two stages:
First, you calculate the number of questions per PID
Then you calculate the average of questions per PID
You'd do that like this:
Step one:
db.profiler.aggregate( [
{ $group: { _id: '$pid', count: { '$sum': 1 } } },
] );
Which outputs (in my case, with some sample data):
{
"result" : [
{ "_id" : 2, "count" : 7 },
{ "_id" : 1, "count" : 1 },
{ "_id" : 3, "count" : 3 },
{ "_id" : 4, "count" : 5 }
],
"ok" : 1
}
I have four profiles, respectively with 7, 1, 3 or 5 questions.
Now with this result, we run another group, but in this case we don't really want to group by anything, and thus do we need to set the _id value to null as you see in the second group below:
db.profiler.aggregate( [
{ $group: { _id: '$pid', count: { '$sum': 1 } } },
{ $group: { _id: null, avg: { $avg: '$count' } } }
] );
And then this outputs:
{
"result" : [
{ "_id" : null, "avg" : 4 }
],
"ok" : 1
}
Which tells me that I have on average, 4 questions per profile.
Related
I want multiplication on value field in last 4 record in mongodb
I have the following data
/* 1 */
{
"_id" : ObjectId("5d6d38f4509717526ea8469c"),
"code" : "302",
"value" : 123,
"timestamp" : ISODate("2019-09-02T15:44:52.012Z"),
"createdDate" : ISODate("2019-09-02T15:44:52.012Z"),
"__v" : 0
}
/* 2 */
{
"_id" : ObjectId("5d6d38f4509717526ea8469e"),
"code" : "340",
"value" : 8,
"timestamp" : ISODate("2019-09-02T15:44:52.013Z"),
"createdDate" : ISODate("2019-09-02T15:44:52.013Z"),
"__v" : 0
}
/* 3 */
{
"_id" : ObjectId("5d6d38f4509717526ea8469d"),
"code" : "327",
"value" : 23,
"timestamp" : ISODate("2019-09-02T15:44:52.013Z"),
"createdDate" : ISODate("2019-09-02T15:44:52.013Z"),
"__v" : 0
}
/* 4 */
{
"_id" : ObjectId("5d6d491d509717526ea8469f"),
"code" : "301",
"value" : 3.48,
"timestamp" : ISODate("2019-09-02T16:53:49.560Z"),
"createdDate" : ISODate("2019-09-02T16:53:49.560Z"),
"__v" : 0
}
and I want to multiply the last 4 records of each code and multiply them, Something like this
last records of each code and multiply all code value and get 1 value.
The more "proper" way to do this would look like this:
db.collection.aggregate([
{ // can remove this step if same code docs are sorted in database
$sort: {
createdDate: -1
}
},
{
$group: {
_id: "$code",
root: {$last: "$ROOT"}
}
},
{
$sort: {
"root.createdDate": -1
}
},
{
$limit: 4
},
{
$group: {
_id: null,
values: {$push: "$root.value"}
}
},
{
$project: {
total:
{
$reduce: {
input: "$values",
initialValue: 1,
in: {$multiply: ["$$value", "$$this"]}
}
}
}
}
])
However on large scale the "proper" way is not really that efficient as you have to scan the entire collection every time.
If indeed that is your case i recommend either using heuristics to fetch a certain amount of documents by sorting and limiting or just batch fetching until the condition is met and then doing the calculation in code. (this is dependant on how your data distribution is)
Assume the following records in mongodb
{
_id: // primary key
age: // some age.
}
The system generates primary key and is guaranteed to be increasing monotonically.
The business logic provides value for age. Age should be increasing, however due to a bug, under some remote cases, the age could be decreasing.
Eg: age could go from 1 yr, 2 yr, 3yr, "2 yr", 4yr, 5yr etc.
How to write a query to spot the outlier in the age ?
Assuming your collection is called 'junk' (sorry, no bad intentions here) I think this might work...
db.junk.aggregate([
{$lookup: {
from: "junk",
let: { age: "$age", id: "$_id" },
pipeline: [
{ $match :
{ $expr:
{ $and:
[
{$gt: ["$_id", "$$id"]},
{ $lt: ["$age", "$$age"] }
]
}
}
}
],
as: "data"
}},
{ $project: { _id: 1, "age": 1, "data": 1, "found": { $gt: [{ $size: "$data" }, 0] } } },
{ $match : { found: true }}
])
The intent is to self join on the same collection where the id is greater than another document, but the age is less for the same document. Count how many records are in this collection, and if the count is greater than 0 output.
Example Collections:
So, for testing this I populated a collection called 'junk' with 7 documents...
> db.junk.find()
{ "_id" : ObjectId("5daf4700090553aca6da1535"), "age" : 0 }
{ "_id" : ObjectId("5daf4700090553aca6da1536"), "age" : 1 }
{ "_id" : ObjectId("5daf4700090553aca6da1537"), "age" : 2 }
{ "_id" : ObjectId("5daf471b090553aca6da1538"), "age" : 3 }
{ "_id" : ObjectId("5daf471e090553aca6da1539"), "age" : 4 }
{ "_id" : ObjectId("5daf4721090553aca6da153a"), "age" : 3 }
{ "_id" : ObjectId("5daf4724090553aca6da153b"), "age" : 5 }
Results:
Here is what my results look like after running this query...
{ "_id" : ObjectId("5daf471e090553aca6da1539"), "age" : 4, "data" : [ { "_id" : ObjectId("5daf4721090553aca6da153a"), "age" : 3 } ], "found" : true }
It found a record having a later outlier (ObjectId 5daf471e090553aca6da1539 precedes the outlier, ObjectId 5daf4721090553aca6da153a is the outlier). Obviously this could be projected differently to show just the outlier, but I wanted to first verify the query works as expected and not invest more time on a inadequate approach.
I'm trying to return the total of requests by type based on their status:
If there is no status set, the request should be added to requested
If the status is ordered, the request should be added to ordered
If the status is arrived, the request should be added to arrived
caseRequest.aggregate([{
$group: {
_id: "$product",
suggested: {
$sum: {
$cond: [{
$ifNull: ["$status", true]
},
1, 0
]}
},
ordered: {
$sum: {
$cond: [{
$eq: ["$status", "ordered"]
},
1, 0
]
}
},
arrived: {
$sum: {
$cond: [{
$eq: ["$status", "arrived"]
},
1, 0
]
}
}
}
}
But for some reason it doesn't find any request status ordered or arrived. If in the database I have 48 requests, 45 of them without status, 2 with ordered and 1 with arrived, it returns:
[
{
_id: "xxx",
suggested: 48,
ordered: 0,
arrived: 0,
},
...
]
Try this approach,
Return the total number of requests by type based on their status
Now the simplest way to get the count of different status is to use aggregate pipeline with $group on the status field
db.stackoverflow.aggregate([{ $group: {_id: "$status", count: {$sum:1}} }])
We will be getting a result similar to this
{ "_id" : "", "count" : 2 }
{ "_id" : "arrived", "count" : 3 }
{ "_id" : "ordered", "count" : 4 }
The schema which is used to retrieve these records is very simple so that it will be easier to understand. The schema will have a parameter on the top level of the document and the value of status can be "ordered", "arrived" or empty
Schema
{ "_id" : ObjectId("5798c348d345404e7f9e0ced"), "status" : "ordered" }
The collection is populated with 9 records, with status as ordered, arrived and empty
db.stackoverflow.find()
{ "_id" : ObjectId("5798c348d345404e7f9e0ced"), "status" : "ordered" }
{ "_id" : ObjectId("5798c349d345404e7f9e0cee"), "status" : "ordered" }
{ "_id" : ObjectId("5798c34ad345404e7f9e0cef"), "status" : "ordered" }
{ "_id" : ObjectId("5798c356d345404e7f9e0cf0"), "status" : "arrived" }
{ "_id" : ObjectId("5798c357d345404e7f9e0cf1"), "status" : "arrived" }
{ "_id" : ObjectId("5798c358d345404e7f9e0cf2"), "status" : "arrived" }
{ "_id" : ObjectId("5798c35ad345404e7f9e0cf3"), "status" : "ordered" }
{ "_id" : ObjectId("5798c361d345404e7f9e0cf4"), "status" : "" }
{ "_id" : ObjectId("5798c362d345404e7f9e0cf5"), "status" : "" }
db.stackoverflow.count()
9
Hope it Helps!!
This is my query:
db.getCollection('grades').
aggregate([{ "$match" : { "class_id" : 28, "student_id" : 0 } },
{ "$unwind" : "$scores" },
{ "$match" : { "scores.type" : "homework" } },
{ "$skip" : 3 }, { "$limit" : 3 },
{ "$group" : { "_id" : { "id" : "$_id" }, "scores" : { "$push" : "$scores" } } },
{ "$project" : { "_id" : "$_id.id", "scores" : 1 } }])
scores - is a nested array of objects. Score object - {type: "someType", score: someScore}. This query returns one document.
The problem: array of scores has 6 objects and 4 of them have type homework.
The result, what I've received: http://prntscr.com/bq217r
The original document: http://prntscr.com/bq23bv
Why skip-limit performed before match operator? How can I fix it?
As per attached screenshot everything looks OK.
we have 4 elements 1 2 3 4, then we are skipping 3, so we get 4 at the end... and 53 is the value :-)
btw your skip/limit is after $match
I have following result:
"result" : [
{
"_id" : "London",
"count" : 499
},
{
"_id" : "Paris",
"count" : 135
},
{
"_id" : "Lviv",
"count" : 95
}
]
And here is query:
{"$group":{
_id: "$city",
"count" : {"$sum":1}
}
}
So, I want some how to calculate all fields not only grouped. I think it would better to show expected result:
"result" : [
{
"_id" : "London",
"count" : 499,
"total" : 729
},
{
"_id" : "Paris",
"count" : 135,
"total" : 729
},
{
"_id" : "Lviv",
"count" : 95,
"total" : 729
}
]
Expected result has "total" field which calculated as amount of "count" field (499+135+95 = 729).
EDITED: I must use only aggregation framework!
Can someone help me with this?
You have to count the total number before:
db.coll.count( ..., function( err, total ) {
and then use that result in your aggregation command:
{
"$group": {
_id: "$city",
count: { "$sum": 1 },
total: total
}
EDIT:
If you only want to use aggregation framework, try this instead of db.coll.count():
{
"$group": {
_id: 1,
count: { "$sum": 1 }
}
}
Sounds like db.collection.count() would give you your result actually. This is because you are actually just summing up ALL documents in the collection there.