mongo db how to write a function in an query maybe aggregation? - mongodb

The question is Calculate the average age of the users who have more than 3 strengths listed.
One of the data is like this :
{
"_id" : 1.0,
"user_id" : "jshaw0",
"first_name" : "Judy",
"last_name" : "Shaw",
"email" : "jshaw0#merriam-webster.com",
"age" : 39.0,
"status" : "disabled",
"join_date" : "2016-09-05",
"last_login_date" : "2016-09-30 23:59:36 -0400",
"address" : {
"city" : "Deskle",
"province" : "PEI"
},
"strengths" : [
"star schema",
"dw planning",
"sql",
"mongo queries"
],
"courses" : [
{
"code" : "CSIS2300",
"total_questions" : 118.0,
"correct_answers" : 107.0,
"incorect_answers" : 11.0
},
{
"code" : "CSIS3300",
"total_questions" : 101.0,
"correct_answers" : 34.0,
"incorect_answers" : 67.0
}
]
}
I know I need to count how many strengths this data has, and then set it to $gt, and then calculate the average age.
However, I don't know how to write 2 function which are count and average in one query. Do I need to use aggregation, if so, how?
Thanks so much

Use $redact to match your array size & $group to calculate the average :
db.collection.aggregate([{
"$redact": {
"$cond": [
{ "$gt": [{ "$size": "$strengths" }, 3] },
"$$KEEP",
"$$PRUNE"
]
}
}, {
$group: {
_id: 1,
average: { $avg: "$age" }
}
}])
The $redact part match the size of strenghs array greater than 3, it will $$KEEP record that match this condition otherwise $$PRUNE the record that don't match. Check $redact documentation
The $group just perform an average with $avg

Related

MongoDB - find document whose array length is less than or equal to 5

Can't we pass an object to $size operator in mongoose? Is there any ways to query on array for length so we can fetch document which contains an array of a particular length.
Hers is Sample Document
"_id" : ObjectId("5e8c9becd1257f66c4b8cd63"),
"index" : 0,
"name" : "Aurelia Gonzales",
"isActive" : false,
"registered" : ISODate("2015-02-11T09:52:39.000+05:30"),
"age" : 20,
"gender" : "female",
"eyeColor" : "green",
"favoriteFruit" : "banana",
"company" : {
"title" : "YURTURE",
"email" : "aureliagonzales#yurture.com",
"phone" : "+1 (940) 501-3963",
"location" : {
"country" : "USA",
"address" : "694 Hewes Street"
}
},
"tags" : [
"enim",
"id",
"velit",
"ad",
"consequat"
]
}
Here is query
db.admin.aggregate([
{
$match : {tags : {$size : {$lte : 5}}}
}
])
Here is Output
{
"message" : "$size needs a number",
"ok" : 0,
"code" : 2,
"codeName" : "BadValue",
"name" : "MongoError"
}
You can't use $size like that & needed to use aggregation $size operator to do this.
Query :
db.collection.find({
$expr: { /** Allows the use of aggregation expressions within the query language */
$lte: [
{
$size: "$tags"
},
5
]
}
})
Test : MongoDB-Playground
Although if the size of the array is important enough, it could be stored in the documents and indexed to fetch much faster results.
Following a similar logic a solution could be, two stage aggregation using $addFields and $size, $lte.
db.collection.aggregate([
{
$addFields: {
sizeOfTags: {
$size: "$tags"
}
}
},
{
$match: {
sizeOfTags: {
$lte: 5
}
}
}
])

mongodb aggregation $max and corresponding timestamp

I'm being challenged by the $group $max in an aggregation with MongoDB on Nodes Express app. Here is the a sample of the collection;
{"_id":"5b7e78cf022be03c35776bec",
"humidity":60,
"pressure":1014.18,
"temperature":26.8,
"light":2464,
"timestampiso":"2018-08-23T09:05:19.112Z",
"timestamp":1535015119112
},
{
"_id":"5b7e7892022be03c35776bea",
"humidity":60.4,
"pressure":1014.14,
"temperature":26.7,
"light":2422,
"timestampiso":"2018-08-23T09:04:18.115Z",
"timestamp":1535015058115
},
{
"_id":"5b7e7855022be03c35776be8",
"humidity":60.6,
"pressure":1014.2,
"temperature":26.6,
"light":2409,
"timestampiso":"2018-08-23T09:03:17.113Z",
"timestamp":1535014997113
}]
What I'm trying to do is to query the collection, by first retrieving the entries of the last hour based on the timestamp and then looking for highest pressure of the sample (should be 60 entries as there is one entry per minute)
What I can de is find this value. What I'm stuggling on to have the timestamp related to that max value.
When I run
db.collection("ArduinoSensorMkr1000").
aggregate([{ "$match" : {"timestamp" : {"$gte" : (Date.now()-60*60*1000)}}},
{ "$group" : {"_id" : null, maxpressure : {"$max" : "$pressure"}
}
},
{
"$project" : { "_id" : 0 }
}
])
Fine, the output is correct and I get the maxpressure as so
[{"maxpressure":1014.87}]
but what I'm trying to output is the maxpressure field but with it, its corresponding timestamp. The output should look as so
[{"maxpressure":1014.87,"timestamp":1535015058115}]
Any hints on how I get this timestamp value to show?
Thank you for your support
You can try this first need to sort your data using $sort and you can pick max value by using $first
QUERY
db.col.aggregate([
{ "$match": { "timestamp": { "$gte": (Date.now() - 60 * 60 * 1000) } } },
{ "$sort": { "pressure": -1 } },
{
"$group": {
"_id": null, "maxpressure": { "$first": "$pressure" },
"timestamp": { "$first": "$timestamp" }
}
},
{
"$project": { "_id": 0 }
}
])
DATA
[{
"_id" : "5b7e78cf022be03c35776bec",
"humidity" : 60.0,
"pressure" : 1014.18,
"temperature" : 26.8,
"light" : 2464.0,
"timestampiso" : "2018-08-23T09:05:19.112Z",
"timestamp" : 1535015119112.0
},
{
"_id" : "5b7e7892022be03c35776bea",
"humidity" : 60.4,
"pressure" : 1014.14,
"temperature" : 26.7,
"light" : 2422.0,
"timestampiso" : "2018-08-23T09:04:18.115Z",
"timestamp" : 1535015058115.0
},
{
"_id" : "5b7e7855022be03c35776be8",
"humidity" : 60.6,
"pressure" : 1014.2,
"temperature" : 26.6,
"light" : 2409.0,
"timestampiso" : "2018-08-23T09:03:17.113Z",
"timestamp" : 1535014997113.0
}]
THE OUTPUT
{
"maxpressure" : 1014.87,
"timestamp" : 1535015058115.0
}
My suggestion is to use sort/limit instead of grouping. By this way you can get entire document before project only interesting fields :
db['ArduinoSensorMkr1000'].aggregate(
[{ "$match" : {"timestamp" : {"$gte" : (Date.now()-5*60*60*1000)}}},
{$sort:{pressure:-1}},
{$limit:1},{
"$project" : { "_id" : 0,"timestamp":1,"pressure":1 }}
])

How to filter and aggregate data from one collection into another format using MongoDB

I have one collection named 'ctrlcharts'.
e.g.
{
"_id" : ObjectId("57fc695492af567031246736"),
"deviceId" : "A001",
"sensorId" : "S003",
"time" : "2016/10/11 12:23:50",
"charts" : [
{
"sensor" : "ch_11",
"value" : 120
},
{
"sensor" : "ch_12",
"value" : 150
}
]
}
How to filter "sensor" : "ch_11" and aggregate data from one collection into another format using MongoDB
e.g.
{
"time" : "2016/10/11 12:23:50",
"sensor" : "ch_12",
"value" : 150
}
I tried below code
db.ctrlcharts.aggregate([
{ $match: {"deviceId" : "A001", "sensorId" : "S003", "time" : "2016/10/11 12:23:50"}},
{ $project: {
_id: 0,
time : 1 ,
sensor : "$charts.sensor"
value : "$charts.value"
}
}
])
But I got the result as
{
"time" : "2016/10/11 12:23:50",
"sensor" : ["ch_11","ch_12"],
"value" : [120,150]
}
Thanks
You tried best....just use $unwind
db.ctrlcharts.aggregate(
{$unwind:"$charts"},
{$match: {"deviceId" :"A001", "charts.sensor":"ch_12", "time" : "2016/10/11 12:23:50"}},
{$project:{_id:0,time:1, sensor : "$charts.sensor", value :"$charts.value"}}).pretty()
You can use $unwind (aggregation) to separate charts array.
db.ctrlcharts.aggregate( [ { $unwind : "$charts" } ] )
This will produce result like -
{ "_id" : ObjectId("57ff397a007c43ecacf10512"), "deviceId" : "A001", "sensorId" : "S003", "time" : "2016/10/11 12:23:50", "charts" : { "sensor" : "ch_11", "value" : 120 } }
{ "_id" : ObjectId("57ff397a007c43ecacf10512"), "deviceId" : "A001", "sensorId" : "S003", "time" : "2016/10/11 12:23:50", "charts" : { "sensor" : "ch_12", "value" : 150 } }
and then use your match query
Use the $arrayElemAt and $filter operators to query the array more efficiently without the need to $unwind. The reason why $unwind is not as efficient is that it produces a cartesian product of the documents i.e. a copy of each document per array entry, which uses more memory (possible memory cap on aggregation pipelines of 10% total memory) and therefore takes time to produce as well processing the documents during the flattening process.
The $filter will return a subset of the array that only contains the elements that match the filter condition. The $arrayElemAt operator then returns the element from the filtered array at the specified array index to give you the subdocument you need.
A further $project is necessary to flatten the fields to give you the desired result:
db.ctrlcharts.aggregate([
{ "$match": {
"deviceId": "A001",
"sensorId": "S003",
"time": "2016/10/11 12:23:50",
"charts.sensor": "ch_11"
} },
{
"$project": {
"time": 1,
"chart": {
"$arrayElemAt": [
"$filter": {
"input": "$charts",
"as": "item",
"cond": { "$eq": ["$$item.sensor", "ch_11"] }
}, 0
]
}
}
},
{
"$project": {
"_id": 0,
"time": 1,
"sensor": "$chart.sensor"
"value": "$chart.value"
}
}
])

mongodb check if all subdocuments in array have the same value in one field

I have a collection of documents, each has a field which is an array of subdocuments, and all subdocuments have a common field 'status'. I want to find all documents that have the same status for all subdocuments.
collection:
{
"name" : "John",
"wives" : [
{
"name" : "Mary",
"status" : "dead"
},
{
"name" : "Anne",
"status" : "alive"
}
]
},
{
"name" : "Bill",
"wives" : [
{
"name" : "Mary",
"status" : "dead"
},
{
"name" : "Anne",
"status" : "dead"
}
]
},
{
"name" : "Mohammed",
"wives" : [
{
"name" : "Jane",
"status" : "dead"
},
{
"name" : "Sarah",
"status" : "dying"
}
]
}
I want to check if all wives are dead and find only Bill.
You can use the following aggregation query to get records of person whose wives are all dead:
db.collection.aggregate(
{$project: {name:1, wives:1, size:{$size:'$wives'}}},
{$unwind:'$wives'},
{$match:{'wives.status':'dead'}},
{$group:{_id:'$_id',name:{$first:'$name'}, wives:{$push: '$wives'},size:{$first:'$size'},count:{$sum:1}}},
{$project:{_id:1, wives:1, name:1, cmp_value:{$cmp:['$size','$count']}}},
{$match:{cmp_value:0}}
)
Output:
{ "_id" : ObjectId("56d401de8b953f35aa92bfb8"), "name" : "Bill", "wives" : [ { "name" : "Mary", "status" : "dead" }, { "name" : "Anne", "status" : "dead" } ], "cmp_value" : 0 }
If you need to find records of users who has same status, then you may remove the initial match stage.
The most efficient way to handle this is always going to be to "match" on the status of "dead" as the opening query, otherwise you are processing items that cannot possibly match, and the logic really quite simply followed with $map and $allElementsTrue:
db.collection.aggregate([
{ "$match": { "wives.status": "dead" } },
{ "$redact": {
"$cond": {
"if": {
"$allElementsTrue": {
"$map": {
"input": "$wives",
"as": "wife",
"in": { "$eq": [ "$$wife.status", "dead" ] }
}
}
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
Or the same thing with $where:
db.collection.find({
"wives.status": "dead",
"$where": function() {
return this.wives.length
== this.wives.filter(function(el) {
el.status == "dead";
}).length;
}
})
Both essentially test the "status" value of all elements to make sure they match in the fastest possible way. But the aggregate pipeline with just $match and $redact should be faster. And "less" pipeline stages ( essentially each a pass through the data ) means faster as well.
Of course keeping a property on the document is always fastest, but it would involve logic to set that only where "all elements" are the same property. Which of course would typically mean inspecting the document by loading it from the server prior to each update.

Match on key from two queries in a single query

I have time series data in mongodb as follows:
{
"_id" : ObjectId("558912b845cea070a982d894"),
"code" : "ZL0KOP",
"time" : NumberLong("1420128024000"),
"direction" : "10",
"siteId" : "0000"
}
{
"_id" : ObjectId("558912b845cea070a982d895"),
"code" : "AQ0ZSQ",
"time" : NumberLong("1420128025000"),
"direction" : "10",
"siteId" : "0000"
}
{
"_id" : ObjectId("558912b845cea070a982d896"),
"code" : "AQ0ZSQ",
"time" : NumberLong("1420128003000"),
"direction" : "10",
"siteId" : "0000"
}
{
"_id" : ObjectId("558912b845cea070a982d897"),
"code" : "ZL0KOP",
"time" : NumberLong("1420041724000"),
"direction" : "10",
"siteId" : "0000"
}
{
"_id" : ObjectId("558912b845cea070a982d89e"),
"code" : "YBUHCW",
"time" : NumberLong("1420041732000"),
"direction" : "10",
"siteId" : "0002"
}
{
"_id" : ObjectId("558912b845cea070a982d8a1"),
"code" : "U48AIW",
"time" : NumberLong("1420041729000"),
"direction" : "10",
"siteId" : "0002"
}
{
"_id" : ObjectId("558912b845cea070a982d8a0"),
"code" : "OJ3A06",
"time" : NumberLong("1420300927000"),
"direction" : "10",
"siteId" : "0000"
}
{
"_id" : ObjectId("558912b845cea070a982d89d"),
"code" : "AQ0ZSQ",
"time" : NumberLong("1420300885000"),
"direction" : "10",
"siteId" : "0003"
}
{
"_id" : ObjectId("558912b845cea070a982d8a2"),
"code" : "ZLV05H",
"time" : NumberLong("1420300922000"),
"direction" : "10",
"siteId" : "0001"
}
{
"_id" : ObjectId("558912b845cea070a982d8a3"),
"code" : "AQ0ZSQ",
"time" : NumberLong("1420300928000"),
"direction" : "10",
"siteId" : "0000"
}
The codes that match two or more conditions need to be filtered out.
For example:
condition1: 1420128000000 < time < 1420128030000,siteId == 0000
condition2: 1420300880000 < time < 1420300890000,siteId == 0003
results for the first condition:
{
"_id" : ObjectId("558912b845cea070a982d894"),
"code" : "ZL0KOP",
"time" : NumberLong("1420128024000"),
"direction" : "10",
"siteId" : "0000"
}
{
"_id" : ObjectId("558912b845cea070a982d895"),
"code" : "AQ0ZSQ",
"time" : NumberLong("1420128025000"),
"direction" : "10",
"siteId" : "0000"
}
{
"_id" : ObjectId("558912b845cea070a982d896"),
"code" : "AQ0ZSQ",
"time" : NumberLong("1420128003000"),
"direction" : "10",
"siteId" : "0000"
}
results for the second condition:
{
"_id" : ObjectId("558912b845cea070a982d89d"),
"code" : "AQ0ZSQ", "time" : NumberLong("1420300885000"),
"direction" : "10",
"siteId" : "0003"
}
The only code that matchs all the conditions above should be:
{"code" : "AQ0ZSQ", "count":2}
"count" means, the code "AQ0ZSQ" appeared in both conditions
The only solution I can think of is using two querys. For example, using python
result1 = list(db.codes.objects({'time': {'$gt': 1420128000000,'$lt': 1420128030000}, 'siteId': "0000"}).only("code"))
result2 = list(db.codes.objects({'time': {'$gt': 1420300880000,'$lt': 1420300890000}},{'siteId':'0003'}).only("code"))
and then found the shared code in both results.
The Problem is that there are millions of documents in the collection, and both query can easily exceed the 16mb limitation.
So is it possible to do that in one query? or should I change the document structure?
What you are asking for here requires the usage of the aggregation framework in order to calculate that there was an intersection between results on the server.
The first part of the logic is you need an $or query for the two conditions, then there will be some additional projection and filtering on those results:
db.collection.aggregate([
// Fetch all possible documents for consideration
{ "$match": {
"$or": [
{
"time": { "$gt": 1420128000000, "$lt": 1420128030000 },
"siteId": "0000"
},
{
"time": { "$gt": 1420300880000, "$lt": 1420300890000 },
"siteId": "0003"
}
]
}},
// Locigically compare the conditions agaist results and add a score
{ "$project": {
"code": "$code",
"score": { "$add": [
{ "$cond": [
{ "$and":[
{ "$gt": [ "$time", 1420128000000 ] },
{ "$lt": [ "$time", 1420128030000 ] },
{ "$eq": [ "$siteId", "0000" ] }
]},
1,
0
]},
{ "$cond": [
{ "$and":[
{ "$gt": [ "$time", 1420300880000 ] },
{ "$lt": [ "$time", 1420300890000 ] },
{ "$eq": [ "$siteId", "0003" ] }
]},
1,
0
]}
]}
}},
// Now Group the results by "code"
{ "$group": {
"_id": "$code",
"score": { "$sum": "$score" }
}},
// Now filter to keep only results with score 2
{ "$match": { "score": 2 } }
])
So break that down and see how it works.
First you want a query with $match to get all the possible documents for "all" of your conditions of "intersection". That is what the $or expression allows here by considering that matched documents must meet either set. You need all of them to work out the "intersection" here.
In the second $project pipeline stage a boolean test of your conditions is performed with each set. Notice the usage of $and here as well as other boolean operators of the aggregation framework is slightly different to that of the query usage form.
In the aggregation framework form ( outside of $match which uses normal query operators ) these operators take an array of arguments, to typically represent "two" values for comparison rather than the operation being assigned to the "right" of the field name.
Since these conditions are logical or "boolean" we want to return the result as "numeric" rather than a true/false value. This is what $cond does here. So where the condition is true for the document inspected a score of 1 is emitted otherwise it is 0 when false.
Finally in this $project expression both of your conditions are wrapped with $add to form the "score" result. So if none of the conditions ( not possible after the $match ) were not true the score would be 0, if "one" is true then 1, or where "both" are true then 2.
Noting here that the specific conditions asked for here will never score above 1 for a single document since no document can have the overlapping range or "two" "siteId" values as is present here.
Now the important part is to $group by the "code" value and $sum the score value to get a total per "code".
This leaves the final $match filter stage of the pipeline to only keep those documents with a "score" value that is equal to the number of conditions you asked for. In this case 2.
There is a failing there however in that where there is more than one value of "code" in the matches for either condition ( as there is ) then the "score" here would be incorrect.
So after the introduction to the principles of using logical operators in aggregation, you can fix that fault by essentially "tagging" each result logically as to which condition "set" it applies to. Then you can basically consider which "code" appeared in "both" sets in this case:
db.collection.aggregate([
{ "$match": {
"$or": [
{
"time": { "$gt": 1420128000000, "$lt": 1420128030000 },
"siteId": "0000"
},
{
"time": { "$gt": 1420300880000, "$lt": 1420300890000 },
"siteId": "0003"
}
]
}},
// If it's the first logical condition it's "A" otherwise it can
// only be the other, therefore "B". Extend for more sets as needed.
{ "$group": {
"_id": {
"code": "$code",
"type": { "$cond": [
{ "$and":[
{ "$gt": [ "$time", 1420128000000 ] },
{ "$lt": [ "$time", 1420128030000 ] },
{ "$eq": [ "$siteId", "0000" ] }
]},
"A",
"B"
]}
}
}},
// Simply add up the results for each "type"
{ "$group": {
"_id": "$_id.code",
"score": { "$sum": 1 }
}}
// Now filter to keep only results with score 2
{ "$match": { "score": 2 } }
])
It might be a bit to take in if this is your first time using the aggregation framework. Please take the time to look at the operators used as defined with the links here and also look at Aggregation Pipeline Operators in general.
Beyond simple data selection, this is the tool you should be reaching to most often when using MongoDB, so you would do well to understand all the operations that are possible.