How do I calculate the average of multiple fields in a mongoDB document? - mongodb

I have a collection with the following fields:
_id:612ff22c17286411a17252cf
WELL:"199-H4-15A"
TYPE:"E"
SYSTEM:"HX"
ID:"199-H4-15A_E_HX"
Jan-14:-168.8
Feb-14:-151
Mar-14:-164.1
Apr-14:-168.7
May-14:-172.6
Jun-14:-177.3
Jul-14:-177.6
Aug-14:-171.9
Sep-14:-138.9
Oct-14:-130.3
Nov-14:-163.8
Dec-14:-161.4
Jan-15:-168.7
Feb-15:-168.9
Mar-15:-168.6
Apr-15:-164.6
May-15:-141.7
Jun-15:-153.5
Jul-15:-163.7
Aug-15:-167.7
and I am trying to take the average of all of the month fields (e.g. all files like "Jan-14", "Feb-14" and so on). I was thinking of somehow pushing all of the month fields data into an array and then average the values but would like to avoid having to list all of the individual field names. Below is what I have so far.
[{$match: {
'WELL': '199-H4-15A'
}}, {$group: {
_id: null,
MonthAverageFlows: {
$push: {
$isNumber: ['$all']
}
}
}}, {$unwind: '$MonthAverageFlows'}, {$group: {
_id: null,
average: {
$avg: '$MonthAverageFlows.value'
}
}}]
All that comes out is ```null``. Any and all help would be appreciated. The raw data is in csv form:
WELL, TYPE, SYSTEM, ID, JAN-14, FEB-14, . . .
"199-H4-15A", "E", "HX", "199-H4-15A_E_HX", -168.8, -151, . . .

Using dynamic values as field name is generally considered as anti-pattern and you should avoid that. You are likely to introduce unnecessary difficulty to composing and maintaining your queries.
Nevertheless, you can do the followings in an aggregation pipeline:
$objectToArray to convert your raw document into k-v tuples array
$filter the array to get contain the monthly data only
$avg to calculate the sum
Here is the Mongo playground for your reference.

Related

How to retrieve the total cases in my data set on a MongoDB query?

I want the total number of cases in all my documents,
This is the query I tried to use:
db.coviddatajson.aggregate([
{ $group: { _id: null, total: { $sum: "$total_cases"} } }
])
For some reason the result is 0 which does not make sense, as it's supposed to be 1000+ at least and the expected result anything that is not zero will make sense but it's supposed to be a few thousands or something like that.
This is the dataset I am using:
https://covid.ourworldindata.org/data/owid-covid-data.json
What am I doing wrong here?
Any ideas on how to fix this query?
The total_cases field is inside data array, and $sum requires field type as number in $group stage, so before we need to do total($sum) of data.total_cases in current document and then pass it to $group stage and count total sum,
db.coviddatajson.aggregate([
{
$project: { total_cases: { $sum: "$data.total_cases" } }
},
{
$group: {
_id: null,
total: { $sum: "$total_cases" }
}
}
])
Playground
The data set has some issues.
The document size is bigger than 16MiB, you cannot load documents >16MiB into MongoDB. This in an internal limitation. You would need to split the document into sub-documents.
The document contains data for each country but also summarized data for "World". Do you have to exclude the "World" data? Can you use it, instead of manual summary?
The data is not consistent. For example some countries do not provide a number of male/female smokers or median age. Not all countries provide all data for each date, you may have missing values. How to deal with them?
Do you like a simple sum of all total_cases? If yes, the query would be easy, however the result would be pointless (15'773'189'214 total cases, twice population of the world).

MongoDB: concatinate multiple number values to string

I have a document (inside aggregation, after $group stage) which have an object (but I could form array, if I needed it to) with number values.
MongoPlayground example with data and my aggregate query available here.
And I want to make a new _id field during next $project stage, consisted of this three number values, like:
item_id | unix time | pointer
_id: 453435-41464556645#1829
The problem is, that when I am trying to use $concat, the query returns me an error like:
$concat only supports strings, not int
So here is my question: is it possible to achieve such results? I have seen the relevant question MongoDB concatenate strings from two fields into a third field, but it didn't cover my case.
The $concat only concatenate strings, these fields $_id.item_id contains int value and $_id.last_modified double value,
The $toString converts a value to a string,
_id: {
$concat: [
{
$toString: "$_id.item_id"
},
" - ",
{
$toString: "$_id.last_modified"
}
]
}
Playground: https://mongoplayground.net/p/SSlXW4gIs_X

Meteor collection get last document of each selection

Currently I use the following find query to get the latest document of a certain ID
Conditions.find({
caveId: caveId
},
{
sort: {diveDate:-1},
limit: 1,
fields: {caveId: 1, "visibility.visibility":1, diveDate: 1}
});
How can I use the same using multiple ids with $in for example
I tried it with the following query. The problem is that it will limit the documents to 1 for all the found caveIds. But it should set the limit for each different caveId.
Conditions.find({
caveId: {$in: caveIds}
},
{
sort: {diveDate:-1},
limit: 1,
fields: {caveId: 1, "visibility.visibility":1, diveDate: 1}
});
One solution I came up with is using the aggregate functionality.
var conditionIds = Conditions.aggregate(
[
{"$match": { caveId: {"$in": caveIds}}},
{
$group:
{
_id: "$caveId",
conditionId: {$last: "$_id"},
diveDate: { $last: "$diveDate" }
}
}
]
).map(function(child) { return child.conditionId});
var conditions = Conditions.find({
_id: {$in: conditionIds}
},
{
fields: {caveId: 1, "visibility.visibility":1, diveDate: 1}
});
You don't want to use $in here as noted. You could solve this problem by looping through the caveIds and running the query on each caveId individually.
you're basically looking at a join query here: you need all caveIds and then lookup last for each.
This is a problem of database schema/denormalization in my opinion: (but this is only an opinion!):
You could as mentioned here, lookup all caveIds and then run the single query for each, every single time you need to look up last dives.
However I think you are much better off recording/updating the last dive inside your cave document, and then lookup all caveIds of interest pulling only the lastDive field.
That will give you immediately what you need, rather than going through expensive search/sort queries. This is at the expense of maintaining that field in the document, but it sounds like it should be fairly trivial as you only need to update the one field when a new event occurs.

Result from "aggregate with unwind" is different from the "find with count"?

Here is a few documents from my collections:
{"make":"Lenovo", "model":"Thinkpad T430"},
{"make":"Lenovo", "model":"Thinkpad T430", "problems":["Battery"]},
{"make":"Lenovo", "model":"Thinkpad T430", "problems":["Battery","Brakes"]}
As you can see some documents have no problems, some have only one problem and some have few problems in a list.
I want to calculate how many reviews have a specific "problem" (like "Battery") in problems list.
I have tried to use the following aggregate command:
{ $match : { model : "Thinkpad T430"} },
{ $unwind : "$problems" },
{ $group: {
_id: '$problems',
count: { $sum: 1 }
}}
And for battery problem the count was 382. I also decided to double check this result with find() and count():
db.reviews.find({model:"Thinkpad T430",problems:"Battery"}).count()
Result was 362.
Why do I have this difference? And what is the right way to calculate it?
You likely have documents in the collection where problems contains more than one "Battery" string in the array.
When using $unwind, these will each result in their own doc, so the subsequent $group operation will count them separately.

sorting documents in mongodb

Let's say I have four documents in my collection:
{u'a': {u'time': 3}}
{u'a': {u'time': 5}}
{u'b': {u'time': 4}}
{u'b': {u'time': 2}}
Is it possible to sort them by the field 'time' which is common in both 'a' and 'b' documents?
Thank you
No, you should put your data into a common format so you can sort it on a common field. It can still be nested if you want but it would need to have the same path.
You can use use aggregation and the following code has been tested.
db.test.aggregate({
$project: {
time: {
"$cond": [{
"$gt": ["$a.time", null]
}, "$a.time", "$b.time"]
}
}
}, {
$sort: {
time: -1
}
});
Or if you also want the original fields returned back: gist
Alternatively you can sort once you get the result back, using a customized compare function ( not tested,for illustration purpose only)
db.eval(function() {
return db.mycollection.find().toArray().sort( function(doc1, doc2) {
var time1 = doc1.a? doc1.a.time:doc1.b.time,
time2 = doc2.a?doc2.a.time:doc2.b.time;
return time1 -time2;
})
});
You can, using the aggregation framework.
The trick here is to $project a common field to all the documents so that the $sort stage can use the value in that field to sort the documents.
The $ifNull operator can be used to check if a.time exists, it
does, then the record will be sorted by that value else, by b.time.
code:
db.t.aggregate([
{$project:{"a":1,"b":1,
"sortBy":{$ifNull:["$a.time","$b.time"]}}},
{$sort:{"sortBy":-1}},
{$project:{"a":1,"b":1}}
])
consequences of this approach:
The aggregation pipeline won't be covered by any of the index you
create.
The performance will be very poor for very large data sets.
What you could ideally do is to ask the source system that is sending you the data to standardize its format, something like:
{"a":1,"time":5}
{"b":1,"time":4}
That way your query can make use of the index if you create one on the time field.
db.t.ensureIndex({"time":-1});
code:
db.t.find({}).sort({"time":-1});