How to trim MongoDB aggregate result's one field - mongodb

This is my data structure store in DB
{
"_id" : ObjectId("58a4e451c164f95c98e96235"),
"_class" : "com.contix.log.parser.log.Log",
"vin" : "6",
"esn" : "c",
"volt" : 11.32,
"internVolt" : 4.14,
"temp" : 39.49,
"timestamp" : NumberLong("1483375743285")
}
What I want to do is get latest 10 unique volt, internVolt, temp based on vin and esn String. Also needs latest timestamp. Then I'm trying to use mongo aggregate way to get right result.
db.log.aggregate({$sort:{timestamp:-1}},{$group:{_id : {esn:"$esn",vin:"$vin"},firstTimestamp:{$first:"$timestamp"},volts:{$addToSet:"$volt"}}},{$limit:5})
But this is my result looks like
{ "_id" : { "esn" : "b", "vin" : "2" }, "firstTimestamp" :
NumberLong("1485852368147"), "volts" : [ 11.95, 10.08, 10.77, 10.47,
11.41, 10.36, 10.96, 10.75, 10.39, 10.53, 10.1, 10.22, 11.16, 10.11, 11.87, 11.33, 11.82, 11.78, 10.25, 11.86, 10.5, 10.41, 11.3, 11.31, 11.97, 10.64, 11.57, 10.93, 10.02, 10.68, 10.9, 11.53, 10.46, 11.42, 11.73, 11.32, 10.19, 10.51, 11.35, 11.28, 10.65, 10.21, 11.18, 10.91, 11.43, 10.52, 11.34, 11.1, 10.99, 10.61, 10.28, 10.97, 10.3, 10.31, 11.81, 11.8, 10.42, 11.51, 10.72, 11.39, 10.69, 11.27, 11.11, 10.15, 10.78, 10.58, 11.49, 10.94, 11.64, 10.32, 11.63, 10.03, 10.81, 11.83, 10.82, 11.84, 10.79, 10.66, 11.21, 10.24, 11.75, 11.2 ] }
And other 4 similar stuff.
I don't know if there is any way that can trim $volts this kind data int group pipeline. The $limit or $skip operation seems use for whole documents.
My dream result should look like below.
{ "_id" : { "esn" : "b", "vin" : "2" }, "firstTimestamp" :
NumberLong("1485852368147"), "volts" : [ 10.81, 11.83, 10.82, 11.84, 10.79, 10.66, 11.21, 10.24, 11.75, 11.2 ], "innerVolts":[...], "temp":[...] }

If you need to trim the results, you can use projection and do something like this:
db.log.aggregate([
{$sort:{timestamp:-1}},
{$group:{
_id : {esn:"$esn",vin:"$vin"},
firstTimestamp:{$first:"$timestamp"},
volts:{$addToSet:"$volt"},
innerVolts:{$addToSet:"$innerVolt"},
temp:{$addToSet:"$temp"}
}},
{ $project: {
_id:1,
firstTimestamp:1,
volts: {$slice : ["$volts",10]},
innerVolts: {$slice : ["$innerVolts",10]},
temp: {$slice:["$temp",10]}
}}])
Hope my answer was helpful.

You can use the $slice modifier to trim the array to the lastest N items.

Related

MongoDB 4.0 aggregation addFields not saving documents after using toDate

I have the following documents,
{
"_id" : ObjectId("5b85312981c1634f59751604"),
"date" : "0"
},
{
"_id" : ObjectId("5b85312981c1634f59751604"),
"date" : "20180330"
},
{
"_id" : ObjectId("5b85312981c1634f59751604"),
"date" : "20180402"
},
{
"_id" : ObjectId("5b85312981c1634f59751604"),
"date" : "20180323"
},
I tried to convert date to ISODate using $toDate in aggregation,
db.documents.aggregate( [ { "$addFields": { "received_date": { "$cond": [ {"$ne": ["$date", "0"] }, {"$toDate": "$date"}, new Date("1970-01-01") ] } } } ] )
the query executed fine, but when I
db.documents.find({})
to examine all the documents, nothing changed, I am wondering how to fix it. I am using MongoDB 4.0.6 on Linux Mint 19.1 X64.
As they mentioned in the comments, aggregate doesn't update documents in the database directly (just an output of them).
If you'd like to permanently add a new field to documents via aggregation (aka update the documents in the database), use the following .forEach/.updateOne method:
Your example:
db.documents
.aggregate([{"$addFields":{"received_date":{"$cond":[{"$ne":["$date","0"]}, {"$toDate": "$date"}, new Date("1970-01-01")]}}}])
.forEach(function (x){db.documents.updateOne({_id: x._id}, {$set: {"received_date": x.received_date}})})
Since _id's value is an ObjectID(), there may be a slight modification you need to do to {_id:x._id}. If there is, let me know and I'll update it!
Another example:
db.users.find().pretty()
{ "_id" : ObjectId("5acb81b53306361018814849"), "name" : "A", "age" : 1 }
{ "_id" : ObjectId("5acb81b5330636101881484a"), "name" : "B", "age" : 2 }
{ "_id" : ObjectId("5acb81b5330636101881484b"), "name" : "C", "age" : 3 }
db.users
.aggregate([{$addFields:{totalAge:{$sum:"$age"}}}])
.forEach(function (x){db.users.updateOne({name: x.name}, {$set: {totalAge: x.totalAge}})})
Being able to update collections via the aggregation pipeline seems to be quite valuable because of what you have the power to do with aggregation (e.g. what you did in your question, doing calculations based on other fields within the document, etc.). I'm newer to MongoDB so maybe updating collections via aggregation pipeline is "bad practice", but it works and it's been quite valuable for me. I wonder why it isn't more straight-forward to do?
Note: I came up with this method after discovering Nazo's now-deprecated .save() method. Shoutout to Nazo!

Updating nested List in mongoDB Query working sometimes but with large data set it fails [duplicate]

This question already has answers here:
Updating a Nested Array with MongoDB
(2 answers)
Closed 5 years ago.
Following is a MongoDB document:
{
"_id" : 2,
"mem_id" : M002,
"email" : "xyz#gmail.com",
"event_type" : [
{
"name" : "MT",
"count" : 1,
"language" : [
{
"name" : "English",
"count" : 1,
"genre" : [
{
"name" : "Action",
"count" : 6
},
{
"name" : "Sci-Fi",
"count" : 3
}
],
"cast" : [
{
"name" : "Sam Wortington",
"count" : 2
},
{
"name" : "Bruce Willis",
"count" : 4
},
{
"name" : "Will Smith",
"count" : 7
},
{
"name" : "Irfan Khan",
"count" : 1
}
]
}
]
}
]
}
I'm not able to update fields that is of type array, specially event_type, language, genre and cast because of nesting. Basically, I wanted to update all the four mentioned fields along with count field for each and subdocuments. The update statement should insert a value to the tree if the value is new else should increment the count for that value.
What can be the query in mongo shell?
Thanks
You are directly hitting one of the current limitations of MongoDB.
The problem is that the engine does not support several positional operators.
See this Multiple use of the positional `$` operator to update nested arrays
There is an open ticket for this: https://jira.mongodb.org/browse/SERVER-831 (mentioned also there)
You can also read this one on how to change your data model: Updating nested arrays in mongodb
If it is feasible for you, you can do:
db.collection.update({_id:2,"event_type.name":'MT' ,"event_type.language.name":'English'},{$set:{"event_type.0.language.$.count":<number>}})
db.collection.update({_id:2,"event_type.name":'MT' ,"event_type.language.name":'English'},{$set:{"event_type.$.language.0.count":<number>}})
But you cannot do:
db.collection.update({_id:2,"event_type.name":'MT' ,"event_type.language.name":'English'},{$set:{"event_type.$.language.$.count":<number>}})
Let's take case by case:
To update the field name in event_type array:
db.testnested.update({"event_type.name" : "MT"}, {$set : {"event_type.name" : "GMT"}})
This command will update the name for an object inside the event_type list, to GMT from MT:
BEFORE:
db.testnested.find({}, {"event_type.name" : 1})
{ "_id" : 2, "event_type" : [ { "name" : "MT" } ] }
AFTER:
db.testnested.find({}, {"event_type.name" : 1})
{ "_id" : 2, "event_type" : [ { "name" : "GMT" } ] }
2.To update fields inside event_type, such as language, genre that are intern list:
There is no direct query for this. You need to read the document, update that document using the JavaScript or language of your choice, and then save() the same. I dont think there is any other way available till mongo 2.4
For further documentation, you can refer to save().
Thanks!

Listing, counting factors of unique Mongo DB values over all keys

I'm preparing a descriptive "schema" (quelle horreur) for a MongoDB I've been working with.
I used the excellent variety.js to create a list of all keys and show coverage of each key. However, in cases where the values corresponding to the keys have a small set of values, I'd like to be able to list the entire set as "available values." In R, I'd be thinking of these as the "factors" for the categorical variable, ie, gender : ["M", "F"].
I know I could just use R + RMongo, query each variable, and basically do the same procedure I would to create a histogram, but I'd like to know the proper Mongo.query()/javascript/Map,Reduce way to approach this. I understand the db.collection.aggregate() functions are designed for exactly this.
Before asking this, I referenced:
http://docs.mongodb.org/manual/reference/aggregation/
http://docs.mongodb.org/manual/reference/method/db.collection.distinct/
How to query for distinct results in mongodb with python?
Get a list of all unique tags in mongodb
http://cookbook.mongodb.org/patterns/count_tags/
But can't quite get the pipeline order right. So, for example, if I have documents like these:
{_id : 1, "key1" : "value1", "key2": "value3"}
{_id : 2, "key1" : "value2", "key2": "value3"}
I'd like to return something like:
{"key1" : ["value1", "value2"]}
{"key2" : ["value3"]}
Or better, with counts:
{"key1" : ["value1" : 1, "value2" : 1]}
{"key2" : ["value3" : 2]}
I recognize one problem with doing this will be any values that have a wide range of different values---so, text fields, or continuous variables. Ideally, if there were more than x different possible values, it would be nice to truncate, say to no more than 20 unique values. If I find it's actually more, I'd query that variable directly.
Is this something like:
db.collection.aggregate(
{$limit: 20,
$group: {
_id: "$??varname",
count: {$sum: 1}
}})
First, how can I reference ??varname? for the name of each key?
I saw this link which had 95% of it:
Binning and tabulate (unique/count) in Mongo
with...
input data:
{ "_id" : 1, "age" : 22.34, "gender" : "f" }
{ "_id" : 2, "age" : 23.9, "gender" : "f" }
{ "_id" : 3, "age" : 27.4, "gender" : "f" }
{ "_id" : 4, "age" : 26.9, "gender" : "m" }
{ "_id" : 5, "age" : 26, "gender" : "m" }
This script:
db.collection.aggregate(
{$project: {gender:1}},
{$group: {
_id: "$gender",
count: {$sum: 1}
}})
Produces:
{"result" :
[
{"_id" : "m", "count" : 2},
{"_id" : "f", "count" : 3}
],
"ok" : 1
}
But what I don't understand is how could I do this generically for an unknown number/name of keys with a potentially large number of return values? This sample knows the key name is gender, and that the response set will be small (2 values).
If you already ran a script that outputs the names of all keys in the collection, you can generate your aggregation framework pipeline dynamically. What that means is either extending the variety.js type script or just writing your own.
Here is what it might look like in JS if passed an array called "keys" which has several non-"_id" named fields (I'm assuming top level fields and that you don't care about arrays, embedded documents, etc).
keys = ["key1", "key2"];
group = { "$group" : { "_id" : null } } ;
keys.forEach( function(f) {
group["$group"][f+"List"] = { "$addToSet" : "$" + f }; } );
db.collection.aggregate(group);
{
"result" : [
{
"_id" : null,
"key1List" : [
"value2",
"value1"
],
"key2List" : [
"value3"
]
}
],
"ok" : 1
}

mongodb change $group output format

I have the following document structure
{
"timestamp" : 13512493603565120,<br>
"value" : 1,<br>
"y" : 42,<br>
"M" : 513,<br>
"w" : 2234,<br>
"d" : 15639,<br>
"S" : 46918,<br>
"h" : 375347,<br>
"m" : 22520822,<br>
"s" : 1351249360,<br>
"_id" : ObjectId("508aa61100b5457c04000001"),<br>
"__v" : 0<br>
}
I have a mongodb aggregate as follows to sum up values grouping by field y:
aggregate({
$group : {_id : "$y", value:{$sum:4}}
})
This will give me
[
{
"_id": 42,
"value": 16
}
]
What I want now is to format this output so that it looks like this:
[
[13512493603565100, 2],
[13512493605167900, 1]
]
ie:
[
[<timestamp>,<sum of value grouped by field y>],
[<timestamp>,<sum of value grouped by field y>]
]
I looked at $project but I still cant figure out how I can use it to get the desired output
Not possible as of now apparently (using mapreduce or aggregate).
Hope this will be added soon.

Map reduce in mongodb

I have mongo documents in this format.
{"_id" : 1,"Summary" : {...},"Examples" : [{"_id" : 353,"CategoryId" : 4},{"_id" : 239,"CategoryId" : 28}, ... ]}
{"_id" : 2,"Summary" : {...},"Examples" : [{"_id" : 312,"CategoryId" : 2},{"_id" : 121,"CategoryId" : 12}, ... ]}
How can I map/reduce them to get a hash like:
{ [ result[categoryId] : count_of_examples , .....] }
I.e. count of examples of each category.
I have 30 categories at all, all specified in Categories collection.
If you can use 2.1 (dev version of upcoming release 2.2) then you can use Aggregation Framework and it would look something like this:
db.collection.aggregate( [
{$project:{"CatId":"$Examples.CategoryId","_id":0}},
{$unwind:"$CatId"},
{$group:{_id:"$CatId","num":{$sum:1} } },
{$project:{CategoryId:"$_id",NumberOfExamples:"$num",_id:0 }}
] );
The first step projects the subfield of Examples (CategoryId) into a top level field of a document (not necessary but helps with readability), then we unwind the array of examples which creates a separate document for each array value of CatId, we do a "group by" and count them (I assume each instance of CategoryId is one example, right?) and last we use projection again to relabel the fields and make the result look like this:
"result" : [
{
"CategoryId" : 12,
"NumberOfExamples" : 1
},
{
"CategoryId" : 2,
"NumberOfExamples" : 1
},
{
"CategoryId" : 28,
"NumberOfExamples" : 1
},
{
"CategoryId" : 4,
"NumberOfExamples" : 1
}
],
"ok" : 1