MongoDB GROUP BY and COUNT unknown keys - mongodb

I am trying to GROUP BY and COUNT each key in each Mongo document but the keys may differ from document to document. I know how to group and count by explicitly calling each key like this:
db.test.aggregate([{"$group" : {_id:"$vcenter", count:{$sum:1}}}])
but how do I iterate through each key of each document without having to call out keys. I'm thinking a mapreduce function?
Here's a sample document:
"key1" : "vmx",
"key2" : "type",
"key3" : "cpu-idle",
and I'm looking for how many records per key like:
"Key1" : 1564
"Key2" : 1565
"Key3" : 458

Yes I can only think at mapreduce, since in the aggregation $group the _id is mandatory. So I'd write
map
function map(){for(var prop in this){emit(prop,1)}}
reduce
function reduce(key,values){return values.length;}
run command
db.inputCollectionName.mapReduce(map,reduce,{out:"outputCollectionName"})
You should then find in your output collection something like
{ "_id" : "key1", "value" : 1564 }
{ "_id" : "Key2", "value" : 1565 }
{ "_id" : "Key3", "value" : 458 }
Is that good for you?

Related

MongoDB cannot sort the _id?

I have a simple order table in mongoDB
{
"_id" : NumberInt(2),
"bar" : "Maggie Choos Bar"
},
{
"_id" : NumberInt(3),
"bar" : "Corona Bar"
{
I want to find the BIGGEST "_id" number in the table
db.getCollection("order").find({}).sort({"$_id":-1}).limit(1);
But no matter if I sort 1 or -1 I keep getting the result with _id : 2
Any ideas?
The field name prefixed with a dollar sign, like $_id, is for references in an aggregation pipeline. For a sort document, use just the field name:
db.getCollection("order").find({}).sort({"_id":-1}).limit(1);

updating mongo documents based in map value and remove that value

am currently working in Go and have a mongo database (connected via gopkg.in/mgo.v2) so, right now I have a data structure similar to:
{
"_id" : "some_id_bson",
"field1" : "value1",
"field2" : {
{
"key1" : "v1",
"key2" : "v2",
"key3" : "v3",
"key4" : "v4"
}
}
}
So, basically what I need to do (as an example) is to update in the database all the records that contains key1 and remove that from the json, so the result would be something like:
{
"_id" : "some_id_bson",
"field1" : "value1",
"field2" : {
{
"key2" : "v2",
"key3" : "v3",
"key4" : "v4"
}
}
}
What can I use to achieve this? I have been searching and cannot find something oriented to maps (field2 is a map). Thanks in advance
It seems like you're asking how to remove a property from a nested object in a particular document, which appears as if to be answered here: How to remove property of nested object from MongoDB document?.
from the main answer there:
Use $unset as below :
db.collectionName.update({},{"$unset":{"values.727920":""}}) EDIT For
updating multiple documents use update options like :
db.collectionName.update({},{"$unset":{"values.727920":""}},{"multi":true})
Try using $exists and $unset:
query:= bson.M{"$exists":bson.M{"field2.key1":true}}
replace:=bson.M{"$unset":bson.M{"field2.key1":""}}
collection.UpdateAll(query,replace)
This should find all documents containing field2.key1, and remove that.

MongoDB Group querying for Embeded Document

I have a mongo document which has structure like
{
"_id" : "THIS_IS_A_DHP_USER_ID+2014-11-26",
"_class" : "weight",
"items" : [
{
"dateTime" : ISODate("2014-11-26T08:08:38.716Z"),
"value" : 98.5
},
{
"dateTime" : ISODate("2014-11-26T08:18:38.716Z"),
"value" : 95.5
},
{
"dateTime" : ISODate("2014-11-26T08:28:38.663Z"),
"value" : 90.5
}
],
"source" : "MANUAL",
"to" : ISODate("2014-11-26T08:08:38.716Z"),
"from" : ISODate("2014-11-26T08:08:38.716Z"),
"userId" : "THIS_IS_A_DHP_USER_ID",
"createdDate" : ISODate("2014-11-26T08:38:38.776Z")
}
{
"_id" : "THIS_IS_A_DHP_USER_ID+2014-11-25",
"_class" : "weight",
"items" : [
{
"dateTime" : ISODate("2014-11-25T08:08:38.716Z"),
"value" : 198.5
},
{
"dateTime" : ISODate("2014-11-25T08:18:38.716Z"),
"value" : 195.5
},
{
"dateTime" : ISODate("2014-11-25T08:28:38.716Z"),
"value" : 190.5
}
],
"source" : "MANUAL",
"to" : ISODate("2014-11-25T08:08:38.716Z"),
"from" : ISODate("2014-11-25T08:08:38.716Z"),
"userId" : "THIS_IS_A_DHP_USER_ID",
"createdDate" : ISODate("2014-11-26T08:38:38.893Z")
}
The query that want to fire on this document structure,
finding documents for a particular user id
unwinding the embedded array
Grouping the documents based over _id with -
summing the items.value of the embedded array
getting the minimum of the items.dateTime of the embedded array
Note. The sum and min, I want to get as a object i.e. { value : sum , dateTime : min of the items.dateTime} inside an array of items
Can this be achieved in an single aggregation call using push or some other technique.
When you group over a particular _id, and apply aggregation operators such as $min and $sum, there exists only one record per group(_id), that holds the sum and the minimum date for that group. So there is no way to obtain a different sum and a different minimum date for the same _id, which also logically makes no sense.
What you would want to do is:
db.collection.aggregate([
{$match:{"userId":"THIS_IS_A_DHP_USER_ID"}},
{$unwind:"$items"},
{$group:{"_id":"$_id",
"values":{$sum:"$items.value"},
"dateTime":{$min:"$items.dateTime"}}}
])
But in case when you do not query for a particular userId, then you would have multiple groups, each group having its own sum and min date. Then it makes sense to accumulate all these results together in an array using the $push operator.
db.collection.aggregate([
{$unwind:"$items"},
{$group:{"_id":"$_id",
"result":{$sum:"$items.value"},
"dateTime":{$min:"$items.dateTime"}}},
{$group:{"_id":null,"result":{$push:{"value":"$result",
"dateTime":"$dateTime",
"id":"$_id"}}}},
{$project:{"_id":0,"result":1}}
])
you should use following aggregation may it works
db.collectionName.aggregate(
{"$unwind":"$items"},
{"$match":{"userId":"THIS_IS_A_DHP_USER_ID"}},
{"$group":{"_id":"$_id","sum":{"$sum":"$items.value"},
"minDate":{"$min":"$items.dateTime"}}}
)

mongodb: remove subdocument with where clause that includes document & subdocument

Here is my collection (workers):
"type" : "Manager",
"employees" : [{
"name" : "bob"
"id" : 101
},{
"name" : "phil"
"id" : 102
},{
"name" : "bob"
"id" : 103
}]
First: this is NOT an array so $pullAll will not work or other array commands. All I want to do is: (1) search the collection for id 101 in ALL subdocuments with type Manager. (2) If 101 exists in a "Manager" subdocument, I want to remove item 103.
I have been pouring over the interwebs for two days on this issue and cannot figure it out.
I've tried this (and many other variations):
db.workers.update( {"type":"Manager","employees.id":101},{$pull : {"employees.id" : {"id" : 103}}},false,true)
The syntax of your $pull object is off. Try this instead:
db.workers.update({"type":"Manager","employees.id":101},
{$pull : {"employees" : {"id" : 103}}},false,true)
To confirm they were removed:
db.workers.find({
type: "Manager",
$and: [{'employees.id': 101}, {'employees.id': 103}]
})

Listing, counting factors of unique Mongo DB values over all keys

I'm preparing a descriptive "schema" (quelle horreur) for a MongoDB I've been working with.
I used the excellent variety.js to create a list of all keys and show coverage of each key. However, in cases where the values corresponding to the keys have a small set of values, I'd like to be able to list the entire set as "available values." In R, I'd be thinking of these as the "factors" for the categorical variable, ie, gender : ["M", "F"].
I know I could just use R + RMongo, query each variable, and basically do the same procedure I would to create a histogram, but I'd like to know the proper Mongo.query()/javascript/Map,Reduce way to approach this. I understand the db.collection.aggregate() functions are designed for exactly this.
Before asking this, I referenced:
http://docs.mongodb.org/manual/reference/aggregation/
http://docs.mongodb.org/manual/reference/method/db.collection.distinct/
How to query for distinct results in mongodb with python?
Get a list of all unique tags in mongodb
http://cookbook.mongodb.org/patterns/count_tags/
But can't quite get the pipeline order right. So, for example, if I have documents like these:
{_id : 1, "key1" : "value1", "key2": "value3"}
{_id : 2, "key1" : "value2", "key2": "value3"}
I'd like to return something like:
{"key1" : ["value1", "value2"]}
{"key2" : ["value3"]}
Or better, with counts:
{"key1" : ["value1" : 1, "value2" : 1]}
{"key2" : ["value3" : 2]}
I recognize one problem with doing this will be any values that have a wide range of different values---so, text fields, or continuous variables. Ideally, if there were more than x different possible values, it would be nice to truncate, say to no more than 20 unique values. If I find it's actually more, I'd query that variable directly.
Is this something like:
db.collection.aggregate(
{$limit: 20,
$group: {
_id: "$??varname",
count: {$sum: 1}
}})
First, how can I reference ??varname? for the name of each key?
I saw this link which had 95% of it:
Binning and tabulate (unique/count) in Mongo
with...
input data:
{ "_id" : 1, "age" : 22.34, "gender" : "f" }
{ "_id" : 2, "age" : 23.9, "gender" : "f" }
{ "_id" : 3, "age" : 27.4, "gender" : "f" }
{ "_id" : 4, "age" : 26.9, "gender" : "m" }
{ "_id" : 5, "age" : 26, "gender" : "m" }
This script:
db.collection.aggregate(
{$project: {gender:1}},
{$group: {
_id: "$gender",
count: {$sum: 1}
}})
Produces:
{"result" :
[
{"_id" : "m", "count" : 2},
{"_id" : "f", "count" : 3}
],
"ok" : 1
}
But what I don't understand is how could I do this generically for an unknown number/name of keys with a potentially large number of return values? This sample knows the key name is gender, and that the response set will be small (2 values).
If you already ran a script that outputs the names of all keys in the collection, you can generate your aggregation framework pipeline dynamically. What that means is either extending the variety.js type script or just writing your own.
Here is what it might look like in JS if passed an array called "keys" which has several non-"_id" named fields (I'm assuming top level fields and that you don't care about arrays, embedded documents, etc).
keys = ["key1", "key2"];
group = { "$group" : { "_id" : null } } ;
keys.forEach( function(f) {
group["$group"][f+"List"] = { "$addToSet" : "$" + f }; } );
db.collection.aggregate(group);
{
"result" : [
{
"_id" : null,
"key1List" : [
"value2",
"value1"
],
"key2List" : [
"value3"
]
}
],
"ok" : 1
}