Mongodb distinct on a key in a key value pair situation - mongodb

I have the bellow document structure in mongodb, I want to bring distinct keys from the column customData
So if you look bellow, I want my result to be: key1,key2,key3,key4
doing
db.coll.distinct("customData")
will bring the values and not the keys.
{
"_id":ObjectId("56c4da4f681ec51d32a4053d"),
"accountUnique":7356464,
"customData":{
"key1":1,
"key2":2,
}
}
{
"_id":ObjectId("56c4da4f681ec51d32a4054d"),
"accountUnique":7356464,
"customData":{
"key3":1,
"key4":2,
}
}

Possible to do this with Map-Reduce since you have dynamic subdocument keys which the distinct method will not return a result for.
Running the following mapreduce operation will populate a separate collection with all the keys as the _id values:
var myMapReduce = db.runCommand({
"mapreduce": "coll",
"map" : function() {
for (var key in this.customData) { emit(key, null); }
},
"reduce" : function() {},
"out": "coll_keys"
})
To get a list of all the dynamic keys, run distinct on the resulting collection:
db[myMapReduce.result].distinct("_id")
will give you the sample output
["key1", "key2", "key3", "key4"]

Related

Mongodb find field with dot in collection with 1M documents

I am trying to find the query to find all the fields/keys in the collection with 1M records. The field is a nested field (field under a field under field). But I end up getting this error with output any suggestions on how to overcome this ??
mr = db.runCommand({
"mapreduce" : "MyCollectionName",
"map" : function() {
var f = function() {
for (var key in this) {
if (this.hasOwnProperty(key)) {
emit(key, null)
if (typeof this[key] == 'object') {
f.call(this[key])
}
}
}
}
f.call(this);
},
"reduce" : function(key, stuff) { return null; },
"out": "MyCollectionName" + "_keys"
});
print(db[mr.result].distinct("_id"));
lmdb> print(db[mr.result].distinct("_id"));
getting this error:
MongoServerError: distinct too big, 16mb cap
I am fairly new to mongodb, so forgive my ignorance…
Once you create new collection containing all the field names. You seem to be using distinct on "_id" field for this collection, "_id" are unique by definition, so can just run "db.{collectionName}.find({})" instead of distinct. Also, mongo docs explicitly state that distict results must not exceed max BSON size (16 Mb).
Results must not be larger than the maximum BSON size. If your results exceed the maximum BSON size, use the aggregation pipeline to retrieve distinct values using the $group operator, as described in Retrieve Distinct Values with the Aggregation Pipeline.
You can use use aggregate option for that. Like this:
db.{collectionName}.aggregate( [ { $group : { _id : "${fieldName}" } } ] )

In mongoDB map function emit full document without assigning (key,values) pair to a varchar

var map = function() {
var values = {d_sno : this.d_sno, type : this.type};
emit(this._id, values);
};
In the map function above I am assigning (key,value) pair to a varchar and emit that varchar. I want to emit the whole document without assigning (key,value) pair to a varchar.
You can emit the whole document like this:
var map = function() {
emit(this._id, this);
};
By emitting the whole document as a value, the emitted document will look like this:
{
"_id" : ObjectId("53a6bd394aaee8df24b45dc5"),
"value" : {
"_id" : ObjectId("53a6bd394aaee8df24b45dc5"),
"d_sno" : "foo",
"type" : "bar",
/* ... other fields */
}
}
The problem with this approach is that you will have _id of the documents in the value and in the key (because you're emitting _id as a key).
To get rid of the _id in the emitted value you can use this approach:
var map = function() {
var key = this._id;
var value = this;
delete value._id;
emit(key, value);
};
But you should be careful when emitting whole documents because a single emit can only hold half of MongoDB’s maximum BSON document size (which is currently 16 MB).

Can MongoDB aggregate "top x" results in this document schema?

{
"_id" : "user1_20130822",
"metadata" : {
"date" : ISODate("2013-08-22T00:00:00.000Z"),
"username" : "user1"
},
"tags" : {
"abc" : 19,
"123" : 2,
"bca" : 64,
"xyz" : 14,
"zyx" : 12,
"321" : 7
}
}
Given the schema example above, is there a way to query this to retrieve the top "x" tags: E.g., Top 3 "tags" sorted descending?
Is this possible in a single document? e.g., top tags for a user on a given day
What if i have multiple documents that need to be combined together before getting the top? e.g., top tags for a user in a given month
I know this can be done by using a "document per user per tag per day" or by making "tags" an array, but I'd like to be able to do this as above, as it makes in place $inc's easier (many more of these happening than reads).
Or do I need to return back the whole document, and defer to the client on the sorting/limiting?
When you use object-keys as tag-names, you are making this kind of reporting very difficult. The aggreation framework has no $unwind-equivalent for objects. But there is always MapReduce.
Have your map-function emit one document for each key/value pair in the tags-subdocument. It should look something like this;
var mapFunction = function() {
for (var key in this.tags) {
emit(key, this.tags[key]);
}
}
Your reduce-function would then sum up the values emitted for the same key.
var reduceFunction = function(key, values) {
var sum = 0;
for (var i = 0; i < values.length; i++) {
sum += values[i];
}
return sum;
}
The complete MapReduce command would look something like this:
db.runCommand(
{
mapReduce: "yourcollection", // the collection where your data is stored
query: { _id : "user1_20130822" }, // or however you want to limit the results
map: mapFunction,
reduce: reduceFunction,
out: "inline", // means that the output is returned directly.
}
)
This will return all tags in unpredictable order. MapReduce has a sort and a limit option, but these only work on a field which has an index in the original collection, so you can't use it on a computed field. To get only the top 3, you would have to sort the results on the application-level. When you insist on doing the sorting and limiting on the database, define an output-collection to store the mapReduce results in (with the out-option set to out: { replace: "temporaryCollectionName" }) and then query that collection with sort and limit afterwards.
Keep in mind that when you use an intermediate collection, you must make sure that no two users run MapReduces with different queries into the same collection. When you have multiple users which want to view your top-3 list, you could let them query the output-collection and do the MapReduce in the background at regular intervales.

Join Through Map reduce

I have one collection in which student_id is the primary key:
test1:{student_id:"xxxxx"},
I have another collection in which student_id is inside array of collection:
class:{"class":"I",students:["student_id":"xxxx"]}
My problem is I want to join these two tables on the basis of student Id,
I am using map reduce and out as "merge", but it won't work.
My MR query is as follows.
db.runCommand({ mapreduce: "test1",
map : function Map() {
emit(this._id,this);
},
reduce : function Reduce(key, values) {
return values;
},
out : { merge: "testmerge" }
});
db.runCommand({ mapreduce: "class",
map : function Map() {
emit(this._id,this);
},
reduce : function Reduce(key, values) {
return values;
},
out : { merge: "testmerge" }
});
But it inserts two rows.
Can some one guide me regarding this,I am very new to MR
As in the example I want to get the details of all student from "test1" collection,studying in class "I".
Your requirement seems to be:
As in the example I want to get the details of all student from "test1" collection,studying in class "I".
In order to do that, store which classes a student is in with the student:
{
student_id: "xxxxx",
classes: ["I"],
},
Then you can just ask for all the students information with:
db.students.find( { classes: "I" } );
Without any need for slow and complex map reduce jobs. In general, you should avoid Map/Reduce as it can't make use of indexes and can not run concurrently. You also need to understand that in MongoDB operations are only done on one collection. There is no such thing as a join, and trying to emulate this with Map/Reduce is a bad idea. At least you can just do it with two queries:
// find all students in class "I":
ids = [];
db.classes.find( { class: "I" } ).forEach(function(e) { ids.push( e.student_id ) ; } );
// then with the result, find all of those students information:
db.students.find( { student_id: { $in: ids } } );
But I would strongly recommend you redesign your schema and store the classes with each student. As a general hint, in MongoDB you would store the relation between documents on the other side as compared to a relational database.

MapReduce to insert items into a new collection, in random order

I've got some documents that look like:
{
_id: 3,
key: 3,
stuff: "Some data"
}
Some documents also have a signUpDate
We can populate a collection for demo purposes like this:
for(i=1; i<=100000; i++){
if(i%3===0)
db.numbers.insert({_id:i, key:i, stuff:"Some data", signUpDate: new Date()});
else
db.numbers.insert({_id:i, key:i, stuff:"Some data"});
}
... so a third of the documents have a signUpDate
What I'm trying to do is create a map reduce function that takes all the documents, where signUpDate is not null, and insert them into a separate collection, ordered randomly
Is this possible?
Ok, here's a solution that works:
Using mongoshell:
First, we populate our data:
for(i=1; i<=100000; i++){
if(i%3===0)
db.numbers.insert({_id:i, key:i, stuff:"Some data", signUpDate: new Date()});
else
db.numbers.insert({_id:i, key:i, stuff:"Some data"});
}
So now, we have a third of our data with a signUpDate.
Now, a super-simple mapreduce:
m = function() {emit(this._id, Math.random());}
r = function(key, values){}
db.numbers.mapReduce(m,r, {out: "randomlyOrdered", query: { signUpDate: { $ne : null } }});
Next, ensureIndex to speed up sorting:
db.randomlyOrdered.ensureIndex({"value":1});
Now, find the numbers (randomly sorted)
db.randomlyOrdered.find({}, {"_id":1}).sort({"value":1});