Mongo - Replace references with embedded documents - mongodb

I have a Collection with a nested attribute that is an array of ObjectId References. These refer to documents in another Collection.
I'd like to replace these references with the documents themselves, i.e. embed those documents where the references are now. I've tried with and without the .snapshot() option. This may be caused because I'm updating a document while in a loop on that doc, and .snapshot() isn't available at that level.
My mongo-fu is low and I'm stuck on a call stack error. How can I do this?
Example code:
db.CollWithReferences.find({}).snapshot().forEach( function(document) {
var doc_id = document._id;
document.GroupsOfStuff.forEach( function(Group) {
var docsToEmbed= db.CollOfThingsToEmbed.find({ _id: { $in: Group.ArrayOfReferenceObjectIds }});
db.CollWithReferences.update({"_id": ObjectId(doc_id) },
{$set: {"Group.ArrayOfReferenceObjectIds ":docsToEmbed}} )
});
});
Gives this error:
{
"message" : "Maximum call stack size exceeded",
"stack" : "RangeError: Maximum call stack size exceeded" +
....}

I figure this is happening for one of two reasons. Either you are running out of memory by executing two queries in a for loop, or the update operation is being executed before the find operation has finished.
Either way, it is not a good idea to execute too many queries in a for loop as it can lead to this type of error.
I can't be sure if this will fix your problem as I don't know how many documents are in your collections, but it may work if you first get all documents from the CollWithReferences collection, then all you need from the CollOfThingsToEmbed collection. Then build a map of an _id from the CollOfThingsToEmbed collection to the actual document that corresponds to that. You can then loop through each document you got from the CollWithReferences collection, and mutate the groupsOfStuff array by accessing each ArrayOfReferenceObjectIds array and setting the ObjectId to the value that you have in the map you already built up, which will be the whole document. Then just update that document by setting GroupsOfSuff to its mutated value.
The following JavaScript code will do this (it could be organised better to have no logic in the global scope etc.):
var references = db.CollWithReferences.find({});
function getReferenceIds(references) {
var referenceIds = [];
for (var i = 0; i < references.length; i++) {
var group = references[i].GroupsOfStuff;
for (let j = 0; j < group.ArrayOfReferenceObjectIds; j++) {
referenceIds.push(group.ArrayOfReferenceObjectIds[j]);
}
}
return referenceIds;
}
function buildIdMap(docs) {
var map = {};
for (var i = 0; i < docs.length; i++) {
map[docs[i]._id.toString()] = docs[i];
}
return map;
}
var referenceIds = getReferenceIds(references);
var docsToEmbed = db.CollOfThingsToEmbed.find({_id: {$in: referenceIds}});
var idMap = buildIdMap(docsToEmbed);
for (var i = 0; i < references.length; i++) {
var groups = references[i].GroupsOfStuff;
for (var j = 0; j < groups.length; j++) {
refs = groups[j].ArrayOfReferenceObjectIds;
refs.forEach(function(ref) {
ref = idMap[ref.toString()];
});
}
db.CollWithReferences.update({
_id: ObjectId(ref._id)
}, {
$set: {GroupsOfStuff: groups}
});
}
It would be better if it was possible to just do one bulk update, but as each document needs to be updated differently, this is not possible.

Related

Meteor: Update object nested two arrays deep

I have a collection with a key called fields, which is an array of JSON objects. Those objects can have options which is another array of JSON objects. I’m trying to update one of the options by optionId. I tried this but it doesn't work.
Projects.update({
'fields.options._id': optionId
}, {
$set: {
`fields.$.options.$.title`: title
}
}
This does find the correct Project document, but doesn't update it.
You can use the $ operator for single level arrays only. Use of array1.$.array2.$.key is not supported.
However, if you are aware of the exact index of the element to be updated within the array, you can update like so:
Projects.update({
'fields.options._id': optionId
}, {
$set: {
`fields.0.options.1.title`: title
}
}
This is one way to update:
Projects.find({"fields.options._id":optionId}).forEach(function(record) {
var match = false;
// iterate fields array
for(var i=0; i< record.fields.length; i++){
// iterate options array
for(var j=0; j<record.fields[i].options.length; j++){
if(record.fields[i].options[j]._id == optionsID){
record.fields[i].options[j].title = title;
match = true;
// break;
}
}
}
if (match === true) Projects.update( { 'fields.options._id': optionId }, record );
});
Source

Meteor, ChartsJS and MongoDB

I want to iterate through the MongoDB collection to get the chart labels but I get TypeError: undefined is not an object (evaluating 'teams[i].name') here is my code:
var teams = Teams.find();
var teamNames = [10];
for(i = 0; i < 10; i++)
{
teamNames.push(teams[i].name);
}
var chart = new Chart(canvas, {
type: 'bar',
data: {
labels: [teamNames]
....
Anyone any suggestions? I am running out of ideas.
Thank you in advance.
You can do this
var teamNames = Teams.find().map(
function(team){
return team.name;
}
)
teams must have a length of less than 10 items. If teams is [{name: "first"}], then teams[1] will return undefined and you will get that error. You can use:
for (let i = 0; i < teams.length; i++)
to solve this problem.
You can also map over the array to get specific properties:
labels: teams.map(team => team.name),
In Meteor, the Collection .find() function returns a cursor that you can then use to perform operations on collection items. In your case, you are treating the cursor as if it were an array which is incorrect. There are a few different ways that you can approach this.
1) Use .forEach() to iterate over the cursor.
var teamNames = [];
Teams.find().forEach(function (e) {
teamNames.push(e.name);
});
2) Use .fetch() to return all matching documents in an array, then iterate over that.
var teams = Teams.find().fetch();
var teamNames = [];
for(i = 0; i < teams.length; i++) {
teamNames.push(teams[i].name);
}
3) Use .map() to iterate over the collection calling the callback on all items and returning an array.
var teamNames = Teams.find().forEach(function (e) {
return e.name;
});

mongodb move documents from one collection to another collection

How can documents be moved from one collection to another collection in MongoDB?? For example: I have lot of documents in collection A and I want to move all 1 month older documents to collection B (these 1 month older documents should not be in collection A).
Using aggregation we can do copy. But what I am trying to do is moving of documents.
What method can be used to move documents?
The bulk operations #markus-w-mahlberg showed (and #mark-mullin refined) are efficient but unsafe as written. If the bulkInsert fails, the bulkRemove will still continue. To make sure you don't lose any records when moving, use this instead:
function insertBatch(collection, documents) {
var bulkInsert = collection.initializeUnorderedBulkOp();
var insertedIds = [];
var id;
documents.forEach(function(doc) {
id = doc._id;
// Insert without raising an error for duplicates
bulkInsert.find({_id: id}).upsert().replaceOne(doc);
insertedIds.push(id);
});
bulkInsert.execute();
return insertedIds;
}
function deleteBatch(collection, documents) {
var bulkRemove = collection.initializeUnorderedBulkOp();
documents.forEach(function(doc) {
bulkRemove.find({_id: doc._id}).removeOne();
});
bulkRemove.execute();
}
function moveDocuments(sourceCollection, targetCollection, filter, batchSize) {
print("Moving " + sourceCollection.find(filter).count() + " documents from " + sourceCollection + " to " + targetCollection);
var count;
while ((count = sourceCollection.find(filter).count()) > 0) {
print(count + " documents remaining");
sourceDocs = sourceCollection.find(filter).limit(batchSize);
idsOfCopiedDocs = insertBatch(targetCollection, sourceDocs);
targetDocs = targetCollection.find({_id: {$in: idsOfCopiedDocs}});
deleteBatch(sourceCollection, targetDocs);
}
print("Done!")
}
Update 2
Please do NOT upvote this answer any more. As written #jasongarber's answer is better in any aspect.
Update
This answer by #jasongarber is a safer approach and should be used instead of mine.
Provided I got you right and you want to move all documents older than 1 month, and you use mongoDB 2.6, there is no reason not to use bulk operations, which are the most efficient way of doing multiple operations I am aware of:
> var bulkInsert = db.target.initializeUnorderedBulkOp()
> var bulkRemove = db.source.initializeUnorderedBulkOp()
> var date = new Date()
> date.setMonth(date.getMonth() -1)
> db.source.find({"yourDateField":{$lt: date}}).forEach(
function(doc){
bulkInsert.insert(doc);
bulkRemove.find({_id:doc._id}).removeOne();
}
)
> bulkInsert.execute()
> bulkRemove.execute()
This should be pretty fast and it has the advantage that in case something goes wrong during the bulk insert, the original data still exists.
Edit
In order to prevent too much memory to be utilized, you can execute the bulk operation on every x docs processed:
> var bulkInsert = db.target.initializeUnorderedBulkOp()
> var bulkRemove = db.source.initializeUnorderedBulkOp()
> var x = 10000
> var counter = 0
> var date = new Date()
> date.setMonth(date.getMonth() -1)
> db.source.find({"yourDateField":{$lt: date}}).forEach(
function(doc){
bulkInsert.insert(doc);
bulkRemove.find({_id:doc._id}).removeOne();
counter ++
if( counter % x == 0){
bulkInsert.execute()
bulkRemove.execute()
bulkInsert = db.target.initializeUnorderedBulkOp()
bulkRemove = db.source.initializeUnorderedBulkOp()
}
}
)
> bulkInsert.execute()
> bulkRemove.execute()
Insert and remove:
var documentsToMove = db.collectionA.find({});
documentsToMove.forEach(function(doc) {
db.collectionB.insert(doc);
db.collectionA.remove(doc);
});
note: this method might be quite slow for large collections or collections holding large documents.
$out is use to create the new collection with data , so use $out
db.oldCollection.aggregate([{$out : "newCollection"}])
then use drop
db.oldCollection.drop()
you can use range query to get data from sourceCollection and keep the cursor data in variable and loop on it and insert to target collection:
var doc = db.sourceCollection.find({
"Timestamp":{
$gte:ISODate("2014-09-01T00:00:00Z"),
$lt:ISODate("2014-10-01T00:00:00Z")
}
});
doc.forEach(function(doc){
db.targetCollection.insert(doc);
})
Hope so it helps!!
First option (Using mongo dump)
1.Get a dump from collection
mongodump -d db -c source_collection
2.Restore from collection
mongorestore -d db -c target_collection dir=dump/db_name/source_collection.bson
Second Option
Running aggregate
db.getCollection('source_collection').aggregate([ { $match: {"emailAddress" : "apitester#mailinator.com"} }, { $out: "target_collection" } ])
Third Option (Slowest)
Running a through for loop
db.getCollection('source_collection').find().forEach(function(docs){ db.getCollection('target_collection').insert(docs); }) print("Rolleback Completed!");
May be from the performance point of view it's better to remove a lot of documents using one command(especially if you have indexes for query part) rather than deleting them one-by-one.
For example:
db.source.find({$gte: start, $lt: end}).forEach(function(doc){
db.target.insert(doc);
});
db.source.remove({$gte: start, $lt: end});
This is a restatement of #Markus W Mahlberg
Returning the favor - as a function
function moveDocuments(sourceCollection,targetCollection,filter) {
var bulkInsert = targetCollection.initializeUnorderedBulkOp();
var bulkRemove = sourceCollection.initializeUnorderedBulkOp();
sourceCollection.find(filter)
.forEach(function(doc) {
bulkInsert.insert(doc);
bulkRemove.find({_id:doc._id}).removeOne();
}
)
bulkInsert.execute();
bulkRemove.execute();
}
An example use
var x = {dsid:{$exists: true}};
moveDocuments(db.pictures,db.artifacts,x)
to move all documents that have top level element dsid from the pictures to the artifacts collection
Here's an update to #jasongarber's answer which uses the more recent mongo 'bulkWrite' operation (Read docs here), and also keeps the whole process asynchronous so you can run it as part of a wider script which depends on its' completion.
async function moveDocuments (sourceCollection, targetCollection, filter) {
const sourceDocs = await sourceCollection.find(filter)
console.log(`Moving ${await sourceDocs.count()} documents from ${sourceCollection.collectionName} to ${targetCollection.collectionName}`)
const idsOfCopiedDocs = await insertDocuments(targetCollection, sourceDocs)
const targetDocs = await targetCollection.find({_id: {$in: idsOfCopiedDocs}})
await deleteDocuments(sourceCollection, targetDocs)
console.log('Done!')
}
async function insertDocuments (collection, documents) {
const insertedIds = []
const bulkWrites = []
await documents.forEach(doc => {
const {_id} = doc
insertedIds.push(_id)
bulkWrites.push({
replaceOne: {
filter: {_id},
replacement: doc,
upsert: true,
},
})
})
if (bulkWrites.length) await collection.bulkWrite(bulkWrites, {ordered: false})
return insertedIds
}
async function deleteDocuments (collection, documents) {
const bulkWrites = []
await documents.forEach(({_id}) => {
bulkWrites.push({
deleteOne: {
filter: {_id},
},
})
})
if (bulkWrites.length) await collection.bulkWrite(bulkWrites, {ordered: false})
}
From MongoDB 3.0 up, you can use the copyTo command with the following syntax:
db.source_collection.copyTo("target_collection")
Then you can use the drop command to remove the old collection:
db.source_collection.drop()
I do like the response from #markus-w-mahlberg, however at times, I have seen the need to keep it a bit simpler for people. As such I have a couple of functions that are below. You could naturally wrap thing here with bulk operators as he did, but this code works with new and old Mongo systems equally.
function parseNS(ns){
//Expects we are forcing people to not violate the rules and not doing "foodb.foocollection.month.day.year" if they do they need to use an array.
if (ns instanceof Array){
database = ns[0];
collection = ns[1];
}
else{
tNS = ns.split(".");
if (tNS.length > 2){
print('ERROR: NS had more than 1 period in it, please pass as an [ "dbname","coll.name.with.dots"] !');
return false;
}
database = tNS[0];
collection = tNS[1];
}
return {database: database,collection: collection};
}
function insertFromCollection( sourceNS, destNS, query, batchSize, pauseMS){
//Parse and check namespaces
srcNS = parseNS(sourceNS);
destNS = parseNS(destNS);
if ( srcNS == false || destNS == false){return false;}
batchBucket = new Array();
totalToProcess = db.getDB(srcNS.database).getCollection(srcNS.collection).find(query,{_id:1}).count();
currentCount = 0;
print("Processed "+currentCount+"/"+totalToProcess+"...");
db.getDB(srcNS.database).getCollection(srcNS.collection).find(query).addOption(DBQuery.Option.noTimeout).forEach(function(doc){
batchBucket.push(doc);
if ( batchBucket.length > batchSize){
db.getDB(destNS.database).getCollection(destNS.collection)insert(batchBucket);
currentCount += batchBucket.length;
batchBucket = [];
sleep (pauseMS);
print("Processed "+currentCount+"/"+totalToProcess+"...");
}
}
print("Completed");
}
/** Example Usage:
insertFromCollection("foo.bar","foo2.bar",{"type":"archive"},1000,20);
You could obviously add a db.getSiblingDB(srcNS.database).getCollection(srcNS.collection).remove(query,true)
If you wanted to also remove the records after they are copied to the new location. The code can easily be built like that to make it restartable.
I had 2297 collection for 15 million of documents but some collection was empty.
Using only copyTo the script failed, but with this script optimization:
db.getCollectionNames().forEach(function(collname) {
var c = db.getCollection(collname).count();
if(c!==0){
db.getCollection(collname).copyTo('master-collection');
print('Copied collection ' + collname);
}
});
all works fine for me.
NB: copyTo is deprecated because it block the read/write operation: so I think is fine if you know that the database is not usable during this operation.
In my case for each didn't work. So I had to make some changes.
var kittySchema = new mongoose.Schema({
name: String
});
var Kitten = mongoose.model('Kitten', kittySchema);
var catSchema = new mongoose.Schema({
name: String
});
var Cat = mongoose.model('Cat', catSchema);
This is Model for both the collection
`function Recursion(){
Kitten.findOne().lean().exec(function(error, results){
if(!error){
var objectResponse = results;
var RequiredId = objectResponse._id;
delete objectResponse._id;
var swap = new Cat(objectResponse);
swap.save(function (err) {
if (err) {
return err;
}
else {
console.log("SUCCESSFULL");
Kitten.deleteOne({ _id: RequiredId }, function(err) {
if (!err) {
console.log('notification!');
}
else {
return err;
}
});
Recursion();
}
});
}
if (err) {
console.log("No object found");
// return err;
}
})
}`
I planned to arhieve 1000 records at a time using bulkinsert and bulkdelete methods of pymongo.
For both source and target
create mongodb objects to connect to the database.
instantiate the bulk objects. Note: I created a backup of bulk objects too. This will help me to rollback the insertion or removal when an error occurs.
example:
For source
// replace this with mongodb object creation logic
source_db_obj = db_help.create_db_obj(source_db, source_col)
source_bulk = source_db_obj.initialize_ordered_bulk_op()
source_bulk_bak = source_db_obj.initialize_ordered_bulk_op()
For target
// replace this with mogodb object creation logic
target_db_obj = db_help.create_db_obj(target_db, target_col)
target_bulk = target_db_obj.initialize_ordered_bulk_op()
target_bulk_bak = target_db_obj.initialize_ordered_bulk_op()
Obtain the source records that matches the filter criteria
source_find_results = source_db_obj.find(filter)
Loop through the source records
create target and source bulk operations
Append archived_at field with the current datetime to the target collection
//replace this with the logic to obtain the UTCtime.
doc['archived_at'] = db_help.getUTCTime()
target_bulk.insert(document)
source_bulk.remove(document)
for rollback in case of any errors or exceptions, create target_bulk_bak and source_bulk_bak operations.
target_bulk_bak.find({'_id':doc['_id']}).remove_one()
source_bulk_bak.insert(doc)
//remove the extra column
doc.pop('archieved_at', None)
When the record count to 1000, execute the target - bulk insertion and source - bulk removal. Note: this method takes target_bulk and source_bulk objects for execution.
execute_bulk_insert_remove(source_bulk, target_bulk)
When exception occurs, execute the target_bulk_bak removal and source_bulk_bak inesertions. This would rollback the changes. Since mongodb doesn't have rollback, I came up with this hack
execute_bulk_insert_remove(source_bulk_bak, target_bulk_bak)
Finally re-initialize the source and target bulk and bulk_bak objects. This is necessary because you can use them only once.
Complete code
def execute_bulk_insert_remove(source_bulk, target_bulk):
try:
target_bulk.execute()
source_bulk.execute()
except BulkWriteError as bwe:
raise Exception(
"could not archive document, reason: {}".format(bwe.details))
def archive_bulk_immediate(filter, source_db, source_col, target_db, target_col):
"""
filter: filter criteria for backup
source_db: source database name
source_col: source collection name
target_db: target database name
target_col: target collection name
"""
count = 0
bulk_count = 1000
source_db_obj = db_help.create_db_obj(source_db, source_col)
source_bulk = source_db_obj.initialize_ordered_bulk_op()
source_bulk_bak = source_db_obj.initialize_ordered_bulk_op()
target_db_obj = db_help.create_db_obj(target_db, target_col)
target_bulk = target_db_obj.initialize_ordered_bulk_op()
target_bulk_bak = target_db_obj.initialize_ordered_bulk_op()
source_find_results = source_db_obj.find(filter)
start = datetime.now()
for doc in source_find_results:
doc['archived_at'] = db_help.getUTCTime()
target_bulk.insert(doc)
source_bulk.find({'_id': doc['_id']}).remove_one()
target_bulk_bak.find({'_id': doc['_id']}).remove_one()
doc.pop('archieved_at', None)
source_bulk_bak.insert(doc)
count += 1
if count % 1000 == 0:
logger.info("count: {}".format(count))
try:
execute_bulk_insert_remove(source_bulk, target_bulk)
except BulkWriteError as bwe:
execute_bulk_insert_remove(source_bulk_bak, target_bulk_bak)
logger.info("Bulk Write Error: {}".format(bwe.details))
raise
source_bulk = source_db_obj.initialize_ordered_bulk_op()
source_bulk_bak = source_db_obj.initialize_ordered_bulk_op()
target_bulk = target_db_obj.initialize_ordered_bulk_op()
target_bulk_bak = target_db_obj.initialize_ordered_bulk_op()
end = datetime.now()
logger.info("archived {} documents to {} in ms.".format(
count, target_col, (end - start)))

MongoDB MapReduce: Not working as expected for more than 1000 records

I wrote a mapreduce function where the records are emitted in the following format
{userid:<xyz>, {event:adduser, count:1}}
{userid:<xyz>, {event:login, count:1}}
{userid:<xyz>, {event:login, count:1}}
{userid:<abc>, {event:adduser, count:1}}
where userid is the key and the remaining are the value for that key.
After the MapReduce function, I want to get the result in following format
{userid:<xyz>,{events: [{adduser:1},{login:2}], allEventCount:3}}
To acheive this I wrote the following reduce function
I know this can be achieved by group by.. both in aggregation framework and mapreduce, but we require a similar functionality for a complex scenario. So, I am taking this approach.
var reducefn = function(key,values){
var result = {allEventCount:0, events:[]};
values.forEach(function(value){
var notfound=true;
for(var n = 0; n < result.events.length; n++){
eventObj = result.events[n];
for(ev in eventObj){
if(ev==value.event){
result.events[n][ev] += value.allEventCount;
notfound=false;
break;
}
}
}
if(notfound==true){
var newEvent={}
newEvent[value.event]=1;
result.events.push(newEvent);
}
result.allEventCount += value.allEventCount;
});
return result;
}
This runs perfectly, when I run for 1000 records, when there are 3k or 10k records, the result I get is something like this
{ "_id" : {...}, "value" :{"allEventCount" :30, "events" :[ { "undefined" : 1},
{"adduser" : 1 }, {"remove" : 3 }, {"training" : 1 }, {"adminlogin" : 1 },
{"downgrade" : 2 } ]} }
Not able to understand where this undefined came from and also the sum of the individual events is less than allEventCount. All the docs in the collection has non-empty field event so there is no chance of undefined.
Mongo DB version -- 2.2.1
Environment -- Local machine, no sharding.
In the reduce function, why should this operation fail result.events[n][ev] += value.allEventCount; when the similar operation result.allEventCount += value.allEventCount; passes?
The corrected answer as suggested by johnyHK
Reduce function:
var reducefn = function(key,values){
var result = {totEvents:0, event:[]};
values.forEach(function(value){
value.event.forEach(function(eventElem){
var notfound=true;
for(var n = 0; n < result.event.length; n++){
eventObj = result.event[n];
for(ev in eventObj){
for(evv in eventElem){
if(ev==evv){
result.event[n][ev] += eventElem[evv];
notfound=false;
break;
}
}}
}
if(notfound==true){
result.event.push(eventElem);
}
});
result.totEvents += value.totEvents;
});
return result;
}
The shape of the object you emit from your map function must be the same as the object returned from your reduce function, as the results of a reduce can get fed back into reduce when processing large numbers of docs (like in this case).
So you need to change your emit to emit docs like this:
{userid:<xyz>, {events:[{adduser: 1}], allEventCount:1}}
{userid:<xyz>, {events:[{login: 1}], allEventCount:1}}
and then update your reduce function accordingly.

how to calculate count and unique count over two fields in mongo reduce function

I have a link tracking table that has (amongst other fields) track_redirect and track_userid. I would like to output both the total count for a given link, and also the unique count - counting duplicates by the user id. So we can differentiate if someone has clicked the same link 5 times.
I've tried emitting this.track_userid in both the key and values parts but can't get to grips with how to correctly access them in the reduce function.
So if I roll back to when it actually worked, I have the very simple code below - just like it would be in a 'my first mapreduce function' example
map
function() {
if(this.track_redirect) {
emit(this.track_redirect,1);
}
}
reduce
function(k, vals) {
var sum = 0;
for (var i in vals) {
sum += vals[i];
}
return sum;
}
I'd like to know the correct way to emit the additional userid information and access it in the mapreduce please. or am i thinking about it in the wrong way?
in case it's not clear, I don't want to calculate the total clicks a userid has made, but to count the unique clicks of each url + userid - not counting any duplicate clicks a userid made on each link
can someone point me in the right direction please? thanks!
You can actually pass arbitrary object on the second parameter of the emit call. That means you can take advantage of this and store the userid in it. For example, your map function can look like this:
var mapFunc = function() {
if (this.track_redirect) {
var tempDoc = {};
tempDoc[this.track_userid] = 1;
emit(this.track_redirect, {
users_clicked: tempDoc,
total_clicks: 1
});
}
};
And your reduce function might look like this:
var reduceFunc = function(key, values) {
var summary = {
users_clicked: {},
total_clicks: 0
};
values.forEach(function (doc) {
summary.total_clicks += doc.total_clicks;
// Merge the properties of 2 objects together
// (and these are actually the userids)
Object.extend(summary.users_clicked, doc.users_clicked);
});
return summary;
};
The users_clicked property of the summary object basically stores the id of every user as a property (since you can't have duplicate properties, you can guarantee that it will store unique users). Also note that you have to be careful of the fact that some of the values passed to the reduce function can be result of a previous reduce and the sample code above takes that into account. You can find more about the said behavior in the docs here.
In order to get the unique count, you can pass in the finalizer function that gets called when the reduce phase is completed:
var finalFunc = function(key, value) {
// Counts the keys of an object. Taken from:
// http://stackoverflow.com/questions/18912/how-to-find-keys-of-a-hash
var countKeys = function(obj) {
var count = 0;
for(var i in obj) {
if (obj.hasOwnProperty(i))
{
count++;
}
}
return count;
};
return {
redirect: key,
total_clicks: value.total_clicks,
unique_clicks: countKeys(value.users_clicked)
};
};
Finally, you can execute the map reduce job like this (modify the out attribute to fit your needs):
db.users.mapReduce(mapFunc, reduceFunc, { finalize: finalFunc, out: { inline: 1 }});