Emit with key as zero in map reduce in mongodb - mongodb

Giving the value of the key as 0 in the emit functions and after reduce it correctly gives the total of a column in the collection as intended. Now my question is I don't understand how this is working.
I have my emit like this;
function(){ emit(0, this.total); }
Could somebody please explain to me the working in this? Thank you in advance.

MapReduce is a tricky thing. You need to change your mindset to understand it. In your particular case, you told mongo that don't care about grouping options. When you emit like this, all your this.total's will be sent to one batch with identifier 0 and aggregated all-together at reduce step. This also means that this cases are identical:
function(){ emit(0, this.total); }
function(){ emit(1, this.total); }
function(){ emit('asdf', this.total); }
function(){ emit(null, this.total); }
They will lead to save result, even batch name is different.

To compliment the other answer with some internals, when you emit your single and only key the resulting document from the emit will look something like:
{_id:0,value:[5,6,7,8,9]}
With the array representing the combination of all the emits.
Emits grouped upon the key when you emit so this means there will only be one document with the content of that document being all the total fields in the collection.
So when the reduce comes along and you add all of these numbers together it will correctly sum up the total for all total fields in the collection.

Related

Mongoose prevent duplicate id numbers based on document.count()

My documents all have sequential numbers, saved as a String as an ID (it's padded with 0s). When creating a new record, I first do a request for Comment.count(). Using the number returned from that, I generate the ID string. I then create an object, and save it as a new document.
var commentNumber = (result[1] + 1).toString().padStart(4, '0');
var newComment = this({
html: processedHtml,
number: commentNumber
});
newComment.save(function(err, result) {
if (err) return callback(err);
return callback(null, result);
});
The problem is, if two comments are submitted at the same time, they will get the same ID (this happens if I make 2 requests on submission instead of 1, they will both have the same ID).
How can I prevent this?
One simple option would be to create a unique index on number so that one of the requests fails.
Another would be to store the current number count elsewhere. If you wanted to use mongo, you could have a doc with commentCount in a different collection & do a findAndUpdate with $inc and use the returned value. This still leads to a weird race condition where a user might only see comments 1 and 3 when comment 2 takes longer to create than comment 3.
I think the approach of storing the comment number on the document is fundamentally flawed: it creates weird race conditions, strange error handling, and complex deletes. If possible, it's better to calculate the number of comments on the way out.
As far as ordering goes, mongo _ids encode date-time information at the start of the _id, so you can use the _id to sort documents.

Mongo find unique results

What's the easiest way to get all the documents from a collection that are unique based on a single field.
I know I can use db.collections.distrinct to get an array of all the distinct values of a field, but I want to get the first (or really any one) document for every distinct value of one field.
e.g. if the database contained:
{number:1, data:'Test 1'}
{number:1, data:'This is something else'}
{number:2, data:'I'm bad at examples'}
{number:3, data:'I guess there\'s room for one more'}
it would return (based on number being unique:
{number:1, data:'Test 1'}
{number:2, data:'I'm bad at examples'}
{number:3, data:'I guess there\'s room for one more'}
Edit: I should add that the server is running Mongo 2.0.8 so no aggregation and there's more results than group will support.
Update to 2.4 and use aggregation :)
When you really need to stick to the old version of MongoDB due to too much red tape involved, you could use MapReduce.
In MapReduce, the map function transforms each document of the collection into a new document and a distinctive key. The reduce function is used to merge documents with the same distincitve key into one.
Your map function would emit your documents as-is and with the number-field as unique key. It would look like this:
var mapFunction = function(document) {
emit(document.number, document);
}
Your reduce-function receives arrays of documents with the same key, and is supposed to somehow turn them into one document. In this case it would just discard all but the first document with the same key:
var reduceFunction = function(key, documents) {
return documents[0];
}
Unfortunately, MapReduce has some problems. It can't use indexes, so at least two javascript functions are executed for every single document in the collections (it can be limited by pre-excluding some documents with the query-argument to the mapReduce command). When you have a large collection, this can take a while. You also can't fully control how the docments created by MapReduce are formed. They always have two fields, _id with the key and value with the document you returned for the key.
MapReduce is also hard to debug an troubleshoot.
tl;dr: Update to 2.4

MongoDB MapReduce key as a document

Is it possible in MongoDB MapReduce to emit keys that are documents themselves? Something like
emit({type: 1, date: ...}, 12);
When I do this MapReduce completes with success but in my reduced results I also see emitted values so I am wondering what's wrong.
You can use document as emit key. The reduce function combines documents with the same key into one document. If the map function emits a single document for a particular key, the reduce function will not be called.
Can you share your code in snippet
You can definitely use a document for key and/or value. It will work exactly the same as when they are primitive types.

MongoDB: Order by computed property

in my db I have a collection where documents have a field score which is a float (-1..1). I can query the db to return the first 20 results ordered by score.
My problem is, that I want to modify the score of a doc with a time penality, based on the field time_updated: The older the doc is, the lower the score should be. And the big problem is, that I have to do this on runtime. I could iterate over all documents, update the score and then order by score. But this would cost too much time, since there is a huge amount of documents in the collection.
So my question is: With MongoDB, can I order by a computed property? Is there any way to do that? Or is there a feature in planning for next versions of MongoDB?
Exactly how is the score updated?
If it's simple and can be put in $add, $multiply, etc., terms then the aggregation pipeline will work well. Otherwise you'll need to use a simple MapReduce a for doing the the score updating.
var mapFunction = function() {
emit(this._id, <compute score here from this.score and this.time_updated>);
};
var reduceFunction = function (values) {
return values[0]; // trivial reduce function since incoming id's are unique.
};
For 10000 rows either the aggregation pipeline or a simple MapReduce will probably be sufficiently performant.
For much bigger datasets you may need to use a more complex MapReduce (that actually does a reduce) to be memory efficient. You might also want to take advantage of Incremental MapReduce.

MongoDB, return recent document for each user_id in collection

Looking for similar functionality to Postgres' Distinct On.
Have a collection of documents {user_id, current_status, date}, where status is just text and date is a Date. Still in the early stages of wrapping my head around mongo and getting a feel for best way to do things.
Would mapreduce be the best solution here, map emits all, and reduce keeps a record of the latest one, or is there a built in solution without pulling out mr?
There is a distinct command, however I'm not sure that's what you need. Distinct is kind of a "query" command and with lots of users, you're probably going to want to roll up data not in real-time.
Map-Reduce is probably one way to go here.
Map Phase: Your key would simply be an ID. Your value would be something like the following {current_status:'blah',date:1234}.
Reduce Phase: Given an array of values, you would grab the most recent and return only it.
To make this work optimally you'll probably want to look at a new feature from 1.8.0. The "re-reduce" feature. Will allow you to process only new data instead of re-processing the whole status collection.
The other way to do this is to build a "most-recent" collection and tie the status insert to that collection. So when you insert a new status for the user, you update their "most-recent".
Depending on the importance of this feature, you could possibly do both things.
Current solution that seems to be working well.
map = function () {emit(this.user.id, this.created_at);}
//We call new date just in case somethings not being stored as a date and instead just a string, cause my date gathering/inserting function is kind of stupid atm
reduce = function(key, values) { return new Date(Math.max.apply(Math, values.map(function(x){return new Date(x)})))}
res = db.statuses.mapReduce(map,reduce);
Another way to achieve the same result would be to use the group command, which is a kind of a mr-shortcut that lets you aggregate on a specific key or set of keys.
In your case it would read like this:
db.coll.group({ key : { user_id: true },
reduce : function(obj, prev) {
if (new Date(obj.date) < prev.date) {
prev.status = obj.status;
prev.date = obj.date;
}
},
initial : { status : "" }
})
However, unless you have a rather small fixed amount of users I strongly believe that a better solution would be, as previously suggested, to keep a separate collection containing only the latest status-message for each user.