The map-reduce usage is following
db.myCollection.mapReduce(function() {
emit(this.smth);
},
function(key, values) {
// return something done with key and values
});
My question is, why is the map part implemented to have implicit this that references the current document being processed? IMO, it would be cleaner to have the current document passed in as an argument to the map function (I prefer to write all my JavaScript without this).
In practice this also rules out the use of arrow functions in mongo scripts, since this reference does not work with them.
why is the map part implemented to have implicit this that references the current document being processed?
MongoDB's Map/Reduce API was created in 2009, which is well before arrow functions were available in JavaScript (via ES6/ES2015). I can only speculate on the design intention, but much has changed in JavaScript (and MongoDB) since the original Map/Reduce implementation.
The this keyword in a JavaScript method refers to the owner or execution context, so setting it to the current document being processed was perhaps a reasonable convention (or convenience) for JavaScript usage at the time. The reduce function has a required prototype of function (key, values) so a map prototype of function (doc) might have been more consistent. However, once an API choice is made, any significant breaking changes become more challenging to introduce.
A more modern take on aggregation might look quite different, and this is the general path that MongoDB has taken. The Aggregation Framework introduced in MongoDB 2.2 (August, 2012) is a higher performance approach to data aggregation and should be preferred (where possible) over Map/Reduce.
Successive releases of the MongoDB server have made significant improvements to the features and performance of the Aggregation Framework, while Map/Reduce has not notably evolved. For example, the Aggregation Framework is written in C++ and able to manipulate MongoDB's native BSON data types; Map/Reduce spawns JavaScript threads and has to marshal data between BSON and JavaScript.
In practice this also rules out the use of arrow functions in mongo scripts, since this reference does not work with them.
Indeed. As at MongoDB 4.0, arrow functions are not supported in Map/Reduce. There is a feature request to support arrow functions which can you watch/upvote in the MongoDB issue tracker: SERVER-34281.
Related
I am not able to differentiate findOneAndDelete() And findOneAndRemove() in the mongoose documentaion.
Query.prototype.findOneAndDelete()
This function differs slightly from Model.findOneAndRemove() in that
findOneAndRemove() becomes a MongoDB findAndModify() command, as
opposed to a findOneAndDelete() command. For most mongoose use cases,
this distinction is purely pedantic. You should use
findOneAndDelete() unless you have a good reason not to.
TLDR: You should use findOneAndDelete() unless you have a good reason not to.
Longer answer:
From the Mongoose documentation of findOneAndDelete:
This function differs slightly from Model.findOneAndRemove() in that findOneAndRemove() becomes a MongoDB findAndModify() command, as opposed to a findOneAndDelete() command. For most mongoose use cases, this distinction is purely pedantic. You should use findOneAndDelete() unless you have a good reason not to.
Mongoose Docs findOneAndDelete
In MongoDB native client (Mongoose is different) passing { remove: true } to findAndModify makes it remove the found document, it's behavior in this case would appear quite similar to findOneAndDelete.
However, there are some differences when using other options:
findAndModify takes 5 parameters :query, sort, doc (the update object), options and a callback. findOneAndDelete only takes 3 (filter or query, options and a callback)
options for findAndModify include w, wtimeout and j for write concerns, as:
“the level of acknowledgment requested from MongoDB for write operations to a standalone mongod or to replica sets or to sharded clusters.“ or simply guarantee levels available for reporting the success of a write operation.
options for findOneAndDelete do not include write concerns configuration.
findAndModify lets you return the new document and remove it at the same time.
Example:
// Simple findAndModify command returning the new document and
// removing it at the same time
collection.findAndModify({b:1}, [['b', 1]], {$set:{deleted: Date()}}, {remove:true}, calback)
Both of them are almost similar except findOneAndRemove uses findAndModify with remove flag and time complexity will be a bit higher compare to findOneAndDelete because you are doing an update. Delete are always faster.
In mongoose findOneAndDelete works the same way as findOneAndRemove. They both look an object by its properties in JSON, then goes ahead to delete it and as well as return it object once after deletion. When you are using the native mongodb as your database findOneAndDelete might be useful to you, but in the case of mongoose it is deprecated I may advise you to use findOneAndDelete to perform your operation based on the latest mongoose and nodejs configuration as at this period. https://github.com/Automattic/mongoose/issues/6880
Here is the exact difference (quoted from the mongoose docs in Model.findOneAndDelete() section):
"This function differs slightly from Model.findOneAndRemove() in that
findOneAndRemove() becomes a MongoDB findAndModify() command, as
opposed to a findOneAndDelete() command. For most mongoose use cases,
this distinction is purely pedantic. You should use findOneAndDelete()
unless you have a good reason not to."
Here is the link to it:
https://mongoosejs.com/docs/api.html#model_Model.findOneAndDelete
The other answers here have a lot of wrong info. They are for all intents and purposes the same, and you should just use findOneAndDelete().
Mongoose's Model.findOneAndRemove(query) becomes MongoDB's findAndModify({query, remove: true}).
Mongoose's Model.findOneAndDelete() becomes MongoDB's findOneAndDelete(). However, the MongoDB driver converts findOneAndDelete(query, opts) to findAndModify({query, remove: true}) (src). Thus they do the exact same thing in the database.
Both take the same options.
Both return the deleted document.
findOneAndRemove returns the removed document so if you remove a document that you later decide should not be removed, you can insert it back into the db. Ensuring your logic is sound before removing the document would be preferred to checks afterward IMO.
findOneAndDelete has the sort parameter which can be used to influence which document is updated. It also has a TimeLimit parameter which can control within which operation has to complete
I would suggest you use findOneAndDelete().
Mongoose provides both the features to handle data using the ORM and also features to write directly into the database, with findOneAndDelete() being one of the latter. Writing to database directly is more dangerous as you run the risk of not calling middleware or validators, potentially submitting partial or incomplete data to the database. Note that I said it's more dangerous, not that it's flat out dangerous, findOneAndDelete() just goes throughout the ORM adding safety.
In Mongodb's Mapreduce, Before I think there was "db"(like db.anotherCollection.find()) object accessible inside map function. But this features has been removed(from version 1.6 or so on), which make difficult in case of join. what was the reason? why it has been removed?
As at MongoDB 2.4 there are several reasons to disallow access to the db object with from within Map/Reduce functions including:
Deadlocks: There are potential deadlock scenarios between database and/or JavaScript locks called from within the same server-side function.
Performance: The Map/Reduce pattern calls reduce() multiple times; each iteration is a different JavaScript context and would have to open new connections to the database and allocate additional memory for query results. Long-running JavaScript operations will block other operations.
Security: Cross-database queries require appropriate authentication checks.
The above issues could be further complicated for Map/Reduce jobs reading or writing to sharded clusters. The MongoDB Map/Reduce implementation is currently only designed to work with data from a single input collection, and any historical abuses of db object within Map/Reduce functions should be considered a bug rather than a feature.
If you want to merge data with Map/Reduce, you can use an Incremental Map/Reduce. Depending on what outcome you are trying to achieve, there are other approaches that may be more straightforward such as adjusting your schema or doing joins in your application code via multiple queries.
I have seen this asked a couple of years ago. Since then MongoDB 2.4 has multi-threaded Map Reduce available (after the switch to the V8 Javascript engine) and has become faster than what it was in previous versions and so the argument of being slow is not an issue.
However, I am looking for a scenario where a Map Reduce approach might work better than the Aggregation Framework. Infact, possibly a scenario where the Aggregation Framework cannot work at all but the Map Reduce can get the required results.
Thanks,
John
Take a look to this.
The Aggregation FW results are stored in a single document so are limited to 16 MB: this might be not suitable for some scenarios. With MapReduce there are several output types available including a new entire collection so it doesn't have space limits.
Generally, MapReduce is better when you have to work with large data sets (may be the entire collection). Furthermore, it gives much more flexibility (you write your own aggregation logic) instead of being restricted to some pipeline commands.
Currently the Aggregation Framework results can't exceed 16MB. But, I think more importantly, you'll find that the AF is better suited to "here and now" type queries that are dynamic in nature (like filters are provided at run-time by the user for example).
A MapReduce is preplanned and can be far more complex and produce very large outputs (as they just output to a new collection). It has no run-time inputs that you can control. You can add complex object manipulation that simply is not possible (or efficient) with the AF. It's simple to manipulate child arrays (or things that are array like) for example in MapReduce as you're just writing JavaScript, whereas in the AF, things can become very unwieldy and unmanageable.
The biggest issue is that MapReduce's aren't automatically kept up to date and they're difficult to predict when they'll complete). You'll need to implement your own solution to keeping them up to date (unlike some other NoSQL options). Usually, that's just a timestamp of some sort and an incremental MapReduce update as shown here). You'll possibly need to accept that the data may be somewhat stale and that they'll take an unknown length of time to complete.
If you hunt around on StackOverflow, you'll find lots of very creative solutions to solving problems with MongoDB and many solutions use the Aggregation Framework as they're working around limitations of the general query engine in MongoDB and can produce "live/immediate" results. (Some AF pipelines are extremely complex though which may be a concern depending on the developers/team/product).
I am somewhat confused about when to use group(), aggregate with $group or mapreduce. I read the documentation at http://www.mongodb.org/display/DOCS/Aggregation for group(), http://docs.mongodb.org/manual/reference/aggregation/group/#_S_group for $group.. Is sharding the only situation where group() won't work? Also, I get this feeling that $group is more powerful than group() because it can be used in conjunction with other pipeline operators from aggregation framework.. How does $group compare with mapreduce? I read somewhere that it doesn't generate any temporary collection whereas mapreduce does. Is that so?
Can someone present an illustration or guide me to a link where these three concepts are explained together, taking the same sample data, so I can compare them easily?
EDIT:Also, it would be great if you can point out anything new specifically in these commands since the new 2.2 release came out..
It is somewhat confusing since the names are similar, but the group() command is a different feature and implementation from the $group pipeline operator in the Aggregation Framework.
The group() command, Aggregation Framework, and MapReduce are collectively aggregation features of MongoDB. There is some overlap in features, but I'll attempt to explain the differences and limitations of each as at MongoDB 2.2.0.
Note: inline result sets mentioned below refer to queries that are processed in memory with results returned at the end of the function call. Alternative output options (currently only available with MapReduce) could include saving results to a new or existing collection.
group() Command
Simple syntax and functionality for grouping .. analogous to GROUP BY in SQL.
Returns result set inline (as an array of grouped items).
Implemented using the JavaScript engine; custom reduce() functions can be written in JavaScript.
Current Limitations
Will not group into a result set with more than 20,000 keys.
Results must fit within the limitations of a BSON document (currently 16MB).
Takes a read lock and does not allow any other threads to execute JavaScript while it is running.
Does not work with sharded collections.
See also: group() command examples.
MapReduce
Implements the MapReduce model for processing large data sets.
Can choose from one of several output options (inline, new collection, merge, replace, reduce)
MapReduce functions are written in JavaScript.
Supports non-sharded and sharded input collections.
Can be used for incremental aggregation over large collections.
MongoDB 2.2 implements much better support for sharded map reduce output.
Current Limitations
A single emit can only hold half of MongoDB's maximum BSON document size (16MB).
There is a JavaScript lock so a mongod server can only execute one JavaScript function at a point in time .. however, most steps of the MapReduce are very short so locks can be yielded frequently.
MapReduce functions can be difficult to debug. You can use print() and printjson() to include diagnostic output in the mongod log.
MapReduce is generally not intuitive for programmers trying to translate relational query aggregation experience.
See also: Map/Reduce examples.
Aggregation Framework
New feature in the MongoDB 2.2.0 production release (August, 2012).
Designed with specific goals of improving performance and usability.
Returns result set inline.
Supports non-sharded and sharded input collections.
Uses a "pipeline" approach where objects are transformed as they pass through a series of pipeline operators such as matching, projecting, sorting, and grouping.
Pipeline operators need not produce one output document for every input document: operators may also generate new documents or filter out documents.
Using projections you can add computed fields, create new virtual sub-objects, and extract sub-fields into the top-level of results.
Pipeline operators can be repeated as needed (for example, multiple $project or $group steps.
Current Limitations
Results are returned inline, so are limited to the maximum document size supported by the server (16MB)
Doesn't support as many output options as MapReduce
Limited to operators and expressions supported by the Aggregation Framework (i.e. can't write custom functions)
Newest server feature for aggregation, so has more room to mature in terms of documentation, feature set, and usage.
See also: Aggregation Framework examples.
Can someone present an illustration or guide me to a link where these three concepts are explained together, taking the same sample data, so I can compare them easily?
You generally won't find examples where it would be useful to compare all three approaches, but here are previous StackOverflow questions which show variations:
group() versus Aggregation Framework
MapReduce versus Aggregation Framework
Does MongoDB supports comparing two fields in same collection by using native operators (not $where and JavaScript)?
I already looked at similar questions and all answers used $where / JavaScript.
MongoDB documentation clearly states that:
JavaScript executes more slowly than the native operators listed on this page, but is very flexible.
My primary concern is speed and I would like to use indexes if possible. So is comparing two fields in MongoDB possible without using JavaScript?
This is not currently possible, but it will be possible through the new aggregation framework currently under development (2.1+). This aggregation framework is native and does not rely on relatively slow JavaScript execution paths.
For more details check http://www.mongodb.org/display/DOCS/Aggregation+Framework
and the progress at https://jira.mongodb.org/browse/SERVER-447
From reading the documentation you link it doesn't look like MongoDB has the ability to compare two document properties using only native operators.
Perhaps you can modify the documents themselves (and/or the code which saves the documents) to include a boolean property with value resulting from the comparison (ahead-of-time) and then simply query on that new property as needed. You could even index it for even better performance.