Does MongoDB supports comparing two fields in same collection by using native operators (not $where and JavaScript)?
I already looked at similar questions and all answers used $where / JavaScript.
MongoDB documentation clearly states that:
JavaScript executes more slowly than the native operators listed on this page, but is very flexible.
My primary concern is speed and I would like to use indexes if possible. So is comparing two fields in MongoDB possible without using JavaScript?
This is not currently possible, but it will be possible through the new aggregation framework currently under development (2.1+). This aggregation framework is native and does not rely on relatively slow JavaScript execution paths.
For more details check http://www.mongodb.org/display/DOCS/Aggregation+Framework
and the progress at https://jira.mongodb.org/browse/SERVER-447
From reading the documentation you link it doesn't look like MongoDB has the ability to compare two document properties using only native operators.
Perhaps you can modify the documents themselves (and/or the code which saves the documents) to include a boolean property with value resulting from the comparison (ahead-of-time) and then simply query on that new property as needed. You could even index it for even better performance.
Related
The map-reduce usage is following
db.myCollection.mapReduce(function() {
emit(this.smth);
},
function(key, values) {
// return something done with key and values
});
My question is, why is the map part implemented to have implicit this that references the current document being processed? IMO, it would be cleaner to have the current document passed in as an argument to the map function (I prefer to write all my JavaScript without this).
In practice this also rules out the use of arrow functions in mongo scripts, since this reference does not work with them.
why is the map part implemented to have implicit this that references the current document being processed?
MongoDB's Map/Reduce API was created in 2009, which is well before arrow functions were available in JavaScript (via ES6/ES2015). I can only speculate on the design intention, but much has changed in JavaScript (and MongoDB) since the original Map/Reduce implementation.
The this keyword in a JavaScript method refers to the owner or execution context, so setting it to the current document being processed was perhaps a reasonable convention (or convenience) for JavaScript usage at the time. The reduce function has a required prototype of function (key, values) so a map prototype of function (doc) might have been more consistent. However, once an API choice is made, any significant breaking changes become more challenging to introduce.
A more modern take on aggregation might look quite different, and this is the general path that MongoDB has taken. The Aggregation Framework introduced in MongoDB 2.2 (August, 2012) is a higher performance approach to data aggregation and should be preferred (where possible) over Map/Reduce.
Successive releases of the MongoDB server have made significant improvements to the features and performance of the Aggregation Framework, while Map/Reduce has not notably evolved. For example, the Aggregation Framework is written in C++ and able to manipulate MongoDB's native BSON data types; Map/Reduce spawns JavaScript threads and has to marshal data between BSON and JavaScript.
In practice this also rules out the use of arrow functions in mongo scripts, since this reference does not work with them.
Indeed. As at MongoDB 4.0, arrow functions are not supported in Map/Reduce. There is a feature request to support arrow functions which can you watch/upvote in the MongoDB issue tracker: SERVER-34281.
I Want to query using part of id to get all the matched documents. So I tried “starts with” and "contains" which works find but is there any performance issue for large collection?
The best way to make this search optimum :
Add $text index on the fields you want to do search in. This is really important because internally it tokenize your string to that you could search for a part of it.
Use regex which is also quicker to do.
If you are using aggregate, read this mongodb official doc about aggregation optimization which might help you to implement this in efficient manner : https://docs.mongodb.com/manual/core/aggregation-pipeline-optimization/
Last but not the least, if you are not yet fully inclined towards mongodb and project is fresh, look out for elasticsearch service which is based on Lucene. Its extremely powerful doing these kinds of searches.
MongoDB 3.2 is now providing a filter expression to partially index the collection.
Based on that feature, I wonder how MongoDB could help me for the following case.
I do have many pre defined queries very near the filter expression. The principle will be to create many filter expression index.
The index will in fact keep the ids of the maching document and will be updated on each document changes.
For performance reason, I prefer to use the index engine from MongoDB that trying to use an external tools with Trigger solution.
How could I accomplish such feature by extending MongoDB. Any others noSQL could help ?
Thanks
I am somewhat confused about when to use group(), aggregate with $group or mapreduce. I read the documentation at http://www.mongodb.org/display/DOCS/Aggregation for group(), http://docs.mongodb.org/manual/reference/aggregation/group/#_S_group for $group.. Is sharding the only situation where group() won't work? Also, I get this feeling that $group is more powerful than group() because it can be used in conjunction with other pipeline operators from aggregation framework.. How does $group compare with mapreduce? I read somewhere that it doesn't generate any temporary collection whereas mapreduce does. Is that so?
Can someone present an illustration or guide me to a link where these three concepts are explained together, taking the same sample data, so I can compare them easily?
EDIT:Also, it would be great if you can point out anything new specifically in these commands since the new 2.2 release came out..
It is somewhat confusing since the names are similar, but the group() command is a different feature and implementation from the $group pipeline operator in the Aggregation Framework.
The group() command, Aggregation Framework, and MapReduce are collectively aggregation features of MongoDB. There is some overlap in features, but I'll attempt to explain the differences and limitations of each as at MongoDB 2.2.0.
Note: inline result sets mentioned below refer to queries that are processed in memory with results returned at the end of the function call. Alternative output options (currently only available with MapReduce) could include saving results to a new or existing collection.
group() Command
Simple syntax and functionality for grouping .. analogous to GROUP BY in SQL.
Returns result set inline (as an array of grouped items).
Implemented using the JavaScript engine; custom reduce() functions can be written in JavaScript.
Current Limitations
Will not group into a result set with more than 20,000 keys.
Results must fit within the limitations of a BSON document (currently 16MB).
Takes a read lock and does not allow any other threads to execute JavaScript while it is running.
Does not work with sharded collections.
See also: group() command examples.
MapReduce
Implements the MapReduce model for processing large data sets.
Can choose from one of several output options (inline, new collection, merge, replace, reduce)
MapReduce functions are written in JavaScript.
Supports non-sharded and sharded input collections.
Can be used for incremental aggregation over large collections.
MongoDB 2.2 implements much better support for sharded map reduce output.
Current Limitations
A single emit can only hold half of MongoDB's maximum BSON document size (16MB).
There is a JavaScript lock so a mongod server can only execute one JavaScript function at a point in time .. however, most steps of the MapReduce are very short so locks can be yielded frequently.
MapReduce functions can be difficult to debug. You can use print() and printjson() to include diagnostic output in the mongod log.
MapReduce is generally not intuitive for programmers trying to translate relational query aggregation experience.
See also: Map/Reduce examples.
Aggregation Framework
New feature in the MongoDB 2.2.0 production release (August, 2012).
Designed with specific goals of improving performance and usability.
Returns result set inline.
Supports non-sharded and sharded input collections.
Uses a "pipeline" approach where objects are transformed as they pass through a series of pipeline operators such as matching, projecting, sorting, and grouping.
Pipeline operators need not produce one output document for every input document: operators may also generate new documents or filter out documents.
Using projections you can add computed fields, create new virtual sub-objects, and extract sub-fields into the top-level of results.
Pipeline operators can be repeated as needed (for example, multiple $project or $group steps.
Current Limitations
Results are returned inline, so are limited to the maximum document size supported by the server (16MB)
Doesn't support as many output options as MapReduce
Limited to operators and expressions supported by the Aggregation Framework (i.e. can't write custom functions)
Newest server feature for aggregation, so has more room to mature in terms of documentation, feature set, and usage.
See also: Aggregation Framework examples.
Can someone present an illustration or guide me to a link where these three concepts are explained together, taking the same sample data, so I can compare them easily?
You generally won't find examples where it would be useful to compare all three approaches, but here are previous StackOverflow questions which show variations:
group() versus Aggregation Framework
MapReduce versus Aggregation Framework
I am currently evaluating mongodb for a project I have started but I can't find any information on what the equivalent of an SQL view in mongodb would be. What I need, that an SQL view provides, is to lump together data from different tables (collections) into a single collection.
I want nothing more than to clump some documents together and label them as a single document. Here's an example:
I have the following documents:
cc_address
us_address
billing_address
shipping_address
But in my application, I'd like to see all of my addresses and be able to manage them in a single document.
In other cases, I may just want a couple of fields from collections:
I have the following documents:
fb_contact
twitter_contact
google_contact
reddit_contact
each of these documents have fields that align, like firstname lastname and email, but they also have fields that don't align. I'd like to be able to compile them into a single document that only contains the fields that align.
This can be accomplished by Views in SQL correct? Can I accomplish this kind of functionality in MongoDb?
The question is quite old already. However, since mongodb v3.2 you can use $lookup in order to join data of different collections together as long as the collections are unsharded.
Since mongodb v3.4 you can also create read-only views.
There are no "joins" in MongoDB. As said by JonnyHK, you can either enormalize your data or you use embedded documents or you perform multiple queries
However, you could also use Map-Reduce.
or if you're prepared to use the development branch, you could test the new aggregation framework though maybe it's too much? This new framework will be in the soon-to-be-released 2.2, which is production-ready unlike 2.1.x.
Here's the SQL-Mongo chart also, which may be of some help in your learning.
Update: Based on your re-edit, you don't need Map-Reduce or the Aggregation Framework because you're just querying.
You're essentially doing joins, querying multiple documents and merging the results. The place to do this is within your application on the client-side.
MongoDB queries never span more than a single collection as there is no support for joins. So if you have related data you need available in the results of a query you must either add that related data to the collection you're querying (i.e. denormalize your data), or make a separate query for it from another collection.
I am currently evaluating mongodb for a project I have started but I
can't find any information on what the equivalent of an SQL view in
mongodb would be
In addition to this answer, mongodb now has on-demand materialized views. In a nutshell, this feature allows you to use aggregate and $merge (in 4.2) to create/update a quick view collection that you can query from faster. The strategy is used to update the quick view collection whenever the main collection has a record change. This has the side effect unlike SQL of increasing your data storage size. But the benefits can be huge depending on your querying needs.