Is it possible in MongoDB MapReduce to emit keys that are documents themselves? Something like
emit({type: 1, date: ...}, 12);
When I do this MapReduce completes with success but in my reduced results I also see emitted values so I am wondering what's wrong.
You can use document as emit key. The reduce function combines documents with the same key into one document. If the map function emits a single document for a particular key, the reduce function will not be called.
Can you share your code in snippet
You can definitely use a document for key and/or value. It will work exactly the same as when they are primitive types.
Related
I'm asking this question performance wise - knowing that there is a unique document with id.
MyCollection.find({_id: id}) //this should return only one document - id is unique
vs
MyCollection.findOne({_id: id}) //this is equivalent to .find({_id: id}).limit(1) from what I've read
My first thought is that no matter the filter, the .find has to go through the collection thus the .findOne is faster in the case I just want to retrieve one doc, am I correct? Or maybe since _id is always indexed, maybe there is no difference?
I'm not asking about the output of the functions, this is an optimization/perf question.
In this particular case, there's no performance difference (because of the unique index, yes).
There may be response shape differences. I'm not familiar with mongoose, but in mongodb shell, find() returns a cursor (which is enumerated right away) and findOne() returns the document directly.
I am using PyMongo to insert data (title, description, phone_number ...) into MongoDB. However, when I use mongo client to view the data, it displays the properties in a strange order. Specifically, phone_number property is displayed first, followed by title and then comes description. Is there some way I can force a particular order?
The above question and answer are quite old. Anyhow, if somebody visits this I feel like I should add:
This answer is completely wrong. Actually in Mongo Documents ARE ordered key-value pairs. However when using pymongo it will use python dicts for documents which indeed are not ordered (as of cpython 3.6 python dicts retain order, however this is considered an implementation detail). But this is a limitation of the pymongo driver.
Be aware, that this limitation actually impacts the usability. If you query the db for a subdocument it will only match if the order of the key-values pairs is correct.
Just try the following code yourself:
from pymongo import MongoClient
db = MongoClient().testdb
col = db.testcol
subdoc = {
'field1': 1,
'field2': 2,
'filed3': 3
}
document = {
'subdoc': subdoc
}
col.insert_one(document)
print(col.find({'subdoc': subdoc}).count())
Each time this code gets executed the 'same' document is added to the collection. Thus, each time we run this code snippet the printed value 'should' increase by one. It does not because find only maches subdocuemnts with the correct ordering but python dicts just insert the subdoc in arbitrary order.
see the following answer how to use ordered dict to overcome this: https://stackoverflow.com/a/30787769/4273834
Original answer (2013):
MongoDB documents are BSON objects, unordered dictionaries of key-value pairs. So, you can't rely on or set a specific fields order. The only thing you can operate is which fields to display and which not to, see docs on find's projection argument.
Also see related questions on SO:
MongoDB field order and document position change after update
Can MongoDB and its drivers preserve the ordering of document elements
Ordering fields from find query with projection
Hope that helps.
I have two type of documents in a mongodb collection:
one where key sessions has a simple value:
{"sessions": NumberLong("10000000000001")}
one where key sessions has an array of values.
{"sessions": [NumberLong("10000000000001")]}
Is there any way to retrieve all documents from the second category, ie. only documents whose value is an arary and not a simple value?
You can use this kind of query for that:
db.collectionName.find( { $where : "Array.isArray(this.sessions)" } );
but you'd better convert all the records to one type to keep the things consistent.
This code can be simple like this:
db.c.find({sessions:{$gte:[]}});
Explanation:
Because you only want to retrieve documents whose sessions data type is array, and by the feature of $gte (if data types are different between tow operands, it returns false; Double, Integer32, Integer64 are considered as same data type.), giving an empty array as the opposite operand will help to retrieve all results by required.
Also , $gt, $lt, $lte for standard query (attention: different behaviors to operaors with same name in expression of aggregation pipeline) have the same feature. I proved this by practice on MongoDB V2.4.8, V2.6.4.
What's the easiest way to get all the documents from a collection that are unique based on a single field.
I know I can use db.collections.distrinct to get an array of all the distinct values of a field, but I want to get the first (or really any one) document for every distinct value of one field.
e.g. if the database contained:
{number:1, data:'Test 1'}
{number:1, data:'This is something else'}
{number:2, data:'I'm bad at examples'}
{number:3, data:'I guess there\'s room for one more'}
it would return (based on number being unique:
{number:1, data:'Test 1'}
{number:2, data:'I'm bad at examples'}
{number:3, data:'I guess there\'s room for one more'}
Edit: I should add that the server is running Mongo 2.0.8 so no aggregation and there's more results than group will support.
Update to 2.4 and use aggregation :)
When you really need to stick to the old version of MongoDB due to too much red tape involved, you could use MapReduce.
In MapReduce, the map function transforms each document of the collection into a new document and a distinctive key. The reduce function is used to merge documents with the same distincitve key into one.
Your map function would emit your documents as-is and with the number-field as unique key. It would look like this:
var mapFunction = function(document) {
emit(document.number, document);
}
Your reduce-function receives arrays of documents with the same key, and is supposed to somehow turn them into one document. In this case it would just discard all but the first document with the same key:
var reduceFunction = function(key, documents) {
return documents[0];
}
Unfortunately, MapReduce has some problems. It can't use indexes, so at least two javascript functions are executed for every single document in the collections (it can be limited by pre-excluding some documents with the query-argument to the mapReduce command). When you have a large collection, this can take a while. You also can't fully control how the docments created by MapReduce are formed. They always have two fields, _id with the key and value with the document you returned for the key.
MapReduce is also hard to debug an troubleshoot.
tl;dr: Update to 2.4
Giving the value of the key as 0 in the emit functions and after reduce it correctly gives the total of a column in the collection as intended. Now my question is I don't understand how this is working.
I have my emit like this;
function(){ emit(0, this.total); }
Could somebody please explain to me the working in this? Thank you in advance.
MapReduce is a tricky thing. You need to change your mindset to understand it. In your particular case, you told mongo that don't care about grouping options. When you emit like this, all your this.total's will be sent to one batch with identifier 0 and aggregated all-together at reduce step. This also means that this cases are identical:
function(){ emit(0, this.total); }
function(){ emit(1, this.total); }
function(){ emit('asdf', this.total); }
function(){ emit(null, this.total); }
They will lead to save result, even batch name is different.
To compliment the other answer with some internals, when you emit your single and only key the resulting document from the emit will look something like:
{_id:0,value:[5,6,7,8,9]}
With the array representing the combination of all the emits.
Emits grouped upon the key when you emit so this means there will only be one document with the content of that document being all the total fields in the collection.
So when the reduce comes along and you add all of these numbers together it will correctly sum up the total for all total fields in the collection.