Using IF/ELSE in map reduce - mongodb

I am trying to make a simple map/reduce function on one of my MongoDB database collections.
I get data but it looks wrong. I am unsure about the Map part. Can I use IF/ELSE in this way?
UPDATE
I want to get the amount of authors that ownes the files. In other words how many of the authors own the uploaded files and thus, how many authors has no files.
The objects in the collection looks like this:
{
"_id": {
"$id": "4fa8efe33a34a40e52800083d"
},
"file": {
"author": "john",
"type": "mobile",
"status": "ready"
}
}
The map / reduce looks like this:
$map = new MongoCode ("function() {
if (this.file.type != 'mobile' && this.file.status == 'ready') {
if (!this.file.author) {
return;
}
emit (this.file.author, 1);
}
}");
$reduce = new MongoCode ("function( key , values) {
var count = 0;
for (index in values) {
count += values[index];
}
return count;
}");
$this->cimongo->command (array (
"mapreduce" => "files",
"map" => $map,
"reduce" => $reduce,
"out" => "statistics.photographer_count"
)
);

The map part looks ok to me. I would slightly change the reduce part.
values.forEach(function(v) {
count += v;
}
You should not use for in loop to iterate an array, it was not meant to do this. It is for enumerating object's properties. Here is more detailed explanation.
Why do you think your data is wrong? What's your source data? What do you get? What do you expect to get?

I just tried your map and reduce in mongo shell and got correct (reasonable looking) results.
The other way you can do what you are doing is get rid of the inner "if" condition in the map but call your mapreduce function with appropriate query clause, for example:
db.files.mapreduce(map,reduce,{out:'outcollection', query:{"file.author":{$exists:true}}})
or if you happen to have indexes to make the query efficient, just get rid of all ifs and run mapreduce with query:{"file.author":{$exists:true},"file.type":"mobile","file.status":"ready"} clause. Change the conditions to match the actual cases you want to sum up over.
In 2.2 (upcoming version available today as rc0) you can use the aggregation framework for this type of query rather than writing map/reduce functions, hopefully that will simplify things somewhat.

Related

Linq: Get all document with sort at mongodb

this is my document :
{
"BusinessCode": "8545",
"CreationDateTime": "/Date(1487417012464)/",
"DeviceId": "",
"Distributions": null,
"EventData": [
{
"Children": null,
"Key": "LogID",
"Value": "496a506b4301"
}
],
"EventId": "events.login",
},...
How could I get all documents into my collection with sorting by linq?
this is my earlier query:
var messagess = GetDbMongo().GetCollection<Message>(DB_COLLECTION_MESSAGES)
.Find(Builders<Message>.Filter.And()).Sort(sortt).Limit(count <= 0 ? 10 : count).ToList();
Not sure if I got it wrong but your idea of sort "by linq" is wrong. Sorting using Linq is done in memory. Using OrderBy will cause the whole collection to be enumerated at once which is very bad. Instead, use Mongo to do the sorting for you. This will be faster and you'll be using the server resources, not your app's. But it seems that you are already using .Sort on the Collection so.. not sure what you really want.
But, you could re-write your code to sort using Mongo by doing this:
var collection = GetDbMongo().GetCollection<Message>(DB_COLLECTION_MESSAGES);
// Empty filter to get all records
var emptyFilter = Builders<Message>.Filter.Empty;
// Choose which field of Message type you want to sort for
var sort = Builders<Message>.Sort.Ascending(p => p.Name);
var messagess = collection.Find(emptyFilter)
.Sort(sort)
.Limit(count <= 0 ? 10 : count)
.ToList();
If you are not familiar with how Linq operators work regarding being "lazy" or not, this is a good blog post from Jon Skeet: Just how lazy are you?
You can try this.
var messagess = GetDbMongo()
.GetCollection<Message>(DB_COLLECTION_MESSAGES)
.FindAll()
.Sort(sortt)
.ToList();

Elegantly return only Subdocuments satisfying elemMatch in MongoDb aggregation result [duplicate]

This question already has answers here:
Retrieve only the queried element in an object array in MongoDB collection
(18 answers)
Closed 5 years ago.
I've tried several ways of creating an aggregation pipeline which returns just the matching entries from a document's embedded array and not found any practical way to do this.
Is there some MongoDB feature which would avoid my very clumsy and error-prone approach?
A document in the 'workshop' collection looks like this...
{
"_id": ObjectId("57064a294a54b66c1f961aca"),
"type": "normal",
"version": "v1.4.5",
"invitations": [],
"groups": [
{
"_id": ObjectId("57064a294a54b66c1f961acb"),
"role": "facilitator"
},
{
"_id": ObjectId("57064a294a54b66c1f961acc"),
"role": "contributor"
},
{
"_id": ObjectId("57064a294a54b66c1f961acd"),
"role": "broadcaster"
},
{
"_id": ObjectId("57064a294a54b66c1f961acf"),
"role": "facilitator"
}
]
}
Each entry in the groups array provides a unique ID so that a group member is assigned the given role in the workshop when they hit a URL with that salted ID.
Given a _id matching an entry in a groups array like ObjectId("57064a294a54b66c1f961acb"), I need to return a single record like this from the aggregation pipeline - basically returning the matching entry from the embedded groups array only.
{
"_id": ObjectId("57064a294a54b66c1f961acb"),
"role": "facilitator",
"workshopId": ObjectId("57064a294a54b66c1f961aca")
},
In this example, the workshopId has been added as an extra field to identify the parent document, but the rest should be ALL the fields from the original group entry having the matching _id.
The approach I have adopted can just about achieve this but has lots of problems and is probably inefficient (with repetition of the filter clause).
return workshopCollection.aggregate([
{$match:{groups:{$elemMatch:{_id:groupId}}}},
{$unwind:"$groups"},
{$match:{"groups._id":groupId}},
{$project:{
_id:"$groups._id",
role:"$groups.role",
workshopId:"$_id",
}},
]).toArray();
Worse, since it explicitly includes named fields from the entry, it will omit any future fields which are added to the records. I also can't generalise this lookup operation to the case of 'invitations' or other embedded named arrays unless I can know what the array entries' fields are in advance.
I have wondered if using the $ or $elemMatch operators within a $project stage of the pipeline is the right approach, but so far they have either been either ignored or triggered operator validity errors when running the pipeline.
QUESTION
Is there another aggregation operator or alternative approach which would help me with this fairly mainstream problem - to return only the matching entries from a document's array?
The implementation below can handle arbitrary queries, serves results as a 'top-level document' and avoids duplicate filtering in the pipeline.
function retrieveArrayEntry(collection, arrayName, itemMatch){
var match = {};
match[arrayName]={$elemMatch:itemMatch};
var project = {};
project[arrayName+".$"] = true;
return collection.findOne(
match,
project
).then(function(doc){
if(doc !== null){
var result = doc[arrayName][0];
result._docId = doc._id;
return result;
}
else{
return null;
}
});
}
It can be invoked like so...
retrieveArrayEntry(workshopCollection, "groups", {_id:ObjectId("57064a294a54b66c1f961acb")})
However, it relies on the collection findOne(...) method instead of aggregate(...) so will be limited to serving the first matching array entry from the first matching document. Projections referencing an array match clause are apparently not possible through aggregate(...) in the same way they are through findXXX() methods.
A still more general (but confusing and inefficient) implementation allows retrieval of multiple matching documents and subdocuments. It works around the difficulty MongoDb has with syntax consistency of Document and Subdocument matching through the unpackMatch method, so that an incorrect 'equality' criterion e.g. ...
{greetings:{_id:ObjectId("437908743")}}
...gets transferred into the required syntax for a 'match' criterion (as discussed at Within a mongodb $match, how to test for field MATCHING , rather than field EQUALLING )...
{"greetings._id":ObjectId("437908743")}
Leading to the following implementation...
function unpackMatch(pathPrefix, match){
var unpacked = {};
Object.keys(match).map(function(key){
unpacked[pathPrefix + "." + key] = match[key];
})
return unpacked;
}
function retrieveArrayEntries(collection, arrayName, itemMatch){
var matchDocs = {},
projectItems = {},
unwindItems = {},
matchUnwoundByMap = {};
matchDocs.$match={};
matchDocs.$match[arrayName]={$elemMatch:itemMatch};
projectItems.$project = {};
projectItems.$project[arrayName]=true;
unwindItems.$unwind = "$" + arrayName;
matchUnwoundByMap.$match = unpackMatch(arrayName, itemMatch);
return collection.aggregate([matchDocs, projectItems, unwindItems, matchUnwoundByMap]).toArray().then(function(docs){
return docs.map(function(doc){
var result = doc[arrayName];
result._docId = doc._id;
return result;
});
});
}

MongoDB: Several fields to a list

I currently have a collection that follows a format like this:
{ "_id": ObjectId(...),
"name" : "Name",
"red": 0,
"blue": 0,
"yellow": 1,
"green": 0,
...}
and so on (a bunch of colors). What I would like to do is to create a new array named colors, whose elements are those colors that have a value of 1.
For example:
{ "_id": ObjectId(...),
"name" : "Name",
"colors": ["yellow"]
}
Is this something I can do on the Mongo shell? Or should I do it in a program?
I'm pretty sure I can do it using Python, however I am having difficulties trying to do it directly in the shell. If it can be done in the shell, can anyone point me in the right direction?
Thanks.
Yes it can be easily done in the shell, or basically by following the example adapted into any language.
The key here is to look at the fields that are "colors" then contruct an update statement that both removes those fields from the document while testing them to see if they are valid for inclusion into the array, then of course adding that to the document update as well:
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
db.collection.find().forEach(function(doc) {
doc.colors = doc.colors || [];
var update = { "$unset": {}};
Object.keys(doc).filter(function(key) {
return !/^_id|name|colors/.test(key)
}).forEach(function(key) {
update.$unset[key] = "";
if ( doc[key] == 1)
doc.colors.push(key);
});
update["$addToSet"] = { "colors": { "$each": doc.colors } };
bulk.find({ "_id": doc._id }).updateOne(update);
count++;
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp()
}
});
if ( count % 1000 != 0 )
bulk.execute();
The Bulk Operations usage means that batches of updates are sent rather than one request and response per document, so this will process a lot faster than merely issuing singular updates back and forth.
The main operators here are $unset to remove the existing fields and $addToSet to add the new evaluated array. Both are built up by cycling the keys of the document that make up the possible colors and excluding the other keys you don't want to modify using a regex filter.
Also using $addToSet and this line:
doc.colors = doc.colors || [];
with the purpose of being sure that if any document was already partially converted or otherwise touched by a code change that had already started storing the correct array, then these would not be adversely affected or overwritten by the update process.
tl;dr, spoiler
Mongodb's shell has access to some javascript-like methods on their objects. You can query your collection with db.yourCollectionName.find() which will return a cursor (cursor methods). Then iterate through to get each document, iterate through the keys, conditionally filter out keys like _id and name and then check to see if the value is 1, store that key somewhere in a collection.
Once done, you'd probably want to use db.yourCollectionName.update() or db.yourCollectionName.findAndModify() to find the record by _id and use $set to add a new field and set it's value to the collection of keys.

Mongo Shell Querying one collection with the results of another

I have 2 different collections and I am trying to query the first collection and take the output of that as an input to the second collection.
var mycursor = db.item.find({"itemId":NumberLong(123)},{"_id":0})
var outp = "";
while(mycursor.hasNext()){
var rec = mycursor.next()
outp = outp + rec.eventId;
}
This query works fine and returns me a list of eventIds.
I have another collection named users, which has eventId field in it. A eventId can repeat in multiple users. So for each eventId I get in the above query I want to get the list of users too.
My query for the second collection would be something like this :
db.users.find({"eventId":ObjectdId("each eventId from above query")},{"_id":0})
My final result would be a unique list of users.
Whell this should basically work ( to a point that is ):
db.events.find({
"eventId": { "$in": db.item.find({
"itemId":NumberLong(123)
}).map(function(doc) { return doc.eventId }) }
})
Or even a bit better:
db.events.find({
"eventId": { "$in": db.item.distinct("eventId",{
"itemId":NumberLong(123) }) }
})
The reason is that "inner query" is evaluated before the outer query is sent. So you get an array of arguments for use with $in.
That is basically the same as doing the following, which translates better outside of the shell:
var events = db.item.distinct("eventId",{ "itemId":NumberLong(123) });
db.events.find({ "eventId": { "$in": events } })
If the results are "too large" however, then your best approach is to loop the initial results as you have done already and build up an array of arguments. Once at a certain size then do the same $in query several times in "pages" to get the results.
But looking for "ditinct" eventId via .distinct() or .aggregate() will help.

MongoDB MapReduce : use positional operator $ in map function

I have a collection with entries that look like that :
{"userid": 1, "contents": [ { "tag": "whatever", "value": 100 }, {"tag": "whatever2", "value": 110 } ] }
I'm performing a MapReduce on this collection with queries such as {"contents.tag": "whatever"}.
What I'd like to do in my map function is emiting the field "value" corresponding to the entry in the array "contents" that matched the query without having to iterate through the whole array. Under normal circumstances, I could do that using the $ positional operator with something like contents.$.value. But in the MapReduce case, it's not working.
To summarize, here is the code I have right now :`
map=function(){
emit(this.userid, WHAT DO I WRITE HERE TO EMIT THE VALUE I WANT ?);
}
reduce=function(key,values){
return values[0]; //this reduce function does not make sense, just for the example
}
res=db.runCommand(
{
"mapreduce": "collection",
"query": {'contents.tag':'whatever'},
"map": map,
"reduce": reduce,
"out": "test_mr"
}
);`
Any idea ?
Thanks !
This will not work without iterating over the whole array. In MongoDB a query is intended to match an entire document.
When dealing with Map / Reduce, the query is simply trimming the number of documents that are passed into the map function. However, the map function has no knowledge of the query that was run. The two are disconnected.
The source code around the M/R is here.
There is an upcoming aggregation feature that will more closely match this desire. But there's no timeline on this feature.
No way. I've had the same problem. The iterate is necessary.
You could do this:
map=function() {
for(var i in this.contents) {
if(this.contents[i].tag == "whatever") {
emit(this.userid, this.contents[i].value);
}
}
}