I'm doing a MapReduce in Mongo to generate a reverse index of tokens for some documents. I am having trouble accessing document's _id in the map function.
Example document:
{
"_id" : ObjectId("4ea42a2c6fe22bf01f000d2d"),
"attributes" : {
"name" : "JCDR 50W38C",
"upi-tokens" : [
"50w38c",
"jcdr"
]
},
"sku" : "143669259486830515"
}
(The field ttributes['upi-tokens'] is a list of text tokens I want to create reverse index for.)
Map function (source of the problem):
m = function () {
this.attributes['upi-tokens'].forEach(
function (token) { emit(token, {ids: [ this._id ]} ); }
); }
Reduce function:
r = function (key, values) {
var results = new Array;
for (v in values) {
results = results.concat(v.ids);
}
return {ids:results};
}
MapReduce call:
db.offers.mapReduce(m, r, { out: "outcollection" } )
PROBLEM Resulting collection has null values everywhere where I'd expect an id instead of actual ObjectID strings.
Possible reason:
I was expecting the following 2 functions to be equivalent, but they aren't.
m1 = function (d) { print(d['_id']); }
m2 = function () { print(this['_id']); }
Now I run:
db.offers.find().forEach(m1)
db.offers.find().forEach(m2)
The difference is that m2 prints undefined for each document while m1 prints the ids as desired. I have no clue why.
Questions:
How do I get the _id of the current object in the map function for use in MapReduce? this._id or this['_id'] doesn't work.
Why exactly aren't m1 and m2 equivalent?
Got it to work... I made quite simple JS mistakes:
inner forEach() in the map function seems to overwrite 'this' object; this is no longer the main document (which has an _id) but the iterated object inside the loop)...
...or it was simply because in JS the for..in loop only returns the keys, not values, i.e.
for (v in values) {
now requires
values[v]
to access the actual array value. Duh...
The way I circumvented mistake #1 is by using for..in loop instead of ...forEach() loop in the map function:
m = function () {
for (t in this.attributes['upi-tokens']) {
var token = this.attributes['upi-tokens'][t];
emit (token, { ids: [ this._id ] });
}
}
That way "this" refers to what it needs to.
Could also do:
that = this;
this.attributes['upi-tokens'].forEach( function (d) {
...
that._id...
...
}
probably would work just fine.
Hope this helps someone.
Related
I have a function map created, is this
var m = function() {
hashtags = {}
for(var i in this.entities.hashtags) {
hashtags[this.entities.hashtags[i].text] = 1
};
var valor = {numtweets: 1 ,
dic_hastag: hashtags};
print(" value: " + tojson(valor));
emit(this.place.country_code, valor)
};
I start from a collection called tweets, and the output of my map function should have a variable numtweets: 1 and a variable hastags with the entire list of tweet hastats with a 1.
example
Nuwtweets: 1, hastags: "hast1": 1, "hast2": 2, "hast3": 1
1.- I can have the result saved in a collection, to prove that it works well instead of print
2.- If not if I have to do mapreduce, what should be the function reduces, why not do anything, and so when executing this, the output of the map function
Db.runCommand ({
MapReduce: "tweets",
Map: m,
Reduce: r,
Query: {"place.country_code": "AD"},
Out: {replace: "resultat5fi"}
});
Any suggestions, help, anything will be welcome.
I suggest you to see this.
This the way to debug map functions easily with mongodb.
first, create your map function as you did
Then define an emit function that will print (or insert into a collection)
Example of emit function that insert into a toto collection.
var emit = function(key, value) {
db.toto.insert({key: key, value: value});
}
Invoke a find function and apply your map function on each record
Example :
var myCursor = db.tweets.find( {} );
while (myCursor.hasNext()) {
var doc = myCursor.next();
map.apply(doc);
print();
}
Then look at yout toto collection
This way :
db.toto.find()
I'm having a hard time to find a way to make a collection index work the way I need. That collection has an array that will contain two elements, and no other array can have these two elements (in any order):
db.collection.insert(users : [1,2] // should be valid
db.collection.insert(users : [2,3] // should be valid
db.collection.insert(users : [1,3] // should be valid
db.collection.insert(users : [3,2] // should be invalid, since there's another array with that same value.
But, if I use db.collection.createIndex({users:1}, {unique: true}), it won't allow me to have two arrays with a common element:
db.collection.insert(users : [1,2] // valid
db.collection.insert(users : [2,3] // invalid, since 2 is already on another document
One of the solutions I tried was to make the array one level deeper. Creating the very same index, but adding documents a little different would make it almost the way I need, but it would still allow two arrays to have the same value in the reverse orders:
db.chat.insert({ users : { people : [1,2] }}) // valid
db.chat.insert({ users : { people : [2,3] }}) // valid
db.chat.insert({ users : { people : [2,1] }}) // valid, but it should be invalid, since there's another document with [1,2] array value.
db.chat.insert({ users : { people : [1,2] }}) // invalid
Is there a way to achieve this on a index level?
The mongodb doesn't create indexes on the entire array. But...
We want one atomic operation insert or update, and guarantee uniqueness of the array's content? Then, we need to calculate one feature which is the same for all permutations of the array's items, and create an unique index for it.
One way would be to sort array items (solves permutation problem) and concatenate them (creates one feature). The example, in javascript:
function index(arr) {
return arr.sort().join();
}
users1 = [1, 2], usersIndex1 = index(users1); // "1,2"
users2 = [2, 1], usersIndex2 = index(users2); // "1,2"
// Create the index
db.collection.ensureIndex({usersIndex: 1}, {unique: true});
//
db.collection.insert({users: users1, usersIndex: usersIndex1}); // Ok
db.collection.insert({users: users2, usersIndex: usersIndex2}); // Error
If the arrays are long, you can apply a hash function on the strings, minimizing the size of the collection. Though, it comes with a price of possible collisions.
You have to write a custom validation in the pre save hook:
Coffee Version
Pre Save Hook
chatSchema.pre('save', (next) ->
data = #
#.constructor.findOne {}, (err, doc) ->
return next err if err?
return next "Duplicate" if customValidation(data, doc) == false
return next()
)
Custom Validation
customValidation = (oldDoc, newDoc)->
#whatever you need
e.g. return !lodash.equal(oldDoc, newDoc)
Js Version
var customValidation;
chatSchema.pre('save', function(next) {
var data;
data = this;
return this.constructor.findOne({}, function(err, doc) {
if (err != null) {
return next(err);
}
if (customValidation(data, doc) === false) {
return next("Duplicate");
}
return next();
});
});
customValidation = function(oldDoc, newDoc) {
return !lodash.equal(oldDoc, newDoc);
};
You should first find the records and If no record available then insert that
db.chat.findOne({users: {$all: [3,2]}})
.then(function(doc){
if(doc){
return res.json('already exists');
} else {
db.chat.insert({users: [3,2]})
}
})
.catch(next);
Setup:
I got a large collection with the following entries
Name - String
Begin - time stamp
End - time stamp
Problem:
I want to get the gaps between documents, Using the map-reduce paradigm.
Approach:
I'm trying to set a new collection of pairs mid, after that I can compute differences from it using $unwind and Pair[1].Begin - Pair[0].End
function map(){
emit(0, this)
}
function reduce(){
var i = 0;
var pairs = [];
while ( i < values.length -1){
pairs.push([values[i], values[i+1]]);
i = i + 1;
}
return {"pairs":pairs};
}
db.collection.mapReduce(map, reduce, sort:{begin:1}, out:{replace:"mid"})
This works with limited number of document because of the 16MB document cap. I'm not sure if I need to get the collection into memory and doing it there, How else can I approach this problem?
The mapReduce function of MongoDB has a different way of handling what you propose than the method you are using to solve it. The key factor here is "keeping" the "previous" document in order to make the comparison to the next.
The actual mechanism that supports this is the "scope" functionality, which allows a sort of "global" variable approach to use in the overall code. As you will see, what you are asking when that is considered takes no "reduction" at all as there is no "grouping", just emission of document "pair" data:
db.collection.mapReduce(
function() {
if ( last == null ) {
last = this;
} else {
emit(
{
"start_id": last._id,
"end_id": this._id
},
this.Begin - last.End
);
last = this;
}
},
function() {}, // no reduction required
{
"out": { "inline": 1 },
"scope": { "last": null }
}
)
Out with a collection as the output as required to your size.
But this way by using a "global" to keep the last document then the code is both simple and efficient.
In Mongo how can I find all documents that have a given key and value, regardless of where that key appears in the document's key/value hierarchy?
For example the input key roID and value 5 would match both:
{
roID: '5'
}
and
{
other: {
roID: '5'
}
}
There is no built in way to do this. You might have to scan each matched document recursively to try and locate that attribute. Not recommended. You might want to think about restructuring your data or perhaps manipulating it into a more unified format so that it will be easier (and faster) to query.
If your desired key appears in a fixed number of different locations, you could use the $or operator to scan all the possibilities.
Taking your sample documents as an example, your query would look something like this:
db.data.find( { "$or": [
{ "roID": 5 },
{ "other.roID": 5 },
{ "foo.bar.roID": 5 },
{ any other possbile locations of roID },
...
] } )
If the number of documents in collection is not so large, then it can be done by this:
db.system.js.save({_id:"keyValueExisted", value: function (key, value) {
function findme(obj) {
for (var x in obj) {
var v = obj[x];
if (x == key && v == value) {
return true;
} else if (v instanceof Object) {
if (findme(v)) return true;
}
}
return false;
}
return findme(this);
}});
var param = ['roID', '5'];
db.c.find({$where: "keyValueExisted.apply(this, " + tojsononeline(param) + ");"});
I have an array of mongoose queries like so
var q = [{"_id":'5324b341a3a9d30000ee310c'},{$addToSet:{"Campaigns":'532365acfc07f60000200ae9'}}]
and I would like to apply them to a mongoose method like so
var q = [{"_id":'5324b341a3a9d30000ee310c'},{$addToSet:{"Campaigns":'532365acfc07f60000200ae9'}}]
Account.update.apply(this, q);
How can I do this ? How can I convert an array of mongoose query objects to mongoose parameters?
I tried the following but it doesnt work.
var q = [{
"_id": '5324b341a3a9d30000ee310c'
}, {
$addToSet: {
"Campaigns": '532365acfc07f60000200ae9'
}
}]
Account.update(q).exec(function (e, r) {
console.log(r)
console.log('ERROR')
console.log(e)
console.log('ERROR')
cb(e, r);
});
All you need to do is pass in the correct object context via apply to the update method:
Account.update.apply(Account, q)
.exec(function(e, r) {
// your code here ...
});
The this needs to be the Model instance, not the global scope context (or whatever else this may have been at the time it was called).
Basically seems to be just the way the prototype of the function seems to be passed in. So if you want to build your arguments dynamically then you need to do something like this:
var q = [
{ "_id": account._id },
{ "$addToSet": { "Campaigns": campaign._id } },
];
Account.update( q[0], q[1], (q[2]) ? q[2] : {}, function(err, num) {
Essentially then you are always passing in the query, update and options documents as arguments. This uses the ternary condition to put an empty {} document in as the options parameter, where it does not exist in your dynamic entry.