I explain with an example:
I have a collection with 1 Million Items with ID: 123 each worth a different value:"worth"
user x with MONEY can "buy" the items. i basically want to know how many items the user can buy in a most elegant way.
so i got.
db.items.find({Item_ID:123},{Item_Age:1,Item_worth:1}.sort({Item_age:1})
-> gives me all Items with Item_ID:123 sorted by age.
I could now
Iterate through all items till Sum of Item Worth == User_Money but somehow i think this is not really efficient if the List returned matches 1 Million items and user might have only enough money for 1000
or
do the loop and query 1000 times
or
limit the query to 100 items. but this is very variably could still result in alot of loops
SO is there a query method which returns the sum of all values in each document??
Or any other efficient suggestions might be helpfull.
Thanks
Better than iterating would be do to this with mapReduce to get the running total for the items, and then filter the result.
So define a mapper as follows:
var mapper = function () {
totalWorth = totalWorth + this.Item_Worth;
var canBuy = userMoney >= totalWorth;
if ( canBuy ) {
emit(
{
Item_ID: this.Item_ID,
Item_Age: this.Item_Age
},
{
worth: this.Item_Worth,
totalWorth: totalWorth,
canBuy: canBuy
}
);
}
}
This accumulates a variable for totalWorth with the current "worth" of the item. Then a check is made to see the current totalWorth value exceeds the amount of userMoney that was input. If not, then you don't emit. Which is automatic filtering.
All the emitted keys are unique so just run the mapReduce as below:
db.items.mapReduce(
mapper,
function(){}, // reduce argument is required though not called
{
query: { Item_ID: 123 }
sort: { Item_ID: 1, Item_Age: 1 },
out: { inline: 1 },
scope: {
totalWorth: 0,
userMoney: 30
},
}
)
So looking at the other parts of that :
query: Is a standard query object you use to get your selection
sort: Is not required really because you are looking at Item_Age in ascending order. But if you wanted the oldest Item_Age first then you can reverse the sort.
out: Gives you an inline object as the result that you can use to get the matching items.
scope: Defines the global variables that can be accessed by the functions. So we provide an initial value for totalWorth and pass the parameter value of userMoney as how much money the user has to buy.
At the end of the day the result contains the filtered list of the items that fall under the amount of money the user can afford to purchase.
Related
I have a table of activities of two types (A and B). These activities are generated by users. What I want to do is grab the 20 most recent activities, but for the B activities, only include the first activity created by a user for that day. So if a user creates 4 A activities and 4 B activities in a single day, it would show all 4 A activities, but only the first B activity created. If they created the same amount of activities again the next day, the query would show all 8 A activities, but only 2 B activities.
My current approach is to get the list of B activities with a group by clause, grouping on the date and the user. I have that query working:
db.runCommand({
group: {
ns: 'activities',
$keyf: function(doc) {
var created = doc._id.getTimestamp();
created.setHours(0, 0, 0, 0);
return { created: created, user: doc.user.id }
},
$reduce: function( curr, result ) {
// We only need the first activity of the day, but we can't sort (can we?)
var earliestSoFar = result.date || new Date();
if (earliestSoFar > curr._id.getTimestamp()) {
result.id = curr._id;
result.date = curr._id.getTimestamp();
}
},
cond: {
"type" : "B"
},
initial: {}
}
})
I was thinking I could then get the id's from that result collection and run a final query of the form:
.find({ $or: [
{ type: 'A' },
{ _id: { $in: getListOfIdsFromGroupQuery() }}
]}).limit(20);
I believe this will give me the results I want, but what I'm afraid of is:
The group query will return me a list of every first B activity/user/day. I'm only showing 20 activities at a time, so I only care about the 20 most recent B activities (at most, since i'm showing 20 A & B combined). This seems really wasteful.
On the first query, I can trim the array that I pass to $in down to 20 ids. However, because the user can view the next page, I have to pass in 40 ids for the 2nd page, 60 for the 3rd page, etc. By the 10th page, my $in query is looking for 200 records. Not sure if this is a problem, but it worries me.
Is there a better way to approach this? Hopefully it's clear, I know it's kind of a confusing situation.
in mongodb records are store like this
{_id:100,type:"section",ancestry:nil,.....}
{_id:300,type:"section",ancestry:100,.....}
{_id:400,type:"problem",ancestry:100,.....}
{_id:500,type:"section",ancestry:100,.....}
{_id:600,type:"problem",ancestry:500,.....}
{_id:700,type:"section",ancestry:500,.....}
{_id:800,type:"problem",ancestry:100,.....}
i want to fetch records in order like this
first record whose ancestry is nil
then all record whose parent is first record we search and whose type is 'problem'
then all record whose parent is first record we search and whose type is 'section'
Expected output is
{_id:100,type:"section",ancestry:nil,.....}
{_id:400,type:"problem",ancestry:100,.....}
{_id:800,type:"problem",ancestry:100,.....}
{_id:300,type:"section",ancestry:100,.....}
{_id:500,type:"section",ancestry:100,.....}
{_id:600,type:"problem",ancestry:500,.....}
{_id:700,type:"section",ancestry:500,.....}
Try this MongoDB shell command:
db.collection.find().sort({ancestry:1, type: 1})
Different languages, where ordered dictionaries aren't available, may use a list of 2-tuples to the sort argument. Something like this (Python):
collection.find({}).sort([('ancestry', pymongo.ASCENDING), ('type', pymongo.ASCENDING)])
#vinipsmaker 's answer is good. However, it doesn't work properly if _ids are random numbers or there exist documents that aren't part of the tree structure. In that case, the following code would work rightly:
function getSortedItems() {
var sorted = [];
var ids = [ null ];
while (ids.length > 0) {
var cursor = db.Items.find({ ancestry: ids.shift() }).sort({ type: 1 });
while (cursor.hasNext()) {
var item = cursor.next();
ids.push(item._id);
sorted.push(item);
}
}
return sorted;
}
Note that this code is not fast because db.Items.find() will be executed n times, where n is the number of documents in the tree structure.
If the tree structure is huge or you will do the sort many times, you can optimize this by using $in operator in the query and sort the result on the client side.
In addition, creating index on the ancestry field will make the code quicker in either case.
I have a mongo collection with the fields
visit_id, user_id, date, action 1, action 2
example:
1 u100 2012-01-01 phone-call -
2 u100 2012-01-02 - computer-check
Can I get in mongodb the user that has made both a phone-call and a computer-check no matter the time ? ( basically it's an AND on different rows )
I guess it is not possible without map/reduce work.
I see it can be done in following way:
1.First you need run map/reduce that produce to you results like this:
{
_id : "u100",
value: {
actions: [
"phone-call",
"computer-check",
"etc..."
]
}
}
2.Then you can query above m/r result via elemMatch
You won't be able to do this with a single query-- if this is something you're doing frequently in your application I wouldn't recommend map/reduce-- I'd recommend doing a query in mongodb using the $or operator, and then processing it on the client to get a unique set of user_id's.
For example:
db.users.find({$or:[{"action 1":"phone-call"}, {"action 2":"computer-check"}]})
In the future, you should save your data in a different format like the one suggested above by Andrew.
There is the MongoDB group method that can be used for your query, comparable to an SQL group by operator.
I haven't tested this, but your query could look something similar to:
var results = db.coll.group({
key: { user_id: true },
cond: { $or: [ { action1: "phone-call" }, { action2: "computer-check" } ] },
initial: { actionFlags: 0 },
reduce: function(obj, prev) {
if(obj.action1 == "phone-call") { prev.actionFlags |= 1; }
if(obj.action2 == "computer-check") { prev.actionFlags |= 2; }
},
finalize: function(doc) {
if(doc.actionFlags == 3) { return doc; }
return null;
}
});
Again, I haven't tested this, it's based on my reading of the documentation. You're grouping by the user_id (the key declaration). The rows you want to let through have either action1 == "phone-call" or action2 == "computer-check" (the cond declaration). The initial state when you start checking a particular user_id is 0 (initial). For each row you check if action1 == "phone-call" and set its flag, and check action2 == "computer-check" and set it's flag (the reduce function). Once you've marked the row types, you check to make sure both flags are set. If so, keep the object, otherwise eliminate it (the finalize function).
That last part is the only part I'm unsure of, since the documentation doesn't explicitly state that you can knock out records in the finalize function. It will probably take me more time to get some test data set up to see than it would for you to see if the example above works.
I've recently discovered that Mongo has no SQL equivalent to "ORDER BY RAND()" in it's command syntax (https://jira.mongodb.org/browse/SERVER-533)
I've seen the recommendation at http://cookbook.mongodb.org/patterns/random-attribute/ and frankly, adding a random attribute to a document feels like a hack. This won't work because this places an implicit limit to any given query I want to randomize.
The other widely given suggestion is to choose a random index to offset from. Because of the order that my documents were inserted in, that will result in one of the string fields being alphabetized, which won't feel very random to a user of my site.
I have a couple ideas on how I could solve this via code, but I feel like I'm missing a more obvious and native solution. Does anyone have a thought or idea on how to solve this more elegantly?
I have to agree: the easiest thing to do is to install a random value into your documents. There need not be a tremendously large range of values, either -- the number you choose depends on the expected result size for your queries (1,000 - 1,000,000 distinct integers ought to be enough for most cases).
When you run your query, don't worry about the random field -- instead, index it and use it to sort. Since there is no correspondence between the random number and the document, you should get fairly random results. Note that collisions will likely result in documents being returned in natural order.
While this is certainly a hack, you have a very easy escape route: given MongoDB's schema-free nature, you can simply stop including the random field once there is support for random sort in the server. If size is an issue, you could run a batch job to remove the field from existing documents. There shouldn't be a significant change in your client code if you design it carefully.
An alternative option would be to think long and hard about the number of results that will be randomized and returned for a given query. It may not be overly expensive to simply do shuffling in client code (i.e., if you only consider the most recent 10,000 posts).
What you want cannot be done without picking either of the two solutions you mention. Picking a random offset is a horrible idea if your collection becomes larger than a few thousands documents. The reason for this is that the skip(n) operation takes O(n) time. In other words, the higher your random offset the longer the query will take.
Adding a randomized field to the document is, in my opinion, the least hacky solution there is given the current feature set of MongoDB. It provides stable query times and gives you some say over how the collection is randomized (and allows you to generate a new random value after each query through a findAndModify for example). I also do not understand how this would impose an implicit limit on your queries that make use of randomization.
You can give this a try - it's fast, works with multiple documents and doesn't require populating rand field at the beginning, which will eventually populate itself:
add index to .rand field on your collection
use find and refresh, something like:
// Install packages:
// npm install mongodb async
// Add index in mongo:
// db.ensureIndex('mycollection', { rand: 1 })
var mongodb = require('mongodb')
var async = require('async')
// Find n random documents by using "rand" field.
function findAndRefreshRand (collection, n, fields, done) {
var result = []
var rand = Math.random()
// Append documents to the result based on criteria and options, if options.limit is 0 skip the call.
var appender = function (criteria, options, done) {
return function (done) {
if (options.limit > 0) {
collection.find(criteria, fields, options).toArray(
function (err, docs) {
if (!err && Array.isArray(docs)) {
Array.prototype.push.apply(result, docs)
}
done(err)
}
)
} else {
async.nextTick(done)
}
}
}
async.series([
// Fetch docs with unitialized .rand.
// NOTE: You can comment out this step if all docs have initialized .rand = Math.random()
appender({ rand: { $exists: false } }, { limit: n - result.length }),
// Fetch on one side of random number.
appender({ rand: { $gte: rand } }, { sort: { rand: 1 }, limit: n - result.length }),
// Continue fetch on the other side.
appender({ rand: { $lt: rand } }, { sort: { rand: -1 }, limit: n - result.length }),
// Refresh fetched docs, if any.
function (done) {
if (result.length > 0) {
var batch = collection.initializeUnorderedBulkOp({ w: 0 })
for (var i = 0; i < result.length; ++i) {
batch.find({ _id: result[i]._id }).updateOne({ rand: Math.random() })
}
batch.execute(done)
} else {
async.nextTick(done)
}
}
], function (err) {
done(err, result)
})
}
// Example usage
mongodb.MongoClient.connect('mongodb://localhost:27017/core-development', function (err, db) {
if (!err) {
findAndRefreshRand(db.collection('profiles'), 1024, { _id: true, rand: true }, function (err, result) {
if (!err) {
console.log(result)
} else {
console.error(err)
}
db.close()
})
} else {
console.error(err)
}
})
The other widely given suggestion is to choose a random index to offset from. Because of the order that my documents were inserted in, that will result in one of the string fields being alphabetized, which won't feel very random to a user of my site.
Why? If you have 7.000 documents and you choose three random offsets from 0 to 6999, the chosen documents will be random, even if the collection itself is sorted alphabetically.
One could insert an id field (the $id field won't work because its not an actual number) use modulus math to get a random skip. If you have 10,000 records and you wanted 10 results you could pick a modulus between 1 and 1000 randomly sucH as 253 and then request where mod(id,253)=0 and this is reasonably fast if id is indexed. Then randomly sort client side those 10 results. Sure they are evenly spaced out instead of truly random, but it close to what is desired.
Both of the options seems like non-perfect hacks to me, random filed and will always have same value and skip will return same records for a same number.
Why don't you use some random field to sort then skip randomly, i admit it is also a hack but in my experience gives better sense of randomness.
I am using mongoose with nodejs. I am using mapReduce to fetch data grouped by a field.So all it gives me as a collection is the key with the grouping field only from every row of database.
I need to fetch all the fields from the database grouped by a field and sorted on the basis of another field.e.g.: i have a database having details of places and fare for travelling to those places and a few other fields also.Now i need to fetch the data in such a way that i get the data grouped on the basis of places sorted by the fare for them. MapReduce helps me to get that, but i cannot get the other fields.
Is there a way to get all the fields using map reduce, rather than just getting the two fields as mentioned in the above example??
I must admit I'm not sure I understand completely what you're asking.
But maybe one of the following thoughts helps you:
either) when you iterate over your mapReduce results, you could fetch complete documents from mongodb for each result. That would give you access to all fields in each document for the cost of some network traffic.
or) The value that you send into emit(key, value) can be an object. So you could construct a value object that contains all your desired fields. Just be sure to use the exactly same object structure for your reduce method's return value.
I try to illustrate with an (untested) example.
map = function() {
emit(this.place,
{
'field1': this.field1,
'field2': this.field2,
'count' : 1
});
}
reduce = function(key, values) {
var result = {
'field1': values[0].field1,
'field2': values[0].field2,
'count' : 0 };
for (v in values) {
result.count += values[v].count;
}
return obj;
}