MongoDB summing field of document during query - mongodb

I want to execute a mongodb query that would fetch documents until the sum of a field of those documents exceeds a value. For example, if I have the following documents
{id: 1, qty: 40}
{id: 2, qty: 50}
{id: 3, qty: 30}
and I have a set quantity of 80, I would want to retrieve id1 and id2 because 40+50 is 90 and is now over 80. If I wanted a quantity of 90, I would also retrieve id1 and id2. Does anyone have any insight into how to query in this manner? (I'm using Go btw - but any general mongo query advice would help tremendously)

Since you're keeping a running sum of a certain field, the easiest way of doing this is running a Find operation, get a cursor, and iterate the cursor while keeping the sum yourself until the required total is reached. Then, close the cursor and return:
cursor, err:=coll.Find(context.Background(),query)
sum:=0
defer cursor.Close(context.Background())
for cursor.Next(context.Background()) {
cursor.Decode(&data)
sum+=data.Qty
if sum>=80 {
break
}
}

Related

how to determine the optimal query and limit size

I am running a mongodb aggregate query to group the data of a collection and get a sum of the values in a field and insert it to another collection.
ex: collection1: [
{ name: "foo",
group_id:1,
marks: 10
},
{ name: "bar",
group_id:1,
marks: 20
},
{ name: "Hello World",
group_id:2,
marks: 40
}]
So, the group by query will insert into a collection ex: collection2 with the following data
collection2:[
{
group_id: 1,
marks: 30
}, {
group_id:2,
marks: 40
}
]
I need to do these two operations:
Group the data and get the aggregate
create a new collection with the data
Now, comes the interesting part, The data that is being grouped is of 5 billion rows, and so, the query to get the aggregate of the marks will be very slow to execute.
thus writing a node script to get the data by the group and then insert it to another collection will be not very optimal. The other way that I was thinking was to limit the data by x ex: 1000, and group the 1000s and then insert that to the collection2 and for the next 1000 , update the collection 2, and so on.
So, here are my questions. does aggregating the data by a limit and then iterating over it faster?
ex:
step 1: group and get the sum of the marks of 1000 rows
step 2: insert/update collection2 with this data
step 3: goto step1
is this above method more useful than just getting the aggregate by grouping all the 5 billion records and then inserting it in the collection2? Assuming that there is a node api that is doing the above task, how to determine/calculate the limit size for the faster operations? how do I use whenMatched to update/insert to collection2 with the marks?

Choosing the type of column value for indexing in mongo

document : {
score:123
}
I have a field in the document called score(integer). I want to use a range query db.collection.find({score: {$gte: 100, $lt: 200}}). I have definite number of these ranges(approx 20).
Should i introduce a new field in the document to tell the type of range and then query on the indentifier of that range. Ex -
document: {
score: 123,
scoreType: "type1"
}
so which query is better-
1. db.collection.find({score: {$gte: 100, $lt: 200}})
2. db.collection.find({scoreType: "type1"})
In any case i will have to create an Index on either score or scoreType.
Which index would tend to perform better??
It depends entirely on your situation, if you are sure the number of documents in your database will always remain the same then use scoreType.
Keep in mind: scoreType will be a fixed value and thus will not help when you query over different ranges i.e it might work for 100 to 200 if score type was created
with this range in mind, but will not work for other ranges i.e for 100 to 500,(Do you plan on having a new scoreType2?) keeping flexibility in scope, this is a bad idea

MongoDB range query with a sort - how to speed up?

I have a query which is routinely taking around 30 seconds to run for a collection with 1 million documents. This query is to form part of a search engine, where the requirement is that every search completes in less than 5 seconds. Using a simplified example here (the actual docs has embedded documents and other attributes), let's say I have the following:
1 millions docs of a Users collections where each looks as follows:
{
name: Dan,
age: 30,
followers: 400
},
{
name: Sally,
age: 42,
followers: 250
}
... etc
Now, lets I'm wanting to return the IDs of 10 users with a follower count between 200 and 300, sorted by age in descending order. This can be achieved with the following:
db.users.find({
'followers': { $gt: 200, $lt: 300 },
}).
projection({ '_id': 1 }).
sort({ 'age': -1 }).
limit(10)
I have the following compound Index created, which winningPlan tells me is being used:
db.users.createIndex({ 'followed_by': -1, 'age': -1 })}
But this query is still taking ~30 seconds as it's having to examine thousands of docs, near equal to the amount of docs in this case that match the find query. I have experimented with different indexes (with different positions and sort orders) with no luck.
So my question is, what else can I do to either reduce the number of documents examined with the query, or speed up the the process of having to examine the docs?
The query is taking long both in production and on my local dev environment, somewhat ruling many network and hardware factors. currentOp shows that the query is not waiting for locks while running, or that there are any other queries running at the same time.
For me, it looks like you have an incorrect index: { 'followed_by': -1, 'age': -1 } for your query. You should have an index { 'followers': 1} (but take into consideration cardinality of that field). But even with that index, you will need to do inmem sort. Anyway, it should be much faster in the way you have high cardinality because you will not need to scan the whole collection for filtering step as you do with index prefix followed_by.

MongoDB sort all and get specific range

I'm using mongoDB. I have a collection with:
String user_name,
Integer score
I would like to make a query that gets a user_name. The query should be sorted by score which returns the range of the 50 documents which the requested user_name is one of them.
For example, if I have 110 documents with the user_name X1-X110 with the scores 1-110 respectively and the input user_name was X72 I would like to get the range: X51-X100
EDIT:
An example of 3 documents:
{ "user_name": "X1", "score": 1}
{ "user_name": "X2", "score": 2}
{ "user_name": "X3", "score": 3}
Now if I have 110 documents as described above, and I want to find X72 I want to get the following documents:
{ "user_name": "X50", "score": 50}
{ "user_name": "X51", "score": 51}
...
{ "user_name": "X100", "score": 100}
How can I do it?
Clarification: I don't have each document rank stored. What I do have is document scores, which aren't necessarily consecutive (the example is a little bit misleading). Here's a less misleading example:
{ "user_name": "X1", "score": 17}
{ "user_name": "X2", "score": 24}
{ "user_name": "X3", "score": 38}
When searching for "X72" I would like to get a slice of size 50 in which "X72" resides according to its rank. Again, the rank is not the element score, but the element index in a hypothetical array sorted by scores.
Check out the MongoDB cursor operations sort, limit and skip. When used in conjunction, they can be used to get elements n to m which match your query:
cursor = db.collcetion.find({...}).sort({score:1}).limit(100).skip(50);
This should return documents 51 to 100 in order of score.
When I understood you correctly, you want to query the users which are scorewise in the neighbourhood of another player.
With three queries you can select the user, the 25 users above it and the 25 users below.
First, you need to get the user itself and its score.
user = db.collection.findOne({user_name: "X72"});
Then you select the next 25 players with scores above them:
cursor db.collection.find(score: { $gt:user.score}).sort(score: -1 ).limit(25);
//... iterate cursor
Then you select the next 25 players with scores below them:
cursor db.collection.find(score: { $lt:user.score}).sort(score: 1 ).limit(25);
//... iterate cursor
Unfortunately, there is no direct way to achieve what you want. You will need some processing at your client end to figure out the range.
First fetch the score by doing simple findOne / find
db.sample.findOne({"user_name": "X72"})
Next, using the score value (72 in this case), calculate the range in your client
lower = 72/50 => lower = 1.44
extract the number before decimal and set it to lower
lower = 1
upper = lower+1 => upper = 2
Now multiply the lower and upper values by 50 in your client, which would give you below values.
lower = 50
upper = 100
pass the lower and upper values to find and get the desired list.
db.sample.find({score:{$gt:50,$lte:100}}).sort({score:1})
Partial solution with one query:
I tried to do this with one query, but unfortunately I could not complete it. I am providing details below in hope that someone may be able to expand on this and complete what I started. Following are the steps that I planned:
project the documents to divide all scores by 50 and store in a new field _score. (This is as far as I got)
extract the value before decimal from _score [Stuck here] (Currently, I did not find any way to do this)
group values based on _score. (each group will give you one slot)
find and return the group where your score belongs (by using $match in aggregation pipeline)
db.sample.aggregate([{$project:{_id:1, user_name:1,score:1,_score:{$divide:["$score",50]}}}])
I would be really interested to see how this is done!!!

MongoDB - Pagination based on non-unique fields

I am familiar with the best practice of range based pagination on large MongoDB collections, however I am struggling with figuring out how to paginate a collection where the sort value is on a non-unique field.
For example, I have a large collection of users, and there is a field for the number of times they have done something. This field is defintely non-unique, and could have large groups of documents that have the same value.
I would like to return results sorted by that 'numTimesDoneSomething' field.
Here is a sample data set:
{_id: ObjectId("50c480d81ff137e805000003"), numTimesDoneSomething: 12}
{_id: ObjectId("50c480d81ff137e805000005"), numTimesDoneSomething: 9}
{_id: ObjectId("50c480d81ff137e805000006"), numTimesDoneSomething: 7}
{_id: ObjectId("50c480d81ff137e805000007"), numTimesDoneSomething: 1}
{_id: ObjectId("50c480d81ff137e805000002"), numTimesDoneSomething: 15}
{_id: ObjectId("50c480d81ff137e805000008"), numTimesDoneSomething: 1}
{_id: ObjectId("50c480d81ff137e805000009"), numTimesDoneSomething: 1}
{_id: ObjectId("50c480d81ff137e805000004"), numTimesDoneSomething: 12}
{_id: ObjectId("50c480d81ff137e805000010"), numTimesDoneSomething: 1}
{_id: ObjectId("50c480d81ff137e805000011"), numTimesDoneSomething: 1}
How would I return this data set sorted by 'numTimesDoneSomething' with 2 records per page?
#cubbuk shows a good example using offset (skip) but you can also mould the query he shows for ranged pagination as well:
db.collection.find().sort({numTimesDoneSomething:-1, _id:1})
Since the _id here will be unique and you are seconding on it you can actually then range by _id and the results, even between two records having numTimesDoneSomething of 12, should be consistent as to whether they should be on one page or the next.
So doing something as simple as
var q = db.collection.find({_id: {$gt: last_id}}).sort({numTimesDoneSomething:-1, _id:1}).limit(2)
Should work quite good for ranged pagination.
You can sort on multiple fields in this case sort on numTimesDoneSomething and id field. Since id_ field is ascending in itself already according to the insertion timestamp, you will able to paginate through the collection without iterating over duplicate data unless new data is inserted during the iteration.
db.collection.find().sort({numTimesDoneSomething:-1, _id:1}).offset(index).limit(2)