MongoDB sort all and get specific range - mongodb

I'm using mongoDB. I have a collection with:
String user_name,
Integer score
I would like to make a query that gets a user_name. The query should be sorted by score which returns the range of the 50 documents which the requested user_name is one of them.
For example, if I have 110 documents with the user_name X1-X110 with the scores 1-110 respectively and the input user_name was X72 I would like to get the range: X51-X100
EDIT:
An example of 3 documents:
{ "user_name": "X1", "score": 1}
{ "user_name": "X2", "score": 2}
{ "user_name": "X3", "score": 3}
Now if I have 110 documents as described above, and I want to find X72 I want to get the following documents:
{ "user_name": "X50", "score": 50}
{ "user_name": "X51", "score": 51}
...
{ "user_name": "X100", "score": 100}
How can I do it?
Clarification: I don't have each document rank stored. What I do have is document scores, which aren't necessarily consecutive (the example is a little bit misleading). Here's a less misleading example:
{ "user_name": "X1", "score": 17}
{ "user_name": "X2", "score": 24}
{ "user_name": "X3", "score": 38}
When searching for "X72" I would like to get a slice of size 50 in which "X72" resides according to its rank. Again, the rank is not the element score, but the element index in a hypothetical array sorted by scores.

Check out the MongoDB cursor operations sort, limit and skip. When used in conjunction, they can be used to get elements n to m which match your query:
cursor = db.collcetion.find({...}).sort({score:1}).limit(100).skip(50);
This should return documents 51 to 100 in order of score.

When I understood you correctly, you want to query the users which are scorewise in the neighbourhood of another player.
With three queries you can select the user, the 25 users above it and the 25 users below.
First, you need to get the user itself and its score.
user = db.collection.findOne({user_name: "X72"});
Then you select the next 25 players with scores above them:
cursor db.collection.find(score: { $gt:user.score}).sort(score: -1 ).limit(25);
//... iterate cursor
Then you select the next 25 players with scores below them:
cursor db.collection.find(score: { $lt:user.score}).sort(score: 1 ).limit(25);
//... iterate cursor

Unfortunately, there is no direct way to achieve what you want. You will need some processing at your client end to figure out the range.
First fetch the score by doing simple findOne / find
db.sample.findOne({"user_name": "X72"})
Next, using the score value (72 in this case), calculate the range in your client
lower = 72/50 => lower = 1.44
extract the number before decimal and set it to lower
lower = 1
upper = lower+1 => upper = 2
Now multiply the lower and upper values by 50 in your client, which would give you below values.
lower = 50
upper = 100
pass the lower and upper values to find and get the desired list.
db.sample.find({score:{$gt:50,$lte:100}}).sort({score:1})
Partial solution with one query:
I tried to do this with one query, but unfortunately I could not complete it. I am providing details below in hope that someone may be able to expand on this and complete what I started. Following are the steps that I planned:
project the documents to divide all scores by 50 and store in a new field _score. (This is as far as I got)
extract the value before decimal from _score [Stuck here] (Currently, I did not find any way to do this)
group values based on _score. (each group will give you one slot)
find and return the group where your score belongs (by using $match in aggregation pipeline)
db.sample.aggregate([{$project:{_id:1, user_name:1,score:1,_score:{$divide:["$score",50]}}}])
I would be really interested to see how this is done!!!

Related

Find the maximum delta of a field between date A and date B

I have been noodling around with an aggregation for hours now trying to work out how to do this without writing code and walking the collection. There are millions of documents in this collection, so I'd prefer to write an aggregation than walk the entire collection and brute force a result, but I am officially stumped. I am a newbie to MongoDB and aggregations in particular, so please be gentle. I do want to learn, so I'd also appreciate pointers to tutorials or similar to help me improve.
We have a collection that includes updated_at, which is when a document was created, faction_name or faction_id, and system name or system_id, and influence. I am trying to create a sorted aggregation that will list the top faction influence delta changes between two updated_at dates (date A, such as today, and date B, such as yesterday) in the same system and faction. I'd like the output to contain the faction id or name and system id or name, and delta of the two influence values. influence_delta would be a computed value from the difference of two documents, and influence movement can result in a positive or negative change. We believe the largest single day change is 0.075 and the smallest is -0.075.
Basically, I'm trying to create a table in a web app that will display the biggest influence changes between date A and date B, but the system_id and faction_id need to be the same.
system_id
faction_id
influence_delta
605e573a68ad125bce5186b4
605e573a68ad125bce5186bd
0.075
605e573a68ad125bce51868d
605e573a68ad125bce518696
0.031
605e573a68...
605e573a68ad...
0.021
Here's a sample document. There will be at least one document per day for popular systems, and for less popular systems, it could be weeks since a document was last recorded. For each system and faction, there be the same faction_id and system_id values as long as the faction has not arrived recently or retreated (this is Elite Dangerous), and there will be different influence values. Stored influence values range from 0.0 to 1.0. I would suggest for performance, please consider using system_id and faction_id, which are unique and indexed, rather than the system or faction name.
{
"_id": {
"$oid": "605e573bbfcfe2a2cdb7fefb"
},
"updated_at": {
"$date": "2021-03-26T21:50:50.000Z"
},
"updated_by": "EDDN",
"system": "Gliese 875",
"system_lower": "gliese 875",
"system_id": {
"$oid": "605e573a68ad125bce5186b4"
},
"faction_id": {
"$oid": "605e573a68ad125bce5186bd"
},
"faction_name": "Gliese 875 Allied Exchange",
"faction_name_lower": "gliese 875 allied exchange",
"state": "none",
"influence": 0.039715,
"happiness": "$faction_happinessband2;",
"active_states": [],
"pending_states": [],
"recovering_states": [],
"conflicts": [],
"systems": [
{
"system_id": {
"$oid": "605e573a68ad125bce5186b4"
},
"name": "Gliese 875",
"name_lower": "gliese 875"
}
],
"__v": 0
}

MongoDB summing field of document during query

I want to execute a mongodb query that would fetch documents until the sum of a field of those documents exceeds a value. For example, if I have the following documents
{id: 1, qty: 40}
{id: 2, qty: 50}
{id: 3, qty: 30}
and I have a set quantity of 80, I would want to retrieve id1 and id2 because 40+50 is 90 and is now over 80. If I wanted a quantity of 90, I would also retrieve id1 and id2. Does anyone have any insight into how to query in this manner? (I'm using Go btw - but any general mongo query advice would help tremendously)
Since you're keeping a running sum of a certain field, the easiest way of doing this is running a Find operation, get a cursor, and iterate the cursor while keeping the sum yourself until the required total is reached. Then, close the cursor and return:
cursor, err:=coll.Find(context.Background(),query)
sum:=0
defer cursor.Close(context.Background())
for cursor.Next(context.Background()) {
cursor.Decode(&data)
sum+=data.Qty
if sum>=80 {
break
}
}

MongoDb: how to create the right (composite) index for data with many searchable fields

UPDATE: I need to add that the point of this question is to allow me to define schemas for Json Rest Stores. The user can search by any one key, or several keys. So, I cannot easily predict what the users will search by -- it could be 1, 2, 5 fields (this is especially true for data-rich fields like people, bookings, etc.)
Imagine that I have an index as such:
{ "item": 1, "location": 1, "stock": 1 }
Following the MongoDb manual on indexes:
MongoDB can use this index to support queries that include:
the item field,
the item field and the location field,
the item field and the location field and the stock field, or
only the item and stock fields; however, this index would be less efficient than an index on only item and stock.
MongoDB cannot use this index to support queries that include:
only the location field,
only the stock field, or
only the location and stock fields.
Now, suppose I have a schema with exactly these fields:
item: String
location: String
stock: String
qty: number
And imagine I want to make sure every query is indeed indexed. I would do:
For item:
item, location, stock, qty
item, location, qty, stock
item, stock, qty, location
item, stock, location, qty
item, qty, location, stock
item, qty, stock, location
For location:
...you know the gist
Now... this seems a little insane. If you have a database where you have TEN searchable fields, this becomes clearly unworkable as the number of indexes grows exponentially.
Am I missing something? My idea was to define a schema, define which fields were searchable, and write a function that makes up all of the needed indexes regardless of what fields were present and what fields weren't. However, I am thinking about it, and... well, I must be missing something.
Am I?
I will try to explain what does this mean by example. The indexes based on B-tree is not something mongodb specific. In contrast it is rather common concept.
So when you create an index - you show the database an easier way to find something. But this index is stored somewhere with a pointer pointing to a location of the original document. This information is ordered and you might look at it as binary tree which has a really nice property: the search is reduced from O(n) (linear scan) to O(log(n)). Which is much much faster because each time we trim our space in half (potentially we can reduce the time from 10^6 to 20 lookups). For example we have a big collection with field {a : some int, b: 'some other things'} and if we index it by a, we end up with another data structure which is sorted by a. It looks this way (by this I do not mean that it is another collection, this is just for demonstration):
{a : 1, pointer: to the field with a = 1}, // if a is the smallest number in the starting collection
...
{a : 999, pointer: to the field with a = 990} // assuming that 999 is the biggest field
So right now we are searching for a field a = 18. Instead of going one by one through all elements we take something in the middle and if it is bigger then 18, then we are dividing the lower part in half and checking the element there. We continue till we will find a = 18. Then we look at the pointer and knowing it we extract the original field.
The situation with compound index is similar (instead of ordering by one element we order by many). For example you have a collection:
{ "item": 5, "location": 1, "stock": 3, 'a lot of other fields' } // was stored at position 5 on the disk
{ "item": 1, "location": 3, "stock": 1, 'a lot of other fields' } // position 1 on the disk
{ "item": 2, "location": 5, "stock": 7, 'a lot of other fields' } // position 3 on the disk
... huge amount of other data
{ "item": 1, "location": 1, "stock": 1, 'a lot of other fields' } // position 9 on the disk
{ "item": 1, "location": 1, "stock": 2, 'a lot of other fields' } // position 7 on the disk
and want an index { "item": 1, "location": 1, "stock": 1 }. The lookup table would look like this (one more time - this is not another collection, this is just for demonstration):
{ "item": 1, "location": 1, "stock": 1, pointer = 9 }
{ "item": 1, "location": 1, "stock": 2, pointer = 7 }
{ "item": 1, "location": 3, "stock": 1, pointer = 1 }
{ "item": 2, "location": 5, "stock": 7, pointer = 3 }
.. huge amount of other data (but not necessarily here. If item would be one it would be somewhere next to items 1)
{ "item": 5, "location": 1, "stock": 3, pointer = 5 }
See that here everything is basically sorted by item, then by location and then by pointer.
The same way as with a single index we do not need to scan everything. If we have a query which looks for item = 2, location = 5 and stock = 7 we can quickly identify where documents with item = 2 are and then the same way quickly identify where among these items item with location 5 and so on.
And right now an interesting part. Also we created just one index (although this is a compound index, it is still one index) we can use it to quickly find the element
only by the item. Really all we need to do is only the first step. So there is no point to create another index {location : 1} because it is already covered by compound index.
also we can quickly find only by item and by location (we need only 2 steps).
Cool 1 index but helps us in three different ways. But wait a minute: what if we want to find by item and stock. Oh it looks like we can speed up this query as well. We can in log(n) find all elements with specific item and ... here we have to stop - magic has finished. We need to iterate through all of them. But still pretty good.
But may it can help us with other queries. Lets look at a query by location which looks like was already ordered. But if you will look at it - you see that this is a mess. One in the beginning and then one in the end. It can not help you at all.
I hope this clarifies few things:
why indexes are good (reduce time from O(n) to potentially O(log(n))
why compound indexes can help with some queries nonetheless we have not created an index on that particular field and help with some other queries.
what indexes are covered by compound index
why indexes can harm (it creates additional datastructure which should be maintained)
And this should tell another valid thing: index is not a silver bullet. You can not speed up all your queries, so it sound silly to think that by creating indexes on all fields EVERYTHING would be super fast.
What are your real query patterns? It's very unlikely that you would need to create all of these possible index combinations. I also doubt that including qty in the index would be of much use. Do you need to search for things where qty == 4 independent of location and item type?
An index doesn't need to identify every single record, it just needs to be specific enough to make any final scan small. Given an item code or a stock value are there really that many locations that you'd also need to index on them?
I suspect in this case an index on item, an index on location and and index on stock would be sufficient to answer most likely queries with sufficient speed. (But we'd need to know more about what these field names mean and what the count and distribution of values is within them).
Use explain with your queries and you can see how well they are performing. Add indices as necessary, don't create every possible ordering.

How to define a couchdb view that I can filter on attributes of different types of document?

I have two types of document in a couch database. There are Events and Occurences. One Event has many Occurences.
Event:
{
"_id": "49bb92b8896515a2994e524b38a041d3",
"type": "Event",
"eventID": 1234,
"area": "m1"
}
Occurence:
{
"_id": "49bb92b8896515a2994e524b38a041d4",
"type": "Occurence",
"occurenceID": 7890,
"eventID": 1234,
"date": "2013-01-01"
}
I need to find the count of occurences per date filtered by an area name and by a range of dates. In SQL, I'd use this query:
SELECT Date, count(*)
FROM Event INNER JOIN Occurence ON Occurence.EventID = Event.EventID
WHERE Event.Area = "m1"
AND Occurence.Date BETWEEN '2013-01-01' AND '2013-02-01'
GROUP BY Date
I don't mind executing more than one query but my database has over 300,000 occurence documents (and will grow 10 times that), so I need to get the results by tansfering as few documents as possible. The app that queries this couchdb is written with node.js.
Yeah, this would require two queries to get right, I think. You should consider denormalizing by copying the event area into the occurrence documents, that would make it a lot easier.

rank leaderboard in mongo with surrounding players

how would I create a query to get both the current player's rank and the surrounding player ranks. For example, if I had a leaderboard collection with name and points
{name: 'John', pts: 123}
If John was in 23rd place, I would want to show the names of users in the 22nd and 24th place as well.
I could query for a count of leader board items with pts greater than 123 to get John's rank, but how can I efficiently get the one player that is ranked just above and below the current player? Can I get items based on index position alone?
I suppose I can make 2 queries, first to get the number the rank position of a user, then a skip limit query, but that seems inefficient and doesn't seem to have an efficient use of the index
db.leaderboards.find({pts:{$gt:123}}).count();
-> 23
db.leaderboards.find().skip(21).limit(3)
The last query seems to scan across 24 records using the its index, is there a way I can reasonably do this with a range query or something more efficient? I can see this becoming an issue if the user is very low ranked, like 50,000th place.
You'll need to do three queries:
var john = db.players.findOne({name: 'John'})
var next_player = db.players.find(
{_id: {$ne: john._id}, pts: {$gte: john.pts}}).sort({pts:1,name:1}).limit(-1)[0]
var previous_player = db.players.find(
{_id: {$ne: john._id}, pts: {$lte: john.pts}}).sort({pts:-1,name:-1}).limit(-1)[0]
Create indexes on name and pts.
Answer of A. Jesse Jiryu Davis is ok but I think there is another better option where geo/2d index could be used.
You could start with creating 2d index on pts field. And query the N number of documents near to a given point or a score. For example if you want to fetch 10 documents with points near to a score let say 123 then you can do this:
db.players.find( { pts: { $near: [ 123, 123 ] } } ).limit(10)
Points probably need to be normalised to fit the 2d index coordinates but this should work.