Couchbase composite keys search - nosql

I have the following array:
array(3) { [0]=> string(8) "xp is 20" [1]=> string(19) "level between 9, 50" [2]=> string(20) "cars between 100,200" }
First is field than is operator and after is the value searched for.
My view is as follows:
function (doc) { emit([doc.data.xp, doc.data.level, doc.data.cars]) }
Basically I want to search for key xp equal to 20 AND level between 9 AND 50 AND cars BETWEEN 100 AND 200.
Can I do that in Couchbase and if so, how?

No, you can't do that, at least now. Couchbase view has only one index, so you can have only one "between" per view.
But you can create 2 views that will emit "xp equal to 20 AND level between 9 AND 50" and "xp equal to 20 AND cars BETWEEN 100 AND 200" and then intersect result arrays on app-side.
For more info about composite keys view this question.

Related

Finding top N entries per group in Arango

I'm trying to efficiently find the top entries by group in Arango (AQL). I have a fairly standard object collection and an edge collection representing Departments and Employees in that department.
Example purpose: Find the top 2 employees in each department by most years of experience.
Sample Data:
"departments" is an object collection. Here are some entries:
_id
name
departments/1
engineering
departments/2
sales
"dept_emp_edges" is an edge collection connecting departments and employee objects by ids.
_id
_from
_to
years_exp
dept_emp_edges/1
departments/1
employees/1
3
dept_emp_edges/2
departments/1
employees/2
4
dept_emp_edges/3
departments/1
employees/3
5
dept_emp_edges/4
departments/2
employees/1
6
I would like to end up with the top 2 employees per department by most years experience:
department
employee
years_exp
departments/1
employee/3
5
departments/1
employee/2
4
departments/2
employee/1
6
Long Working Query
The following query works! But is a bit slow on larger tables and feels inefficient.
FOR dept IN departments
LET top2earners = (
FOR dep_emp_edge IN dept_emp_edges
FILTER dep_emp_edge._from == dept._id
SORT dep_emp_edge.years_exp DESC
LIMIT 2
RETURN {'department': dep_emp_edge._from,
'employee': dep_emp_edge._to,
'years_exp': dep_emp_edge.years_exp}
)
FOR row in top2earners
return {'department': dep_emp_edge._from,
'employee': dep_emp_edge._to,
'years_exp': dep_emp_edge.years_exp}
I don't like this because there is 3 loops in here and feels rather inefficient.
Short Query
However, I tried to write:
FOR dept IN departments
FOR dep_emp_edge IN dept_emp_edges
FILTER dep_emp_edge._from == dept._id
SORT dep_emp_edge.years_exp DESC
LIMIT 2
RETURN {'department': dep_emp_edge._from,
'employee': dep_emp_edge._to,
'years_exp': dep_emp_edge.years_exp}
But this last query only outputs the final department top 2 results. Not all of the top 2 in each department.
My questions are: (1) why doesn't the second shorter query give all results? and (2) I'm quite new to Arango and ArangoQL, what other things can I do to make sure this is efficient?
Your first query is incorrect as written (Query: AQL: collection or view not found: dep_emp_edge (while parsing)) - as I could only guess what you mean, I ignore it for now.
Your smaller query limits the overall results to two - counter intuitively - as you are not grouping by department.
I suggest a slightly different approach: Use the edge collection as central source and group by _from, returning one document per department, containing an array of the two top resulting employees (should they exist), not one document per employee:
FOR edge IN dept_emp_edges
SORT edge.years_exp DESC
COLLECT dep = edge._from INTO deps
LET emps = (
FOR e in deps
LIMIT 2
RETURN ZIP(["employee", "years_exp"], [e.edge._to, e.edge.years_exp])
)
RETURN {"department": dep, employees: emps}
For your example database this returns:
[
{
"department": "departments/1",
"employees": [
{
"employee": "employees/3",
"years_exp": 5
},
{
"employee": "employees/2",
"years_exp": 4
}
]
},
{
"department": "departments/2",
"employees": [
{
"employee": "employees/1",
"years_exp": 6
}
]
}
]
If the query is too slow, an index on the year_exp-field of the dept_emp_edges collection could help (Explain suggests it would).

MongoDB compound shard key

I have a doubt regarding Mongo compound shard keys. Let's suppose I have document that is structured like this:
{
"players": [
{
"id": "12345",
"name": "John",
},
{
"id": "23415",
"name": "Doe",
}
]
}
Players embedded documents are always present and always 2. I think that the "players.0.id" and "players.1.id" should be a good choice as shard keys because are not monotonic and are evenly distributed.
What I can't understand from the documentation is if:
All documents with same "players.0.id" OR same "players.1.id" are supposed to be saved into the same Chunk, or
All documents with same "players.0.id" AND same "players.1.id" are supposed to be saved into the same Chunk.
In other words, if I query the Collection to get all games played by John (as player 1 or player 2) the query will be sent to one chunk or to all chunks?
You cannot create a shard key where part of the key is a multikey index (i.e. index on an array field). This is mentioned in Shard Key Index Type:
A shard key index cannot be an index that specifies a multikey index, a text index or a geospatial index on the shard key fields.
If you have exactly two items under the players field, why not create two sub-documents instead of using an array? An array is typically useful for use cases where you have multiple items of indeterminate number in a document. For example, this structure might work for your use case:
{
"players": {
"player_1": {
"id" : 12345,
"name": "John"
},
"player_2": {
"id": 54321,
"name": "Doe"
}
}
}
You can then create an index like:
> db.test.createIndex({'players.player_1.id':1, 'players.player_2.id':1})
To answer your questions, if you're using this shard key, then:
There is no guarantee that the same player_1.id and player_2.id will be on the same chunk. This will depend on your data distribution.
If you query John as player_1 OR player_2, the query will be sent to all shards. This is because you have a compound index as the shard key, and you're searching for an exact match on the non-prefix field.
To elaborate on question 2:
The query you're doing is this:
db.test.find({$or: [
{'players.player_1.id':123},
{'players.player_2.id':123}
]})
In a compound index, the index was first sorted by player_1.id, then for each player_1.id, there exist sorted player_2.id. For example, if you have 10 documents with some combination of values for player_1.id and player_2.id, you can visualize the index like this:
player_1.id | player_2.id
------------|-------------
0 | 10
0 | 123
1 | 100
1 | 123
2 | 123
2 | 150
123 | 10
123 | 100
123 | 123
123 | 150
Note that the value player_2.id: 123 occur multiple times in the table, once per each player_1.id. Also note that for each player_1.id value, the player_2.id values are sorted within it.
This is how MongoDB's compound index works and how it's sorted. There are more nuances with compound indexes that is too long to explain here, but the details are explained in the Compound Indexes page
The effect of this ordering method is that, there are many, many identical player_2.id values spread across the index. Since the overall index is only sorted in terms of player_1.id, it is not possible to find an exact player_2.id without specifying player_1.id. Hence, the above query will be sent to all shards.

Query Exact Matches in MongoDB

I am working on developing a web application feature that suggests prices for users based on previous orders in the database. I am using the MongoDB NoSQL database. Before I begin, I am trying to figure out the best way to set up the order object to return the correct results.
When a user places an order such as the following: 1 cheeseburger + 1 fry, McDonalds, 12345 E. Street, MyTown, USA... it should only return objects that are EXACT matches from the database.
For example, I would not want to receive an order that contained 1 cheeseburger + 1 fry + 1 shake. I will be keeping running averages of the prices and counts for that exact order.
{
restaurantAddress: "12345 E. Street, MyTown, USA",
restaurantName: "McDonald's",
orders: {
{ cheeseburger: 1, fries: 2 }
: {
sumPaid: 1444.55,
numTimesOrdered: 167,
avgPaid: 8.65 (gets recomputed w/ each new order)
},
{ // repeat for each unique item config },
{ // another unique item (or items) }
}
Do you think this is a valid and efficient way to set up the document in MongoDB? Or should I be using multiple documents?
If this is valid, how can I query it to only return exact orders? I looked into $eq but it did not seem to be exactly what I was looking for.
So I believe we have solved the problem. The solution is to create a string that is unique for the order on the server side. For example, we will write a function that would transform the 1 cheeseburger + 2 fries into burger1fries2. In order to keep consistency in the database, we will first sort the entries alphabetically, so we will always hit what we intended with the query. A similar order of 2 fries + 1 cheeseburger would generate the string burger1fries2 as well.

How to fetch between a range of indexes in mongodb?

I need help.. Is there any method available to fetch documents between a range of indexes while using find in mongo.. Like [2:10] (from 2 to 10) ?
If you are talking about the "index" position within an array in your document then you want the $slice operator. The first argument being the index to start with and the second is how many to return. So from a 0 index position 2 is the "third" index:
db.collection.find({},{ "list": { "$slice": [ 2, 8 ] })
Within a collection itself if you use the .limit() an .skip() modifiers to move through the range in the collection:
db.collection.find({}).skip(2).limit(8)
Keep in mind that in the collection context MongoDB has no concept of "ordered" records and is dependent on the query and/or sort order that is given

MongoDB sort all and get specific range

I'm using mongoDB. I have a collection with:
String user_name,
Integer score
I would like to make a query that gets a user_name. The query should be sorted by score which returns the range of the 50 documents which the requested user_name is one of them.
For example, if I have 110 documents with the user_name X1-X110 with the scores 1-110 respectively and the input user_name was X72 I would like to get the range: X51-X100
EDIT:
An example of 3 documents:
{ "user_name": "X1", "score": 1}
{ "user_name": "X2", "score": 2}
{ "user_name": "X3", "score": 3}
Now if I have 110 documents as described above, and I want to find X72 I want to get the following documents:
{ "user_name": "X50", "score": 50}
{ "user_name": "X51", "score": 51}
...
{ "user_name": "X100", "score": 100}
How can I do it?
Clarification: I don't have each document rank stored. What I do have is document scores, which aren't necessarily consecutive (the example is a little bit misleading). Here's a less misleading example:
{ "user_name": "X1", "score": 17}
{ "user_name": "X2", "score": 24}
{ "user_name": "X3", "score": 38}
When searching for "X72" I would like to get a slice of size 50 in which "X72" resides according to its rank. Again, the rank is not the element score, but the element index in a hypothetical array sorted by scores.
Check out the MongoDB cursor operations sort, limit and skip. When used in conjunction, they can be used to get elements n to m which match your query:
cursor = db.collcetion.find({...}).sort({score:1}).limit(100).skip(50);
This should return documents 51 to 100 in order of score.
When I understood you correctly, you want to query the users which are scorewise in the neighbourhood of another player.
With three queries you can select the user, the 25 users above it and the 25 users below.
First, you need to get the user itself and its score.
user = db.collection.findOne({user_name: "X72"});
Then you select the next 25 players with scores above them:
cursor db.collection.find(score: { $gt:user.score}).sort(score: -1 ).limit(25);
//... iterate cursor
Then you select the next 25 players with scores below them:
cursor db.collection.find(score: { $lt:user.score}).sort(score: 1 ).limit(25);
//... iterate cursor
Unfortunately, there is no direct way to achieve what you want. You will need some processing at your client end to figure out the range.
First fetch the score by doing simple findOne / find
db.sample.findOne({"user_name": "X72"})
Next, using the score value (72 in this case), calculate the range in your client
lower = 72/50 => lower = 1.44
extract the number before decimal and set it to lower
lower = 1
upper = lower+1 => upper = 2
Now multiply the lower and upper values by 50 in your client, which would give you below values.
lower = 50
upper = 100
pass the lower and upper values to find and get the desired list.
db.sample.find({score:{$gt:50,$lte:100}}).sort({score:1})
Partial solution with one query:
I tried to do this with one query, but unfortunately I could not complete it. I am providing details below in hope that someone may be able to expand on this and complete what I started. Following are the steps that I planned:
project the documents to divide all scores by 50 and store in a new field _score. (This is as far as I got)
extract the value before decimal from _score [Stuck here] (Currently, I did not find any way to do this)
group values based on _score. (each group will give you one slot)
find and return the group where your score belongs (by using $match in aggregation pipeline)
db.sample.aggregate([{$project:{_id:1, user_name:1,score:1,_score:{$divide:["$score",50]}}}])
I would be really interested to see how this is done!!!