Query Exact Matches in MongoDB - mongodb

I am working on developing a web application feature that suggests prices for users based on previous orders in the database. I am using the MongoDB NoSQL database. Before I begin, I am trying to figure out the best way to set up the order object to return the correct results.
When a user places an order such as the following: 1 cheeseburger + 1 fry, McDonalds, 12345 E. Street, MyTown, USA... it should only return objects that are EXACT matches from the database.
For example, I would not want to receive an order that contained 1 cheeseburger + 1 fry + 1 shake. I will be keeping running averages of the prices and counts for that exact order.
{
restaurantAddress: "12345 E. Street, MyTown, USA",
restaurantName: "McDonald's",
orders: {
{ cheeseburger: 1, fries: 2 }
: {
sumPaid: 1444.55,
numTimesOrdered: 167,
avgPaid: 8.65 (gets recomputed w/ each new order)
},
{ // repeat for each unique item config },
{ // another unique item (or items) }
}
Do you think this is a valid and efficient way to set up the document in MongoDB? Or should I be using multiple documents?
If this is valid, how can I query it to only return exact orders? I looked into $eq but it did not seem to be exactly what I was looking for.

So I believe we have solved the problem. The solution is to create a string that is unique for the order on the server side. For example, we will write a function that would transform the 1 cheeseburger + 2 fries into burger1fries2. In order to keep consistency in the database, we will first sort the entries alphabetically, so we will always hit what we intended with the query. A similar order of 2 fries + 1 cheeseburger would generate the string burger1fries2 as well.

Related

Finding top N entries per group in Arango

I'm trying to efficiently find the top entries by group in Arango (AQL). I have a fairly standard object collection and an edge collection representing Departments and Employees in that department.
Example purpose: Find the top 2 employees in each department by most years of experience.
Sample Data:
"departments" is an object collection. Here are some entries:
_id
name
departments/1
engineering
departments/2
sales
"dept_emp_edges" is an edge collection connecting departments and employee objects by ids.
_id
_from
_to
years_exp
dept_emp_edges/1
departments/1
employees/1
3
dept_emp_edges/2
departments/1
employees/2
4
dept_emp_edges/3
departments/1
employees/3
5
dept_emp_edges/4
departments/2
employees/1
6
I would like to end up with the top 2 employees per department by most years experience:
department
employee
years_exp
departments/1
employee/3
5
departments/1
employee/2
4
departments/2
employee/1
6
Long Working Query
The following query works! But is a bit slow on larger tables and feels inefficient.
FOR dept IN departments
LET top2earners = (
FOR dep_emp_edge IN dept_emp_edges
FILTER dep_emp_edge._from == dept._id
SORT dep_emp_edge.years_exp DESC
LIMIT 2
RETURN {'department': dep_emp_edge._from,
'employee': dep_emp_edge._to,
'years_exp': dep_emp_edge.years_exp}
)
FOR row in top2earners
return {'department': dep_emp_edge._from,
'employee': dep_emp_edge._to,
'years_exp': dep_emp_edge.years_exp}
I don't like this because there is 3 loops in here and feels rather inefficient.
Short Query
However, I tried to write:
FOR dept IN departments
FOR dep_emp_edge IN dept_emp_edges
FILTER dep_emp_edge._from == dept._id
SORT dep_emp_edge.years_exp DESC
LIMIT 2
RETURN {'department': dep_emp_edge._from,
'employee': dep_emp_edge._to,
'years_exp': dep_emp_edge.years_exp}
But this last query only outputs the final department top 2 results. Not all of the top 2 in each department.
My questions are: (1) why doesn't the second shorter query give all results? and (2) I'm quite new to Arango and ArangoQL, what other things can I do to make sure this is efficient?
Your first query is incorrect as written (Query: AQL: collection or view not found: dep_emp_edge (while parsing)) - as I could only guess what you mean, I ignore it for now.
Your smaller query limits the overall results to two - counter intuitively - as you are not grouping by department.
I suggest a slightly different approach: Use the edge collection as central source and group by _from, returning one document per department, containing an array of the two top resulting employees (should they exist), not one document per employee:
FOR edge IN dept_emp_edges
SORT edge.years_exp DESC
COLLECT dep = edge._from INTO deps
LET emps = (
FOR e in deps
LIMIT 2
RETURN ZIP(["employee", "years_exp"], [e.edge._to, e.edge.years_exp])
)
RETURN {"department": dep, employees: emps}
For your example database this returns:
[
{
"department": "departments/1",
"employees": [
{
"employee": "employees/3",
"years_exp": 5
},
{
"employee": "employees/2",
"years_exp": 4
}
]
},
{
"department": "departments/2",
"employees": [
{
"employee": "employees/1",
"years_exp": 6
}
]
}
]
If the query is too slow, an index on the year_exp-field of the dept_emp_edges collection could help (Explain suggests it would).

Optimising queries in mongodb

I am working on optimising my queries in mongodb.
In normal sql query there is an order in which where clauses are applied. For e.g. select * from employees where department="dept1" and floor=2 and sex="male", here first department="dept1" is applied, then floor=2 is applied and lastly sex="male".
I was wondering does it happen in a similar way in mongodb.
E.g.
DbObject search = new BasicDbObject("department", "dept1").put("floor",2).put("sex", "male");
here which match clause will be applied first or infact does mongo work in this manner at all.
This question basically arises from my background with SQL databases.
Please help.
If there are no indexes we have to scan the full collection (collection scan) in order to find the required documents. In your case if you want to apply with order [department, floor and sex] you should create this compound index:
db.employees.createIndex( { "department": 1, "floor": 1, "sex" : 1 } )
As documentation: https://docs.mongodb.org/manual/core/index-compound/
db.products.createIndex( { "item": 1, "stock": 1 } )
The order of the fields in a compound index is very important. In the
previous example, the index will contain references to documents
sorted first by the values of the item field and, within each value of
the item field, sorted by values of the stock field.

How to efficiently page batches of results with MongoDB

I am using the below query on my MongoDB collection which is taking more than an hour to complete.
db.collection.find({language:"hi"}).sort({_id:-1}).skip(5000).limit(1)
I am trying to to get the results in a batch of 5000 to process in either ascending or descending order for documents with "hi" as a value in language field. So i am using this query in which i am skipping the processed documents every time by incrementing the "skip" value.
The document count in this collection is just above 20 million.
An index on the field "language" is already created.
MongoDB Version i am using is 2.6.7
Is there a more appropriate index for this query which can get the result faster?
When you want to sort descending, you should create a multi-field index which uses the field(s) you sort on as descending field(s). You do that by setting those field(s) to -1.
This index should greatly increase the performance of your sort:
db.collection.ensureIndex({ language: 1, _id: -1 });
When you also want to speed up the other case - retrieving sorted in ascending order - create a second index like this:
db.collection.ensureIndex({ language: 1, _id: 1 });
Keep in mind that when you do not sort your results, you receive them in natural order. Natural order is often insertion order, but there is no guarantee for that. There are various events which can cause the natural order to get messed up, so when you care about the order you should always sort explicitly. The only exception to this rule are capped collections which always maintain insertion order.
In order to efficiently "page" through results in the way that you want, it is better to use a "range query" and keep the last value you processed.
You desired "sort key" here is _id, so that makes things simple:
First you want your index in the correct order which is done with .createIndex() which is not the deprecated method:
db.collection.createIndex({ "language": 1, "_id": -1 })
Then you want to do some simple processing, from the start:
var lastId = null;
var cursor = db.collection.find({language:"hi"});
cursor.sort({_id:-1}).limit(5000).forEach(funtion(doc) {
// do something with your document. But always set the next line
lastId = doc._id;
})
That's the first batch. Now when you move on to the next one:
var cursor = db.collection.find({ "language":"hi", "_id": { "$lt": lastId });
cursor.sort({_id:-1}).limit(5000).forEach(funtion(doc) {
// do something with your document. But always set the next line
lastId = doc._id;
})
So that the lastId value is always considered when making the selection. You store this between each batch, and continue on from the last one.
That is much more efficient than processing with .skip(), which regardless of the index will "still" need to "skip" through all data in the collection up to the skip point.
Using the $lt operator here "filters" all the results you already processed, so you can move along much more quickly.

mongo db count differs from aggregate sum

i have a query and when i validate it i see that the count command returns a different results from the aggregate result.
i have an array of sub-documents like so:
{
...
wished: [{'game':'dayz','appid':'1234'}, {'game':'half-life','appid':'1234'}]
...
}
i am trying to query a count of all games in the collection and return the name along with the count of how many times i found that game name.
if i go
db.user_info.count({'wished.game':'dayz'})
it returns 106 as the value and
db.user_info.aggregate([{'$unwind':'$wished'},{'$group':{'_id':'$wished.game','total':{'$sum':1}}},{'$sort':{'total':-1}}])
returns 110
i don't understand why my counts are different. the only thing i can think of is that it has to do with the data being in an array of sub-documents as opposed to being in an array or just in a document.
The $unwind statement will cause one user with multiple wished games to appear as several users. Imagine this data:
{
_id: 1,
wished: [{game:'a'}, {game:'b'}]
}
{
_id: 2,
wished: [{game:'a'}, {game:'c'}, {game:'a'}]
}
The count can NEVER be more than 2.
But with this same data, an $unwind will give you 5 different documents. Summing them up will then give you a:3, b:1, c:1.

CouchDB view, filtering with array keys

I am emitting an array-based key with 2 items, for reducing I am just using the built in _count function.
function (doc) {
emit([ doc.Name, doc.Date], null);
}
I want to filter by doc.Date while grouping for the count based only on doc.Name (group level 1), e.g between Jan 1st 2010 and Jan 1st 2011,
/_view/test?group_level=1&startkey=[{},20100101]&endkey=[{},20110101]
However the syntax I am using above doesn't seem to work. There are no results returned. I was under the impression that the {} placeholder is used to match to any key. If I enter a specific document name it works but filters only to that document name,
/_view/test?group_level=1&startkey=["MYDOC",20100101]&endkey=["MYDOC",20110101]
which comes out,
{"rows":[
{"key":["MYDOC"],"value":10}
]}
I do not want to filter at all on the document name, only on the underlying date range. Thanks.