CouchDB view, filtering with array keys - nosql

I am emitting an array-based key with 2 items, for reducing I am just using the built in _count function.
function (doc) {
emit([ doc.Name, doc.Date], null);
}
I want to filter by doc.Date while grouping for the count based only on doc.Name (group level 1), e.g between Jan 1st 2010 and Jan 1st 2011,
/_view/test?group_level=1&startkey=[{},20100101]&endkey=[{},20110101]
However the syntax I am using above doesn't seem to work. There are no results returned. I was under the impression that the {} placeholder is used to match to any key. If I enter a specific document name it works but filters only to that document name,
/_view/test?group_level=1&startkey=["MYDOC",20100101]&endkey=["MYDOC",20110101]
which comes out,
{"rows":[
{"key":["MYDOC"],"value":10}
]}
I do not want to filter at all on the document name, only on the underlying date range. Thanks.

Related

Optimising queries in mongodb

I am working on optimising my queries in mongodb.
In normal sql query there is an order in which where clauses are applied. For e.g. select * from employees where department="dept1" and floor=2 and sex="male", here first department="dept1" is applied, then floor=2 is applied and lastly sex="male".
I was wondering does it happen in a similar way in mongodb.
E.g.
DbObject search = new BasicDbObject("department", "dept1").put("floor",2).put("sex", "male");
here which match clause will be applied first or infact does mongo work in this manner at all.
This question basically arises from my background with SQL databases.
Please help.
If there are no indexes we have to scan the full collection (collection scan) in order to find the required documents. In your case if you want to apply with order [department, floor and sex] you should create this compound index:
db.employees.createIndex( { "department": 1, "floor": 1, "sex" : 1 } )
As documentation: https://docs.mongodb.org/manual/core/index-compound/
db.products.createIndex( { "item": 1, "stock": 1 } )
The order of the fields in a compound index is very important. In the
previous example, the index will contain references to documents
sorted first by the values of the item field and, within each value of
the item field, sorted by values of the stock field.

Query Exact Matches in MongoDB

I am working on developing a web application feature that suggests prices for users based on previous orders in the database. I am using the MongoDB NoSQL database. Before I begin, I am trying to figure out the best way to set up the order object to return the correct results.
When a user places an order such as the following: 1 cheeseburger + 1 fry, McDonalds, 12345 E. Street, MyTown, USA... it should only return objects that are EXACT matches from the database.
For example, I would not want to receive an order that contained 1 cheeseburger + 1 fry + 1 shake. I will be keeping running averages of the prices and counts for that exact order.
{
restaurantAddress: "12345 E. Street, MyTown, USA",
restaurantName: "McDonald's",
orders: {
{ cheeseburger: 1, fries: 2 }
: {
sumPaid: 1444.55,
numTimesOrdered: 167,
avgPaid: 8.65 (gets recomputed w/ each new order)
},
{ // repeat for each unique item config },
{ // another unique item (or items) }
}
Do you think this is a valid and efficient way to set up the document in MongoDB? Or should I be using multiple documents?
If this is valid, how can I query it to only return exact orders? I looked into $eq but it did not seem to be exactly what I was looking for.
So I believe we have solved the problem. The solution is to create a string that is unique for the order on the server side. For example, we will write a function that would transform the 1 cheeseburger + 2 fries into burger1fries2. In order to keep consistency in the database, we will first sort the entries alphabetically, so we will always hit what we intended with the query. A similar order of 2 fries + 1 cheeseburger would generate the string burger1fries2 as well.

How to efficiently page batches of results with MongoDB

I am using the below query on my MongoDB collection which is taking more than an hour to complete.
db.collection.find({language:"hi"}).sort({_id:-1}).skip(5000).limit(1)
I am trying to to get the results in a batch of 5000 to process in either ascending or descending order for documents with "hi" as a value in language field. So i am using this query in which i am skipping the processed documents every time by incrementing the "skip" value.
The document count in this collection is just above 20 million.
An index on the field "language" is already created.
MongoDB Version i am using is 2.6.7
Is there a more appropriate index for this query which can get the result faster?
When you want to sort descending, you should create a multi-field index which uses the field(s) you sort on as descending field(s). You do that by setting those field(s) to -1.
This index should greatly increase the performance of your sort:
db.collection.ensureIndex({ language: 1, _id: -1 });
When you also want to speed up the other case - retrieving sorted in ascending order - create a second index like this:
db.collection.ensureIndex({ language: 1, _id: 1 });
Keep in mind that when you do not sort your results, you receive them in natural order. Natural order is often insertion order, but there is no guarantee for that. There are various events which can cause the natural order to get messed up, so when you care about the order you should always sort explicitly. The only exception to this rule are capped collections which always maintain insertion order.
In order to efficiently "page" through results in the way that you want, it is better to use a "range query" and keep the last value you processed.
You desired "sort key" here is _id, so that makes things simple:
First you want your index in the correct order which is done with .createIndex() which is not the deprecated method:
db.collection.createIndex({ "language": 1, "_id": -1 })
Then you want to do some simple processing, from the start:
var lastId = null;
var cursor = db.collection.find({language:"hi"});
cursor.sort({_id:-1}).limit(5000).forEach(funtion(doc) {
// do something with your document. But always set the next line
lastId = doc._id;
})
That's the first batch. Now when you move on to the next one:
var cursor = db.collection.find({ "language":"hi", "_id": { "$lt": lastId });
cursor.sort({_id:-1}).limit(5000).forEach(funtion(doc) {
// do something with your document. But always set the next line
lastId = doc._id;
})
So that the lastId value is always considered when making the selection. You store this between each batch, and continue on from the last one.
That is much more efficient than processing with .skip(), which regardless of the index will "still" need to "skip" through all data in the collection up to the skip point.
Using the $lt operator here "filters" all the results you already processed, so you can move along much more quickly.

mongo db count differs from aggregate sum

i have a query and when i validate it i see that the count command returns a different results from the aggregate result.
i have an array of sub-documents like so:
{
...
wished: [{'game':'dayz','appid':'1234'}, {'game':'half-life','appid':'1234'}]
...
}
i am trying to query a count of all games in the collection and return the name along with the count of how many times i found that game name.
if i go
db.user_info.count({'wished.game':'dayz'})
it returns 106 as the value and
db.user_info.aggregate([{'$unwind':'$wished'},{'$group':{'_id':'$wished.game','total':{'$sum':1}}},{'$sort':{'total':-1}}])
returns 110
i don't understand why my counts are different. the only thing i can think of is that it has to do with the data being in an array of sub-documents as opposed to being in an array or just in a document.
The $unwind statement will cause one user with multiple wished games to appear as several users. Imagine this data:
{
_id: 1,
wished: [{game:'a'}, {game:'b'}]
}
{
_id: 2,
wished: [{game:'a'}, {game:'c'}, {game:'a'}]
}
The count can NEVER be more than 2.
But with this same data, an $unwind will give you 5 different documents. Summing them up will then give you a:3, b:1, c:1.

MongoDB select subdocument with aggregation function

I have a mongo DB collection that looks something like this:
{
{
_id: objectId('aabbccddeeff'),
objectName: 'MyFirstObject',
objectLength: 0xDEADBEEF,
objectSource: 'Source1',
accessCounter: {
'firstLocationCode' : 283,
'secondLocationCode' : 543,
'ThirdLocationCode' : 564,
'FourthLocationCode' : 12,
}
}
...
}
Now, assuming that this is not the only record in the collection and that most/all of the documents contain the accessCounter subdocument/field how will I go with selecting the x first documents where I have the most access from a specific location.
A sample "query" will be something like:
"Select the first 10 documents From myCollection where the accessCounter.firstLocationCode are the highest"
So a sample result will be X documents where the accessCounter. will be the greatest is the database.
Thank your for taking the time to read my question.
No need for an aggregation, that is a basic query:
db.collection.find().sort({"accessCounter.firstLocation":-1}).limit(10)
In order to speed this up, you should create a subdocument index on accessCounter first:
db.collection.ensureIndex({'accessCounter':-1})
assuming the you want to do the same query for all locations. In case you only want to query firstLocation, create the index on accessCounter.firstLocation.
You can speed this up further in case you only need the accessCounter value by making this a so called covered query, a query of which the values to return come from the index itself. For example, when you have the subdocument indexed and you query for the top secondLocations, you should be able to do a covered query with:
db.collection.find({},{_id:0,"accessCounter.secondLocation":1})
.sort("accessCounter.secondLocation":-1).limit(10)
which translates to "Get all documents ('{}'), don't return the _id field as you do by default ('_id:0'), get only the 'accessCounter.secondLocation' field ('accessCounter.secondLocation:1'). Sort the returned values in descending order and give me the first ten."