Server Side Looping - mongodb

I’ve solved this problem but looking for a better way to do it on the mongodb server rather that client.
I have one collection of Orders with a placement datetime (iso date) and a product.
{ _id:1, datetime:“T1”, product:”Apple”}
{ _id:2, datetime:“T2”, product:”Orange”}
{ _id:3, datetime:“T3”, product:”Pear”}
{ _id:4, datetime:“T4”, product:”Pear”}
{ _id:5, datetime:“T5”, product:”Apple”}
Goal: For a given time (or set of times) show the last order for EACH product in the set of my products before that time. Products are finite and known.
eg. query for time T6 will return:
{ _id:2, datetime:“T2”, product:”Orange”}
{ _id:4, datetime:“T4”, product:”Pear”}
{ _id:5, datetime:“T5”, product:”Apple”}
T4 will return:
{ _id:1, datetime:“T1”, product:”Apple”}
{ _id:2, datetime:“T2”, product:”Orange”}
{ _id:4, datetime:“T4”, product:”Pear”}
i’ve implemented this by creating a composite index on orders [datetime:descending, product:ascending]
Then on the java client:
findLastOrdersForTimes(times) {
for (time: times) {
for (product: products) {
db.orders.findOne(product:product, datetime: { $lt: time}}
}
}
}
Now that is pretty fast since it hits the index and only fetching the data i need. However I need to query for many time points (100000+) which will be a lot of calls over the network. Also my orders table will be very large. So how can I do this on the server in one hit, i.e return a collection of time->array products? If it was oracle, id create a stored proc with a cursor that loops back in time and collects the results for every time point and breaks when it gets to the last product after the last time point. I’ve looked at the aggregation framework and mapreduce but can’t see how to achieve this kind of loop. Any pointers?

If you truly want the last order for each product, then the aggregation framework comes in:
db.times.aggregate([
{ "$match": {
"product": { "$in": products },
}},
{ "$group": {
"_id": "$product",
"datetime": { "$max": "$datetime" }
}}
])
Example with an array of products:
var products = ['Apple', 'Orange', 'Pear'];
{ "_id" : "Pear", "datetime" : "T4" }
{ "_id" : "Orange", "datetime" : "T2" }
{ "_id" : "Apple", "datetime" : "T5" }
Or if the _id from the original document is important to you, use the $sort with $last instead:
db.times.aggregate([
{ "$match": {
"product": { "$in": products },
}},
{ "$sort": { "datetime": 1 } },
{ "$group": {
"_id": "$product",
"id": { "$last": "$_id" },
"datetime": { "$last": "$datetime" }
}}
])
And that is what you most likely really want to do in either of those last cases. But the index you really want there is on "product":
db.times.ensureIndex({ "product": 1 })
So even if you need to iterate that with an additional $match condition for $lt a certain timepoint, then that is better or otherwise you can modify the "grouping" to include the "datetime" as well as keeping a set in the $match.
It seems better at any rate, so perhaps this helps at least to modify your thinking.
If I'm reading out your notes correctly you seem to simply be looking for turning this on it's head and finding the last product for each point in time. So the statement is not much different:
db.times.aggregate([
{ "$match": {
"datetime": { "$in": ["T4","T5"] },
}},
{ "$sort": { "product": 1, "datetime": 1 } },
{ "$group": {
"_id": "$datetime",
"id": { "$last": "$_id" },
"product": { "$last": "$product" }
}}
])
That is in theory it is like that based on how you present the question. I have the feeling though that you are abstracting this though and "datetime" is possibly actual timestamps as date object types.
So you might not be aware of the date aggregation operators you can apply, for example to get the boundary of each hour:
db.times.aggregate([
{ "$group": {
"_id": {
"year": { "$year": "$datetime" },
"dayOfYear": { "$dayOfYear": "$datetime" },
"hour": { "$hour": "$datetime" }
},
"id": { "$last": "$_id" },
"datetime": { "$last": "$datetime" },
"product": { "$last": "$product" }
}}
])
Or even using date math instead of the operators if a epoch based timestamp
db.times.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$datetime", new Date("1970-01-01") ] },
{ "$mod": [
{ "$subtract": [ "$datetime", new Date("1970-01-01") ] },
1000*60*60
]}
]
},
"id": { "$last": "$_id" },
"datetime": { "$last": "$datetime" },
"product": { "$last": "$product" }
}}
])
Of course you can add a range query for dates in the $match with $gt and $lt operators to keep the data within the range you are particularly looking at.
Your overall solution is probably a combination of ideas, but as I said, your question seem to be about matching the last entries on certain time boundaries, so the last examples possibly in combination with filtering certain products is what you need rather than looping .findOne() requests.

Related

How to sum values in a nested date range in MongoDB

I need to sum the values for 2018-06-01 through 2018-06-30 for each document in the collection. Each key in "days" is a different date and value. What should the mongo aggregate command look like? Result should look something like {
_id: Product_123 ,
June_Sum:
value}
That's really not a great structure for the sort of operation you now want to do. The whole point of keeping data in such a format is that you "increment" it as you go.
For example:
var now = Date.now(),
today = new Date(now - ( now % ( 1000 * 60 * 60 * 24 ))).toISOString().substr(0,10);
var product = "Product_123";
db.counters.updateOne(
{
"month": today.substr(0,7),
"product": product
},
{
"$inc": {
[`dates.${today}`]: 1,
"totals": 1
}
},
{ "upsert": true }
)
In that way the subsequent updates with $inc apply to both the "key" used for the "date" and also increment the "totals" property of the matched document. So after a few iterations you would end up with something like:
{
"_id" : ObjectId("5af395c53945a933add62173"),
"product": "Product_123",
"month": "2018-05",
"dates" : {
"2018-05-10" : 2,
"2018-05-09" : 1
},
"totals" : 3
}
If you're not actually doing that then you "should" be since it's the intended usage pattern for such a structure.
Without keeping a "totals" or like type of entry within the document(s) storing these keys the only methods left for "aggregation" in processing are to effectively coerce the the "keys" into an "array" form.
MongoDB 3.6 with $objectToArray
db.colllection.aggregate([
// Only consider documents with entries within the range
{ "$match": {
"$expr": {
"$anyElementTrue": {
"$map": {
"input": { "$objectToArray": "$days" },
"in": {
"$and": [
{ "$gte": [ "$$this.k", "2018-06-01" ] },
{ "$lt": [ "$$this.k", "2018-07-01" ] }
]
}
}
}
}
}},
// Aggregate for the month
{ "$group": {
"_id": "$product", // <-- or whatever your key for the value is
"total": {
"$sum": {
"$sum": {
"$map": {
"input": { "$objectToArray": "$days" },
"in": {
"$cond": {
"if": {
"$and": [
{ "$gte": [ "$$this.k", "2018-06-01" ] },
{ "$lt": [ "$$this.k", "2018-07-01" ] }
]
},
"then": "$$this.v",
"else": 0
}
}
}
}
}
}
}}
])
Other versions with mapReduce
db.collection.mapReduce(
// Taking the same presumption on your un-named key for "product"
function() {
Object.keys(this.days)
.filter( k => k >= "2018-06-01" && k < "2018-07-01")
.forEach(k => emit(this.product, this.days[k]));
},
function(key,values) {
return Array.sum(values);
},
{
"out": { "inline": 1 },
"query": {
"$where": function() {
return Object.keys(this.days).some(k => k >= "2018-06-01" && k < "2018-07-01")
}
}
}
)
Both are pretty horrible since you need to calculate whether the "keys" fall within the required range even to select the documents and even then still filter through the keys in those documents again in order to decide whether to accumulate for it or not.
Also noting here that if your "Product_123' is also the "name of a key" in the document and NOT a "value", then you're performing even more "gymnastics" to simply convert that "key" into a "value" form, which is how databases do things and the whole point of the the unnecessary coercion going on here.
Better Option
So as opposed to the handling as originally shown where you "should" be accumulating "as you go" with every write to the document(s) at hand, the better option than needing "processing" in order to coerce into an array format is to simply put the data into an array in the first place:
{
"_id" : ObjectId("5af395c53945a933add62173"),
"product": "Product_123",
"month": "2018-05",
"dates" : [
{ "day": "2018-05-09", "value": 1 },
{ "day": "2018-05-10", "value": 2 }
},
"totals" : 3
}
These are infinitely better for purposes of query and further analysis:
db.counters.aggregate([
{ "$match": {
// "month": "2018-05" // <-- or really just that, since it's there
"dates": {
"day": {
"$elemMatch": {
"$gte": "2018-05-01", "$lt": "2018-06-01"
}
}
}
}},
{ "$group": {
"_id": null,
"total": {
"$sum": {
"$sum": {
"$filter": {
"input": "$dates",
"cond": {
"$and": [
{ "$gte": [ "$$this.day", "2018-05-01" ] },
{ "$lt": [ "$$this.day", "2018-06-01" ] }
]
}
}
}
}
}
}}
])
Which is of course really efficient, and kind of deliberately avoiding the "total" field that is already there for demonstration only. But of course you keep the "running accumulation" on writes by doing:
db.counters.updateOne(
{ "product": product, "month": today.substr(0,7)}, "dates.day": today },
{ "$inc": { "dates.$.value": 1, "total": 1 } }
)
Which is really simple. Adding upserts adds a "little" more complexity:
// A "batch" of operations with bulkWrite
db.counter.bulkWrite([
// Incrementing the matched element
{ "udpdateOne": {
"filter": {
"product": product,
"month": today.substr(0,7)},
"dates.day": today
},
"update": {
"$inc": { "dates.$.value": 1, "total": 1 }
}
}},
// Pushing a new "un-matched" element
{ "updateOne": {
"filter": {
"product": product,
"month": today.substr(0,7)},
"dates.day": { "$ne": today }
},
"update": {
"$push": { "dates": { "day": today, "value": 1 } },
"$inc": { "total": 1 }
}
}},
// "Upserting" a new document were not matched
{ "updateOne": {
"filter": {
"product": product,
"month": today.substr(0,7)},
},
"update": {
"$setOnInsert": {
"dates": [{ "day": today, "value": 1 }],
"total": 1
}
},
"upsert": true
}}
])
But generally your getting the "best of both worlds" by having something simple to accumulate "as you go" as well as something that's easy and efficient to query and do other analysis on later.
The overall moral of the story is to "choose the right structure" for what you actually want to do. Don't put things into "keys" which are clearly intended to be used as "values", since it's an anti-pattern which just adds complexity and inefficiency to the rest of your purposes, even if it seemed right for a "single" purpose when you originally stored it that way.
NOTE Also not really advocating storing "strings" for "dates" in any way here. As noted the better approach is to use "values" where you really mean "values" you intend to use. When storing date data as a "value" it is always far more efficient and practical to store as a BSON Date, and NOT a "string".

MongoDB: How to Get the Lowest Value Closer to a given Number and Decrement by 1 Another Field

Given the following document containing 3 nested documents...
{ "_id": ObjectId("56116d8e4a0000c9006b57ac"), "name": "Stock 1", "items" [
{ "price": 1.50, "description": "Item 1", "count": 10 }
{ "price": 1.70, "description": "Item 2", "count": 13 }
{ "price": 1.10, "description": "Item 3", "count": 20 }
]
}
... I need to select the sub-document with the lowest price closer to a given amount (here below I assume 1.05):
db.stocks.aggregate([
{$unwind: "$items"},
{$sort: {"items.price":1}},
{$match: {"items.price": {$gte: 1.05}}},
{$group: {
_id:0,
item: {$first:"$items"}
}},
{$project: {
_id: "$item._id",
price: "$item.price",
description: "$item.description"
}}
]);
This works as expected and here is the result:
"result" : [
{
"price" : 1.10,
"description" : "Item 3",
"count" : 20
}
],
"ok" : 1
Alongside returning the item with the lowest price closer to a given amount, I need to decrement count by 1. For instance, here below is the result I'm looking for:
"result" : [
{
"price" : 1.10,
"description" : "Item 3",
"count" : 19
}
],
"ok" : 1
It depends on whether you actually want to "update" the result or simply "return" the result with a decremented value. In the former case you will of course need to go back to the document and "decrement" the value for the returned result.
Also want to note that what you "think" is efficient here is actually not. Doing the "filter" of elements "post sort" or even "post unwind" really makes no difference at all to how the $first accumulator works in terms of performance.
The better approach is to basically "pre filter" the values from the array where possible. This reduces the document size in the aggregation pipeline, and the number of array elements to be processed by $unwind:
db.stocks.aggregate([
{ "$match": {
"items.price": { "$gte": 1.05 }
}},
{ "$project": {
"items": {
"$setDifference": [
{ "$map": {
"input": "$items",
"as": "item",
"in": {
"$cond": [
{ "$gte": [ "$$item.price", 1.05 ] }
],
"$$item",
false
}
}},
[false]
]
}
}},
{ "$unwind": "$items"},
{ "$sort": { "items.price":1 } },
{ "$group": {
"_id": 0,
"item": { "$first": "$items" }
}},
{ "$project": {
"_id": "$item._id",
"price": "$item.price",
"description": "$item.description"
}}
]);
Of course that does require a MongoDB version 2.6 or greater server to have the available operators, and going by your output you may have an earlier version. If that is the case then at least loose the $match as it does not do anything of value and would be detremental to performance.
Where a $match is useful, is in the document selection before you do anything, as what you always want to avoid is processing documents that do not even possibly meet the conditions you want from within the array or anywhere else. So you should always $match or use a similar query stage first.
At any rate, if all you wanted was a "projected result" then just use $subtract in the output:
{ "$project": {
"_id": "$item._id",
"price": "$item.price",
"description": "$item.description",
"count": { "$subtract": [ "$item.count", 1 ] }
}}
If you wanted however to "update" the result, then you would be iterating the array ( it's still an array even with one result ) to update the matched item and "decrement" the count via $inc:
var result = db.stocks.aggregate([
{ "$match": {
"items.price": { "$gte": 1.05 }
}},
{ "$project": {
"items": {
"$setDifference": [
{ "$map": {
"input": "$items",
"as": "item",
"in": {
"$cond": [
{ "$gte": [ "$$item.price", 1.05 ] }
],
"$$item",
false
}
}},
[false]
]
}
}},
{ "$unwind": "$items"},
{ "$sort": { "items.price":1 } },
{ "$group": {
"_id": 0,
"item": { "$first": "$items" }
}},
{ "$project": {
"_id": "$item._id",
"price": "$item.price",
"description": "$item.description"
}}
]);
result.forEach(function(item) {
db.stocks.update({ "item._id": item._id},{ "$inc": { "item.$.count": -1 }})
})
And on a MongoDB 2.4 shell, your same aggregate query applies ( but please make the changes ) however the result contains another field called result inside it with the array, so add the level:
result.result.forEach(function(item) {
db.stocks.update({ "item._id": item._id},{ "$inc": { "item.$.count": -1 }})
})
So either just $project for display only, or use the returned result to effect an .update() on the data as required.

How to find events that occurred in a timeframe (mongo)

I have the following document structure:
{ _id:ID1
value: { data:{userData:{name:aaa,surname:bbb}}
events:[
{even1tName:{timestamp:UNIX_TIMESTAMP,value:NUMBER}},
{even2tName:{timestamp:UNIX_TIMESTAMP,value:NUMBER}},
{even3tName:{timestamp:UNIX_TIMESTAMP,value:NUMBER}},
{even4tName:{timestamp:UNIX_TIMESTAMP,value:NUMBER}},
],
activity:{countEvents:INTEGER,totalValue:NUMBER}
}
}
This is the output of a MapReduce pipe, I need to find using aggregation, what users have a certain amount of events and a certain amount of value (summed up), within a timeframe. Consider these are online buyers and I need to find those that have made 3 purchases within the last month or those that have bought of a total amount greater than $300.
Your question is a bit light on information, but the main thing is that as long as there is consistent "keyname" naming in the documents then this really is not an issue:
db.junk.aggregate([
// Match where type within timeframe
{ "$match": {
"value.events.confirmedSale.timestamp": {
"$gte": startTime, "$lt": endTime
}
}},
// Pre-filter the array for required data
{ "$project": {
"value": {
"data": "$value.data",
"events": {
"$setDifference": [
{"$map": {
"input": "$value.events",
"as": "el",
"in": {
"$cond": [
{ "$and": [
{ "$gte": [ "$$el.confirmedSale.timestamp", startTime ] },
{ "$lt": [ "$$el.confirmedSale.timestamp", endTime ] }
]},
"$$el",
false
]
}
}},
[false]
]
}
}
}},
// Unwind array elements for processing
{ "$unwind": "$value.events" },
// Group data
{ "$group": {
"_id": "$_id",
"value": { "$sum": "$value.events.confirmedSale.value"},
"count": { "$sum": 1 }
}},
// Filter results on totals
{ "$match": {
"value": { "$gte": 300, "count": { "$gte": 3 } }
}}
])
However, due to the document structure you cannot really get more extensive than that. Such naming requires "path names" to embedded objects to be absolute, and this particular case does not do well for indexing either.
With some control over the document creation, then it should look more like this:
{ _id: 1,
value: {
data:{
userData:{name:"aaa",surname:"bbb"}
},
events:[
{ "type": "adCLick", "timestamp": 1234, "value": 1234 },
{ "type": "confirmedSale", "timestamp": 5678, "value": 5678 },
{ "type": "confirmedSale", "timestamp": 4567, "value": 4567 },
{ "type": "something", "timestamp": 9876, "value": 9876}
]
}
}
Now that the field Name you were using here is actually now just a consistent "data" property, the query can be more clearly readable, do more with combined events that you cannot do, and also work in the use of indexes for performance.
MongoDB is primarily a "database", if you do not keep consistent naming paths then you will have performance and feature loss as a consequence. The aggregation framework is the "high performance" option over mapReduce with JavaScript. Working with a set key pattern is fine for the aggregation framework, but if you vary that pattern, then your only option is mapReduce.

How to calculate difference between values of different documents using mongo aggregation?

Hi my mongo structure as below
{
"timemilliSec":1414590255,
"data":[
{
"x":23,
"y":34,
"name":"X"
},
{
"x":32,
"y":50,
"name":"Y"
}
]
},
{
"timemilliSec":1414590245,
"data":[
{
"x":20,
"y":13,
"name":"X"
},
{
"x":20,
"y":30,
"name":"Y"
}
]
}
Now I want to calculate difference of first document and second document and second to third in this way
so calculation as below
diffX = ((data.x-data.x)/(data.y-data.y)) in our case ((23-20)/(34-13))
diffY = ((data.x-data.x)/(data.y-data.y)) in our case ((32-20)/(50-30))
Tough question in principle, but I'm going to stay with the simplified case you present of two documents and base a solution around that. The concepts should abstract, but are more difficult for expanded cases. Possible with the aggregation framework in general:
db.collection.aggregate([
// Match the documents in a pair
{ "$match": {
"timeMilliSec": { "$in": [ 1414590255, 1414590245 ] }
}}
// Trivial, just keeping an order
{ "$sort": { "timeMilliSec": -1 } },
// Unwind the arrays
{ "$unwind": "$data" },
// Group first and last
{ "$group": {
"_id": "$data.name",
"firstX": { "$first": "$data.x" },
"lastX": { "$last": "$data.x" },
"firstY": { "$first": "$data.y" },
"lastY": { "$last": "$data.y" }
}},
// Difference on the keys
{ "$project": {
"diff": {
"$divide": [
{ "$subtract": [ "$firstX", "$lastX" ] },
{ "$subtract": [ "$firstY", "$lastY" ] }
]
}
}},
// Not sure you want to take it this far
{ "$group": {
"_id": null,
"diffX": {
"$min": {
"$cond": [
{ "$eq": [ "$_id", "X" ] },
"$diff",
false
]
}
},
"diffY": {
"$min": {
"$cond": [
{ "$eq": [ "$_id", "Y" ] },
"$diff",
false
]
}
}
}}
])
Possibly overblown, not sure of the intent, but the output of this based on the sample would be:
{
"_id" : null,
"diffX" : 0.14285714285714285,
"diffY" : 0.6
}
Which matches the calculations.
You can adapt to your case, but the general principle is as shown.
The last "pipeline" stage there is a little "extreme" as all that is done is combine the results into a single document. Otherwise, the "X" and "Y" results are already obtained in two documents in the pipeline. Mostly by the $group operation with $first and $last operations to find the respective elements on the grouping boundary.
The subsequent operations in $project as a pipeline stage performs the required math to determine the distinct results. See the aggregation operators for more details, particularly $divide and $subtract.
Whatever you do you follow this course. Get a "start" and "end" pair on your two keys. Then perform the calculations.

How to optimize mongoDB query?

I am having following sample document in the mongoDB.
{
"location" : {
"language" : null,
"country" : "null",
"city" : "null",
"state" : null,
"continent" : "null",
"latitude" : "null",
"longitude" : "null"
},
"request" : [
{
"referrer" : "direct",
"url" : "http://www.google.com/"
"title" : "index page"
"currentVisit" : "1401282897"
"visitedTime" : "1401282905"
},
{
"referrer" : "direct",
"url" : "http://www.stackoverflow.com/",
"title" : "index page"
"currentVisit" : "1401282900"
"visitedTime" : "1401282905"
},
......
]
"uuid" : "109eeee0-e66a-11e3"
}
Note:
The database contains more than 10845 document
Each document contains nearly 100 request(100 object in the request array).
Technology/Language - node.js
I had setProfiling to check the execution time
First Query - 13899ms
Second Query - 9024ms
Third Query - 8310ms
Fourth Query - 6858ms
There is no much difference using indexing
Queries:
I am having the following aggregation queries to be executed to fetch the data.
var match = {"request.currentVisit":{$gte:core.getTime()[1].toString(),$lte:core.getTime()[0].toString()}};
For Example: var match = {"request.currentVisit":{$gte:"1401282905",$lte:"1401282935"}};
For third and fourth query request.visitedTime instead of request.currentVisit
First
[
{ "$project":{
"request.currentVisit":1,
"request.url":1
}},
{ "$match":{
"request.1": {$exists:true}
}},
{ "$unwind": "$request" },
{ "$match": match },
{ "$group": {
"_id": {
"url":"$request.url"
},
"count": { "$sum": 1 }
}},
{ "$sort":{ "count": -1 } }
]
Second
[
{ "$project": {
"request.currentVisit":1,
"request.url":1
}},
{ "$match": {
"request":{ "$size": 1 }
}},
{ "$unwind": "$request" },
{ "$match": match },
{ "$group": {
"_id":{
"url":"$request.url"
},
"count":{ "$sum": 1 }
}},
{ "$sort": { "count": -1} }
]
Third
[
{ "$project": {
"request.visitedTime":1,
"uuid":1
}},
{ "$match":{
"request.1": { "$exists": true }
}},
{ "$match": match },
{ "$group": {
"_id": "$uuid",
"count":{ "$sum": 1 }
}},
{ "$group": {
"_id": null,
"total": { "$sum":"$count" }}
}}
]
Forth
[
{ "$project": {
"request.visitedTime":1,
"uuid":1
}},
{ "$match":{
"request":{ "$size": 1 }
}},
{ "$match": match },
{ "$group": {
"_id":"$uuid",
"count":{ "$sum": 1 }
}},
{ "$group": {
"_id":null,
"total": { "$sum": "$count" }
}}
]
Problem:
It is taking more than 38091 ms to fetch the data.
Is there any way to optimize the query?
Any suggestion will be grateful.
Well there are a few problems and you definitely need indexes, but you cannot have compound ones. It is the "timestamp" values that you are querying within the array that you want to index. It would also be advised that you either convert these to numeric values rather than the current strings, or indeed to BSON Date types. The latter form is actually internally stored as a numeric timestamp value, so there is a general storage size reduction, which also reduces the index size as well as being more efficient to match on the numeric values.
The big problem with each query is that you are always later diving into the "array" contents after processing an $unwind and then "filtering" that with match. While this what you want to do for your result, since you have not applied the same filter at an earlier stage, you have many documents in the pipeline that do not match these conditions when you $unwind. The result is "lots" of documents you do not need being processed in this stage. And here you cannot use an index.
Where you need this match is at the start of the pipeline stages. This narrows down the documents to the "possible" matches before that acutual array is filtered.
So using the first as an example:
[
{ "$match":{
{ "request.currentVisit":{
"$gte":"1401282905", "$lte": "1401282935"
}
}},
{ "$unwind": "$request" },
{ "$match":{
{ "request.currentVisit":{
"$gte":"1401282905", "$lte": "1401282935"
}
}},
{ "$group": {
"_id": {
"url":"$request.url"
},
"count": { "$sum": 1 }
}},
{ "$sort":{ "count": -1 } }
]
So a few changes. There is a $match at the head of the pipeline. This narrows down documents and is able to use an index. That is the most important performance consideration. Golden rule, always "match" first.
The $project you had in there was redundant as you cannot project "just" the fields of an array that is yet unwound. There is also a misconception that people believe they $project first to reduce the pipeline. The effect is very minimal if in fact there is a later $project or $group statement that actually limits the fields, then this will be "forward optimized" so things do get taken out of the pipeline processing for you. Still the $match statement above does more to optimize.
Dropping the need to see if the array is actually there with the other $match stage, as you are now "implicitly" doing that at the start of the pipeline. If more conditions make you more comfortable, then add them to that initial pipeline stage.
The rest remains unchanged, as you then $unwind the array and $match to filter the items that you actually want before moving on to your remaining processing. By now, the input documents have been significantly reduced, or reduced as much as they are going to be.
The other alternative that you can do with MongoDB 2.6 and greater is "filter" the array content before you even **$unwind it. This would produce a listing like this:
[
{ "$match":{
{ "request.currentVisit":{
"$gte":"1401282905", "$lte": "1401282935"
}
}},
{ "$project": {
"request": {
"$setDifference": [
{
"$map": {
"input": "$request",
"as": "el",
"in": {
"$cond"": [
{
"$and":[
{ "$gte": [ "1401282905", "$$el.currentVisit" ] },
{ "$lt": [ "1401282935", "$$el.currentVisit" ] }
]
}
"$el",
false
]
}
}
}
[false]
]
}
}}
{ "$unwind": "$request" },
{ "$group": {
"_id": {
"url":"$request.url"
},
"count": { "$sum": 1 }
}},
{ "$sort":{ "count": -1 } }
]
That may save you some by being able to "filter" the array before the $unwind and which is possibly better than doing the $match afterwards.
But this is the general rule for all of your statements. You need usable indexes and you need to $match first.
It is possible that the actual results you really want could be obtained in a single query, but as it stands your question is not presented that way. Try changing your processing as outlined, and you should see a notable improvement.
If you are still then trying to come to terms with how this could possibly be singular, then you can always ask another question.