MongoDB: how to find out how many objects were skipped in query - mongodb

I'm running MongoDB v.2-4-4-pre- under Linux. I use simple find() operation with "skip" parameter to select some elements from my DB.
Is there any way to find out how many objects were skipped by my query?

It skips the the number of objects that YOU provide as an argument to the skip(). Example from here
db.article.aggregate(
{ $skip : 5 }
);
This operation skips the first 5 documents passed to it by the
pipeline. $skip has no effect on the content of the documents it
passes along the pipeline.
EDIT #1
Based on your comment, I believe you need to execute the count command.

Related

What's the difference between following two mongodb queries?

I ran following two queries and they returned different results.
// my query 1
> db.events.count({"startTimeUnix":{$lt:1533268800000},"startTimeUnix":{$gte:1533182400000}})
131
// existing app query 2
> db.events.count({"startTimeUnix":{"$lt":1533268800000,"$gte":1533182400000}})
0
The query 2 is already being used in the batch application but it reported to pulling less records which I confirmed from these queries.
//these counts are confusing
> db.events.count()
2781
> db.events.count({"startTimeUnix":{$lt:1533268800000}})
361
> db.events.count({"startTimeUnix":{$gte:1533182400000}})
2780
Use the second query. You can add explain() to find out the query plans. The first query
db.events.count({"startTimeUnix":{$lt:1533268800000},"startTimeUnix":{$gte:1533182400000}})
is evaluated the same as
db.events.explain().count({"startTimeUnix":{$gte:1533182400000}})
Use the command below to view the query plans.
db.events.explain().count({"startTimeUnix":{$lt:1533268800000},"startTimeUnix":{$gte:1533182400000}})
query 2 is an impicit (and proper) way of building AND condition, query 1 is incorrect in terms of MongoDB syntax. The way it gets analyzed is pretty simple, MongoDB takes first condtion and then overrides it with second one so it has the same meaning as:
db.events.count({"startTimeUnix":{$gte:1533182400000}})
first condition simply gets ignored and that's why you're getting more results (described here)
The problem is that mongo doesn't parse operators if the are in quotes.
db.events.count({"startTimeUnix":{"$lt":1533268800000,"$gte":1533182400000}})
means that it looks for the entries where startTimeUnix is an object and contains fields "$lt" and "$gte"
If you'll the next command, this query starts returning 1:
db.events.insert({"startTimeUnix":{"$lt":1533268800000,"$gte":1533182400000}})

Avoid sending full table with MongoDB and Pymongo

I'm trying to get the min and max value from some fields inside a collection. I'm not sure if this:
result = collection.find(date_filter, expected_projection).sort({'attribute': -1}).limit(1)
is equivalent to this:
result_a = collection.find(date_filter, expected_projection)
result_b = result_a.sort({'attribute': -1}).limit(1)
I don't want the server to query all the data in result_a from the database. Is the first line of code actually fetching every document in my collection and THEN sorting it, or just fetching the max element in the attribute field?
No, they aren't equivalent; and MongoDB will not return the entire collection to the client - whether or not the attribute field is indexed.
When you chain operators together in a MongoDB command (e.g. find().sort().limit()), it is not treated by the MongoDB server as a set of separate functions to be called sequentially; it is treated as a single query which should be optimised as a whole and executed as a whole on the MongoDB server.
See the documentation on Combining Cursor Methods for another example of how the chaining is not taken as a sequence of independent operations:
The following statements chain cursor methods limit() and sort():
db.bios.find().sort( { name: 1 } ).limit( 5 )
db.bios.find().limit( 5 ).sort( { name: 1 } )
The two statements are equivalent; i.e. the order in which you chain the limit() and the sort() methods is not significant. Both statements return the first five documents, as determined by the ascending sort order on ‘name’.
The first line of code tells MongoDB to return only the document with the lowest value for "attribute". If "attribute" is indexed, then MongoDB can directly access only that one document, and not even consider the rest of the collection.
Do this once:
collection.create_index([('attribute', 1)])
Having that index in place means you can find the highest-sorting or lowest-sorting document practically instantly.

How can I remove the matching element from the array in mongo db?

So I'm using aggregation framework to sum some results.
My question is how do I avoid returning count: 0 every time if there was the sum result equal to 0.
I want to display only count: 20 and to get rid of the other count: 0.
Links:
ifNull helps to populate a null entry with a desired value
something similar but the reverse
What you are trying to do is remove a field from a document in the pipeline based on the value of the field. I do not believe this is possible with the aggregation framework (as of 2.6). It seems like a non-problem though, since it should be trivial to ignore the count : 0 results wherever you are using them. A full aggregation pipeline would be helpful to see, because there may be modifications earlier in the pipeline that would prevent any count fields being created with a value of 0, solving the problem from the other direction, so to speak.

difference between aggregate ($match) and find, in MongoDB?

What is the difference between the $match operator used inside the aggregate function and the regular find in Mongodb?
Why doesn't the find function allow renaming the field names like the aggregate function?
e.g. In aggregate we can pass the following string:
{ "$project" : { "OrderNumber" : "$PurchaseOrder.OrderNumber" , "ShipDate" : "$PurchaseOrder.ShipDate"}}
Whereas, find does not allow this.
Why does not the aggregate output return as a DBCursor or a List? and also why can't we get a count of the documents that are returned?
Thank you.
Why does not the aggregate output return as a DBCursor or a List?
The aggregation framework was created to solve easy problems that otherwise would require map-reduce.
This framework is commonly used to compute data that requires the full db as input and few document as output.
What is the difference between the $match operator used inside the aggregate function and the regular find in Mongodb?
One of differences, like you stated, is the return type. Find operations output return as a DBCursor.
Other differences:
Aggregation result must be under 16MB. If you are using shards, the full data must be collected in a single point after the first $group or $sort.
$match only purpose is to improve aggregation's power, but it has some other uses, like improve the aggregation performance.
and also why can't we get a count of the documents that are returned?
You can. Just count the number of elements in the resulting array or add the following command to the end of the pipe:
{$group: {_id: null, count: {$sum: 1}}}
Why doesn't the find function allow renaming the field names like the aggregate function?
MongoDB is young and features are still coming. Maybe in a future version we'll be able to do that. Renaming fields is more critical in aggregation than in find.
EDIT (2014/02/26):
MongoDB 2.6 aggregation operations will return a cursor.
EDIT (2014/04/09):
MongoDB 2.6 was released with the predicted aggregation changes.
I investigated a few things about the aggregation and find call:
I did this with a descending sort in a table of 160k documents and limited my output to a few documents.
The Aggregation command is slower than the find command.
If you access to the data like ToList() the aggregation command is faster than the find.
if you watch at the total times (point 1 + 2) the commands seem to be equal
Maybe the aggregation automatically calls the ToList() and does not have to call it again. If you dont call ToList() afterwards the find() call will be much faster.
7 [ms] vs 50 [ms] (5 documents)

How does MongoDB evaluates multiple $or statements?

How will MongoDB evaluate this query:
db.testCol.find(
{
"$or" : [ {a:1, b:12}, {b:9, c:15}, {c:10, d:"foo"} ]
});
When scanning values in a document if first OR statement is TRUE will the other statements be also be evaluated?
Logically if the MongoDB is optimized other values in OR statement should not be evaluated, but I don't know how MongoDB is implemented.
UPDATE:
I updated my query because it was wrong and it didn't explain correctly what I was trying to accomplish. I need to find a set of documents that have different properties and if an exact combination of these properties is found the document must be returned.
The SQL equivalent of my query would be:
SELECT * FROM testCol
WHERE (a = 1 AND b = 12) OR (b = 9 AND c = 15) OR (c = 10 AND d = 'foo');
MongoDB will execute each clause of the $or operation as a seperate query and remove duplicates as a post processing pass. As such each clause can use a seperate index which is often very useful.
In other words, it will NOT look at 1 document, see which of the OR clauses apply and do an early-out if the first clause is a match. Rather it does a full dataset query per clause and de-dupe after the fact. This may seem less than efficient but in practice it's almost always faster since the first approach would only be able to hit at most one index for all clauses which is rarely efficient.
EDIT: Mongo only skips documents during the de-duplication process, not during the table scans.
Mongo won't check documents that are already part of the result set. So if your first {a:1, b:12} returns 100% of the documents, Mongo is done.
You want to put whatever will grab the most documents as your first evaluated statement because of this. If your first item only grabs 1% of documents, the subsequent item will need to scan the other 99%.
That being said, you are using $or to look for values in a single key. I think you want to use $in for this.
See here for more:
http://books.google.com/books?id=BQS33CxGid4C&lpg=PA48&ots=PqvQJPRUoe&dq=mongo%20tips%20and%20tricks%20%22OR-query%22&pg=PA48#v=onepage&q&f=false