Is there a difference between $lt/$gt and $ne in MongoDB? - mongodb

I am just getting started with MongoDB and trying to understand how indexes work. I have a list of items in a collection. Each item has a version that gets incremented. Then, all previous versions (less than current version) get removed (record is not updated so that both versions are available for a while). There is a compound index on item ID and version. For removing, does it make a difference (in terms of performance) whether you use $ne versus $lt?
I would assume no, but I just want to confirm.

Without knowing the details of the implementation $lt can be more efficient than $ne. On a B-tree index $ne would be two range scans ($lt and $gt), whereas $lt is just one.
But in your case $lt seems to be what you want anyway (to find the older versions). If you used $ne, you could accidentally also remove newer versions that you just assume do not exist, but might actually have been created in the mean-time. Remember that MongoDB does not support transactions or consistent views across documents. Concurrent updates might bite you here.

Actually, there's a huge difference. The "$ne and $nin operators are not selective", which means that an index will not speed up that part of the query at all. So if you use $ne, then the version part of the compound index will not be used by MongoDB.

Related

Is it okay to have COLLSCAN for a collection with only few documents?

I have a collection which has just two documents in it, both are used to keep track of a certain count.
I know this will never have more than 2 documents, but when the counter value is increased, it uses findAndModify and shows COLLSCAN.
I believe it is okay to have COLLSCAN here as having an index over they search key wont give any performance benefits, any thoughts?
Indexes are not always good. The main things to understand how they work are:
Index use memory in exchange for better performance. Every time you want to use an index, you need to load it to MongoDB RAM (if its not there yet).
When the Mongo engine gets a query it needs to check which index to use (if there are some) and for each index check if it can use it (contains the relevant query parameters which are union of find, projection and sort). If not Mongo decide whether to use it (best found index) or doing a collection scan (or both).
Index requires handling - every insert/update/delete operation requires updating the index.
There is a lot of overhead to use an index so the benefit should be several times greater than a simple collection scan. You can continue reading here.

Can an ordered array inside a document with MongoDB be guaranteed safe to maintain order in production [duplicate]

Simple question, do arrays keep their order when stored in MongoDB?
yep MongoDB keeps the order of the array.. just like Javascript engines..
Yes, in fact from a quick google search on the subject, it seems that it's rather difficult to re-order them: http://groups.google.com/group/mongodb-user/browse_thread/thread/1df1654889e664c1
I realise this is an old question, but the Mongo docs do now specify that all document properties retain their order as they are inserted. This naturally extends to arrays, too.
Document Field Order
MongoDB preserves the order of the document fields following write operations except for the following cases:
The _id field is always the first field in the document.
Updates that include renaming of field names may result in the reordering of fields in the document.
Changed in version 2.6: Starting in version 2.6, MongoDB actively attempts to preserve the field order in a document. Before version 2.6, MongoDB did not actively preserve the order of the fields in a document.

Query with $in operator and large list of Ids

I have a pretty large number of document Ids to iterate through (say 5k-10k). The $in operator doesn't limit that number starting from mongodb version 2.6. Earlier versions had a combinatorial limit of 4 mio.
That said, does it make sense at all to do something like that in mongodb or is it an anti-pattern with performance penalties and I should iterate manually in the application layer?
It's somewhat of an anti-pattern, but sometimes there's no other choice.
If you can change the schema and make that query redundant then you should. If you can't, doing it yourself will surly be slower than letting MongoDB do it.
However, there is another limit you need to consider. Each document in MongoDB is limited to 16MB and each query is sent as a document so with enough items you may reach that limit and get an exception.

MongoDb index intersection usage

I have trouble understanding what MongoDB is doing with my queries. My documents contain almost exclusively array fields, keeping me from using compound indexes.
every field is Indexed with ensureIndex({FieldName:1})
The queries are AND concatenated like that:
{$and: [{FIELD1:"field1Val"},{FIELD2:"field2Val"},{FIELD3:"field3Val"}]}
If i run this query, MongoDB appears to be using only one index.
Why isn't MongoDB using all the Indexes in parallel and then intersects them?
The same problem solved with Lucene runs 8 times faster then my MongoDB implementaition does now.
(Before v2.6, one of MongoDB's well-known limitation is that it can use only one index per query except some special cases using $or
To improve query speed, you can use hint() to enforce the index used. Choose the most seletive index.)
As the comments say, its no longer true. Use index intersection. It seems that u can use at most 2 index intersected. See : When are Compound Indexes still relevant in MongoDB 2.6, given the new Index Intersection feature?
#JohnnyHK Ty for the comments, it makes me learn new things.

Skipping the first term of a compound index by using hint()

Suppose I have a Mongo collection with fields a and b. I've populated this collection with {a:'a', b : index } where index increases iteratively from 0 to 1000.
I know this is very, very wrong, but can't explain (no pun intended) why:
collection.find({i:{$gt:500}}).explain() confirms that the index was not used (I can see that it scanned all 1,000 documents in the collection).
Somehow forcing Mongo to use the index seems to work though:
collection.find({i:{$gt:500}}).hint({a:1,i:1}).explain()
Edit
The Mongo documentation is very clear that it will only use compound indexes if one of your query terms is the matches the first term of the compound index. In this case, using hint, it appears that Mongo used the compound index {a:1,i:1} even though the query terms do NOT include a. Is this true?
The interesting part about the way MongoDB performs queries is that it actually may run multiple queries in parallel to determine what is the best plan. It may have chosen to not use the index due to other experimenting you've done from the shell, or even when you added the data and whether it was in memory, etc/ (or a few other factors). Looking at the performance numbers, it's not reporting that using the index was actually any faster than not (although you shouldn't take much stock in those numbers generally). In this case, the data set is really small.
But, more importantly, according to the MongoDB docs, the output from the hinted run also suggests that the query wasn't covered entirely by the index (indexOnly=false).
That's because your index is a:1, i:1, yet the query is for i. Compound indexes only support searches based on any prefix of the indexed fields (meaning they must be in the order they were specified).
http://docs.mongodb.org/manual/core/read-operations/#query-optimization
FYI: Use the verbose option to see a report of all plans that were considered for the find().