Pymongo stop index updating while inserting new documents - mongodb

Is there a way to prevent index updating when inserting new documents (in a for loop) ?
I have a multikey index, and the collection is about 2 million documents, so removing the index and recreating it is not practical, since I'm inserting documents in a loop and I do not want an index for the newly inserted ones.

No, updates to indexes are done synchronously as part of the write operation itself.
What is your goal here though, to not index those new documents at all? If so, perhaps creating an appropriate Partial Index would be the correct approach here?

Related

Does build createIndex() a new index over the whole collection?

What happens when I add a new document to my collection and run the createIndex() function. Does MongoDB build a new index over the whole collection (time and resource consuming)? Or is MongoDB just updating the index for the single document (time and resource saving)? I am not sure because I found this in the documentation (3.0):
By default, MongoDB builds indexes in the foreground, which prevents all read and write operations to the database while the index builds. Also, no operation that requires a read or write lock on all databases (e.g. listDatabases) can occur during a foreground index build.
I am asking because I need a dynamic 2dsphere index which will be updated continuous. If it needs to build the index everytime over the whole collection it would take too much time.
Building an index indexes all existing documents and also cause all future documents to get indexed as well. So if you insert new documents into an already indexed collection (or update the indexed fields of existing documents), the indexes will get updated.
When you call createIndex and an index of the same type over the same fields already exists, nothing happens.

Efficient way to remove all entries from mongodb

Which is the better way to remove all entries from a mongodb collection?
db.collection.remove({})
or
db.collection.drop()
Remove all documents in a collection
db.collection.remove({})
Will only remove all the data in the collection but leave any indexes intact. If new documents are added to the collection they will populate the existing indexes.
Drop collection and all attached indexes
db.collection.drop()
Will drop the collection and all indexes. If the collection is recreated then the indexes will also need to be re-created.
From your question, if you only want to remove all the entities from a collection then using db.collection.remove({}) would be better as that would keep the Collection intact with all indexes still in place.
From a speed perspective, the drop() command is faster.
db.collection.drop() will delete to whole collection (very fast) and all indexes on the collection.
db.collection.remove({}) will delete all entries but the collection and the indexes will still exists.
So there is a difference in both operations. The first one is faster but it will delete more meta information. If you want to ceep them, you should not use it.

When create my index MongoDB

that might seems a stupid question, but when should I create an index on my collection ?
To be more explicit, I was wondering if I just have to create it once, when I create my collection, and then it will be updated automatically when I add some new documents. Or do I have to regenerate it regularly in background ?
The index will be kept up-to-date by MongoDB as you update/insert documents.
Performance-wise, do not create an index until you need it (to speed up queries). And when doing massive bulk-inserts, it may be more efficient to drop the index and recreate it after you are done inserting.
MongoDB will maintain any and all indexes itself, in other words only once.
This does, however, mean you need to be careful about just what indexes you ensure as each index will create significant overhead while performing write operations. The more indexes you have the more MongoDB will have to update to do a single write.

How often shall we reindex the geospatial data in mongodb?

I wonder if it is a must of reindexing the geospatial data in mongodb if there are some new geo-data has been inserted in order to search them? Say we have a document,which looks like:
{user:'a',loc:[363.236,-45.365]}, and it is indexed. Later on, I inserted document b, which looks like: {user:'b',loc:{42.3654,-56.3}}. In order to search, do I have to reindex (using ensureIndex()) the collection every time when a new document is inserted? Will the frequent reindexing affect the overall application performance?
Thanks.
You only need to ensureIndex once; after that MongoDB maintains the index on every insert. I'm not 100% sure the index is maintained for deletes though - I imagine it must do.
You can defragment an index and rebuild it to make it smaller, hence the existence of the functionality. A useful post:
http://jasonwilder.com/blog/2012/02/08/optimizing-mongodb-indexes/

Capped collections - BsonId, uniqueness and index

I want to use a capped collection as a cache store, I plan on selecting using a compound index - key and expiry-date. Since it's impossible to update/delete from a capped collection, I will add new entries with new expiry dates and just select the one with future expiry.
1) Is this the optimal way of creating the index if I'll be using Query.GTE("expiry", DateTime.Now) in the query?
cacheColl.EnsureIndex(new IndexKeysBuilder().Ascending("key").Descending("expiry"));
2) Do I need a [BsonId] attribute on the class? I know that "key" won't be unique. Does a record need to have a unique id entry??
3) My only motivation for using a capped collection is to limit the final size of the cache (both disk and memory) and not having to delete expired cache items myself. Is there a reason to prefer a regular collection and update items / delete expired ones? Even if I delete the documents, I read that space is not freed (would I need to compact?)
1) The index looks about right. You can also add a sort descending and limit 1 to the query if you only care about the latest.
2) No. In a capped collection, _id is not automatically created and is not required. The reason why I needs to be unique on normal collections is because a unique index on _id is created for that collection by default.
3) There are pros and cons to both approach and which is better is totally dependent on your needs. One thing you might want to consider about capped collection is that it will not be easy to resize the collection once you created it. This will be problematic if you realize later on that the size you initially set was too small to fit into the time frame window that you needed.
P.S. You are right about the part that the space used by extents of a deleted document is not being freed back. However, Mongo keeps track of these extents and reuse them whenever possible.