Efficient way to remove all entries from mongodb - mongodb

Which is the better way to remove all entries from a mongodb collection?
db.collection.remove({})
or
db.collection.drop()

Remove all documents in a collection
db.collection.remove({})
Will only remove all the data in the collection but leave any indexes intact. If new documents are added to the collection they will populate the existing indexes.
Drop collection and all attached indexes
db.collection.drop()
Will drop the collection and all indexes. If the collection is recreated then the indexes will also need to be re-created.
From your question, if you only want to remove all the entities from a collection then using db.collection.remove({}) would be better as that would keep the Collection intact with all indexes still in place.
From a speed perspective, the drop() command is faster.

db.collection.drop() will delete to whole collection (very fast) and all indexes on the collection.
db.collection.remove({}) will delete all entries but the collection and the indexes will still exists.
So there is a difference in both operations. The first one is faster but it will delete more meta information. If you want to ceep them, you should not use it.

Related

Pymongo stop index updating while inserting new documents

Is there a way to prevent index updating when inserting new documents (in a for loop) ?
I have a multikey index, and the collection is about 2 million documents, so removing the index and recreating it is not practical, since I'm inserting documents in a loop and I do not want an index for the newly inserted ones.
No, updates to indexes are done synchronously as part of the write operation itself.
What is your goal here though, to not index those new documents at all? If so, perhaps creating an appropriate Partial Index would be the correct approach here?

MongoDB: what’s happend when the mongodb drop indexes

MongoDB create index be slow in the large data collections,it's easy to understand. But why drop indexes operation so fast? Is there any changes of the data structure after executing drop indexes operation?
Creating a new index on a collection is like create an new collection which is arranged as B-Tree so you can do search on the key fields quickly. This will look like copying part of the collection.
So for deleting index, it will like deleting a collection, mongo just remove the index collection, and then it's done.
I am not sure if you know how file system work or not but you can consider this problem in the same way.When you copy file to a disk , it will take time. But if you remove file from a disk, it takes little time because file system just mark it unused, need almost zero time.

MongoDB 3.X : Does it make sense to have only one collection per database

Since MongoDB 3.x introduces lock per record and not on collection or database, does it make sense to write all of your data to single collection with one extra identifier field "documentType".
It will help simulate "join" through map-reduce operation.
Couchbase does the same thing with "buckets" instead of collection.
Does anybody see any disadvatanges with this approach ?
There's one big general-case disadvantage: indexes.
With Mongo, you generally want to set up indexes so that most, if not all, queries you make, use them. So in addition to the one on _id, you'll set up indexes on the primary fields you search by (often compounded with those you sort by).
If you're storing everything in one single collection, that means you need to have all those indexes on that collection. Which means two things:
The indexes are be bigger, since there's more documents to index. Granted, this can be somewhat mitigated by using sparse indexes.
Inserting or modifying documents in the collection requires Mongo to update all these indexes (where it'd just update the relevant indexes in the standard use-many-collections approach). This kills your write performance.
Furthermore, if you have in your application a query that somehow doesn't use one of those many indexes, it needs to scan through the entire collection, which is O(n) where n is the number of documents in the collection -- in your case, that means the number of documents in the entire database.
Collections are cheap. Use them ;)

When create my index MongoDB

that might seems a stupid question, but when should I create an index on my collection ?
To be more explicit, I was wondering if I just have to create it once, when I create my collection, and then it will be updated automatically when I add some new documents. Or do I have to regenerate it regularly in background ?
The index will be kept up-to-date by MongoDB as you update/insert documents.
Performance-wise, do not create an index until you need it (to speed up queries). And when doing massive bulk-inserts, it may be more efficient to drop the index and recreate it after you are done inserting.
MongoDB will maintain any and all indexes itself, in other words only once.
This does, however, mean you need to be careful about just what indexes you ensure as each index will create significant overhead while performing write operations. The more indexes you have the more MongoDB will have to update to do a single write.

Indexes for capped collectoins in mongoDB

I wonder if capped collections keep indexes for expired documents?
Removing documents from normal collection keeps indexes.
Capped collections remove documents by timer and do not allow db.collection.remove() at all.
I could not find any word in docs what happens with indexes for capped collections and would appreciate any help from ones who know.
TL;DR: The only way to remove documents from a capped collection is to drop the entire collection, that will also remove the indexes themselves from the collection.
I wonder if capped collections keep indexes for expired documents?
No. Documents that are no longer stored never remain in the index.
Removing documents from normal collection keeps indexes.
This is a bit misleading. Removing all documents from a normal collection by using db.collection.remove() removes both the documents from the collection and also deletes those documents from the index. It does not, however, remove the indexes of the collection, i.e. once you add new documents they are being added to the respective indexes again (i.e. removing the index itself is different from deleting documents from the index).
Capped collections remove documents by timer and do not allow db.collection.remove() at all.
The TTL-feature you linked has nothing to do with capped collections, in fact, the documentation says:
You cannot create a TTL index on a capped collection, because MongoDB cannot remove documents from a capped collection.
A collection with a TTL index does allow db.collection.remove.
A capped collection, on the other hand, has a fixed size (in terms of data size) and the oldest documents of the collection are automatically overwritten once the collection is full. This is not based on time, but purely on the size of the collection. Capped collections are always kept in insertion order (natural order).
Since the only way to remove documents from a capped collection is to drop the entire collection, that will also remove the indexes themselves from the collection.