Truncate a collection - mongodb

How do I truncate a collection in MongoDB or is there such a thing?
Right now I have to delete 6 large collections all at once and I'm stopping the server, deleting the database files and then recreating the database and the collections in it. Is there a way to delete the data and leave the collection as it is? The delete operation takes very long time. I have millions of entries in the collections.

To truncate a collection and keep the indexes use
db.<collection>.remove({})

You can efficiently drop all data and indexes for a collection with db.collection.drop(). Dropping a collection with a large number of documents and/or indexes will be significantly more efficient than deleting all documents using db.collection.remove({}). The remove() method does the extra housekeeping of updating indexes as documents are deleted, and would be even slower in a replica set environment where the oplog would include entries for each document removed rather than a single collection drop command.
Example using the mongo shell:
var dbName = 'nukeme';
db.getSiblingDB(dbName).getCollectionNames().forEach(function(collName) {
// Drop all collections except system ones (indexes/profile)
if (!collName.startsWith("system.")) {
// Safety hat
print("WARNING: going to drop ["+dbName+"."+collName+"] in 5s .. hit Ctrl-C if you've changed your mind!");
sleep(5000);
db[collName].drop();
}
})
It is worth noting that dropping a collection has different outcomes on storage usage depending on the configured storage engine:
WiredTiger (default storage engine in MongoDB 3.2 or newer) will free the space used by a dropped collection (and any associated indexes) once the drop completes.
MMAPv1 (default storage engine in MongoDB 3.0 and older) will
not free up preallocated disk space. This may be fine for your use case; the free space is available for reuse when new data is inserted.
If you are instead dropping the database, you generally don't need to explicitly create the collections as they will be created as documents are inserted.
However, here is an example of dropping and recreating the database with the same collection names in the mongo shell:
var dbName = 'nukeme';
// Save the old collection names before dropping the DB
var oldNames = db.getSiblingDB(dbName).getCollectionNames();
// Safety hat
print("WARNING: going to drop ["+dbName+"] in 5s .. hit Ctrl-C if you've changed your mind!")
sleep(5000)
db.getSiblingDB(dbName).dropDatabase();
// Recreate database with the same collection names
oldNames.forEach(function(collName) {
db.getSiblingDB(dbName).createCollection(collName);
})

the below query will delete all records in a collections and will keep the collection as is,
db.collectionname.remove({})

remove() is deprecated in MongoDB 4.
You need to use deleteMany or other functions:
db.<collection>.deleteMany({})

There is no equivalent to the "truncate" operation in MongoDB. You can either remove all documents, but it will have a complexity of O(n), or drop the collection, then the complexity will be O(1) but you will loose your indexes.

Create the database and the collections and then backup the database to bson files using mongodump:
mongodump --db database-to-use
Then, when you need to drop the database and recreate the previous environment, just use mongorestore:
mongorestore --drop
The backup will be saved in the current working directory, in a folder named dump, when you use the command mongodump.

The db.drop() method obtains a write lock on the affected database and will block other operations until it has completed.
I think using the db.remove({}) method is better than db.drop().

Related

how to convert mongoDB Oplog file into actual query

I want to convert the MongoDB local Oplog file into an actual real query so I can execute that query and get the exact copy database.
Is there any package, file, build-in tools, or script for it?
It's not possible to get the exact query from the oplog entry because MongoDB doesn't save the query.
The oplog has an entry for each atomic modification performed. Multi-inserts/updates/deletes performed on the mongo instance using a single query are converted to multiple entries and written to the oplog collection. For example, if we insert 10,000 documents using Bulk.insert(), 10,000 new entries will be created in the oplog collection. Now the same can also be done by firing 10,000 Collection.insertOne() queries. The oplog entries would look identical! There is no way to tell which one actually happened.
Sorry, but that is impossible.
The reason is that, that opLog doesn't have queries. OpLog includes only changes (add, update, delete) to data, and it's there for replication and redo.
To get an exact copy of DB, it's called "replication", and that is of course supported by the system.
To "replicate" changes to f.ex. one DB or collection, you can use https://www.mongodb.com/docs/manual/changeStreams/.
You can get the query from the Oplogs. Oplog defines multiple op types, for instance op: "i","u", "d" etc, are for insert, update, delete. For these types, check the "o"/"o2" fields which have corresponding data and filters.
Now based on the op types call the corresponding driver APIs db.collection.insert()/update()/delete().

Mongo dynamic collection creation and locking

I am working on an app where I am looking into creating MongoDB collections on the fly as they are needed.
The app consumes data from a data source and maps the data to a collection. If the collection does not exist, the app:
creates the collection
kicks off appropriate indexes in the background
shards the collection
and then inserts the data into the collection.
While this is going on, other processes will be reading and writing from the database.
Looking at this MongodDB locking FAQ, it appears that the reads and writes in other collections of the database should not be affected by the dynamic collection creation snd setup i.e. they won't end up waiting on a lock we acquired to create the collection.
Question: Is the above assumption correct?
Thank you in advance!
No, when you insert into a collection which does not exist, then a new collection is created automatically. Of course, this new collection does not have any index (apart from _id) and is not sharded.
So, you must ensure the collection is created before the application starts any inserts.
However, it is no problem to create indexes and enable sharding on a collection which contains already some data.
Note, when you enable sharding on an empty collection or a collection with low amount of data, then all data is written initially to the primary shard. You may use sh.splitAt() to pre-split the upcoming data.

Is it mandatory to restart MongoDB after adding new index on collection?

A MongoDB collection is slow to provide data as it has grown huge overtime.
I need to add an index on a few fields and to reflect it immediately in search. So I seek for clarification on followings things:
Is it mandatory to restart MongoDB after indexing?
If yes, then is there any way to add index without restarting the server? I don't want any downtime...
MongoDB does not need to be restarted after indexing.
However, by default, the createIndex operation blocks read/write on the affected database (note that it is not only the collection but the db). You may change the behaviour using background mode like this:
db.collectionName.createIndex( { collectionKey: 1 }, { background: true } )
It might seem that your client is blocked when creating the index. The mongo shell session or connection where you are creating the index will block, but if there are more connections to the database, these will still be able to query and operate on the database.
Docs: https://docs.mongodb.com/manual/core/index-creation/
There is no need to restart MongoDB after you add an index!
However,an index could be created in the foreground which is the default.
What does it mean? MongoDB documentation states: ‘By default, creating an index on a populated collection blocks all other operations on a database. When building an index on a populated collection, the database that holds the collection is unavailable for reading or write operations until the index build completes. Any operation that requires a read or writes lock on all databases will wait for the foreground index build to complete’.
For potentially long-running index building operations on standalone deployments, the background option should be used. In that case, the MongoDB database remains available during the index building operation.
To create an index in the background, the following snippet should be used, see the image below.

Best way to remove all elements of a large mongo collection without lock and performance impact? [duplicate]

How do I truncate a collection in MongoDB or is there such a thing?
Right now I have to delete 6 large collections all at once and I'm stopping the server, deleting the database files and then recreating the database and the collections in it. Is there a way to delete the data and leave the collection as it is? The delete operation takes very long time. I have millions of entries in the collections.
To truncate a collection and keep the indexes use
db.<collection>.remove({})
You can efficiently drop all data and indexes for a collection with db.collection.drop(). Dropping a collection with a large number of documents and/or indexes will be significantly more efficient than deleting all documents using db.collection.remove({}). The remove() method does the extra housekeeping of updating indexes as documents are deleted, and would be even slower in a replica set environment where the oplog would include entries for each document removed rather than a single collection drop command.
Example using the mongo shell:
var dbName = 'nukeme';
db.getSiblingDB(dbName).getCollectionNames().forEach(function(collName) {
// Drop all collections except system ones (indexes/profile)
if (!collName.startsWith("system.")) {
// Safety hat
print("WARNING: going to drop ["+dbName+"."+collName+"] in 5s .. hit Ctrl-C if you've changed your mind!");
sleep(5000);
db[collName].drop();
}
})
It is worth noting that dropping a collection has different outcomes on storage usage depending on the configured storage engine:
WiredTiger (default storage engine in MongoDB 3.2 or newer) will free the space used by a dropped collection (and any associated indexes) once the drop completes.
MMAPv1 (default storage engine in MongoDB 3.0 and older) will
not free up preallocated disk space. This may be fine for your use case; the free space is available for reuse when new data is inserted.
If you are instead dropping the database, you generally don't need to explicitly create the collections as they will be created as documents are inserted.
However, here is an example of dropping and recreating the database with the same collection names in the mongo shell:
var dbName = 'nukeme';
// Save the old collection names before dropping the DB
var oldNames = db.getSiblingDB(dbName).getCollectionNames();
// Safety hat
print("WARNING: going to drop ["+dbName+"] in 5s .. hit Ctrl-C if you've changed your mind!")
sleep(5000)
db.getSiblingDB(dbName).dropDatabase();
// Recreate database with the same collection names
oldNames.forEach(function(collName) {
db.getSiblingDB(dbName).createCollection(collName);
})
the below query will delete all records in a collections and will keep the collection as is,
db.collectionname.remove({})
remove() is deprecated in MongoDB 4.
You need to use deleteMany or other functions:
db.<collection>.deleteMany({})
There is no equivalent to the "truncate" operation in MongoDB. You can either remove all documents, but it will have a complexity of O(n), or drop the collection, then the complexity will be O(1) but you will loose your indexes.
Create the database and the collections and then backup the database to bson files using mongodump:
mongodump --db database-to-use
Then, when you need to drop the database and recreate the previous environment, just use mongorestore:
mongorestore --drop
The backup will be saved in the current working directory, in a folder named dump, when you use the command mongodump.
The db.drop() method obtains a write lock on the affected database and will block other operations until it has completed.
I think using the db.remove({}) method is better than db.drop().

Best way to drop big collection in MongoDB

I have standalone MongoDB instance version 3.2. Storage engine is WiredTiger. What is the best way to drop big collection (>500Gb) to minimize time of exclusive database lock? Will be there time difference between 2 solution?
Remove all documents from collection, drop index, drop collection
Just drop collection
Additional information, probably it could be important:
Collection contains about 200.000.000 documents
Collection has only one index by _id
'_id' looks like {_id : {day: "2018-01-01", key :"someuniquekeybyday"}}
The correct answer is probably: "drop operation is not linear". It takes few seconds on 10Gb collection and it takes absolutely the same time with 500Gb collection.
I've deleted 1TB collection many times, it took several seconds.
p.s. To offer you something new, not seen in comments: you have the third option - to make a copy of other collections in this database and then switch database in your application.
I have dropped over 1.4 TB collection on version 4.0. Operation took less than 1 sec.
2022-03-15T01:17:25.688+0000 I REPL [replication-2] Completing collection drop for order.system.drop.1647307045i163t6.feeds with drop optime { ts: Timestamp(1647307045, 163), t: 6 } (notification optime: { ts: Timestamp(1647307045, 163), t: 6 })
As per documentation drop operation will obtain lock on affected database & will block all the operations, so application will suffer minor latencies in database operations for short duration. before dropping large collection make sure...
remove all read/writes from application on that collection
rename to temporary collection to avoid any operation during drop.
choose least traffic time to drop temporary collection.