Dropping Mongo collection not clearing disk space

Dropping Mongo collection not clearing disk space - mongodb

I have a collection with 750,000 documents, it's taking around 7Gb on the disk.
I've dropped the collection, but the files (test.0...test.11) are still on the disk.
If I delete them, then I loose all the collections, not just the one I dropped
Shouldn't Mongo be deleting them?
Just noticed that the database stats have an error.
{
"ok" : 0,
"errmsg" : "Collection [test.loadTest-2016-02-06 15:05:34Z] not found."
}

You have dropped a collection, but not the database containing it. Dropping the collection does not compact the data files, nor does deleting a document. If you really want to compact the database, either drop it entirely and reimport it, or compact it using repairDatabase (see the docs). Beware though, you cannot compact the database online I think if you just have one node.
If you have a replica set, adding new nodes and removing the old ones is the safest way of compacting the database online. I do that from time to time and it's easy.

Related

Should I be removing tombstones from my Cloudant database?

I have a Cloudant database with 4 million documents and 27 million deleted documents ("tombstones"). Is having so many tombstones a problem and, if so, how do I get rid of them?

"Tombstones" occupy space and so contribute to your bill. They also increase the time for new replications to complete or new indexes to build.
So in general it is good practice to periodically remove these tombstones.
The best way to do it is to replicate your database with a filter that leaves deleted documents behind.
Replications are started by creating a document in the _replicator database like so:
{
"_id": "myfirstreplication",
"source" : "http://<username1>:<password1>#<account1>.cloudant.com/<sourcedb>",
"target" : "http://<username2:<password2>#<account2>.cloudant.com/<targetdb>",
"selector": {
"_deleted": {
"$exists": false
}
}
}
where the source is the original database and the target is the new empty database. The selector is the filter that checks each document before replicating - in this case we only want documents without a deleted attribute (a document that hasn’t been deleted).
This replication will result in a brand new database with no tombstones. Point your application to this new database and then delete the old one with the tombstones.
In this blog post there are other, more complex, scenarios that you may want to consider.

Mongo Collection with TTL index not reclaiming disk space

I have a mongo collection with TTL index. I see the documents are getting evicted as expected but i dont see that the disk space is getting reclaimed. Did anyone see this kind of issue?
Let me know if you need more details.

We have discussed this with mongo team and based on all information there are couple of things and its not that easy
If you have already have more space then TTL will delete documents and ideally space should be reclaimed but it will not.
If new documents will come then it will be reused
If size is going to remain constant then you need to run compact command. In case of sharded cluster it will be on each shard.
Other options are create new collection and move your data to newer collection. and once done drop the previous collection
Take backup of this collection and drop collection and then restore it.
After all this things, there is possibility that mongo holds all in memory and you need to restart cluster, once restarted it will release the storage.

Best way to remove all elements of a large mongo collection without lock and performance impact? [duplicate]

How do I truncate a collection in MongoDB or is there such a thing?
Right now I have to delete 6 large collections all at once and I'm stopping the server, deleting the database files and then recreating the database and the collections in it. Is there a way to delete the data and leave the collection as it is? The delete operation takes very long time. I have millions of entries in the collections.

To truncate a collection and keep the indexes use
db.<collection>.remove({})

You can efficiently drop all data and indexes for a collection with db.collection.drop(). Dropping a collection with a large number of documents and/or indexes will be significantly more efficient than deleting all documents using db.collection.remove({}). The remove() method does the extra housekeeping of updating indexes as documents are deleted, and would be even slower in a replica set environment where the oplog would include entries for each document removed rather than a single collection drop command.
Example using the mongo shell:
var dbName = 'nukeme';
db.getSiblingDB(dbName).getCollectionNames().forEach(function(collName) {
// Drop all collections except system ones (indexes/profile)
if (!collName.startsWith("system.")) {
// Safety hat
print("WARNING: going to drop ["+dbName+"."+collName+"] in 5s .. hit Ctrl-C if you've changed your mind!");
sleep(5000);
db[collName].drop();
}
})
It is worth noting that dropping a collection has different outcomes on storage usage depending on the configured storage engine:
WiredTiger (default storage engine in MongoDB 3.2 or newer) will free the space used by a dropped collection (and any associated indexes) once the drop completes.
MMAPv1 (default storage engine in MongoDB 3.0 and older) will
not free up preallocated disk space. This may be fine for your use case; the free space is available for reuse when new data is inserted.
If you are instead dropping the database, you generally don't need to explicitly create the collections as they will be created as documents are inserted.
However, here is an example of dropping and recreating the database with the same collection names in the mongo shell:
var dbName = 'nukeme';
// Save the old collection names before dropping the DB
var oldNames = db.getSiblingDB(dbName).getCollectionNames();
// Safety hat
print("WARNING: going to drop ["+dbName+"] in 5s .. hit Ctrl-C if you've changed your mind!")
sleep(5000)
db.getSiblingDB(dbName).dropDatabase();
// Recreate database with the same collection names
oldNames.forEach(function(collName) {
db.getSiblingDB(dbName).createCollection(collName);
})

the below query will delete all records in a collections and will keep the collection as is,
db.collectionname.remove({})

remove() is deprecated in MongoDB 4.
You need to use deleteMany or other functions:
db.<collection>.deleteMany({})

There is no equivalent to the "truncate" operation in MongoDB. You can either remove all documents, but it will have a complexity of O(n), or drop the collection, then the complexity will be O(1) but you will loose your indexes.

Create the database and the collections and then backup the database to bson files using mongodump:
mongodump --db database-to-use
Then, when you need to drop the database and recreate the previous environment, just use mongorestore:
mongorestore --drop
The backup will be saved in the current working directory, in a folder named dump, when you use the command mongodump.

The db.drop() method obtains a write lock on the affected database and will block other operations until it has completed.
I think using the db.remove({}) method is better than db.drop().

Truncate a collection

How do I truncate a collection in MongoDB or is there such a thing?
Right now I have to delete 6 large collections all at once and I'm stopping the server, deleting the database files and then recreating the database and the collections in it. Is there a way to delete the data and leave the collection as it is? The delete operation takes very long time. I have millions of entries in the collections.

To truncate a collection and keep the indexes use
db.<collection>.remove({})

You can efficiently drop all data and indexes for a collection with db.collection.drop(). Dropping a collection with a large number of documents and/or indexes will be significantly more efficient than deleting all documents using db.collection.remove({}). The remove() method does the extra housekeeping of updating indexes as documents are deleted, and would be even slower in a replica set environment where the oplog would include entries for each document removed rather than a single collection drop command.
Example using the mongo shell:
var dbName = 'nukeme';
db.getSiblingDB(dbName).getCollectionNames().forEach(function(collName) {
// Drop all collections except system ones (indexes/profile)
if (!collName.startsWith("system.")) {
// Safety hat
print("WARNING: going to drop ["+dbName+"."+collName+"] in 5s .. hit Ctrl-C if you've changed your mind!");
sleep(5000);
db[collName].drop();
}
})
It is worth noting that dropping a collection has different outcomes on storage usage depending on the configured storage engine:
WiredTiger (default storage engine in MongoDB 3.2 or newer) will free the space used by a dropped collection (and any associated indexes) once the drop completes.
MMAPv1 (default storage engine in MongoDB 3.0 and older) will
not free up preallocated disk space. This may be fine for your use case; the free space is available for reuse when new data is inserted.
If you are instead dropping the database, you generally don't need to explicitly create the collections as they will be created as documents are inserted.
However, here is an example of dropping and recreating the database with the same collection names in the mongo shell:
var dbName = 'nukeme';
// Save the old collection names before dropping the DB
var oldNames = db.getSiblingDB(dbName).getCollectionNames();
// Safety hat
print("WARNING: going to drop ["+dbName+"] in 5s .. hit Ctrl-C if you've changed your mind!")
sleep(5000)
db.getSiblingDB(dbName).dropDatabase();
// Recreate database with the same collection names
oldNames.forEach(function(collName) {
db.getSiblingDB(dbName).createCollection(collName);
})

the below query will delete all records in a collections and will keep the collection as is,
db.collectionname.remove({})

remove() is deprecated in MongoDB 4.
You need to use deleteMany or other functions:
db.<collection>.deleteMany({})

There is no equivalent to the "truncate" operation in MongoDB. You can either remove all documents, but it will have a complexity of O(n), or drop the collection, then the complexity will be O(1) but you will loose your indexes.

Create the database and the collections and then backup the database to bson files using mongodump:
mongodump --db database-to-use
Then, when you need to drop the database and recreate the previous environment, just use mongorestore:
mongorestore --drop
The backup will be saved in the current working directory, in a folder named dump, when you use the command mongodump.

The db.drop() method obtains a write lock on the affected database and will block other operations until it has completed.
I think using the db.remove({}) method is better than db.drop().

MongoDB. Keep information about sharded collections when restoring

I am using mongodump and mongorestore in a replicated shard cluster in MongoDB 2.2. to get a backup and restore it.
First, I use mongodump for creating the dump of all the system, then I drop a concrete collection and restore it using mongorestore with the output of mongodump. After that, the collection is correct (the data it contains is correct and also the indexes), but the information about if this collection is sharded is lost. Before dropping it, the collection was sharded. After the restore, however, the collection was not sharded anymore.
I was wondering then if a way of keeping this information in backups exist. I was thinking that maybe sharded information for collection is kept in the admin database, but in the dump, admin folder is empty, and using show collections for this database I get nothing. Then I thought it could be kept in the metadata, but this would be strange, because I know that, in the metadata, the information about indexes is stored and indexes are correctly restored.
Then, I would like to know if it could be possible to keep this information using instead of mongodump + mongorestore, filesystem snapshots; or maybe still using mongodump and mongorestore but stopping the system or locking writing. I don't thing this last option could be the reason, because I am not performing writing operations while restoring even not being locking it, but just to give ideas.
I also would like to know if anyone is completely sure about if it is the case that this feature is still not available in the current version.
Any ideas?

If you are using mongodump to back up your sharded collection, are you sure it really needs to be sharded? Usually sharded collections are very large and mongodump would take too long to back it up.
What you can do to back up a large sharded collection is described here.
The key piece is to back up your config server as well as each shard - and do it as close to "simultaneously" as possible after having stopped balancing. Config DB is small so you should probably back it up very frequently anyway. Best way to back up large shards is via file snapshots.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse