How can I resize a mongodb capped collection without losing data?
Is there a command for that, or could somebody provide a script?
This is what I do to resize capped collections:
db.runCommand({"convertToCapped": "log", size: 1000000000});
I already have a Capped Collection named "log". So I just run the "convertToCapped" on it again, specifying a new size. I've not tried it on reducing the size of the collection. That may be something that you'd need to use Scott Hernandez's version on. But this works for increasing the size of your capped collections without losing any data or your indexes.
EDIT: #JMichal is correct. Data is preserved, but indexes are not and will need to be recreated.
You basically need to create a new capped collection and copy the docs to it. This can be done very easily in the javascript (shell), or your language of choice.
db.createCollection("new", {capped:true, size:1073741824}); /* size in bytes */
db.old.find().forEach(function (d) {db.new.insert(d)});
db.old.renameCollection("bak", true);
db.new.renameCollection("old", true);
Note: Just make sure nobody is inserting/updating the old collection when you switch. If you run that code in a db.eval("....") it will lock the server while it runs.
There is a feature request for increasing the size of an existing collection: http://jira.mongodb.org/browse/SERVER-1864
https://www.mongodb.com/docs/manual/core/capped-collections/#change-a-capped-collection-s-size
New in version 6.0:
db.runCommand( { collMod: "log", cappedSize: 100000 } )
Related
How do I truncate a collection in MongoDB or is there such a thing?
Right now I have to delete 6 large collections all at once and I'm stopping the server, deleting the database files and then recreating the database and the collections in it. Is there a way to delete the data and leave the collection as it is? The delete operation takes very long time. I have millions of entries in the collections.
To truncate a collection and keep the indexes use
db.<collection>.remove({})
You can efficiently drop all data and indexes for a collection with db.collection.drop(). Dropping a collection with a large number of documents and/or indexes will be significantly more efficient than deleting all documents using db.collection.remove({}). The remove() method does the extra housekeeping of updating indexes as documents are deleted, and would be even slower in a replica set environment where the oplog would include entries for each document removed rather than a single collection drop command.
Example using the mongo shell:
var dbName = 'nukeme';
db.getSiblingDB(dbName).getCollectionNames().forEach(function(collName) {
// Drop all collections except system ones (indexes/profile)
if (!collName.startsWith("system.")) {
// Safety hat
print("WARNING: going to drop ["+dbName+"."+collName+"] in 5s .. hit Ctrl-C if you've changed your mind!");
sleep(5000);
db[collName].drop();
}
})
It is worth noting that dropping a collection has different outcomes on storage usage depending on the configured storage engine:
WiredTiger (default storage engine in MongoDB 3.2 or newer) will free the space used by a dropped collection (and any associated indexes) once the drop completes.
MMAPv1 (default storage engine in MongoDB 3.0 and older) will
not free up preallocated disk space. This may be fine for your use case; the free space is available for reuse when new data is inserted.
If you are instead dropping the database, you generally don't need to explicitly create the collections as they will be created as documents are inserted.
However, here is an example of dropping and recreating the database with the same collection names in the mongo shell:
var dbName = 'nukeme';
// Save the old collection names before dropping the DB
var oldNames = db.getSiblingDB(dbName).getCollectionNames();
// Safety hat
print("WARNING: going to drop ["+dbName+"] in 5s .. hit Ctrl-C if you've changed your mind!")
sleep(5000)
db.getSiblingDB(dbName).dropDatabase();
// Recreate database with the same collection names
oldNames.forEach(function(collName) {
db.getSiblingDB(dbName).createCollection(collName);
})
the below query will delete all records in a collections and will keep the collection as is,
db.collectionname.remove({})
remove() is deprecated in MongoDB 4.
You need to use deleteMany or other functions:
db.<collection>.deleteMany({})
There is no equivalent to the "truncate" operation in MongoDB. You can either remove all documents, but it will have a complexity of O(n), or drop the collection, then the complexity will be O(1) but you will loose your indexes.
Create the database and the collections and then backup the database to bson files using mongodump:
mongodump --db database-to-use
Then, when you need to drop the database and recreate the previous environment, just use mongorestore:
mongorestore --drop
The backup will be saved in the current working directory, in a folder named dump, when you use the command mongodump.
The db.drop() method obtains a write lock on the affected database and will block other operations until it has completed.
I think using the db.remove({}) method is better than db.drop().
does anyone know a way to update a capped collection in Mongo3.2? I had this working in 2.x where by I updated a collection, and basically removed all its content so I knew it had been processed. This would then age out.
When I do the same in 3.2 I get the following error on the command line.
Cannot change the size of a document in a capped collection: 318 != 40
Here you can see I'm shrinking the document from 318bytes to 40bytes.
Is there a way to do this?
As mentioned in mongodb docs
Changed in version 3.2.
If an update or a replacement operation changes the document size, the
operation will fail.
https://docs.mongodb.com/manual/core/capped-collections/
So your operation is changing the size of the capped collection, which is not allowed in mongodb 3.2+
Capped collections are fixed size Collections. Hence it fails when the record update exceeds it's previously allocated size.
Changed in version 3.2.
If an update or a replacement operation changes the document size, the operation will fail.
For more information, Please visit the link: https://docs.mongodb.com/manual/core/capped-collections/
So, the solutions is create a new Collection without choosing the capped one ( if this fits your requirement ). It works.
Shrinking a document in a capped collection is no longer allowed in 3.2. I did not find anything related to this in the documentation, but there is a rationale at https://jira.mongodb.org/browse/SERVER-20529
(basicalling shrinking document cannot be rolled back)
Your only option is to find a same-size update, for example update a boolean.
[Solve] It's work with me
Create new Collection not Capped and copy all your document.
Copy, replace your collection name and paste in your terminal.
// replace <mycol> by <your collections name>
db.createCollection( "mycol_temp")
var cur = db.mycol.find()
while (cur.hasNext()) {
mycol = cur.next();
db.mycol_temp.insert(logItem);
}
db.mycol.drop()
db.mycol_temp.renameCollection("mycol")
Now, update() or remove() document is accepted.
I've got a mongo db instance with a collection in it which has around 17 million records.
I wish to alter the document structure (to add a new attribute in the document) of all 17 million documents, so that I dont have to problematically deal with different structures as well as make queries easier to write.
I've been told though that if I run an update script to do that, it will lock the whole database, potentially taking down our website.
What is the easiest way to alter the document without this happening? (I don't mind if the update happens slowly, as long as it eventually happens)
The query I'm attempting to do is:
db.history.update(
{ type : { $exists: false }},
{
$set: { type: 'PROGRAM' }
},
{ multi: true }
)
You can update the collection in batches(say half a million per batch), this will distribute the load.
I created a collection with 20000000 records and ran your query on it. It took ~3 minutes to update on a virtual machine and i could still read from the db in a separate console.
> for(var i=0;i<20000000;i++){db.testcoll.insert({"somefield":i});}
The locking in mongo is quite lightweight, and it is not going to be held for the whole duration of the update. Think of it like 20000000 separate updates. You can read more here:
http://docs.mongodb.org/manual/faq/concurrency/
You do actually care if your update query is slow, because of the write lock problem on the database you are aware of, both are tightly linked. It's not a simple read query here, you really want this write query to be as fast as possible.
Updating the "find" part is part of the key here. First, since your collection has millions of documents, it's a good idea to keep the field name size as small as possible (ideally one single character : type => t). This helps because of the schemaless nature of mongodb collections.
Second, and more importantly, you need to make your query use a proper index. For that you need to workaround the $exists operator which is not optimized (several ways to do it there actually).
Third, you can work on the field values themselves. Use http://bsonspec.org/#/specification to estimate the size of the value you want to store, and eventually pick a better choice (in your case, you could replace the 'PROGRAM' string by a numeric constant for example and gain a few bytes in the process, multiplied by the number of documents to update for each update multiple query). The smaller the data you want to write, the faster the operation will be.
A few links to other questions which can inspire you :
Can MongoDB use an index when checking for existence of a field with $exists operator?
Improve querying fields exist in MongoDB
How can I resize a mongodb capped collection without losing data?
Is there a command for that, or could somebody provide a script?
This is what I do to resize capped collections:
db.runCommand({"convertToCapped": "log", size: 1000000000});
I already have a Capped Collection named "log". So I just run the "convertToCapped" on it again, specifying a new size. I've not tried it on reducing the size of the collection. That may be something that you'd need to use Scott Hernandez's version on. But this works for increasing the size of your capped collections without losing any data or your indexes.
EDIT: #JMichal is correct. Data is preserved, but indexes are not and will need to be recreated.
You basically need to create a new capped collection and copy the docs to it. This can be done very easily in the javascript (shell), or your language of choice.
db.createCollection("new", {capped:true, size:1073741824}); /* size in bytes */
db.old.find().forEach(function (d) {db.new.insert(d)});
db.old.renameCollection("bak", true);
db.new.renameCollection("old", true);
Note: Just make sure nobody is inserting/updating the old collection when you switch. If you run that code in a db.eval("....") it will lock the server while it runs.
There is a feature request for increasing the size of an existing collection: http://jira.mongodb.org/browse/SERVER-1864
https://www.mongodb.com/docs/manual/core/capped-collections/#change-a-capped-collection-s-size
New in version 6.0:
db.runCommand( { collMod: "log", cappedSize: 100000 } )
I'm thinking about trying MongoDB to use for storing our stats but have some general questions about whether I'm understanding it correctly before I actually start learning it.
I understand the concept of using documents, what I'm not too clear about is how much data can be stored inside each document. The following diagram explains the layout I'm thinking of:
Website (document)
- some keys/values about the particular document
- statistics (tree)
- millions of rows where each record is inserted from a pageview (key/value array containing data such as timestamp, ip, browser, etc)
What got me excited about mongodb was the grouping functions such as:
http://www.mongodb.org/display/DOCS/Aggregation
db.test.group(
{ cond: {"invoked_at.d": {$gte: "2009-11", $lt: "2009-12"}}
, key: {http_action: true}
, initial: {count: 0, total_time:0}
, reduce: function(doc, out){ out.count++; out.total_time+=doc.response_time }
, finalize: function(out){ out.avg_time = out.total_time / out.count }
} );
But my main concern is how hard would that command for example be on the server if there is say 10's of millions of records across dozens of documents on a 512-1gb ram server on rackspace for example? Would it still run low load?
Is there any limit to the number of documents MongoDB can have (seperate databases)? Also, is there any limit to the number of records in a tree I explained above? Also, does that query I showed above run instantly or is it some sort of map/reduce query? Not very sure if I can execute that upon page load in our control panel to get those stats instantly.
Thanks!
Every document has a size limit of 4MB (which in text is A LOT).
It's recommended to run MongoDB in replication mode or to use sharding as you otherwise will have problems with single-server durability. Single-server durability is not given because MongoDB only fsync's to the disk every 60 seconds, so if your server goes down between two fsync's the data that got inserted/updated in that time will be lost.
There is no limit of documents other than your disk space in mongodb.
You should try to import a dataset that matches your data (or generate some test data) to MongoDB and analyse how fast your query executes. Remember to set indexes on those fields that you use heavily in your queries. Your above query should work pretty well even with a lot of data.
In order to analyze the speed of your query use the database profiler MongoDB comes with. On the mongo shell do:
db.setProfilingLevel(2); // to set the profiling level
[your query]
db.system.profile.find(); // to see the results
Remember to turn off profiling once you're finished (log will get pretty huge otherwise).
Regarding your database layout I suggest to change the "schema" (yeah yeah, schema less..) to:
website (collection):
- some keys/values about the particular document
statistics (collection)
- millions of rows where each record is inserted from a pageview (key/value array containing data such as timestamp, ip, browser, etc)
+ DBRef to website
See Database References
Documents in MongoDB are limited to a size of 4MB. Let's say a single page view results in 32 bytes being stored. Then you'll be able to store about 130,000 page views in a single document.
Basically the amount of page views a page can generate is infinite, and you indicated that you expect millions of them, so I suggest you store the log entries as separate documents. Each log entry should contain the _id of the parent document.
The number of documents in a database is limited to 2GB of total space on 32-bit systems. 64-bit systems don't have this limitation.
The group() function is a map-reduce query under the hood. The documentation recommends you use a map-reduce query instead of group(), because it has some limitations with large datasets and sharded environments.