Mongodb back up old data then remove it from database periodically - mongodb

I have a project that currently stores GPS tracking data in MongoDB, and it grows really fast. In order to slow it down, I want to automatically backup old data then remove it from the database monthly. The old data must older than 3 months.
Is there any solution to accomplish that?

This question was partially answered earlier.
After using this approach, you can backup the collection with your old data using mongodump for example:
mongodump -host yourhost [-u user] -d yourdb -c collOldDocs

Related

mongodump vs mongoexport: which is better? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 months ago.
Improve this question
I want to export very large collections and import them into another database in another server. I found there are at least two ways: mongoexport and mongodump.
I searched previous posts about this issue, however I did not find a complete comparison/benchmark about the speed of exporting and size of export file using these two ways!
I will be so thankful if there is any experience to share.
As mentioned in the latest documentation
Avoid using mongoimport and mongoexport for full instance production backups. They do not reliably preserve all rich BSON data types, because JSON can only represent a subset of the types supported by BSON. Use mongodump and mongorestore as described in MongoDB Backup Methods for this kind of functionality.
As you need to restore large data, prefer dump.
mongoexport is a command-line tool that produces a JSON or CSV export of data stored in a MongoDB instance.
mongodump is a utility for creating a binary export of the contents of a database. mongodump can export data from either mongod or mongos instances; i.e. can export data from standalone, replica set, and sharded cluster deployments.
One of the important differences is that mongodump is faster than mongoexport for backup purposes. Mongodump store data as a binary, whereas, mongoexport store data as a JSON or CSV.
The best answer to this question here is to use file system snapshots as for large clusters both mongoexport and mongodump can take some significant time.
Mongodump is preferable ,if whole database or collection needs to be backedup.
use Mongorestore to restore the backed up data, its very fast , stores in Bson
mongoexport is preferable for backing up the subset of the documents in a collection
Slow compared to mongodump . the data can be stored in either csv or Json ,as per the type specified in the Command.
use Mongoimport to import the backed up data , to a specific collection with in a database .Hope this may help.
vishwanath

MongoDB merging db

Is there a way to merge two mongodb databases?
In a way all records and files from DB2 should be merged to DB1.
I have a Java based web application with several APIs to download file content from the MongoDB. So I'm thinking using bash curl download the file, read the records properties then re-upload (merge) to the destination DB1.
This however will have an issue since the same Mongo _id ObjectID("xxxx") from DB2 cannot be transfer to DB1. MongoDB will automatically generate and assign ObjectID("xxxx") value based on what I understand.
Yes, use Mongodump and Mongorestore.
the chance for a duplicate document id (assuming its not the same document) is extremely low.
and in that case mongo will let you know insertion has failed and you could choose to deal with it however you see fit.
You could also use the write concern flag with the restore to decide how to deal with it while uploading.

mongoimport without dropping the data first

I reset my database every night with a mongoimport command. Unfortunately, I understand that it drops the database first then fills it again.
This means that my database is being queried while half-filled. Is there a way to make the mongoimport atomic ? This would be achieved by first filling another collection, dropping the first then renaming the second.
Is that a builtin feature of mongoimport ?
Thanks,
It's unclear what behaviour you want from your nightly process.
If your nightly process is responsible for creating a new dataset then dropping everything first makes sense. But if your nightly process is responsible for adding to an existing dataset then that might suggest using mongorestore (without --drop) since mongorestore's behaviour is:
mongorestore can create a new database or add data to an existing database. However, mongorestore performs inserts only and does not perform updates. That is, if restoring documents to an existing database and collection and existing documents have the same value _id field as the to-be-restored documents, mongorestore will not overwrite those documents.
However, those concerns seem to be secondary to your need to import / restore into your database while it is still in use. I don't think either mongoimport or mongorestore are viable 'write mechanisms' for use when your database is online and available for reads. From your question, you are clearly aware that issues can arise from this but there is no Mongo feature to resolve this for you. You can either:
Take your system offline during the mongoimport or mongorestore and then bring it backonline once that process is complete and verified
Use mongoimport or mongorestore to create a side-by-side database and then once this database is ready switch your application to read from that database. This is a variant of a Blue/Green or A/B deployment model.

Herkou pg_restore to a database that has changed

I'm wondering what should I do if I'm using Heroku Postgres and I want to dump the data of an App 1.0, then I want to pg_restore the data to a new version of the app, App 2.0. The problem is that App 2.0 has new fields and tables and the pg_restore documentation writes:
... will issue the commands necessary to reconstruct the database to
the state it was in at the time it was saved.
I don't want to reconstruct the database to the state it was on App 1.0, I only want to get the data and put it on the new database, the tables and fields I added should not conflict with the data in the dump file.
One option would be to pg_restore and "reconstruct the database to the state it was in at the time it was saved" and then run the migrations again. Is it the best way to go? there might be a better way, thanks for your suggestions.
You can try a pg_dump --data-only which will skip the table creation and only dump the data rows. Then when you restore, your data will go into existing tables. So you'll need to make sure that they already exist in the new database. I'm not sure offhand what will happen if the table definitions are different.
Alternatively, you could do a pg_dump --table <table> for only the tables you want to keep.

How to reset MongoDB's collection statistics?

Today I've been working on a performance test with MongoDB. Once I managed to use all the left space of my hard disk so the test was halted at the middle. So I removed some of the files and restarted the test after a db.dropDatabase();. But I noticed that the results of db.collection.stats(); seems to be wrong now.
My question is, how can I make MongoDB reset / recalculate statistics of a collection?
Sounds like mongodb is keeping space for the data and indexes it "knows" you will need when you run the test again, even though there is no data there at the moment.
What files did you delete? If you really don't need the data, you could stop mongod, and delete the other files corresponding to the database - but this is only safe if you are running in a test environment, and not sharing your database.
I think you're looking for db.collectionName.drop() function, then to reimport your collection using mongoimport --db dbName --collection collectionName --file fileName to view whether or not those values are correct, quick guess though is that they are correct.