If I want to take the backup of MongoDB data files and transfer it to a different server how can we do that? In the data path, I can see a lot of files are there with the prefix collection, index, and ending with *.wt
Tried with all files but the service got stopped.
I'm trying to take the data from version 3.2 and want to restore the data in version 5
By using mongo import and export it's working. But the challenge is it can not be done in production data as the data size is 8TB+
For that looking for some solution if we can copy the data files only from the data path and send it to the version 5 data path.
Related
Is there a way to merge two mongodb databases?
In a way all records and files from DB2 should be merged to DB1.
I have a Java based web application with several APIs to download file content from the MongoDB. So I'm thinking using bash curl download the file, read the records properties then re-upload (merge) to the destination DB1.
This however will have an issue since the same Mongo _id ObjectID("xxxx") from DB2 cannot be transfer to DB1. MongoDB will automatically generate and assign ObjectID("xxxx") value based on what I understand.
Yes, use Mongodump and Mongorestore.
the chance for a duplicate document id (assuming its not the same document) is extremely low.
and in that case mongo will let you know insertion has failed and you could choose to deal with it however you see fit.
You could also use the write concern flag with the restore to decide how to deal with it while uploading.
My Job is to Improve the speed of reading a lot of small file (1KB) from disk to write into our database.
The database is open source to me, and I can change all the code from the client to the server.
The database architecture is that , it is a simple master-slave distributed HDFS based database like HBase. The small file from disk can be insert into our database and combined into bigger block automatically and then write into HDFS.(also the big file can be split to smaller block by database and then write into HDFS)
One way to change the client is to increase the thread number.
I don't have any other idea.Or you can provide some idea to do the performance analysis.
One of the way to process such small files could be to convert these small files to a sequence file and store it into HDFS. Then use this file as a Map Reduce job input file to put the data into HBase or similar database.
This uses aws as an example but it could be any storage/queue setup:
If the files were able to exist on a shared storage such as S3 you could add one queue entry for each file and then just start throwing servers at the queue to add the files to the db. At that point the bottleneck becomes the db instead of the client.
Mongodb Database generate files automatically after certain period as follow
Doc.0
Doc.1
Doc.2
Doc.3
Doc.4
but Doc.ns file never regenerate like above file
I'm not sure exactly what, if anything, you are specifying as a problem. This is expected behavior. MongoDB allocates new data files as the data grows. The .ns file, which stores namespace information, does not grow like data files, and shouldn't need to.
I have a site that I created using mongodb but now I want to create a new site with MySQL. I want to retrieve data from my old site (the one using mongodb). I use RoboMongo software to connect to mongodb server but I don't see my old data (*.pdf, *.doc). I think that the data is in binary, isn't it?
How can I retrieve this data?
The binary data you've highlighted is stored using a convention called GridFS. Robomongo 0.8.x doesn't support decoding GridFS binary data (see: issue #255).
In order to extract the files you'll either need to:
use the command line mongofiles utility included with MongoDB. For example:
mongofiles list to see files stored
mongofiles get filename to get a specific file
use a different program or driver that supports GridFS
I have got 2 mongo databases. 1. Staging, 2. Production.
In the staging we have around 5 collections of seed data, on which we run some batch jobs and populate few more say 3 collections.
The 8 collection becomes seed data for the production which has user information + this seed data.
Is there any better patterns on managing the data push to the staging and from the staging to production. Right now we are trying to mongoexport all the collections and tar.gz it and archive it on network drive on each stage and mongoimport it.
Its very painful and taking long to export,import and archive which on gzipping is around 1.5 GB.
Is there any good patterns to solve this problem?
'mongoimport' and 'mongoexport' is meant to be used with data from outside systems - all data is translated into plain json and then back again into bson.
If you use 'mongodump' and 'mongorestore' you should see much better performance as both deal with bson directly which is more compact to store and does not require two translations (once to json and once from json).