Is there a faster way to clone limited mongo database - mongodb

I want to clone remote Mongo Database to my local database with limited data. I know I can do it thanks to exporting each collection then import it to my local database. But is there any faster or efficient way that I can clone the mongo database?
Thanks.

To take a database dump, you need to:
connect to the server
issue the query requesting the data you want
receive query results (as bson)
perform the minimum amount of transformations on that bson to get to the format you want to save in
mongodump with bson output and filter conditions should do these operations no less efficiently than any hand-built system you can come up with.
Same thing for importing.

Related

how to get export data from oracle to mongoDB

I'm trying to change data from oracle data base to mongodb
the problem is :
my friend give me "file.sql" that have all the tables with data
and I want to get this data and add it to my mongo database
please I don't know how to do it
MongoDB is a NoSQL database, thus the SQL file will not help you.
Ask for CSV files, then you can use the mongoimport tool to import it into your MongoDB. Please note, usually it is a poor design when you migrate from Oracle to MongoDB and move tables to collections one-by-one. Typically in a MongoDB you will have much less collections than you have tables in Oracle.

Mongodb atlas backup/export data auto

I'm using Atlas mongodb
When I try to set a back it says Turn on Backup (M10 and up) which means that I cannot backup (Logical Size928.8 KB).
Is there a different method to backup Atlas mongodb?
I know that I can use Compas to export each collection but it's tedious, I would like to have simple method to backup my data daily, preferable automatic.
Is there a similar service I can use that offer backup for smaller DB as well?

Faster way to remove 2TB of data from single collection (without sharding)

We collect a lot of data and currently decided to migrate our data from mongodb into data lake. We are going to leave in mongo only some portion of our data and use it as our operational database that keeps only newest most relevant data. We have replica set, but we don't use sharding. I suspect, if we had sharded cluster, we could achieve necessary results much simpler, but it's one-time operation, so setting up cluster just for one operation looks like very complex solution (plus I also suspect, that it will be very long running operation to convert such collection into sharded collection, but I can be completely wrong here)
One of our collections has size of 2TB right now. We want to remove old data from original database as fast as possible, but looks like standard "remove" operation is very slow, even if we use unorderedBulkOperation.
I found a few suggestions to copy data into another collection and then just drop original collection instead of trying remove data (so migrate data that we want to keep instead of removing data that we don't want to keep). There are few different ways, that I found, to copy portion of data from original collection to another collection:
Extract data and insert it into other collection one by one. Or extract portion of data and insert it in bulk using insertMany(). It looks faster than just remove data, but still not enough fast.
Use $out operator with aggregation framework to extract portions of data. It's very fast! But it extracts every portion of data into separate collections and doesn't have ability to append data in current mongodb version. So we will need to combine all exported portions of data into one final collection, what is slow again. I see that $out will be able to append data in next release of mongo (https://jira.mongodb.org/browse/SERVER-12280). But we need some solution now, and unfortunately, we won't be able to do quick update of mongo version anyway.
mongoexport / mongoimport - it exports portion of data into json file and append to another collection using import. It's quite fast too, so looks like good option.
Currently it looks like the best choice to improve performance of migration is combination of $out + mongoexport/mongoimport approaches. Plus multithreading to perform multiple described operations at once.
But is there any even faster option that I might missed?

MongoDB. Keep information about sharded collections when restoring

I am using mongodump and mongorestore in a replicated shard cluster in MongoDB 2.2. to get a backup and restore it.
First, I use mongodump for creating the dump of all the system, then I drop a concrete collection and restore it using mongorestore with the output of mongodump. After that, the collection is correct (the data it contains is correct and also the indexes), but the information about if this collection is sharded is lost. Before dropping it, the collection was sharded. After the restore, however, the collection was not sharded anymore.
I was wondering then if a way of keeping this information in backups exist. I was thinking that maybe sharded information for collection is kept in the admin database, but in the dump, admin folder is empty, and using show collections for this database I get nothing. Then I thought it could be kept in the metadata, but this would be strange, because I know that, in the metadata, the information about indexes is stored and indexes are correctly restored.
Then, I would like to know if it could be possible to keep this information using instead of mongodump + mongorestore, filesystem snapshots; or maybe still using mongodump and mongorestore but stopping the system or locking writing. I don't thing this last option could be the reason, because I am not performing writing operations while restoring even not being locking it, but just to give ideas.
I also would like to know if anyone is completely sure about if it is the case that this feature is still not available in the current version.
Any ideas?
If you are using mongodump to back up your sharded collection, are you sure it really needs to be sharded? Usually sharded collections are very large and mongodump would take too long to back it up.
What you can do to back up a large sharded collection is described here.
The key piece is to back up your config server as well as each shard - and do it as close to "simultaneously" as possible after having stopped balancing. Config DB is small so you should probably back it up very frequently anyway. Best way to back up large shards is via file snapshots.

How can I discover a mongo database's structure

I have a Mongo database that I did not create or architect, is there a good way to introspect the db or print out what the structure is to start to get a handle on what types of data are being stored, how the data types are nested, etc?
Just query the database by running the following commands in the mongo shell:
use mydb //this switches to the database you want to query
show collections //this command will list all collections in the database
db.collectionName.find().pretty() //this will show all documents in the database in a readable format; do the same for each collection in the database
You should then be able to examine the document structure.
There is actually a tool to help you out here called Variety:
http://blog.mongodb.org/post/21923016898/meet-variety-a-schema-analyzer-for-mongodb
You can view the Github repo for it here: https://github.com/variety/variety
I should probably warn you that:
It uses MR to accomplish its tasks
It uses certain other queries that could bring a production set-up to a near halt in terms of performance.
As such I recommend you run this on a development server or a hidden node of a replica or something.
Depending on the size and depth of your documents it may take a very long time to understand the rough structure of your database through this but it will eventually give one.
This will print name and its type
var schematodo = db.collection_name.findOne()
for (var key in schematodo) { print (key, typeof key) ; }
I would recommend limiting the result set rather than issuing an unrestricted find command.
use mydb
db.collectionName.find().limit(10)
var z = db.collectionName.find().limit(10)
Object.keys(z[0])
Object.keys(z[1])
This will help you being to understand your database structure or lack thereof.
This is an open-source tool that I, along with my friend, have created - https://pypi.python.org/pypi/mongoschema/
It is a Python library with a pretty simple usage. You can try it out (even contribute).
One option is to use the Mongoeye. It is open-source tool similar to the Variety.
The difference is that Mongoeye is a stand-alone program (Mongo Shell is not required) and has more features (histograms, most frequent values, etc.).
https://github.com/mongoeye/mongoeye
Few days ago I found GUI client MongoDB Compass with some nice visualizations. See the product overview. It comes directly from the mongodb people and according to their doc:
MongoDB Compass is designed to allow users to easily analyze and understand the contents of their data collections within MongoDB...
You may've asked about validation schema. Here's the answer how to get it:
How to retrieve MongoDb collection validator rules?
Use Mongo Compass
which does a sample as explained here
Which does a random sample of 1000 documents to get you the schema - it could miss something but it's the only rational option if you database is several GBs.
Visualisation
The schema then can be exported as JSON
Documentation
You can use MongoDB's tool mongodump. On running it, a dump folder is created in the directory from which you executed mongodump. In that folder, there are multiple folders that correspond to the databases in MongDB, and there are subfolders that correspond to the collections, and files that correspond to the documents.
This method is the best I know of, as you can also make out the schema of empty collections.