mongodump - query for one collection - mongodb

I'm trying to write a mongodump / mongorestore script that would copy our data from the production environment to staging once a week.
Problem is, I need to filter out one of the collections.
I was sure I'd find a way to apply a query only on a specific collection during the mongodump, but it seems like the query statement affects all cloned collections.
So currently I'm running one dump-restore for all the other collections, and one for this specific collection with a query on it.
Am I missing something? Is there a better way to achieve this goal?
Thanks!

It is possible.
--excludeCollection=<string>
Excludes the specified collection from the mongodump output. To exclude multiple collections, specify the --excludeCollection multiple times.
Example
mongodump --db=test --excludeCollection=users --excludeCollection=salaries
See Details here.
Important mongodump writes to /dump folder. If it's already there, it will overwrite everything.
If you need that data rename the folder or give mongodump an --out directory. Otherwise you don't need to worry.

Related

Mongo Restore fails to restore time series collection with the same name

I have a timeseries collection in mongodb liveSamples collection, I'm trying to dump some documents from this collection and then restore it later with mongorestore to the same time series collection liveSamples.
The problem is that using mongorestore it doesn't allow to restore to an existing time series, so it asks to use --drop, but if i use drop ill lose all existing data in the collection.
notice that this works in a normal collection but not in a timeseries colelction.
This works in mongoexport and mongoimport, but the thing is that im using the capability of mongodump and mongorestore to compress the documents to gzip which is very critical for me.
How can i overcome this?

How to handle databases or collection being created accidentally in mongoDB? [duplicate]

Is there a way to switch off the ability of mongo to sporadically create dbs and collections as soon as it sees one in a query. I run queries on the mongo console all the time and mistype a db or collection name, causing mongo to just create one. There should be a switch to have mongo only explicitly create dbs and collections. I can't find one on the docs.
To be clear, MongoDB does not auto create collections or databases on queries. For collections, they are auto created when you actually save data to them. You can test this yourself, run a query on a previously unknown collection in a database like this:
use unknowndb
db.unknowncollection.find()
show collections
No collection named "unknowncollection" shows up until you insert or save into it.
Databases are a bit more complex. A simple "use unknowndb" will not auto create the database. However, if after you do that you run something like "show collections" it will create the empty database.
I agree, an option to control this behavior would be great. Happy to vote for it if you open a Jira ticket at mongoDB.
No, implicit creation of collections and DBs is a feature of the console and may not be disabled. You might take a look at the security/authorization/role features of 2.6 and see if anything might help (although there's not something that exactly matches your request as far as I know).
I'd suggest looking through the MongoDB issues/bug/requests database system here to and optionally add the feature request if it doesn't already exist.
For people who are using Mongoose, a new database will get created automatically if your Mongoose Schema contains any form of index. This is because Mongo needs to create a database before it can insert said index.

MongoDB. Keep information about sharded collections when restoring

I am using mongodump and mongorestore in a replicated shard cluster in MongoDB 2.2. to get a backup and restore it.
First, I use mongodump for creating the dump of all the system, then I drop a concrete collection and restore it using mongorestore with the output of mongodump. After that, the collection is correct (the data it contains is correct and also the indexes), but the information about if this collection is sharded is lost. Before dropping it, the collection was sharded. After the restore, however, the collection was not sharded anymore.
I was wondering then if a way of keeping this information in backups exist. I was thinking that maybe sharded information for collection is kept in the admin database, but in the dump, admin folder is empty, and using show collections for this database I get nothing. Then I thought it could be kept in the metadata, but this would be strange, because I know that, in the metadata, the information about indexes is stored and indexes are correctly restored.
Then, I would like to know if it could be possible to keep this information using instead of mongodump + mongorestore, filesystem snapshots; or maybe still using mongodump and mongorestore but stopping the system or locking writing. I don't thing this last option could be the reason, because I am not performing writing operations while restoring even not being locking it, but just to give ideas.
I also would like to know if anyone is completely sure about if it is the case that this feature is still not available in the current version.
Any ideas?
If you are using mongodump to back up your sharded collection, are you sure it really needs to be sharded? Usually sharded collections are very large and mongodump would take too long to back it up.
What you can do to back up a large sharded collection is described here.
The key piece is to back up your config server as well as each shard - and do it as close to "simultaneously" as possible after having stopped balancing. Config DB is small so you should probably back it up very frequently anyway. Best way to back up large shards is via file snapshots.

How can I discover a mongo database's structure

I have a Mongo database that I did not create or architect, is there a good way to introspect the db or print out what the structure is to start to get a handle on what types of data are being stored, how the data types are nested, etc?
Just query the database by running the following commands in the mongo shell:
use mydb //this switches to the database you want to query
show collections //this command will list all collections in the database
db.collectionName.find().pretty() //this will show all documents in the database in a readable format; do the same for each collection in the database
You should then be able to examine the document structure.
There is actually a tool to help you out here called Variety:
http://blog.mongodb.org/post/21923016898/meet-variety-a-schema-analyzer-for-mongodb
You can view the Github repo for it here: https://github.com/variety/variety
I should probably warn you that:
It uses MR to accomplish its tasks
It uses certain other queries that could bring a production set-up to a near halt in terms of performance.
As such I recommend you run this on a development server or a hidden node of a replica or something.
Depending on the size and depth of your documents it may take a very long time to understand the rough structure of your database through this but it will eventually give one.
This will print name and its type
var schematodo = db.collection_name.findOne()
for (var key in schematodo) { print (key, typeof key) ; }
I would recommend limiting the result set rather than issuing an unrestricted find command.
use mydb
db.collectionName.find().limit(10)
var z = db.collectionName.find().limit(10)
Object.keys(z[0])
Object.keys(z[1])
This will help you being to understand your database structure or lack thereof.
This is an open-source tool that I, along with my friend, have created - https://pypi.python.org/pypi/mongoschema/
It is a Python library with a pretty simple usage. You can try it out (even contribute).
One option is to use the Mongoeye. It is open-source tool similar to the Variety.
The difference is that Mongoeye is a stand-alone program (Mongo Shell is not required) and has more features (histograms, most frequent values, etc.).
https://github.com/mongoeye/mongoeye
Few days ago I found GUI client MongoDB Compass with some nice visualizations. See the product overview. It comes directly from the mongodb people and according to their doc:
MongoDB Compass is designed to allow users to easily analyze and understand the contents of their data collections within MongoDB...
You may've asked about validation schema. Here's the answer how to get it:
How to retrieve MongoDb collection validator rules?
Use Mongo Compass
which does a sample as explained here
Which does a random sample of 1000 documents to get you the schema - it could miss something but it's the only rational option if you database is several GBs.
Visualisation
The schema then can be exported as JSON
Documentation
You can use MongoDB's tool mongodump. On running it, a dump folder is created in the directory from which you executed mongodump. In that folder, there are multiple folders that correspond to the databases in MongDB, and there are subfolders that correspond to the collections, and files that correspond to the documents.
This method is the best I know of, as you can also make out the schema of empty collections.

Bulk update/upsert in MongoDB?

Is it possible to do bulk update/upsert (not insert) in MongoDB?
If yes, please point me to any docs related to this?
Thanks
You can use the command line program mongoimport it should be in your MongoDB bin dir ...
There are two options you'll want to look into to use upsert ...
--upsert insert or
update objects that already exist
--upsertFields arg comma-separated fields for the query
part of the
upsert. You should make sure this is indexed
More info here: http://www.mongodb.org/display/DOCS/Import+Export+Tools
Or just do ...
$ mongoimport --help
mongo can execute .js file.
you can push all you update commands in a js file.
t.js
db.record.update({md5:"a35f10a8339ab678612d1f86be08b81a"},{$set:{algres:[]}},false,true);
db.record.update({md5:"a35f10a8339ab678612d1f86be08b81b"},{$set:{algres:[]}},false,true);
then,
mongo 127.0.0.1/test t.js
Bulk updates can also be done in batches as found in the documentation:
MongoDB Bulk Methods
I use these to import CSV files that I need to massage a bit before importing the data. Its kinda slow when dealing with updates, but it did my 50K document updates in about 83 seconds, which is far slower than mongoimport command.