I regularly reinstate prod data into testing environments on MongoDB Atlas. I delete it and perform a mongotools restore --collection myCollection. I have good reasons.not to replace the whole dB.
Is there a way to avoid hammering the oplog during such copy and hence generate warnings on the oplog window size?
In the first instance I thought just to disable them, since it doesn't really matter from a backup point of view. However this may cause replica sync issues (which matters little too, due to the nightly nature of the job, but still doesn't feel like I'm doing the right thing).
Thank you.
Related
In our Mongo configuration we have a replica set with a primary and 2 secondaries. We currently have a collection that is about 5TB in size that we want to drop completely. From reading docs it sounds like just dropping the collection would lock the database. Seems like it might take a bit to delete 5TB and anything more than a few minutes downtime really isn't an option.
I tried deleting records a little bit at a time via query and remove commands, but this still slowed the db down to a crawl.
I've thought about taking the primary out of the set, dropping the collection and then putting it back in the set as primary, but what will the impact be of having those changes replicate to the secondaries? Is it still just going to use a ton of cpu and lock things up?
The end goal is to move all of our mongo instances to smaller disks, so it would be nice if there was an option that allowed us to tackle both the migration and the deletion of the data at the same time.
Any advice is appreciated.
I am using replica set (2 mongo, 1 arbitor) for my Sitecore CD servers.
Assuming all mongo DB data get flushed to Reporting SQL DB; do we need to take backup of MongoDB database on production CD ?
If yes what is best approach and frequency to do it; considering My application is moderately using anaytics feature (Personalization , Campaign etc).
Unfortunately, your assumption is bad - the MongoDB is the definitive source of analytic data, not the reporting db. The reporting db contains only the aggregate info needed for generating the report (mostly). In fact, if (when) something goes wrong with the SQL DB, the idea is that it is rebuilt from the source MongoDB. Remember: You can't un-add two numbers after you've added them!
Backup vs Replication
A backup is a point-in-time view of the database, where replication is multiple active copies of a current database. I would advocate for replication over backup for this type of data. Why? Glad you asked!
Currency - under what circumstance would you want to restore a 50GB MongoDB? What if it was a week old? What if it was a month? Really the only useful data is current data, and websites are volatile places - log data backups are out of date within an hour. If you personalise on stale data is that providing a good user experience?
Cost - backing up large datasets is costly in terms of time, storage capacity and compute requirements; they are also a pain to restore and the bigger they are the more likely there's a corruption somewhere
Run of business
In a production MongoDB environment you really should have 2-3 replicas. That's going to save your arse if one of the boxes dies, which they sometimes do - MongoDB works the disks very hard.
These replicas are self-healing, and always current (pretty-much) so they are much better than taking backups. The chances that you lose all your replicas at once is really low except for one particular edge case... upgrades. So a backup is really only protection against hardware failure or data corruption which, in a multi-instance replica set, is already very effectively handled. Unless you're paranoid, you're never going to use that backup and it'll cost you plenty to have it.
Sitecore Upgrades
This is the killer edge-case - always make backups (see Back Up and Restore with MongoDB Tools) before running an upgrade because you can corrupt all of your replicas in one motion and you'll want to be able to roll back.
Data Trimming (side-note)
You didn't ask this, but at some point you'll be thinking "how the heck can I back up this 170GB monster db every day? this is ridiculous" - and you'll be right.
There are various schools of thought around how long this data should be persisted for - that's a question only you or your client can answer. I suggest keeping it until there's too much, then make a decision on how much you have to get rid of. Keep as much as you can tolerate.
I am trying to reconcile this kadira.io article which says that oplog tailing is a must for every Meteor production app with this compose.io article - see section "To Oplog or not Oplog" which says you should only use oplog in certain circumstances.
Basically I have a Meteor app which does not have a high volume of users or a massive amount of continuous writing to collections.
It does however need to read a lot of data from the DB which seems to be slowing things down.
As far as I know it is only running on one server.
I am wondering will adding oplog tailing speed things up?
Thanks in advance.
Basically no matter if you do it, replica set is always doing it to keep all nodes in sync. Usually if your system is not write heavy, the tailing shouldn't be an issue because with replication working, the latest oplog should be in memory. What causes stress is usually the first round when the program tries to find where to tail from. With no index, it has to be a COLLSCAN. Other than that there's no need to worry. But it's a one time thing so as long as you know what's going on, it should be fine.
Back to your question. Yes it's running on one server. Which one depends on your readPreference and replica set tag if any. And after the first time finding the tail point, it shouldn't be a problem.
I'm wondering about experiences people have had with MongoDB backups. Assuming a filesystem snapshot is not an option, what have your experiences been with mongodump/restore versus doing a write lock and backing up the files? Have you run into any bugs with one method that caused you to switch?
From the reading I've done so far, it seems like mongodump/restore has the advantage of being able to run it while the server is live, but I'm not sure how well it will scale.
Locking and copying files is only an option when you don't have heavy write load.
mongodump can be run against live server. It will create some additional load, so don't do it on peak hours. Also, it is advised to do it on a secondary node (if you don't use replica sets, you should).
There are some complications when you have a DB so large that no single machine can hold it. See this document.
Also, if you have replica set, you take down one of secondaries and copy its files directly. See http://www.mongodb.org/display/DOCS/Backups:
A simple approach is just to stop the database, back up the data files, and resume. This is safe but of course requires downtime. This can be done on a secondary without requiring downtime, but you must ensure your oplog is large enough to cover the time the secondary is unavailable so that it can catch up again when you restart it.
In an app that uses MongoDB, when/where is the best place to make database changes that would be migrations in a relational database?
For example, how should creating indexes or setting shard keys be managed? Where should this code go?
it's probably best to do this in the shell, conciously!, because you could cause havoc if you accidentally start such a command at the wrong moment and on the wrong instance.
Most importantly: do this offline on an extra slave instance if you add an index on an existing DB! For large data sets, building an index can take hours, even days!
see also:
http://www.mongodb.org/display/DOCS/Indexes
http://www.javabeat.net/articles/print.php?article_id=353
http://www.mongodb.org/display/DOCS/Indexing+as+a+Background+Operation
http://nosql.mypopescu.com/post/1312926692/mongodb-indexes-and-indexing
If you have a large data set, make sure to read up on the 4square outage last year..!!
http://www.infoq.com/news/2010/10/4square_mongodb_outage
http://blog.foursquare.com/2010/10/05/so-that-was-a-bummer/
http://highscalability.com/blog/2010/10/15/troubles-with-sharding-what-can-we-learn-from-the-foursquare.html
one of the main reasons for not wanting to put indexing in a script or config file of some sort is that in MongoDB the index operation is blocking(!) -- that means MongoDB will stop other operations on the database from proceeding until the indexing is completed. Just imagine an innocent change in the code, requiring a new index to improve performance -- and this change is carelessly checked-in and deployed to production ... and suddenly your production MongoDB is feezing up for your app-server, because MongoDB is internally adding the new index first before doing anything else... outch! Apparently that has happened to a couple of folks, that's why they keep reminding people at the MongoDB conferences to be careful to not 'programmatically' require indexes.
New versions of MongoDB allow background indexing -- you should always do that e.g. db.yourcollection.ensureIndex(..., {background: true})
otherwise, not-so-fun stuff happens:
https://jira.mongodb.org/browse/SERVER-1341
https://jira.mongodb.org/browse/SERVER-3067