MongoDB - Safeguard against .remove() entire database? - mongodb

I'm using MongoDB as my database, and as a first-time back-end developer the ease with which I can delete an entire database/collection really bothers me.
Simply typing db.collection.remove() removes all records from that collection!
I know that an effective backup strategy should render this a non-issue, but I occasionally do run .remove() on some collections, and I'd hate to type in the wrong collection name by accident and (a) have to go through a backup restore, and (b) lose whatever data I had gathered between the backup and the restore, especially as my app gathers a lot of user data.
Is there any 'safeguard' I can set up my database to use, even if it's just a warning/confirmation that says
"Yo, are you sure you want to remove everything from <collectionname>? Choose: Yes/No"

User roles won't fix your problem. If your account has permissions to delete one user, you could accidentally delete them all. If your account has permissions to update an attribute for one user, you could accidentally update all of your users.
There's a simple fix for this however.
Step 0: Backup your database. And test your backups regularly. And make sure you get alerted if the backup did not run, or errored. Replica sets are not backups. I know this is obvious, but evidentally it's not obvious to everybody.
Step 1: Write a web admin GUI interface for your database. This it will only take a day or two -- and it should be simple enough that a secretary or intern could use it without fear for your data. (If you think this will take a long time, find a framework with more bells and whistles. Your admin console doesn't even need to be written in the same language as your app.)
Step 2: Data migrations (maintenance transformations of your database) should always be run from scripts checked into source control and tested on non-prod beforehand. The script could be as simple as mongo -e "foo.update(blah)", but you should run it as a script to avoid cut-n-paste errors. Ideally, you would even have a checklist for all migrations. (Check that you have a recent backup. Check the database log and system load beforehand. Write a before and after query that will tell you if the migration was successful...)
Step 3: You now no longer need to use the production Mongo console. So don't. It's a useful tool for development, but that's only needed on local development databases.
The above-mentioned Roles might be useful for read-only queries. But you can already do that against the non-master replica set member.
tl;dr: You can go pretty far using cowboy admin techniques, but eventually you're going to figure out that it's better (and not much more work) to automate everything.

There is nothing you can do in the current version to provide this functionality.
In a future version when user defined roles are available you could define a role which allows insert() and update() but not remove() or drop() etc. and therefore make yourself log-in as a different higher-role user, but that's not available in the current (2.4) version.

Related

Event logging for auditing with replay

I need to implement an auditing log for GDPR compliance so that we have a record of every consent given or revoked (an event) per user of our system. It has to store how & when it happened alongside things like what the wording of the consent actually was at the time.
So that we can recover from a backup restore, this log will be stored separately from our main DB. We will then need to be able to update the state of the user consent so that it accurately reflects the event log (i.e. the last known value (true/false) of each consent question per user)
I could simply do this using a second postgres instance (our main DB is postgres) with a single table to store the information and then some simple application code to log each event as well as update the main DB. There could also be some simple application logic to find the last known states of each consent from the event log and update the master DB.
To me it seems like a bit of overkill using postgres to store this info? though adding a new technology to store this also seems overkill. Are there any technologies that are more suitable for this sort of thing? It sounds a lot like Event Sourcing to me.
If you're already running postgres, it doesn't seem like overkill, given that it needs to be online and queryable. Something like kafka is often a natural fit for this kind of problem, but that's even more overkill.
This bears a passing resemblance to event sourcing, but on a really small scale. Event sourcing usually means that all your data is expressed in terms of events, and replayed from beginning to end to materialize the current state.
Could you elaborate on this?:
So that we can recover from a backup restore, this log will be stored separately from our main DB.
Doesn't your main database recover from a backup / restore?

Point in time reqovery for just one schema or database

I'm going to develop a multi-tenant application, where each tenant lives in its own database or schema (I've not decided this yet).
In this scenario, if I wanted to use point in time recovery (PITR), I also want to have it per-tenant. If a tenant has a problem, I want to be able to roll back only his database or schema and not the whole server.
While I found information how to do backup/restore in such situations with pg_dump and pg_restore, I haven't found any information for PITR.
Is this even possible? If yes, only per database or even per schema?
I can imagine that postgres maybe stores the log of the whole server in a single file, which may be the reason why it could not be possible. But I may be wrong..

Liferay deploy backup with changes in database

When I'd modified or create new tables in Liferay 6.1 and deploy in production server. Liferay automatically makes a backup of each table.
This backup takes a long time when a table has more than 10k records. And a century when has 100k. Although this table hasn't been modified.
What can I do for optimice the deployment to the server?
Many thanks in advance,
At the moment I think only two options are available:
(easy way) Set "build.auto.upgrade=false" in /WEB-INF/src/service.properties to avoid any automatic updates, and perform the db changes (if any) manually.
(hard way) Reworite the Liferays ServiceBuilder so that it perform an update only on those tables which were changed. This will require an EXT development as it is a very core change, and for every next Liferay version you will need to carefully review it and upgrade.
Liferay automatically makes a backup? Where to? This is new to me!
Also, you're describing a "no go" operation: You don't modify any tables in the database. Period. That's what the API is there for. If you do, prepare for disaster, sooner (if you're lucky) or later (according to Murphy, when you forgot that you changed manually. Then you'll blame Liferay for the failure that you caused by manipulating the database).
Do you have your own backup routine implemented that runs on every server restart? This is the only thing that I can imagine to happen here - in that case you'll need to modify your backup strategy. Or the database that you use - maybe a transaction log backup makes more sense than duplicating the table content into the same database...

Mongo Import on a Production DB

I have a production system which is running with thousands of user profiles. These profiles were added by us at the time of launch after which users have added/updated more data into it.
Now, we are planning to add around 20K more profiles into the system. These 20K Profiles are present in our staging mongo instance.
What would be the best approach to get these profiles on Production instance considering there will be constant read/writes happening on the system? Also, we want to make sure if there is any common user than his profile should not be altered as he might have made some changes to it.
I was thinking of using mongorestore as it guaranties to insert documents and not update. What would be the risk involved?
Thanks
As always - test it first!
If your unhappy with mongorestore's impact on loadthen a custom python script will give you more granular control. This will allow you to put whatever business logic and checks you need and potentially add some flood limitations.

How to apply database updates after deployment?

i know this is an often asked question on these boards. And usually the question has been about how to manage the changes being made to the database before you even get around to deploy them.Mostly the answer has been to script the database and save it under sourcecontrol and then any additional updates are saved as scripts under version control too.(ex. Tool to upgrade SQL Express database after deployment)
my question is when is it best to apply the database updates , in the installer or when the new version first runs and connects to the database? note this is a WinApp that is deployed to customers each have their own databases.
One thing to add to the script: Back up the database (or at least the tables you're changing!) before applying the changes.
As a user I think I'd prefer it happens during the install, and going a little further that the installer can roll itself back in the event of a failure. My thinking here is that if I am installing an update, I'd like to know when the update is done that it actually is done and has succeeded. I don't want a message coming up the next time I run it informing me that something failed and I've potentially lost all my data. I would assume that a system admin would probably also appreciate install time feedback (of course, that doesn't matter if your web app isn't something that will be installed on a network). Also, as ראובן said, backing up the database would be a nice convenience.
You haven't said much about the architecture of the application, but since an installer is involved I assume it's a client/server application.
If you have a server installer, that's where you want to put it, since the database structure is only going to change once. Since the client installers are going to need to know about the change, it would be nice to have a way to detect the database version change, and for the old client to be able to download the client update from the server automatically and apply it.
If you only have a client installer, I still think it's better to put it there (maybe as a custom action that fires off the executable for updating the database). But it really isn't going to matter, because conceptually one installer or first-time user of the new version is going to have to fire off the changes to the database anyway. The database changes are going to put structural locks on the database so, in practical terms, everyone is going to have to be kicked off the system at that time for the database update to be applied.
Of course, this is all BS if it's not client-server.