I'm going to develop a multi-tenant application, where each tenant lives in its own database or schema (I've not decided this yet).
In this scenario, if I wanted to use point in time recovery (PITR), I also want to have it per-tenant. If a tenant has a problem, I want to be able to roll back only his database or schema and not the whole server.
While I found information how to do backup/restore in such situations with pg_dump and pg_restore, I haven't found any information for PITR.
Is this even possible? If yes, only per database or even per schema?
I can imagine that postgres maybe stores the log of the whole server in a single file, which may be the reason why it could not be possible. But I may be wrong..
Related
I'm looking to deploy moodle in the cloud however I have some 50 odd sites which require access to this moodle possibly even temporarily offline. So I'm looking into replicating moodle down onto each site. From what I understand there are 2 data stores that require replication, moodledata and the database, postgresql in our case. moodledata if I'm not mistaken contains the multimedia data and the database among other things all the user records. Luckily the multimedia data will be centralized and is thus synched only one way down to the nodes, that seems doable. Where I'm stuck is how do I handle the Postgres database where the sync will need to be bidirectional?
I'm trying to set up an architecture with 2 databases, say preview and live, that have the exact same schemas. The use case is that edits can be made to the preview database and then pushed to the live database after they are vetted and approved. The production application would read from the live database.
What would be the most appropriate way to push all data from the preview database to the live database without bringing the live database down? Ideally the copy from preview to live would be an atomic transaction.
I've worked with this type of setup in MSSQL, but I'm fairly new to Postgres. So I'm open to hearing other ways to architect this (with Schemas perhaps?).
EDIT: The main reason to use separate databases is that I may need more than 1 target database (not just a single "live" database). I also may need to switch target databases on the fly without altering the source database schema.
I think what you're looking for is a "hot standby". This would be a separate instance of Postgresql, possibly on the same server but usually not, which is a near-real-time replica of the primary server.
In broad strokes, this is done by shipping the binary transaction logs from the primary server to the backup server, and then "replaying" them there. The exact mechanism for transmitting the logs may vary depending on your requirements.
Fortunately, the docs on this are excellent:
https://www.postgresql.org/docs/9.3/static/warm-standby.html
https://www.postgresql.org/docs/9.0/static/hot-standby.html
In order to secure our database we create a schema for each new customer. We then create a user for this schema and when a customer logs in via the web we use their user and hence prevent them gaining access to other areas of the database.
Our issue is with connection pooling as it is a bit inefficient to keep creating/dropping new connections for these users. We would like to have a solution that can work across many hundreds of different database users.
We've looked at pg_bouncer, but the issue here is that we have to create a text record in an ini file for each user and restart pg_bouncer every time we set up a customer. This is not a great solution.
Is there an alternative solution that works in real time and would mean a customers connection/connection(s) would stay in the pool whilst they were active?
According to the latest release notes pgbouncer might actually do this. But I haven't tried.
Pooling mode can be configured both per-database and per-user.
As for use case in general. We also had this kind of issue a while ago. We just went with connection pooling with one user/database and multiple schemas. Before running psql query we just used SET search_path TO schemaName. As for logging, we had compliance mode, when we could log activity per customer and save it in appropriate schema.
I'm using MongoDB as my database, and as a first-time back-end developer the ease with which I can delete an entire database/collection really bothers me.
Simply typing db.collection.remove() removes all records from that collection!
I know that an effective backup strategy should render this a non-issue, but I occasionally do run .remove() on some collections, and I'd hate to type in the wrong collection name by accident and (a) have to go through a backup restore, and (b) lose whatever data I had gathered between the backup and the restore, especially as my app gathers a lot of user data.
Is there any 'safeguard' I can set up my database to use, even if it's just a warning/confirmation that says
"Yo, are you sure you want to remove everything from <collectionname>? Choose: Yes/No"
User roles won't fix your problem. If your account has permissions to delete one user, you could accidentally delete them all. If your account has permissions to update an attribute for one user, you could accidentally update all of your users.
There's a simple fix for this however.
Step 0: Backup your database. And test your backups regularly. And make sure you get alerted if the backup did not run, or errored. Replica sets are not backups. I know this is obvious, but evidentally it's not obvious to everybody.
Step 1: Write a web admin GUI interface for your database. This it will only take a day or two -- and it should be simple enough that a secretary or intern could use it without fear for your data. (If you think this will take a long time, find a framework with more bells and whistles. Your admin console doesn't even need to be written in the same language as your app.)
Step 2: Data migrations (maintenance transformations of your database) should always be run from scripts checked into source control and tested on non-prod beforehand. The script could be as simple as mongo -e "foo.update(blah)", but you should run it as a script to avoid cut-n-paste errors. Ideally, you would even have a checklist for all migrations. (Check that you have a recent backup. Check the database log and system load beforehand. Write a before and after query that will tell you if the migration was successful...)
Step 3: You now no longer need to use the production Mongo console. So don't. It's a useful tool for development, but that's only needed on local development databases.
The above-mentioned Roles might be useful for read-only queries. But you can already do that against the non-master replica set member.
tl;dr: You can go pretty far using cowboy admin techniques, but eventually you're going to figure out that it's better (and not much more work) to automate everything.
There is nothing you can do in the current version to provide this functionality.
In a future version when user defined roles are available you could define a role which allows insert() and update() but not remove() or drop() etc. and therefore make yourself log-in as a different higher-role user, but that's not available in the current (2.4) version.
I want to write a C++ administrative app to simplify management of DBs I am in charge of. Currently, when I want to tell if there are users connected to multiple Firebird databases operated by 2 different instances of it, I have to connect to every single DB and check. That's ok, but I don't want to register every new database that is being created when i don't look, I want some way to list databases that are currently open or otherwise in use by the server. Current 2 uses of this functionality I can think of are:
Auto-inclusion in backup procedure
Application update, which require users to log off (one-look and I would be able to tell whom to kick or at least which department to call)
Firebird does not have an API to list all available databases. Technically Firebird simply doesn't know about the existence of a database until you actually connect to it.
You might be able to find all databases that are being connected to using the Trace API or the monitoring tables, but that does not exclude the possibility that other databases exist on your system.