In order to secure our database we create a schema for each new customer. We then create a user for this schema and when a customer logs in via the web we use their user and hence prevent them gaining access to other areas of the database.
Our issue is with connection pooling as it is a bit inefficient to keep creating/dropping new connections for these users. We would like to have a solution that can work across many hundreds of different database users.
We've looked at pg_bouncer, but the issue here is that we have to create a text record in an ini file for each user and restart pg_bouncer every time we set up a customer. This is not a great solution.
Is there an alternative solution that works in real time and would mean a customers connection/connection(s) would stay in the pool whilst they were active?
According to the latest release notes pgbouncer might actually do this. But I haven't tried.
Pooling mode can be configured both per-database and per-user.
As for use case in general. We also had this kind of issue a while ago. We just went with connection pooling with one user/database and multiple schemas. Before running psql query we just used SET search_path TO schemaName. As for logging, we had compliance mode, when we could log activity per customer and save it in appropriate schema.
Related
We have a PostgreSQL database with PostGIS running and today we ran into the issue that too less connections were available. Mostly we are using QGIS to access the database. We realized that issue because multiple users got the following error:
FATAL: remaining connection slots are reserved for non-replication superuser connections
When checking the number of connections in pgAdmin I realized a thing I saw before, but as I never ran into problems didn't care too much about.
QGIS creates multiple connections to PostgreSQL for the same user to the same database.
Now I am wondering why this is the case and how I can maybe change that behaviour.
Could this happen for example if a person got access rights to a database through different user groups?
One approach might be the issue that some users run into that if you add layers to a QGIS project that was created before might ask you multiple times for your login credentials if those changed. This seem to me that probably different credentials are saved with the project and therefor multiple connections might be used. Can anyone confirm or dispruve this? - Suggestions for a test scenario are also welcome to check this.
Any ideas, hints or soutions are welcome.
By the way: Yes we increase the number of max_connections, but I want to understand why this happens and get closer to the core of the situation.
I would like to develop a specific app that can be used to access a database developed in PostgreSQL. The app performs calculations and asks for the required data from the database server.
The user can download the app from a website if he has registered. After starting the app, the user has to log in to be able to use it.
Now the question:
What would be the most sensible solution in this example?
To be honest, I don't want to create a separate role for each user.
My idea is that the app only accesses the database via a general role, for example with the name "usership". With this role, a user only has well-defined read access. It is possible that users should also be able to save their own settings or measured values under their user name in certain tables. Access would then only be possible with the correct user name and password, which are specified with each operation on the relevant tables (however, this effort would not be necessary for read-only access to other tables with general data).
The question is whether there are any limits to how many apps can communicate with the database at the same time via the same database credentials / username "usership".
I don't want to have to create a separate DB role for each customer. Somehow that doesn't seem right to me, if only because adding new employees or deleting them means major interventions in the database schema (create / drop role). Basically, the app should do nothing else than a website where several users are logged in at the same time, the only difference being that the app does not run in the browser and everything works either on the client side at the application level or on the database server.
I'm not aware of any limits on sharing of usernames + passwords in postgres. You can have hundreds or thousands of concurrent connections using the same username + password.
There can be issues with many hundreds or thousands of concurrent connections, depending on your database hardware, especially ram.
While Postgres supports thousands of concurrent connections in theory, in practice I've run into memory issues as the # of open connections increases. If this is a problem and a large % of your connections are idle at any one moment, you can add a layer of connection pooling with something like pgbouncer, but keep in mind that adds another process to monitor.
In general, however, I wouldn't recommend this approach. You'd be providing direct, essentially anonymous access to your shared database. I expect it would be difficult to secure your database credentials in the client, and with direct access it should be fairly easy to construct SQL queries that would take down your database server. This would be difficult to monitor or prevent against since all users would be the same and you'd have no way to revoke access in case of abuse (without changing the password for everyone that has access).
From a security standpoint I'd definitely recommend being able to identify your users, monitor their usage separately and revoke access individually. I don't know of any performance issues with having many thousands of separate postgres users/credentials.
-- Scalability --
Using a postgres cluster with read replicas and load balancing (e.g. https://aws.amazon.com/premiumsupport/knowledge-center/requests-rds-read-replicas/) you should be able to scale this horizontally fairly easily if the need arises.
I'm trying to set up an architecture with 2 databases, say preview and live, that have the exact same schemas. The use case is that edits can be made to the preview database and then pushed to the live database after they are vetted and approved. The production application would read from the live database.
What would be the most appropriate way to push all data from the preview database to the live database without bringing the live database down? Ideally the copy from preview to live would be an atomic transaction.
I've worked with this type of setup in MSSQL, but I'm fairly new to Postgres. So I'm open to hearing other ways to architect this (with Schemas perhaps?).
EDIT: The main reason to use separate databases is that I may need more than 1 target database (not just a single "live" database). I also may need to switch target databases on the fly without altering the source database schema.
I think what you're looking for is a "hot standby". This would be a separate instance of Postgresql, possibly on the same server but usually not, which is a near-real-time replica of the primary server.
In broad strokes, this is done by shipping the binary transaction logs from the primary server to the backup server, and then "replaying" them there. The exact mechanism for transmitting the logs may vary depending on your requirements.
Fortunately, the docs on this are excellent:
https://www.postgresql.org/docs/9.3/static/warm-standby.html
https://www.postgresql.org/docs/9.0/static/hot-standby.html
I'm going to develop a multi-tenant application, where each tenant lives in its own database or schema (I've not decided this yet).
In this scenario, if I wanted to use point in time recovery (PITR), I also want to have it per-tenant. If a tenant has a problem, I want to be able to roll back only his database or schema and not the whole server.
While I found information how to do backup/restore in such situations with pg_dump and pg_restore, I haven't found any information for PITR.
Is this even possible? If yes, only per database or even per schema?
I can imagine that postgres maybe stores the log of the whole server in a single file, which may be the reason why it could not be possible. But I may be wrong..
Our new security policies require data access restriction for developers to the production database. Setting up -RO parameter does not work for several reasons (extracts from 'Startup command and Parameter reference' http://documentation.progress.com/output/OpenEdge102b/pdfs/dpspr/dpspr.pdf)
1) "If you use the -RO parameter when other users are updating the database, you might see invalid data, such as stale data or index entries pointing to records that have been deleted."
2) "A read-only session is essentially a single-user session. Read-only users do not share database resources (database buffers, lock table, index cursors)."
3) "When a read-only session starts, it does not check for the existence of a lock file for the database. Furthermore, a read-only user opens the database file, but not the log or before-image files.
Therefore, read-only user activity does not appear in the log file."
We would like to be able to access data on the production database from OpenEdge Architect, but not being able to edit data. Is it possible?
In most security conscious companies developers are not allowed to access production. Period. Full stop.
One thing that you could do as a compromise... if the need is to occasionally query data you could give them access to a replicated database via OpenEdge Replication Plus. This is a read-only db connection without the drawbacks of -RO. It is real-time, up to date and access is separately controlled -- you could, for instance, put the replicated db on a different server that is on a different subnet.
The short answer is no, they can't access it directly and read-only.
If you have an appserver, you could write some code which would provide a level of dynamic RO data access via appserver or webservice calls.
The other question I'd have is - what are your developers doing accessing the production database? That should be a big no-no.