Ok, so there are views one can go to to dig information on this ...
pg_tables
pg_views
v_generate_user_grant_revoke_ddl
.. and a whole AWS page on the subject:
https://aws.amazon.com/premiumsupport/knowledge-center/redshift-user-cannot-be-dropped/
However, all of these rely upon you querying these various views in each database that the user 'may' have permissions in.
i.e 1 database at a time.
A provisioned Redshift cluster can have up to 50 databases, and when Serverless then I believe 100 + ?
Ultimately, this means that we are really going to have to write some external scripts to query each of the cluster's databases, extract the content of multiple system/admin views, and store/consolidate somewhere central.
I am seriously hoping that someone has a better idea ?
Thanks for insight ....
Related
I have one machine on which several applications are hosted. Applications work on separated data and don't interact - each application only needs access to its own data. I want to use PostgreSQL as RDBMS. Which one of the following is best and why?
One global Postgres sever, one global database, one schema per application.
One global Postgres server, one database per application.
One Postgres server per application.
Feel free to suggest additional architectures if you think they would be better than the ones above.
The questions you need to ask yourself: does any application ever need to access data from another application (in the same SQL statement). If you can can answer that with a clear NO, then you should at least go for separate databases. Cross-database queries aren't that straight-forward in Postgres, so if the different applications do need a lot of data from other applications, then solution 1 might be deployment layout to think about. If this would only concern very few tables, then using foreign data wrappers with different databases might still be a better solution.
Solution 2 and 3 are more or less the same from the perspective of each application. One thing to keep in mind when deciding between 2 and 3 is availability. Some configuration changes to Postgres require a restart of the service. Is an outage of all applications acceptable in that case, even though the change was only necessary for one?
But you can always start with option 2 and then move database to different servers later.
Another question to ask is if all applications always use the same (major) Postgres version With solution 2 you must make sure that all applications are compatible with a new Postgres version if one of them wants to upgrade e.g. because of new features that the application wants to use.
Solution 1 is stupid : a SQL schema is not a database. Use SQL schema for one application that have multiple "parts" like "Poduction", "sales", "marketing", "finances"...
While the final volume of the data won't be too heavy and the number of user won't be too much, use only one PG cluster to facilitate administration tasks
If the volume of data or the number of user increases, it will be time to separates your different databases on new distinct PG clusters....
In an effort to do some basic housekeeping on our Amazon RDS (Postgresql) instance, my team hopes to drop unused or rarely used tables from our database. In Redshift, I used the stl_query table to determine which tables were accessed frequently enough to remain.
The problem is, I can't seem to figure out an equivalent strategy for Postgres. I tried checking the log files in the console, but these don't appear to have the correct info.
Aside from searching our code base for references to used tables, is there a good strategy to find unused / infrequently used tables in Postgres? If sufficient logs exist, I am willing to write some sort of parsing script to get the necessary data - I just need to find a good source.
It turns out the statistics I need live in the statistics collector views, specifically pg_stat_user_tables.
This is the query I was able to find infrequently accessed tables:
SELECT
relname,
schemaname
FROM
pg_stat_user_tables
WHERE
(idx_tup_fetch + seq_tup_read) < 5; --access threshold
We have a SaaS application where each tenant has its own database in Postgres. How would I apply a patch to all the databses? For example if I want to add a table or add a column to a table, I have to either write a program that loops through all databases and execute a SQL against them or using pgadmin, go through them one by one.
Is there smarter and/or faster way?
Any help is greatly appreciated.
Yes, there's a smarter way.
Don't create a new database for each tenant. If everything is in one database then you only need to alter one database.
Pick one database, alter each table to have the column TENANT and add this to the primary key. Then insert into this database every record for all tenants and drop the other databases (obviously considerably more work than this as your application will need to be changed).
The differences with your approach are extensively discussed elsewhere:
What problems will I get creating a database per customer?
What are the advantages of using a single database for EACH client?
Multiple schemas versus enormous tables
Practicality of multiple databases per client vs one database
Multi-tenancy - single database vs multiple database
If you don't put everything in one database then I'm afraid you have to alter them all individually, and doing it programatically would be simplest.
At a higher level, all multi-tenant applications follow one of three approaches:
One tenant's data lives in one database,
One tenant's data lives in one schema, or
Add a tenant_id / account_id column to your tables (shared schema).
I usually find that developers use the following criteria when they evaluate these different approaches.
Isolation: Since you can put each tenant into its own database in one hand, and have tenants share the same table on the other, this becomes the most apparent dimension. If you provide your users raw SQL access or you're in a regulated industry such as healthcare, you may need strict guarantees from your database. That said, PostgreSQL 9.5 comes with row level security policies that makes this less of a concern for most applications.
Extensibility: If your tenants are sharing the same schema (approach #3), and your tenants have fields that varies between them, then you need to think about how to merge these fields.
This article on multi-tenant databases has a great summary of different approaches. For example, you can add a dozen columns, call them C1, C2, and so forth, and have your application infer the actual data in this column based on the tenant_id. PostgresQL 9.4 comes with JSONB support and natively allows you to use semi-structured fields to express variations between different tenants' data.
Scaling: Another criteria is how easily your database would scale-out. If you create a tenant per database or schema (#1 or #2 above), your application can make use of existing Ruby Gems or [Django packages][1] to simplify app integration. That said, you'll need to manually manage your tenants' data and the machines they live on. Similarly, you'll need to build your own sharding logic to propagate foreign key constraints and ALTER TABLE commands.
With approach #3, you can use existing open source scaling solutions, such as Citus. For example, this blog post describes how to easily shard a multi-tenant app with Postgres.
it's time for me to give back to the community :) So after 4 years, our multi-tenant platform is in production and I would like to share the following observations/experiences with all of you.
We used a database per each tenant. This has given us extreme flexibility as the size of the databases in the backups are not huge and hence we can easily import them into our staging environment for customers issues.
We use Liquibase for database development and upgrades. This has been a tremendous help to us, allowing us to package the entire build into a simple war file. All changes are easily versioned and managed very efficiently. There is a bit of learning curve here an there but nothing substantial. 2-5 days can significantly save you time.
Given that we use Spring/JPA/Hibernate, we use a technique called Dynamic Data Source Routing. So when a user logs-in, we find the related datasource with a lookup and connect them to the session to the right database. That's also when the Liquibase scripts get applied for updates.
This is, for now, I will come back with more later on.
Well, there are problems with one database for all tenants in our case for sure.
The backup file gets huge and becomes almost not practical hard to manage
For troubleshooting, we need to restore customer's data in our dev env, we just use that customer's backup file and usually the file is not as big as if we were to use one database for all customers.
Again, Liquibase has been key in allowing to manage updates across all the tenants seamlessly and without any issues. Without Liquibase, I can see lots of complications with this approach. So Liquibase, Liquibase and more Liquibase.
I also suspect that we would need a more powerful hardware to manage a huge database with large joins across millions of records vs much lighter database with much smaller queries.
In case of problems, the service doesn't go down for everyone and there will be limited to one or few tenants.
In general, for our purposes, this has been a great architectural decision and we are benefiting from it every day. One time we had one customer that didn't have their archiving active and their database size grew to over 3 GB. With offshore teams and slower internet as well as storage/bandwidth prices, one can see how things may become complicated very quickly.
Hope this helps someone.
--Rex
I'm working on a web application where I need to warn the user that they're running out of space in the given db user's tablespace.
The application doesn't know the credentials of the db's system user, so I can't query views like dba_users, dba_free_space..etc.
My question is, is there a way in Oracle for a user to find out how much space there is left for them in their tablespace?
Thanks!
Forgive my ignorance on the subject, for I believed only views available on data storage were dba_free_space etc..
I realized that for the logged user, there are user_free_space.. views for them.
Modified version of the query mentioned here would be the answer my question.
Query is as follows: (Getting the space left on the DEFAULT_TABLESPACE of the logged user)
SELECT
ts.tablespace_name,
TO_CHAR(SUM(NVL(fs.bytes,0))/1024/1024, '99,999,990.99') AS MB_FREE
FROM
user_free_space fs,
user_tablespaces ts,
user_users us
WHERE
fs.tablespace_name(+) = ts.tablespace_name
AND ts.tablespace_name(+) = us.default_tablespace
GROUP BY
ts.tablespace_name;
It would return free space in MB
create a stored package as a user that has the necessary privileges. You may have to create a new user. Grant EXECUTE on the package to any user that needs it. The packages needs to have all the procedures and functions needed to access the DBA views but should be coded carefully to avoid accessing "too much" information. You may want to write a second package in the account of a non-privileged user to encapsulate the logic.
This is potentially very complex, as it's quite possible for the user to:
Receive an "out of space" error even though the tablespaces that they have privileges on, including their default tablespace, have plenty of space. This could happen when they insert into a table that is owned by a different user which is on a tablespace that your user has no quota on. In this case, your user probably does not have access to the views required to determine whether there is free space or not,
Be able to continue inserting data even though there is no free space on the tablespaces on which they have a quota -- they might not even have a quota on their default tablespaces.
So unless you have a rather simple case you really have to be very aware of the way that the user interacts with the database on a far deeper level, and look at free space from a more database-holistic viewpoint.
Our system will run on a local network with no more than 50 clients that connect to the same local server. We are creating a DB user for each client, to take advantage of the postgresql privilege system.
1) Analyzing the "performance", its OK to have ~ 50 DB users instead of reimplementing a custom system?
2) (SOLVED) How can the user check (what SQL statement) what permission he has in a table?
Solution:
SELECT HAS_TABLE_PRIVILEGE('user','table','insert')
I prefer to not reimplement the system, since a good security system isn't trivial to implement.
To answer the user/performance question: probably not. The only real risk would depend on how many users have unique security permissions (for example, if every one of those 50 users had different permissions on each table/schema in the database). In practice this should never happen, and as long as you have a sane group system for permissions, you should be fine.