Maximum number of user per cluster in redshift - amazon-redshift

We are trying to create multiple users for our redshift cluster to implement WLM. can anyone please tell us what is the maximum number of users supported by redshift per cluster.

Although I don't know the actual limit number of users, I confirmed more than 10,000 users can be created on a Redshift cluster.
dev=# select count(*) from pg_user;
count
-------
10003
(1 row)

Related

PostgreSQL Database size is not equal to sum of size of all tables

I am using an AWS RDS PostgreSQL instance. I am using below query to get size of all databases.
SELECT datname, pg_size_pretty(pg_database_size(datname))
from pg_database
order by pg_database_size(datname) desc
One database's size is 23 GB and when I ran below query to get sum of size of all individual tables in this particular database, it was around 8 GB.
select pg_size_pretty(sum(pg_total_relation_size(table_schema || '.' || table_name)))
from information_schema.tables
As it is an AWS RDS instance, I don't have rights on pg_toast schema.
How can I find out which database object is consuming size?
Thanks in advance.
The documentation says:
pg_total_relation_size ( regclass ) → bigint
Computes the total disk space used by the specified table, including all indexes and TOAST data. The result is equivalent to pg_table_size + pg_indexes_size.
So TOAST tables are covered, and so are indexes.
One simple explanation could be that you are connected to a different database than the one that is shown to be 23GB in size.
Another likely explanation would be materialized views, which consume space, but do not show up in information_schema.tables.
Yet another explanation could be that there have been crashes that left some garbage files behind, for example after an out-of-space condition during the rewrite of a table or index.
This is of course harder to debug on a hosted platform, where you don't have shell access...

How does Heroku Postgres count rows for purposes of enforcing plan limits?

I have received the following email from Heroku:
The database DATABASE_URL on Heroku app [redacted] has
exceeded its allocated storage capacity. Immediate action is required.
The database contains 12,858 rows, exceeding the Hobby-dev plan limit
of 10,000. INSERT privileges to the database will be automatically
revoked in 7 days. This will cause service failures in most
applications dependent on this database.
To avoid a disruption to your service, migrate the database to a Hobby
Basic ($9/month) or higher database plan:
https://hello.heroku.com/upgrade-postgres-c#upgrading-with-pg-copy
If you are unable to upgrade the database, you should reduce the
number of records stored in it.
My postgres database had a single table with 5693 rows at the time I received this email, which does not match the '12858 rows' mentioned by the email. What am I missing here?
It is perhaps worth mentioning that my DB also has a view of the table mentioned above, which Heroku might be adding to the count (despite not being an actual table), doubling the row count from 5693 to 11386, which still does not match the 12858 mentioned in the email, but it is closer.
TL;DR the rows in views DO factor into the total row count, even when it isn't a materialized view, despite the fact that views do not store data.
I ran heroku pg:info and saw the line:
Rows: 12858/10000 (Above limits, access disruption imminent)
I then dropped the view I mentioned in the original post, and ran heroku pg:info again:
Rows: 5767/10000 (Above limits, access disruption imminent)
So it seems indeed that views DO get counted in the total row count, which seems rather silly, since views don't actually store any data.
I also don't know why the (Above limits, access disruption imminent) string is still present after reducing the row number below the 10000 limit, but after running heroku pg:info again a minute later, I got
Rows: 5767/10000 (In compliance)
so apparently the compliance flag is not updated at the same time as the row number.
What's even stranger is that when I later re-created the same view that I had dropped, and ran heroku pg:info again, the row count did not double back up to ~11000, it stayed at ~5500.
It is useful to note that the following SQL command will display the row counts of the various objects in the database:
select table_schema,
table_name,
(xpath('/row/cnt/text()', xml_count))[1]::text::int as row_count
from (
select table_name, table_schema,
query_to_xml(format('select count(*) as cnt from %I.%I', table_schema, table_name), false, true, '') as xml_count
from information_schema.tables
where table_schema = 'public' --<< change here for the schema you want
) t
the above query was copy-pasted from here
It sounds as you would have two different moments where usage of your postgresql database were measured: first one with higher values (12.858 rows is over the free limit and measured at Heroku) and the second one with less values (5693 rows which would be in free limit and could be measured on your local environment?).
Anyway - first things first: Take a look into your PostgreSQL database at Heroku - this can be done in two ways:
Connect your local Heroku CLI with your dyno and check the info of the related PostgreSQL database
HowTo see the database related info
Login into your Heroku WebGUI and check size and rows in there
Heroku Postgres
The background behind their database monitoring is Heroku explain in there:
Monitoring Heroku Postgres

Postgresql: checking if values exist in full table efficiently

We have a transaction table of sale to customers with over 2000 million rows on Redshift. Every months transaction table has 5 million rows. For MIS (monthly 5 million rows only), I need to check if a customer is new based on mobile number, or the mobile number exists in the 2000 million database without joining it on the full table so my query remains efficient.
What I have tried:
newtable=SELECT DISTINCT(mobile_no) as mobile_no,'old' as category FROM table
maintable=SELECT maintable.*, coalesce(nq.category,'new')
FROM maintable as maintable
LEFT JOIN (newquery) as nq on nq.mobile_no=maintable.mobile_no;
This is very slow takes over 50 mins. I also tried
SELECT exists (SELECT 1 FROM newtable WHERE mobile_no=maintable.mobile_no LIMIT 1) as as category but this gives an 'out of memory' error.
Amazon RedShift is a data warehouse, so it won't be fast on queries by design. If you will be doing analysis on the data and expect a faster result, you might want to explore other products they offer such as EMR to do your queries faster.
Here is a reference on what each service is intention is: https://aws.amazon.com/big-data/datalakes-and-analytics/

Retrieve Redshift error messages

I'm running queries on a Redshift cluster using DataGrip that take upwards of 10 hours to run and unfortunately these often fail. Alas, DataGrip doesn't maintain a connection to the database long enough for me to get to see the error message with which the queries fail.
Is there a way of retrieving these error messages later, e.g. using internal Redshift tables? Alternatively, is there are a way to make DataGrip maintain the connection for long enough?
Yes, you Can!
Query stl_connection_log table to find out pid by looking at the recordtime column when your connection was initiated and also dbname, username and duration column helps to narrow down.
select * from stl_connection_log order by recordtime desc limit 100
If you can find the pid, you can query stl_query table to find out if are looking at right query.
select * from stl_query where pid='XXXX' limit 100
Then, check the stl_error table for your pid. This will tell you the error you are looking for.
select * from stl_error where pid='XXXX' limit 100
If I’ve made a bad assumption please comment and I’ll refocus my answer.

what are pros and cons of using database schemas in postgres?

The app, I am working on is like flikr but with groups concept. Each group consists of multiple users and user can do activities like upload,share,comment etc. within their group only.
I am thinking of creating a schema per group to organized data under group-name namespace in order to manage it easily & efficiently.
Will it have any adverse effect on database backup plans ?
Is there any practical limits on number of schemas per database ?
When splitting identically-structured data into schemas, you need to anticipate the fact that you won't need to query them as global entities again. Because it's as cumbersome and anti-SQL as having them in different tables of the same schema.
As an example, say you have 100 groups of users, in 100 schemas named group1..group100, each with a photo table.
To get the total number of photos in your system, you'd need to do:
select sum(n) FROM
(
select count(*) as n from group1.photos
UNION
select count(*) as n from group2.photos
UNION
select count(*) as n from group3.photos
...
UNION
select count(*) as n from group100.photos
)
This sort of query or view needs also to be rebuilt any time a group is added or removed.
This is neither easy or efficient, it's a programmer's nightmare.