Is it possible to know which database takes up the most storage on my disk - postgresql

I found the largest folder under my postgreSQL storage directory is /usr/local/var/postgres/base/209510
How could I know this data is belonging to which datatable ?
In the other way, is it possible to know which database or datatable takes up the most storege.
Because there's almost no free space on my SSD disk

To find the largest database in postgreSQL :
SELECT datname, pg_size_pretty(pg_database_size(datname)) db_size
FROM pg_database
ORDER BY db_size;

In psql, you can use the \l+ command to get a nice summary of databases with sizes.

The best way to do this is from within an SQL prompt - there are several examples of queries you can run listed at https://wiki.postgresql.org/wiki/Disk_Usage and one for the largest tables in current database is copied below for posterity.
SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_relation_size(C.oid)) AS "size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_relation_size(C.oid) DESC
LIMIT 20;

Related

ERROR: could not read block 240030 in file "pg_tblspc/16387/PG_9.1_201105231/16388/16597.1": read only 0 of 8192 bytes

My friend got the following error So I imported his idb_dumps into my system.
ERROR: could not read block 240030 in file "pg_tblspc/16387/PG_9.1_201105231/16388/16597.1": read only 0 of 8192 bytes.
In pgadmin3 I ran the following query
"select n.nspname AS tableschema, c.relname AS tablename, c.relfilenode as rel_file_node from pg_class c
inner join pg_namespace n on (c.relnamespace = n.oid) ORDER BY rel_file_node;"
But in the output, I am not seeing any entry for "relfilenode = 16597.1"
Please help me to find corrupted table or index.
one more thing why relfilenode is float value?
Above query from question won't work until you are not copying the entire database from other systems. Here dump of database files doesn't help which I have tried earlier.

Delete unused indexes

I run this query for check if there are some unused indexes in my DataBase.
select
t.tablename AS "relation",
indexname,
c.reltuples AS num_rows,
pg_relation_size(quote_ident(t.tablename)::text) AS table_size,
pg_relation_size(quote_ident(indexrelname)::text) AS index_size,
idx_scan AS number_of_scans,
idx_tup_read AS tuples_read,
idx_tup_fetch AS tuples_fetched
FROM pg_tables t
LEFT OUTER JOIN pg_class c ON t.tablename=c.relname
LEFT OUTER JOIN
( SELECT c.relname AS ctablename, ipg.relname AS indexname, x.indnatts AS number_of_columns, psai.idx_scan, idx_tup_read, idx_tup_fetch, indexrelname, indisunique FROM pg_index x
JOIN pg_class c ON c.oid = x.indrelid
JOIN pg_class ipg ON ipg.oid = x.indexrelid
JOIN pg_stat_all_indexes psai ON x.indexrelid = psai.indexrelid )
AS foo
ON t.tablename = foo.ctablename
WHERE t.schemaname='public'
and idx_scan = 0
ORDER BY
--1,2
--6
5 desc
;
And I got a lot of rows where those fields are all zero:
number_of_scans,
tuples_read,
tuples_fetched
Is that mean that I can drop them? Is there a chance that that Metadata is out-of-date? How can I check it?
I'm using Postgres with version 9.6
Your query misses some uses of indexes that do not require them to be scanned:
they enforce primary key, unique and exclusion constraints
they influence statistics collection (for “expression indexes”)
Here is my gold standard query from my blog post:
SELECT s.schemaname,
s.relname AS tablename,
s.indexrelname AS indexname,
pg_relation_size(s.indexrelid) AS index_size
FROM pg_catalog.pg_stat_user_indexes s
JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid
WHERE s.idx_scan = 0 -- has never been scanned
AND 0 <>ALL (i.indkey) -- no index column is an expression
AND NOT EXISTS -- does not enforce a constraint
(SELECT 1 FROM pg_catalog.pg_constraint c
WHERE c.conindid = s.indexrelid)
ORDER BY pg_relation_size(s.indexrelid) DESC;
Anything that shows up there has not been used since the statistics have been reset and can be safely dropped.
There are a few caveats:
statistics collection must run (look for the “statistics collector” process and see if you have warnings about “stale statistics” in the log)
run the query against your production database
if your program is running at many sites, try it on all of them (different users have different usage patterns)
It is possible you can delete them, however you should make sure your query runs after a typical workload. That is, are there some indexes that show no usage in this query only used during certain times when specialized queries run? Month-end reporting, weekly runs, etc? We ran into this a couple of times - several large indexes didn't get used during the day but supported month-end summaries.

Why my empty postgres database is 7MB?

I just created an new database and it already takes up 7MB. Do you know what is taking up this much space? Is there a way to get the "real" size of the database used as in how much data is stored?
0f41ba72-a1ea-4516-a9f0-de8a3609bc4a=> select pg_size_pretty(pg_database_size(current_database()));
pg_size_pretty
----------------
7055 kB
(1 row)
0f41ba72-a1ea-4516-a9f0-de8a3609bc4a=> \dt
No relations found.
Well, even you don't created any relation yet the new database is not empty. When a CREATE DATABASE is issued, Postgres copy a TEMPLATE database - which comes with catalog tables - to a new database. In fact, "Nothing is created, everything is transformed". You can use commands below to inspect this:
--Size per table
SELECT pg_size_pretty(pg_total_relation_size(oid)), relname FROM pg_class WHERE relkind = 'r' AND NOT relisshared;
--Total size
SELECT pg_size_pretty(sum(pg_total_relation_size(oid))) FROM pg_class WHERE relkind = 'r' AND NOT relisshared;
--Total size of databases
SELECT pg_size_pretty(pg_database_size(oid)), datname FROM pg_database;
A quote from the docs:
By default, the new database will be created by cloning the standard
system database template1.
An empty database contains system catalogs and The Information Schema.
Execute this query to see them:
select nspname as schema, relname as table, pg_total_relation_size(c.oid)
from pg_class c
join pg_namespace n on n.oid = relnamespace
order by 3 desc;
schema | table | pg_total_relation_size
--------------------+-----------------------------+------------------------
pg_catalog | pg_depend | 1146880
pg_catalog | pg_proc | 950272
pg_catalog | pg_rewrite | 589824
pg_catalog | pg_attribute | 581632
... etc
You can get the total size of non-system relations with the query:
select sum(pg_total_relation_size(c.oid))
from pg_class c
join pg_namespace n on n.oid = relnamespace
where nspname not in ('information_schema', 'pg_catalog', 'pg_toast');
The query returns null on empty database.
Every PostgreSQL databases has own system catalogue .. 7MB. So your numbers are correct. PostgreSQL is designed for client-server architecture and 1GB and longer databases - so this cost is not significant.
If you need reduced space allocation, you can try embedded databases like SQLite or Firebird.

Incremental loading into amazon redshift from local mysql database - Automation process

We are beginning to use Amazon Redshift for our reporting purposes. We are able to load our entire data onto Redshift through S3 and also manually update the data for everyday incremental load. Now we are into the process of automating the entire process because then the scripts can be run at a particular time and data gets automatically updated with everyday data.
The method we are using for incremental load is as suggested in the documentation,
http://docs.aws.amazon.com/redshift/latest/dg/merge-create-staging-table.html
this works fine manually but while automating the process, I am not sure how to obtain the primary key for each table based on which the existing records are updated. In short how to obtain the primary key field from redshift ? Is there something like "index" or some other term which can be used to obtain the primary key or even the distkey ? Thanks in advance
I'm still working on the details of the query to extract the information easily, but you can use this query
select a.attname AS "column_name", format_type(a.atttypid, a.atttypmod) AS "column_type",
format_encoding(a.attencodingtype::integer) AS "encoding", a.attisdistkey AS "distkey",
a.attsortkeyord AS "sortkey", a.attnotnull AS "notnull", a.attnum, i.*
FROM pg_namespace n
join pg_class c on n.oid = c.relnamespace
join pg_attribute a on c.oid = a.attrelid AND a.attnum > 0 AND NOT a.attisdropped
left join pg_index i on c.oid = i.indrelid and i.indisprimary='true'
WHERE
c.relname = 'mytablename'
and n.nspname='myschemaname'
order by a.attnum
to find most of the interesting things about a table. If you look at the output, the pg_index.indkey is a space delimited concatenation of the primary key columns (since it may be a compound key) expressed as the column order number which ties back to the pg_attribute.attnum column.

What are the available options to identify and remove the invalid objects in Postgres (ex: corrupted indexes)

What are the available options to identify and remove the invalid objects in Postgres
If you're referring to detecting "invalid" (poorly created) indexes, apparently Postgres can "fail" in an attempt to create an index, and then the query planner won't use them, though they exist in your system. This query will detect "failed" indexes:
https://www.enterprisedb.com/blog/pgupgrade-bug-invalid-concurrently-created-indexes
SELECT n.nspname, c.relname
FROM pg_catalog.pg_class c, pg_catalog.pg_namespace n,
pg_catalog.pg_index i
WHERE (i.indisvalid = false OR i.indisready = false) AND
i.indexrelid = c.oid AND c.relnamespace = n.oid AND
n.nspname != 'pg_catalog' AND
n.nspname != 'information_schema' AND
n.nspname != 'pg_toast'
though I suppose detecting TOAST table indexes wouldn't hurt, so you can remove that portion of the query :)
Related, for me sometimes just running a fresh ANALYZE on a table also makes indexes suddenly start being used in production (i.e. even if indexes aren't "invalid" they may be unused until an ANALYZE run). Weird.
Have you tried running vacuum full pg_class as superuser?
Also, auto-vacuum should take care of it eventually. Your objects seem to be temporary tables/indexes, and the catalog is (usually) not being updated as frequently as your data.