Postgresql FS is full but tables aren't so big - postgresql

I want to check something regarding PostgreSQL performance during my app is running.
My app does the next things on 20 tables in a loop :
truncate table.
drop constraints on table
drop indexes on table
insert into local_table select * from remote_oracle_table
Recently I'm getting an error in this part
SQLERRM = could not extend file "base/16400/124810.23": wrote only 4096 of 8192 bytes at block
3092001
create constraints on table
create indexes on table.
This operation runs every night. Most of the tables are small 500M-2G but few tables are pretty big 24G-45G.
My wals and my data directory are on different fs. My data directory fs size is 400G. During this operation the data directory fs becomes full. However, after this operation 100G are freed which means that 300G are used from the 400g of the data directory fs. Something regarding those sizes doesn't seems ok.
When I check my database size:
mydb=# SELECT
mydb-# pg_database.datname,
mydb-# pg_size_pretty(pg_database_size(pg_database.datname)) AS size
mydb-# FROM pg_database;
datname | size
-----------+---------
template0 | 7265 kB
mydb | 246 GB
postgres | 568 MB
template1 | 7865 kB
(4 rows)
When I check all the tables in mydb database:
mydb-# relname as "Table",
mydb-# pg_size_pretty(pg_total_relation_size(relid)) As "Size",
mydb-# pg_size_pretty(pg_total_relation_size(relid) -
pg_relation_size(relid)) as "External Size"
mydb-# FROM pg_catalog.pg_statio_user_tables ORDER BY
pg_total_relation_size(relid) DESC;
Table | Size | External Size
-------------------+------------+---------------
table 1| 45 GB | 13 GB
table 2| 15 GB | 6330 MB
table 3| 9506 MB | 3800 MB
table 4| 7473 MB | 1838 MB
table 5| 7267 MB | 2652 MB
table 6| 5347 MB | 1701 MB
table 7| 3402 MB | 1377 MB
table 8| 3092 MB | 1318 MB
table 9| 2145 MB | 724 MB
table 10| 1804 MB | 381 MB
table 11 293 MB | 83 MB
table 12| 268 MB | 103 MB
table 13| 225 MB | 108 MB
table 14| 217 MB | 40 MB
table 15| 172 MB | 47 MB
table 16| 134 MB | 36 MB
table 17| 102 MB | 27 MB
table 18| 86 MB | 22 MB
.....
In the data directory the base directory`s size is 240G. I have 16G of ram in my machine.

Related

Postgres import issues - huge table shows 0 rows

I have an issue with a database import. Basically, I am doing this on a postgres 9.6 (production):
/usr/bin/pg_dump mydb | /bin/gzip | /usr/bin/ssh root#1.2.3.4 "cat > /root/20210130.sql.gz"
And on a remote machine I am importing on Postgres 11 like this:
step 1: import schema from a beta machine with Postgres 11
step 2: import data from that export like this: time zcat 20210130.sql.gz | psql mydb
The issue that I have is that one of the tables has 0 rows even though it uses a lot of disk space.
In the original db:
table_schema | table_name | row_estimate | total | index | toast | table
--------------------+--------------------------+--------------+------------+------------+------------+------------
public | test | 5.2443e+06 | 18 GB | 13 GB | 8192 bytes | 4864 MB
In the new db:
table_schema | table_name | row_estimate | total | index | toast | table
--------------------+--------------------------+--------------+------------+------------+------------+------------
public | test | 0 | 4574 MB | 4574 MB | 8192 bytes | 16 kB
What is going on here? How can I fix it?
I can't import the entire DB again because it needed ~7 hours to do the import.

TRUNCATE and VACUUM FULL doesn't free space on PostgreSQL server

I imported data into 30 tables (every table also has indexes and constraints) in my PostgreSQL database. After some check I truncated all the tables and performed vacuum full analyze on all the tables. Some space indeed was freed but I`m sure that more data need to be freed. Before the import my PostgreSQL directory was about 20G. After the import it grew to be 270G. Currently, the size of the data directory is 215G.
I run this select :
SELECT
relname as "Table",
pg_size_pretty(pg_total_relation_size(relid)) As "Size",
pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as
"External Size"
FROM pg_catalog.pg_statio_user_tables ORDER BY
pg_total_relation_size(relid)
DESC;
and the result was that the biggest tables are 660M (and right now there are only 4 tables that their size bigger than 100M).
Table | Size | External Size
-------------------------------+------------+---------------
my_table_1 | 660 MB | 263 MB
my_Table_2 | 609 MB | 277 MB
my_table_3 | 370 MB | 134 MB
my_table_4 | 137 MB | 37 MB
my_table_5 | 83 MB | 31 MB
my_table_6 | 5056 kB | 24 kB
mariel_test_table | 4912 kB | 8192 bytes
..........
The data/base directory size is 213G.
I run also this select :
SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_relation_size(C.oid)) AS "size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_relation_size(C.oid) DESC
LIMIT 20;
output :
relation | size
-----------------------------------+--------
my_table_1 | 397 MB
my_Table_2 | 332 MB
my_Table_3 | 235 MB
my_table_7 | 178 MB
my_table_4 | 100 MB
my_table_8 | 99 MB
The outputs of the selects aren't identical.
tempfile sizes :
SELECT temp_files AS "Temporary files"
, temp_bytes AS "Size of temporary files"
FROM pg_stat_database db;;
Temporary files | Size of temporary files
-----------------+-------------------------
0 | 0
0 | 0
0 | 0
100 | 47929425920
0 | 0
I also tried to restart the PostgreSQL instance and the Linux server. What can I try next?

Amazon Redshift table block allocation

Our cluster is a 4 node cluster. We have a table consisting 72 columns.When we query svv_diskusage table to check the allocation of columns in each slice we observed that every column has been allocated into 2 blocks (0 and 1). But for few columns we have datatype of varchar(1) which should not be occupying two blocks of space.
Is it possible that if one of the columns occupies more than a block(in case of varchar(1500)), then the same is allocated for all the other columns of the table. If yes, how this effects the overall database size of the cluster.
Each Amazon Redshift storage block is 1MB in size. Each block contains data from only one column within one table.
The SVV_DISKUSAGE system view contains a list of these blocks, eg:
select db_id, trim(name) as tablename, col, tbl, max(blocknum)
from svv_diskusage
where name='salesnew'
group by db_id, name, col, tbl
order by db_id, name, col, tbl;
db_id | tablename | col | tbl | max
--------+------------+-----+--------+-----
175857 | salesnew | 0 | 187605 | 154
175857 | salesnew | 1 | 187605 | 154
175857 | salesnew | 2 | 187605 | 154
175857 | salesnew | 3 | 187605 | 154
175857 | salesnew | 4 | 187605 | 154
175857 | salesnew | 5 | 187605 | 79
175857 | salesnew | 6 | 187605 | 79
175857 | salesnew | 7 | 187605 | 302
175857 | salesnew | 8 | 187605 | 302
175857 | salesnew | 9 | 187605 | 302
175857 | salesnew | 10 | 187605 | 3
175857 | salesnew | 11 | 187605 | 2
175857 | salesnew | 12 | 187605 | 296
(13 rows)
The number of blocks required to store each column depends upon the amount of data and the compression encoding used for that table.
Amazon Redshift also stores the minvalue and maxvalue of the data that is stored in each block. This is visible in the SVV_DISKUSAGE table. These values are often called Zone Maps and they are used to identify blocks that can be skipped when scanning data. For example, if a WHERE clause looks for rows with a value of 5 in that column, then blocks with a minvalue of 6 can be entirely skipped. This is especially useful when data compressed.
To investigate why your data is consuming two blocks, examine:
The minvalue and maxvalue of each block
The number of values (num_values) stored in each block
Those values will give you an idea of how much data is stored in each block, and whether that matches your expectations.
Also, take a look at the Distribution Key (DISTKEY) used on the table. If the DISTKEY is set to ALL, then table data is replicated between multiple nodes. This could also explain your block count.
Finally, if data has been deleted from the table, then old values might be consuming disk space. Run a VACUUM command on the table to remove deleted data.
A good reference is: Why does a table in my Amazon Redshift cluster consume more disk storage space than expected?

How to get database size accurately in postgresql?

We are running on PostgreSQL version 9.1, previously we had over 1Billion rows in one table and has been deleted. However, it looks like the \l+ command still reports inaccurately about the actual database size (it reported 568GB but in reality it's much much less than).
The proof of that 568GB is wrong is that the individual table size tally didn't add up to the number, as you can see, top 20 relations has 4292MB in size, the remaining 985 relations are all well below 10MB. In fact all of them add up to about less than 6GB.
Any idea why PostgreSQL so much bloat? If confirmed, how can I debloat? I am not super familiar with VACUUM, is that what I need to do? If so, how?
Much appreciate it.
pmlex=# \l+
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description
-----------------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
pmlex | pmlex | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 568 GB | pg_default |
pmlex_analytics | pmlex | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 433 MB | pg_default |
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 5945 kB | pg_default | default administrative connection database
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +| 5841 kB | pg_default | unmodifiable empty database
| | | | | postgres=CTc/postgres | | |
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +| 5841 kB | pg_default | default template for new databases
| | | | | postgres=CTc/postgres | | |
(5 rows)
pmlex=# SELECT nspname || '.' || relname AS "relation",
pmlex-# pg_size_pretty(pg_relation_size(C.oid)) AS "size"
pmlex-# FROM pg_class C
pmlex-# LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
pmlex-# WHERE nspname NOT IN ('pg_catalog', 'information_schema')
pmlex-# ORDER BY pg_relation_size(C.oid) DESC;
relation | size
-------------------------------------+---------
public.page_page | 1289 MB
public.page_pageimagehistory | 570 MB
pg_toast.pg_toast_158103 | 273 MB
public.celery_taskmeta_task_id_key | 233 MB
public.page_page_unique_hash_uniq | 140 MB
public.page_page_ad_text_id | 136 MB
public.page_page_kn_result_id | 125 MB
public.page_page_seo_term_id | 124 MB
public.page_page_kn_search_id | 124 MB
public.page_page_direct_network_tag | 124 MB
public.page_page_traffic_source_id | 123 MB
public.page_page_active | 123 MB
public.page_page_is_referrer | 123 MB
public.page_page_category_id | 123 MB
public.page_page_host_id | 123 MB
public.page_page_serp_id | 121 MB
public.page_page_domain_id | 120 MB
public.celery_taskmeta_pkey | 106 MB
public.page_pagerenderhistory | 102 MB
public.page_page_campaign_id | 89 MB
...
...
...
pg_toast.pg_toast_4354379 | 0 bytes
(1005 rows)
Your options include:
1). Ensuring autovacuum is enabled and set aggressively.
2). Recreating the table as I mentioned in an earlier comment (create-table-as-select + truncate + reload the original table).
3). Running CLUSTER on the table if you can afford to be locked out of that table (exclusive lock).
4). VACUUM FULL, though CLUSTER is more efficient and recommended.
5). Running a plain VACUUM ANALYZE a few times and leaving the table as-is, to eventually fill the space back up as new data comes in.
6). Dump and reload the table via pg_dump
7). pg_repack (though I haven't used it in production)
it will likely look different if you use pg_total_relation_size instead of pg_relation_size
pg_relation_size doesn't give the total size of the table, see
https://www.postgresql.org/docs/9.5/static/functions-admin.html#FUNCTIONS-ADMIN-DBSIZE

How can pg_column_size be smaller than octet_length?

I'm looking for getting anticipated table size by referring column type and length size. I'm trying to use pg_column_size for this.
When testing the function, I realized something seems wrong with this function.
The result value from pg_column_size(...) is sometimes even smaller than the return value from octet_length(...) on the same string.
There is nothing but numeric characters in the column.
postgres=# \d+ t5
Table "public.t5"
Column | Type | Modifiers | Storage | Stats target | Description
--------+-------------------+-----------+----------+--------------+-------------
c1 | character varying | | extended | |
Has OIDs: no
postgres=# select pg_column_size(c1), octet_length(c1) as octet from t5;
pg_column_size | octet
----------------+-------
2 | 1
704 | 700
101 | 7000
903 | 77000
(4 rows)
Is this the bug or something? Is there someone with the some formula to calculate anticipated table size from column types and length values of it?
I'd say pg_column_size is reporting the compressed size of TOASTed values, while octet_length is reporting the uncompressed sizes. I haven't verified this by checking the function source or definitions, but it'd make sense, especially as strings of numbers will compress quite well. You're using EXTENDED storage so the values are eligible for TOAST compression. See the TOAST documentation.
As for calculating expected DB size, that's whole new question. As you can see from the following demo, it depends on things like how compressible your strings are.
Here's a demonstration showing how octet_length can be bigger than pg_column_size, demonstrating where TOAST kicks in. First, let's get the results on query output where no TOAST comes into play:
regress=> SELECT octet_length(repeat('1234567890',(2^n)::integer)), pg_column_size(repeat('1234567890',(2^n)::integer)) FROM generate_series(0,12) n;
octet_length | pg_column_size
--------------+----------------
10 | 14
20 | 24
40 | 44
80 | 84
160 | 164
320 | 324
640 | 644
1280 | 1284
2560 | 2564
5120 | 5124
10240 | 10244
20480 | 20484
40960 | 40964
(13 rows)
Now let's store that same query output into a table and get the size of the stored rows:
regress=> CREATE TABLE blah AS SELECT repeat('1234567890',(2^n)::integer) AS data FROM generate_series(0,12) n;
SELECT 13
regress=> SELECT octet_length(data), pg_column_size(data) FROM blah;
octet_length | pg_column_size
--------------+----------------
10 | 11
20 | 21
40 | 41
80 | 81
160 | 164
320 | 324
640 | 644
1280 | 1284
2560 | 51
5120 | 79
10240 | 138
20480 | 254
40960 | 488
(13 rows)