Temp File In PostgreSql - postgresql

My db size is growing up to 41GB but when I create a backup it becomes 2.4 / 2.5GB, Why this much difference?
I have performed some vaccum, Reindexing etc. but still the size is not shrinking and it is showing 19 GB temp file but when I check in C:\Program Files\PostgreSQL\10\data\base\pgsql_tmp the folder is empty..
How can I delete those temp files? Is there a way to reduce my DB size?
EDIT
Query to get temp files
SELECT temp_files AS "Temporary files"
, temp_bytes AS "Size of temporary files"
FROM pg_stat_database db;

The values in pg_stat_database are cummulative since the server was started, so it's not the number of bytes currently used. It's the number of bytes used since the statistics were reset.
As this is a cumulative number, it will increase constantly and will never shrink unless you reset the statistics.
You can reset those statistics (as the superuser) using
select pg_stat_reset();
It seems that your queries are indeed using many temp files and temp space. That could indicate that you have not given Postgres enough memory (e.g. work_mem or temp_buffers) But that is a different question.
Those temp file are also not part of a database dump as they are only used during query execution.

Related

PostgreSQL Database size is not equal to sum of size of all tables

I am using an AWS RDS PostgreSQL instance. I am using below query to get size of all databases.
SELECT datname, pg_size_pretty(pg_database_size(datname))
from pg_database
order by pg_database_size(datname) desc
One database's size is 23 GB and when I ran below query to get sum of size of all individual tables in this particular database, it was around 8 GB.
select pg_size_pretty(sum(pg_total_relation_size(table_schema || '.' || table_name)))
from information_schema.tables
As it is an AWS RDS instance, I don't have rights on pg_toast schema.
How can I find out which database object is consuming size?
Thanks in advance.
The documentation says:
pg_total_relation_size ( regclass ) → bigint
Computes the total disk space used by the specified table, including all indexes and TOAST data. The result is equivalent to pg_table_size + pg_indexes_size.
So TOAST tables are covered, and so are indexes.
One simple explanation could be that you are connected to a different database than the one that is shown to be 23GB in size.
Another likely explanation would be materialized views, which consume space, but do not show up in information_schema.tables.
Yet another explanation could be that there have been crashes that left some garbage files behind, for example after an out-of-space condition during the rewrite of a table or index.
This is of course harder to debug on a hosted platform, where you don't have shell access...

PostgreSQL "pg_prewarm" buffer size

Table orders contains total 1,500,000 toples. After a fresh restart of the system, I ran the following query:
SELECT pg_prewarm('orders');
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE o_totalprice < 100
Which gave a buffer output as following:
Buffers: shared hit=15768 read=10327
The select statement returns no records.
Now my question is, how did PostgreSQL calculate that it will take 15768 blocks in buffer?
Your shared_buffers is set to 128MB, right?
128 MB of shared buffers translates to 16384 blocks of size 8KB in the cache.
So when you run pg_prewarm('orders'), PostgreSQL will read the complete table into shared buffers. Now the table is bigger than your shared_buffers, so the first blocks “drop out” of the cache again when the last blocks are read, because shared_buffers cannot fit them all.
Increase shared_buffers if you want to have the whole table in the cache.

Select * from table_name is running slow

The table contains around 700 000 data. Is there any way to make the query run faster?
This table is stored on a server.
I have tried to run the query by taking the specific columns.
If select * from table_name is unusually slow, check for these things:
Network speed. How large is the data and how fast is your network? For large queries you may want to think about your data in bytes instead of rows. Run select bytes/1024/1024/1024 gb from dba_segments where segment_name = 'TABLE_NAME'; and compare that with your network speed.
Row fetch size. If the application or IDE is fetching one-row-at-a-time, each row has a large overhead with network lag. You may need to increase that setting.
Empty segment. In a few weird cases the table's segment size can increase and never shrink. For example, if the table used to have billions of rows, and they were deleted but not truncated, the space would not be released. Then a select * from table_name may need to read a lot of empty extents to get to the real data. If the GB size from the above query seems too large, run alter table table_name move; to rebuild the table and possible save space.
Recursive query. A query that simple almost couldn't have a bad execution plan. It's possible, but rate, for a recursive query has a bad execution plan. While the query is running, look at select * from gv$sql where users_executing > 0;. There might be a data dictionary query that's really slow and needs to be tuned.

Evaluate how much space will be freed by VACUUM in Redshift

According to AWS doc:
Amazon Redshift does not automatically reclaim and reuse space that is freed when you delete rows and update rows.
Before running VACUUM, is there a way to know or evaluate how much space will be free from disk by the VACUUM?
Thx
References:
http://docs.aws.amazon.com/redshift/latest/dg/t_Reclaiming_storage_space202.html
http://docs.aws.amazon.com/redshift/latest/dg/r_VACUUM_command.html
You can calculate the amount of storage that will be freed up from a vacuum command by looking up the tbl_rows column in the svv_table_info view. This includes rows that are marked for deletion. Compare that to a select count(*) from the same table and you'll have a ratio. Something like this on a theoretical table named factsales.
select (select cast(count(*) as numeric(12,0)) from factsales) /
cast(tbl_rows as numeric(12,0))
as "percentage of non deleted rows"
from svv_table_info where "table" = 'factsales'
There doesn't appear to be a straightforward way to execute dynamic SQL and cursors so to get this same ratio across all tables you'd have to execute the code from an external source or programming language i.e. python.
Its not an extremely accurate way, but you can query svv_table_info and look for the column deleted_pct. This will give you a rough idea, in percentage terms, about what fraction of the table needs to be rebuilt using vacuum.
You can run it for all the tables in your system to get this estimate for the whole system.

PostgreSQL - restored database smaller than original

I have made a backup of my PostgreSQL database using pg_dump to ".sql" file.
When I restored the database, its size was 2.8GB compared with 3.7GB of the source (original) database. The application that access the database appears to work fine.
What is the reason to smaller size of the restored database?
The short answer is that database storage is more optimised for speed than space.
For instance, if you inserted 100 rows into a table, then deleted every row with an odd numbered ID, the DBMS could write out a new table with only 50 rows, but it's more efficient for it to simply mark the deleted rows as free space and reuse them when you next insert a row. Therefore the table takes up twice as much space as is currently needed.
Postgres's use of "MVCC", rather than locking, for transaction management makes this even more likely, since an UPDATE usually involves writing a new row to storage, then marking the old row deleted once no transactions are looking at it.
By dumping and restoring the database, you are recreating a DB without all this free space. This is essentially what the VACUUM FULL command does - it rewrites the current data into a new file, then deletes the old file.
There is an extension distributed with Postgres called pg_freespace which lets you examine some of this. e.g. you can list the main table size (not including indexes and columns stored in separate "TOAST" tables) and free space used by each table with the below:
Select oid::regclass::varchar as table,
pg_size_pretty(pg_relation_size(oid)/1024 * 1024) As size,
pg_size_pretty(sum(free)) As free
From (
Select c.oid,
(pg_freespace(c.oid)).avail As free
From pg_class c
Join pg_namespace n on n.oid = c.relnamespace
Where c.relkind = 'r'
And n.nspname Not In ('information_schema', 'pg_catalog')
) tbl
Group By oid
Order By pg_relation_size(oid) Desc, sum(free) Desc;
The reason is simple: during its normal operation, when rows are updated, PostgreSQL adds a new copy of the row and marks the old copy of the row as deleted. This is multi-version concurrency control (MVCC) in action. Then VACUUM reclaims the space taken by the old row for data that can be inserted in the future, but doesn't return this space to the operating system as it's in the middle of a file. Note that VACUUM isn't executed immediately, only after enough data has been modified in the table or deleted from the table.
What you're seeing is entirely normal. It just shows that PostgreSQL database will be larger in size than the sum of the sizes of the rows. Your new database will most likely eventaully grow to 3.7GB when you start actively using it.