Postgres db data deletion not reducing the table size

Postgres db data deletion not reducing the table size - postgresql

I have a postgres db with its size constantly increasing due to ~70 gb of data size and ~305 indexes size. So we planned a cleanup of older data,even after cleanup of data for 14 months and keeping 6 month data but data size and indexes size doesn't seems to be freeing up.
Please suggest anything if I am missing as I am new to postgres db.

Related

Postgres work_mem tuning

I have 2 postgres instances in azure.
I see every hour:
1 temporary file created in one postgres instance (About 8 MB Max
Size)
4 temporary files are created in another postgres instance
(About 11 MB Max Size)
I thought this was frequent enough for me to increase work_mem...
I increased work_mem to 8MB and 12 MB respectively in the 2 postgres instances but then I saw the temporary files were still created..
This time I saw each instance has one temporary file of 16MB size each... this behavior confuses me..
I expected that temporary file creation would stop..
I tried to refer to: https://pganalyze.com/docs/log-insights/server/S7
Is few temporary files every hour not a big deal?
Should I not tune work_mem ?

Amazon RDS PostgreSQL: Sudden increase in Read IOPS

We are using Amazon RDS to host our PostgreSQL databases. Our production instance (db.t3.xlarge, Single-AZ) was running smoothly until suddenly Read IOPS, Read Latency, Read Throughput and Disk Queue Depth metrics in the AWS console increased rapidly and stayed high afterward (with a lower variability) whereas Write IOPS and Write Throughput were normal.
Read IOPS
Read Throughput
Disk Queue Depth
Write IOPS
There were no code changes or deployments on the date of the increase. There were no significant increases in user activity either.
About our DB structure, we have a single table that holds all of our data and in that table, we have these fields: id as UUID (primary key), type as VARCHAR, data as JSONB (holds the actual data), createdAt and updatedAt as timestamp with the time zone. Most of our data columns have sizes bigger than 2 KB so most of the rows are stored in TOAST table. We have 20 (BTREE) indexes that are created for frequently used fields in JSONB.
So far we have tried VACUUM ANALYZE and also completely rebuilding our table: creating a new table, copying all data from the old table, creating all indexes. They didn't change the behavior.
We also tried increasing storage thus increasing IOPS performance. It helped a bit but it is still not the same as before.
What could be the root cause of this problem? How can we fix it permanently (without increasing storage or instance type)? For now, we are looking for easy changes and we will improve our data model in the future.

T3 instances are not suitable for production. Try moving to another family like a C or M type. You may have hit some burst limits that are now causing odd behaviour

PostgreSQL large number of partition tables problem using hash partitioning

I have a very large database with more than 1.5 billion records for device data and growing.
I manage this by having a separate table for each device, about 1000 devices (tables) with an index table for daily stats. Some devices produce much more data than others, so I have tables with more than 20 million rows and others with less than 1 million.
I use indexes, but queries and data processing gets very slow on large tables.
I just upgraded to PostgreSQL 13 from 9.6 and tried to do one single table with hash partition with a least 3600 tables to import all the tables into this one and speed up the process.
But as soon as I do this, I was able to insert some rows, but when I try to query or count rows I get out of shared memory, and max locks per transaction issues.
I tried to fine tune but didn’t succeed. I dropped the tables to 1000, but in certain operations I get the error once again, just for testing I dropped down to 100 and it works, but queries are slower with the same amount data in a stand alone table.
I tried range partition in each individual table for year period and improved but will be very messy to maintain thousands of tables with yearly ranges (note I am running in a server with 24 virtual processors and 32 GB RAM).
The question is: Is it possible to have a hash partition with more than 1000 tables? If so, what I am doing wrong?

Slow bulk read from Postgres Read replica while updating the rows we read

We have on RDS a main Postgres server and a read replica.
We constantly write and update new data for the last couple of days.
Reading from the read-replica works fine when looking at older data but when trying to read from the last couple of days, where we keep updating the data on the main server, is painfully slow.
Queries that take 2-3 minutes on old data can timeout after 20 minutes when querying data from the last day or two.
Looking at the monitors like CPU I don't see any extra load on the read replica.
Is there a solution for this?

You are accessing over 65 buffers for ever 1 visible row found in the index scan (and over 500 buffers for each row which is returned by the index scan, since 90% are filtered out by the mmsi criterion).
One issue is that your index is not as well selective as it could be. If you had the index on (day, mmsi) rather than just (day) it should be about 10 times faster.
But it also looks like you have a massive amount of bloat.
You are probably not vacuuming the table often enough. With your described UPDATE pattern, all the vacuum needs are accumulating in the newest data, but the activity counters are evaluated based on the full table size, so autovacuum is not done often enough to suit the needs of the new data. You could lower the scale factor for this table:
alter table simplified_blips set (autovacuum_vacuum_scale_factor = 0.01)
Or if you partition the data based on "day", then the partitions for newer days will naturally get vacuumed more often because the occurrence of updates will be judged against the size of each partition, it won't get diluted out by the size of all the older inactive partitions. Also, each vacuum run will take less work, as it won't have to scan all of the indexes of the entire table, just the indexes of the active partitions.

As suggested, the problem was bloat.
When you update a record in an ACID database the database creates a new version of the record with the new updated record.
After the update you end with a "dead record" (AKA dead tuple)
Once in a while the database will do autovacuum and clean the table from the dead tuples.
Usually the autovacuum should be fine but if your table is really large and updated often you should consider changing the autovacuum analysis and size to be more aggressive.

Postgresql - optimal hardware for working with 50 milliion record retrival

I run postgresql 9.6 on aws rds with r4.xlarge. initially the table had less records and query was lightning fast later the table grew to 8 GB with 50 million records and query were extremely slow.
What's the optimal config needed for working with such data retrieval.
Server specs
Cores 4 ram 30 gb

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse